Matrix import wizard for description data

After selecting Target: Description a window as shown below will be displayed. The window is separated in three areas. On the left side you see the descriptor settings for the actual file column (presently not visible). On the right side you see the list of available import steps. In the middle part the details of the selected import steps are shown.

 

Choosing the File

As a first step, choose the File from where the data should be imported. The currently supported format is tab-separated text. Then choose the Encoding of the file, e.g. Unicode.  The Header lines specifies the number of header lines, usually at least the first line which typically contains the descriptor names (see image below). The Start line and End line will automatically be set according to your data. You may change these to restrict the data lines, i.e. the descriptions/items that shall be imported. The not imported parts in the file are indicated with a gray background. If your data contains e.g. date information or floating point values where notations differ between countries (e.g. 3.14 - 3,13), choose the Language / Country to ensure a correct interpretation of your data. Finally you can select a prepared Schema (see sections Testing and Import below) for the import.

 

Choosing the descriptor types

In the first line of the file section the descriptor type of the file columns are displayed. The value Ignore indicates that the column shall not be imported. To adjust the descriptor type, select a cell in the file column and chose a Descriptor type in the left section of the panel (see image below). Available descriptor types are Categorical, Quantitative, Text and Sequence.

For categorical descriptors you may specify State splitters, if one item has more than one categorical state. To add a splitter click on the button, to remove it use the button. By clicking the Test states button you get a list of categorical states found in the file between start line and end line and the number of occurrences (field Count, see image above). With this function you may test the effect of the state splitters.

For quantitative descriptors you have to specify the Statistical measure that shall be used for the imported values. By clicking the Test values button you get a list of values in the file between start line and the line number in the file (see image above).

If a value cannot be interpreted as a number, it is marked with a light red background colour. You may edit the erroneous values in the test result panel (see below).

For text and sequence descriptors no additional adjustments are possible. As long as no descriptor columns have been defined, a reminder is visible on top of the left section:

In our example column 1 (date) will be adjusted as text descriptor and columns 2 (functional group) and 3 (sublocation) as categorical descriptors.

 

Table data

To set the source for the data columns, select the step of a table listed underneath the File step. Be aware that for the Descriptor table not the file columns will be selected but the file rows in the header lines. In the image below you see an example of the descriptor data columns, where the values for id and label are taken from the first line in the file. Additionally the descriptors will be assigned to the descriptor tree "Descriptor tree for matrix".  

Remark: Although descriptortree_id is a numeric value in the database, the externally visible descriptor tree name is expected here. Since this name is unambigious, the corresponding id will be determined during import.

The option Address index allows the selection of the column number (for descriptor) rsp. the line number (for description) as value. See below an example for the Description table data columns. For id the line number has been selected to get an unambigious address value. The label has been composed of the line number, three data columns and some fixed text parts. Finally all description data are assigned to the project "Matrix text".

Remark: Although project_id is a numeric value in the database, the externally visible project name is expected here. Since this name is unambigious, the corresponding id will be determined during import.

A reminder in the header line will show you what actions are still needed to import the data into the table:

The handling of the columns is almost the same as described in the chapter columns of the table oriented import wizard.

 

Testing

To test if all requirements for the import are met use the Testing step. You can navigate to a certain data cell by using the Column: and Line: controls. As an alternative select the data cell ent click on button . Finally click on the Test data button. If there are still unmet requirements, these will be listed in a window as shown below. In the window below you can see the following test actions:

  1. Insert of a quantitative descriptor "biomass"
  2. Insert of a descriptor tree node (assignemnt to tree "Matrix test tree")
  3. Insert of a recommended statistical measure (measure "Mean") for the descriptor tree node
  4. Insert of description "Biomass 3 [CSP01 woody NW_c]" for project "Matrix test"
  5. Insert of a statistical measure for "Mean" with value 28.2020000...

You may save the schema file by a click on button Save schema. If you click on button Generate flat files according to your adjustments data and matching schema files for the table oriented import wizards will be generated. This option may be used, if you want to import additional data, e.g. a descriptor's measurement unit, that are not supported by the matrix wizard. Since all mapping relevant data are stored in the selected import session, you may import the basic descriptor and description data using the matrix wizard and append additional data with the table oriented import wizards. In this case the generated flat data and schema files might be useful to have a defined starting point.    

 

Import

With the last step you can finally start to import the data into the database. If you want to repeat the import with the same settings and data of the same structure, you can save a schema of the current settings (see below).

Schedule for matrix import of tab-separated text files into DiversityDescriptions
Target within DiversityDescriptions: Description
Schedule version: 1 Database version: 03.00.17
Lines: 2 - 757 Header lines: 1
Encoding: ANSI Language: US

Tables

Descriptor (Descriptor)
Merge handling: Merge
Column in table ? Copy Pre Post File pos. Value Source
id ? 1 File
label 1 File
descriptortree_id Descriptor tree for matrix Interface

Description (Description)
Merge handling: Merge
Column in table ? Copy Pre Post File pos. Value Source
id ? Index
label Biomass Index
+ [ 0 File
+ 2 File
+ ] 3 File
project_id Matrix test Interface

Descriptor columns

File pos. Descriptor type Additional data
0 Categorical
Splitters
1 Text
2 Categorical
Splitters
3 Categorical
Splitters
4 Quantitative
Statistical measure
Mean (= average)

File cells that could not be imported will be marked with a red background while imported lines are marked green. If you want to save lines that produce errors during the import in a separate file, use the Save failed lines option. The protocol of the import will contain all settings acording to the used schema and an overview containing the number of inserted, updated, unchanged and failed lines (see below).

Protocol
Responsible: Link (DB-User: Workshop)
Date: Donnerstag, 20. April 2017, 15:53:24
Server: training.diversityworkbench.de
Database: DiversityDescriptions_Workshop
Descriptor columns total: 5
Descriptors imported: 5
Import lines total: 756
Descriptions imported: 756
Cells imported: 3780