Matrix Wizard

Matrix import wizard for tab separated lists

The table oriented import wizard works fine if you have separate lists for descriptor and description data. Usually this type of tables is generated by an export of data from a database. A typical example for that cases is described in the import wizardtutorial. If no dedicated application for collecting description data is available, most commonly a spreadsheet program like MS Excel or Open Office Calc is used to collect the description data. Typically the table columns represent a single character (=descriptor) and the table rows represent the items (=description or sample data). Importing data from such a “matrix” into Diversity Descriptions with the table oriented import wizard usually requires a lot of manual adaptions. Therefore the specialized “Matrix Wizard” was designed to import the most important descriptor and description data in a single import step.

As usual you should create a new project and install a descriptor tree to collect the dedicated descriptors. Then choose Data → Import → Import wizard → Matrix wizard … from the menu. As know from the import wizard, a window to create or select a import session will be shown.

After selecting or creating an import session a window as shown below will open that will lead you through the import of the data.

With the selection box Target: you may select which data shall be imported:

Select Description to import summary data
Select Sampling event to import sample data

Remark: Example files and XML schemas to import summary or sample data using the matrix wizard are provided in the tutorialfiles or may be downloaded from the Diversity Descriptions example file repository. Find the example data in folders “Biomass as description” and “Biomass as sample”

Matrix Wizard Description

Matrix import wizard for description data

After selecting Target: Description a window as shown below will be displayed. The window is separated in three areas. On the left side you see the descriptor settings for the actual file column (presently not visible). On the right side you see the list of available import steps. In the middle part the details of the selected import steps are shown.

Choosing the File

As a first step, choose the File from where the data should be imported. The currently supported format is tab-separated text. Then choose the Encoding of the file, e.g. Unicode. The Header lines specifies the number of header lines, usually at least the first line which typically contains the descriptor names (see image below). The Start line and End line will automatically be set according to your data. You may change these to restrict the data lines, i.e. the descriptions/items that shall be imported. The not imported parts in the file are indicated with a gray background. If your data contains e.g. date information or floating point values where notations differ between countries (e.g. 3.14 - 3,13), choose the Language / Country to ensure a correct interpretation of your data. Finally you can select a prepared Schema (see sections Testing and Import below) for the import.

Choosing the descriptor types

In the first line of the file section the descriptor type of the file columns are displayed. The value Ignore indicates that the column shall not be imported. To adjust the descriptor type, select a cell in the file column and chose a Descriptor type in the left section of the panel (see image below). Available descriptor types are Categorical, Quantitative, Text and Sequence.

For categorical descriptors you may specify State splitters, if one item has more than one categorical state. To add a splitter click on the add button, to remove it use the button. By clicking the Test states button you get a list of categorical states found in the file between start line and end line and the number of occurrences (field Count, see image above). With this function you may test the effect of the state splitters.

For quantitative descriptors you have to specify the Statistical measure that shall be used for the imported values. By clicking the Test values button you get a list of values in the file between start line and the line number in the file (see image above).

If a value cannot be interpreted as a number, it is marked with a light red background colour. You may edit the erroneous values in the test result panel (see below).

For text and sequence descriptors no additional adjustments are possible. As long as no descriptor columns have been defined, a reminder is visible on top of the left section:

Please select a descriptor type = At least for one file column a descriptor type is needed.

In our example column 1 (date) will be adjusted as text descriptor and columns 2 (functional group) and 3 (sublocation) as categorical descriptors.

Table data

To set the source for the data columns, select the step of a table listed underneath the File step. Be aware that for the Descriptor table not the file columns will be selected but the file rows in the header lines. In the image below you see an example of the descriptor data columns, where the values for id and label are taken from the first line in the file. Additionally the descriptors will be assigned to the descriptor tree “Descriptor tree for matrix”.

Remark: Although descriptortree_id is a numeric value in the database, the externally visible descriptor tree name is expected here. Since this name is unambigious, the corresponding id will be determined during import.

The option Address index allows the selection of the column number (for descriptor) rsp. the line number (for description) as value. See below an example for the Description table data columns. For id the line number has been selected to get an unambigious address value. The label has been composed of the line number, three data columns and some fixed text parts. Finally all description data are assigned to the project “Matrix text”.

Remark: Although project_id is a numeric value in the database, the externally visible project name is expected here. Since this name is unambigious, the corresponding id will be determined during import.

A reminder in the header line will show you what actions are still needed to import the data into the table:

Please select at least one decisive column = If data will be imported depends on the content of decisive colums, so at least one must be selected.
Please select the position in the file = The position in the file must be given if the data for a column should be taken from the file.
From file or For all = For every you have to decide whether the data are taken from the file or a value is entered for all
Please select a value from the list = You have to select a value from the provided list

The handling of the columns is almost the same as described in the chapter columns of the table oriented import wizard.

Testing

To test if all requirements for the import are met use the Testing step. You can navigate to a certain data cell by using the Column: and Line: controls. As an alternative select the data cell ent click on button . Finally click on the Test data button. If there are still unmet requirements, these will be listed in a window as shown below. In the window below you can see the following test actions:

Insert of a quantitative descriptor “biomass”
Insert of a descriptor tree node (assignemnt to tree “Matrix test tree”)
Insert of a recommended statistical measure (measure “Mean”) for the descriptor tree node
Insert of description “Biomass 3 [CSP01 woody NW_c]” for project “Matrix test”
Insert of a statistical measure for “Mean” with value 28.2020000…

You may save the schema file by a click on button Save schema. If you click on button Generate flat files according to your adjustments data and matching schema files for the table oriented import wizards will be generated. This option may be used, if you want to import additional data, e.g. a descriptor’s measurement unit, that are not supported by the matrix wizard. Since all mapping relevant data are stored in the selected import session, you may import the basic descriptor and description data using the matrix wizard and append additional data with the table oriented import wizards. In this case the generated flat data and schema files might be useful to have a defined starting point.

Import

With the last step you can finally start to import the data into the database. If you want to repeat the import with the same settings and data of the same structure, you can save a schema of the current settings (see below).


Schedule for matrix import of tab-separated text files into DiversityDescriptions
Target within DiversityDescriptions: Description
Schedule version:	1	Database version:	03.00.17
Lines:	2 - 757	Header lines:	1
Encoding:	ANSI	Language:	US

Tables

Descriptor (Descriptor)
Merge handling: Merge

Column in table	?	File pos.	Value	Source
id	?	1		File
label		1		File
descriptortree_id			Descriptor tree for matrix	Interface

Description (Description)
Merge handling: Merge

Column in table	?	Pre	Post	File pos.	Value	Source
id	?					Index
label		Biomass				Index
+		[		0		File
+				2		File
+			]	3		File
project_id					Matrix test	Interface

Descriptor columns

File pos.

Descriptor type

Additional data

Categorical

Splitters

Text

Categorical

Splitters

Categorical

Splitters

Quantitative

Mean (= average)

File cells that could not be imported will be marked with a red background while imported lines are marked green. If you want to save lines that produce errors during the import in a separate file, use the Save failed lines option. The protocol of the import will contain all settings acording to the used schema and an overview containing the number of inserted, updated, unchanged and failed lines (see below).

Protocol


Responsible:	Link	(DB-User: Workshop)
Date:	Donnerstag, 20. April 2017, 15:53:24
Server:	training.diversityworkbench.de
Database:	DiversityDescriptions_Workshop
Descriptor columns total:	5
Descriptors imported:	5
Import lines total:	756
Descriptions imported:	756
Cells imported:	3780

Matrix Wizard Sample

Matrix import wizard for sample data

After selecting Target: Sampling event a window as shown below will be displayed. The window is separated in 3 areas. On the left side you see the descriptor settings for the actual file column (presently not visible). On the right side you see the list of available import steps. In the middle part the details of the selected import steps are shown.