Import

There are two options to import data:

Import GFBio

Import data from GFBio online service

For a short introduction, see .

In the menu, choose Data - Import - GFBio to open a window as shown below. Enter the login data (User + Password) and the Key of the project. To retrieve the data, click on the Connect button. The software will retrieve the data provided for the project, as shown below. Information concerning agents will be taken from the selected DiversityAgents database, and the project selected within this database. The GFBio portal does not enforce roles for all the agents entered. If you want to add a role for agents where no role has been given in the GFBio portal, you may choose a default.

If you get an error message about missing identifier types, please open Administration - Identifier types… and insert the missing types.

If all needed types are available, click Start import to import the data either into an existing project or a new project that you may include in an existing parent project.

Diversity Collection

Import Wizard

The import wizard is the general way to import data into a Diversity Workbench module database. It allows the import from tab separated text files (tsv) into the database tables. The key import steps, the definition of a mapping from tab seperated columns in the text file to the coresponding database table columns, is designed in the import wizard. As the mapping can be sometimes cumbersom to develop, the import wizard allows to save the mapping for repeated import of equaly structured tsv files.

The examples below are from the module DiversityAgents, but are valid for any other module as well.

With the current solution please ensure that there are no concurrent imports in the same database.

With this import routine, you can import data from text files (as tab-separated lists) into the database. A short introduction is provided in a video . Choose Data → Import → Wizard → Agent from the menu. A window as shown below will open that will lead you through the import of the data. The window is separated in 3 areas. On the left side, you see a list of possible data related import steps according to the type of data you choose for the import. On the right side you see the list of currently selected import steps. In the middle part the details of the selected import steps are shown.

Choosing the File and Settings

File: As a first step, choose the File from where the data should be imported. The currently supported format is tab-separated text. Choosing a file will automatically set the default directory for the import files. To avoid setting this directory, deselect the option Adapt default directory in the context menu of the button to open the file.
Encoding: Choose the Encoding of the file, e.g. Unicode. The preferred encoding is UTF8.

Encoding changed

Starting with version 4.6 the default encoding changed to UTF8. Please use Windows for old ANSI encodings. The program will try to detect the encoding and warn you if it does not match the current encoding. The detection of the encoding depends on the presence of a BOM and does not work for every file, e.g. ANSI-encoding
Lines: The Start line and End line will automatically be set according to your data. You may change these to restrict the data lines that should be imported. The not imported parts in the file are indicated as shown below with a gray background. If the
First line: The option First line contains the column definition decides if this line will not be imported.
Duplicates: To avoid duplicate imports you can Use the default duplicate check - see a video for an explanation.
Language: If your data contains e.g. date information where notations differ between countries (e.g. 31.4.2013 - 4.31.2013), choose the Language / Country to ensure a correct interpretation of your data.
Line break: With the option Translate \r\n to line break the character sequence \r\n in the data will be translated in a line break in the database.
SQL statements: To save all SQL statements that are generated during a test or import, you can check the option Record all SQL statements.
Schema: Finally you can select a prepared Schema (see chapter Schema below) for the import.

Choosing the data ranges

In the selection list on the left side of the window (see below) all possible import steps for the data are listed according to the type of data you want to import.

The import of certain tables can be paralleled. To add parallels click on the add button (see below). To remove parallels, use the button. Only selected ranges will appear in the list of the steps on the right (see below).

To import information of logging columns like who created and changed the data, click on the include logging columns button in the header line. This will include additional substeps for every step containing the logging columns (see below). If you do not import these data, they will be automatically filled by default values like the current time and user.

Attaching data

You can either import your data as new data or Attach them to data in the database. Select the import step Attachment from the list. All tables that are selected and contain columns at which you can attach data are listed (see below). Either choose the first option Import as new data or one of the columns the attachment columns offered like SeriesCode in the table Series in the example below.

If you select a column for attachment, this column will be marked with a blue background (see below and chapter Table data).

Merging data

You can either import your data as new data or Merge them with data in the database. Select the import step Merge from the list. For every table you can choose between Insert, Merge, Update and Attach (see below).

The Insert option will import the data from the file independent of existing data in the database.

The Merge option will compare the data from the file with those in the database according to the Key columns (see below). If no matching data are found in the database, the data from the file will be imported. Otherwise the data will be updated.

The Update option will compare the data from the file with those in the database according to the Key columns. Only matching data found in the database will be updated.

The Attach option will compare the data from the file with those in the database according to the Key columns. The found data will not be changed, but used as a reference data in depending tables.

Empty content will be ignored e.g. for the Merge or Update option. To remove content you have to enter the value NULL. As long as the column will allow emty values, the content will be removed using the NULL value.

Table data

To set the source for the columns in the file, select the step of a table listed underneath the Merge step. All columns available for importing data will be listed in the central part of the window. In the example shown below, the first column is used to attach the new data to data in the database.

A reminder in the header line will show you which actions are still needed to import the data into the table:

Please select at least one column = No column has been selected so far.
Please select at least one decisive column = If data will be imported depends on the content of decisive columns, so at least one must be selected.
Please select the position in the file = The position in the file must be given if the data for a column should be taken from the file.
Please select at least one column for comparison = For all merge types other than insert columns for comparison with data in the database are needed.
From file or For all = For every you have to decide whether the data are taken from the file or a value is entered for all
Please select a value from the list = You have to select a value from the provided list
Please enter a value = You have to enter a value used for all datasets

The handling of the columns in described in the chapter columns.

Testing

- To test if all requirements for the import are met use the Testing step. You can use a certain line in the file for your test and then click on the Test data in line: button. If there are still unmet requirements, these will be listed in a window as shown below.

If finally all requirements are met, the testing function will try to write the data into the database and display any errors that occurred as shown below. All datasets marked with a red background, produced some error.

To see the list of all errors, double click in the error list window in the header line (see below).

If finally no errors are left, your data are ready for import. The colors in the table nodes in the tree indicate the handling of the datasets:

INSERT
MERGE
UPDATE,
No difference
Attach
No data

The colors of the table columns indicate whether a column is decisive , a key column or an attachment column .

If you suspect, that the import file contains data already present in the database, you may test this and extract only the missing lines in a new file. Choose the attachment column (see chapter Attaching data) and click on the button Check for already present data. The data already present in the database will be marked red (see below). Click on the button Save missing data as text file to store the data not present in the database in a new file for the import. The import of agents contains the option Use default duplicate check for AgentName that is selected by default. To ensure the employment of this option the column AgentName must be filled according to the generation of the name by the insert trigger of the table Agent (InheritedNamePrefix + ' ' + Inheritedname + ', ' + GivenName + ' ' + GivenNamePostfix + ', ' + InheritedNamePostfix + ', ' + AgentTitle - for details, see the documentation of the database).

If you happen to get a file with a content as shown below, you may have seleted the wrong encoding or the encoding is incompatible. Please try to save the original file as UTF8 and select this encoding for the import.

Import

- With the last step you can finally start to import the data into the database. If you want to repeat the import with the same settings and data of the same structure, you can save a schema of the current settings (see below). You optionally can include a description of your schema and with the button you can generate a file containing only the description.

Schedule for import of tab-separated text files into DiversityAgents

Target within DiversityAgents: Agent
Database version: 02.01.13
Schedule version: 1
Use default duplicate check: ✔
Lines: 2 - 7
First line contains column definition: ✔
Encoding: UTF8
Language: US

Lines that could not be imported will be marked with a red background while imported lines are marked green (see below).

If you want to save lines that produce errors during the import in a separate file, use the Save failed lines option. The protocol of the import will contain all settings according to the used schema and an overview containing the number of inserted, updated, unchanged and failed lines (see below).

Description

- A description of the schema may be included in the schema itself or with a click on the Import button generated as a separate file. This file will be located in a separate directory Description to avoid confusion with import schemas. An example for a description file is shown below, containing common settings, the treatment of the file columns and interface settings as defined in the schema.

Diversity Collection

Import Wizard

Columns

If the content of a file should be imported into a certain column of a table, mark it with the checkbox.

Decisive columns

The import depends on the data found in the file where certain columns can be selected as decisive. Only those lines will be imported where data are found in any of these decisive columns. To mark a column as decisive, click on the icon at the beginning of the line (see below).

In the example shown below, the file column Organims 2 was marked as decisive. Therefore only the two lines containing content in this column will be imported.

Key columns

For the options Merge, Update and Attach the import compares the data from the file with those already present in the database. This comparison is done via key columns. To make a column a key column, click on the icon at the beginning of the line. You can define as many key columns as you need to ensure a valid comparison of the data.

Source

The data imported into the database can either be taken From file or the same value that you enter into the window or select from a list can be used For all datasets. If you choose the From file option, a window as shown below will pop up. Just click in the column where the data for the column should be taken from and click OK (see below).

If you choose the For all option, you can either enter text, select a value from a list or use a checkbox for YES or NO.

Transformation

The data imported may be transformed e.g. to adapt them to a format demanded by the database. For further details please see the chapter Transformation.

Copy

If data in the source file are missing in subsequent lines as shown below,

you can use the Copy line option to fill in missing data as shown below where the blue values are copied into empty fields during the import. Click on the button to ensure that missing values are filled in from previous lines.

Prefix and Postfix

In addition to the transformation of the values from the file, you may add a pre- and a postfix. These will be added after the transformation of the text. Double-click in the field to see or edit the content. The pre- and a postfix values will only be used, if the file contains data for the current position.

Column selection

If for any reason, a column that should take its content from the imported file misses the position of the file or you want to change the position click on the button. In case a position is present, this button will show the number of the column. A window as shown below will pop up where you can select and change the position in the file.

Multi column

The content of a column can be composed from the content of several columns in the file. To add additional file columns, click on the button. A window as shown below will pop up, showing you the column selected so far, where the sequence is indicated in the header line. The first column is marked with a blue background while the added columns are marked with a green background (see below).

To remove an added column, use the button (see below).

Information

The button opens a window displaying the information about the column. For certain datatypes additional options are included (see Pre- and Postfix).

Diversity Collection

Import Wizard

Transformation

The data imported may be transformed e.g. to adapt them to a format demanded by the database. A short introduction is provided in a video . Click on the button to open a window as shown below.

Here you can enter 4 types of transformation that should be applied to your data. Cut out parts, Translate contents from the file, RegEx apply regular expressions or Replace text in the data from the file. All transformations will be applied in the sequence they had been entered. Finally, if a prefix and/or a postfix are defined, these will be added after the transformation. To remove a transformation, select it and click on the button.

Cut

With the cut transformation you can restrict the data taken from the file to a part of the text in the file. This is done by splitters and the position after splitting. In the example below, the month of a date should be extracted from the information. To achieve this, the splitter '.' is added and than the position set to 2. You can change the direction of the sequence with the button Seq starting at the first position and starting at the last position. Click on the button Test the transformation to see the result of your transformation.

Translate

The translate transformation translates values from the file into values entered by the user. In the example above, the values of the month cut out from the date string should be translated from roman into numeric notation. To do this click on the button to add a translation transformation (see below). To list all different values present in the data, click on the button. A list as shown below will be created. You may as well use the and buttons to add or remove values from the list or the button to clear the list. Then enter the translations as shown below.

For columns with a lookup table as source, you can select the values from the source that will then be translated into e.g. the key of the source as shown in the example below

Use the save button to save entries and the Test the transformation button to see the result.

To load a predefined list for the transformation use the button. A window as shown below will open. Choose the encoding of the data in your translation source, if the first line contains the column definition and click on the open button to open a file. Click OK to use the values from the file for the translation.

Regular expression

The RegEx transformation using regular expressions will transform the values according to the entered Regular expression and Replace by vales. For more details please see documentations about regular expressions.

Replacement

The replacement transformation replaces any text in the data by a text specified by the user. In the example shown below, the text "." is replaced by "-".

Calculation

The Σ calculation transformation performs a calculation on numeric value, dependent on an optional condition. In the example below, 2 calculations were applied to convert 2-digit values into 4 digit years.

Filter

The filter transformation compares the values from the file with a value entered by the user. As a result you can either Import content of column in file or Import a fixed value. To select another column that should be compared, click on the button and choose a column from the file in the window that will open. If the column that should be compared is not the column of the transformation, the number of the column will be shown instead of the symbol. To add further filter conditions use the add button. For the combination of the conditions you can choose among AND and OR.

Diversity Collection

Import Wizard

Tutorial

This tutorial demonstrates the import of a small file into the database. The following data should be imported (the example file is included in the software): At the end of this tutorial you will have imported several datasets and practiced most of the possibilities provided by the import wizard. The import is done in 2 steps to demonstrate the attachment functionality of the wizard.

Import of the collection events

Choose Data → Import → Wizard → import Specimen ... from the menu. A window as shown below will open. This will lead you through the import of the data. The window is separated in 3 areas. On the left side, you see a list of possible data related import steps according to the type of data you choose for the import. On the right side you see the list of currently selected import steps. In the middle part the details of the selected import steps are shown.

Choosing the File

As a first step, choose the File] from where the data should be imported. The currently supported format is [tab-separated text]. Than choose the Encoding of the file, e.g. Unicode. The Start line and End line will automatically be set according to your data. You may change these to restrict the data lines that should be imported. The [not imported parts] in the file are indicated as shown below with a gray background. If the First line contains the column definition this line will not be imported as well. If your data contains e.g. date information where notations differ between countries (e.g. 31.4.2013 - 4.31.2013), choose the Language / Country to ensure a correct interpretation of your data. Finally you can select a prepared Schema (see chapter Schema below) for the import.

Choosing the data ranges

In the selection list on the left side of the window (see below) all possible import steps for the data are listed according to the type of data you want to import.

Certain tables can be imported in parallel. To add parallels click on the add button (see below). To remove parallels, use the button. Only selected ranges will appear in the list of the steps on the right (see below).

To import information of logging columns like who created and changed the data, click on button in the header line. This will include a additional substeps for every step containing the logging columns (see below). If you do not import these data, they will be automatically filled by default values like the current time and user.

Attaching data

You can either import your data as new data or [Attach] them to data in the database. Select the import step [Attachment] from the list. All tables that are selected and contain columns at which you can attach data are listed (see below). Either choose the first option Import as new data or one of the columns the attachment columns offered like SeriesCode in the table Series in the example below.

If you select a column for attachment, this column will be marked with a blue backgroud (see below and chapter Table data).

Merging data

You can either import your data as new data or Merge them wih data in the database. Select the import step Merge from the list. For every table you can choose between Insert, Merge, Update and Attach (see below).

The Insert option will import the data from the file independent of existing data in the database.

The Update option will compare the data from the file with those in the database according to the Key columns. Only matching data found in the database will be updated.

Table data

To set the source for the columns in the file, select the step of a table listed underneath the Merge step. All columns available for importing data will be listed in the central part of the window. In the example shown below, the first column is used to attach the new data to data in the database.

A reminder in the header line will show you what actions are still needed to import the data into the table:

Please select at least one column = No column has been selected so far.
Please select at least one decisive column = If data will be imported depends on the content of decisive colums, so at least one must be selected.
Please select the position in the file = The position in the file must be given if the data for a column should be taken from the file.
Please select at least one column for comparision = For all merge types other than insert columns for comparision with data in the database are needed.
From file or For all = For every you have to decide whether the data are taken from the file or a value is entered for all
Please select a value from the list = You have to select a value from the provided list
Please enter a value = You have to enter a value used for all datasets

The handling of the columns in described in the chapter columns.

Testing

To test if all requirements for the import are met use the Testing step. You can use a certain line in the file for you test and than click on the Test data in line: button. If there are still unmet requirements, these will be listed in a window as shown below.

If finally all requirements are met, the testing function will try to write the data into the database and display you any errors that occurred as shown below. All datasets marked with a [red backgroud], produced some error.

To see the list of all errors, double click in the [error list window] in the header line (see below).

If finally no errors are left, your data are ready for import. The colors in the table nodes in the tree indicate the handling of the datasets: [INSERT], [MERGE], [UPDATE], [No difference]. [Attach], [No data]. The colors of the table colums indicate whether a colums is [decisive] , a [key column] or an [attachment column].

In case you get an error because you can not specify the analysis you may have to enter an analysis. Choose Administration - Analysis from the menu. If no analysis is available create a new analysis and link it to your project and the taxonomic groups that are imported. For more datails see the chapter Analysis.

Import

With the last step you can finally start to import the data into the database. If you want to repeat the import with the same settings and data of the same structure, you can save a schema of the current settings.

Lines that could not be imported will be marked with a red background while imported lines are marked green (see below).

If you want to save lines that produce errors during the import in a separate file, use the Save failed lines option. The protocol of the import will contain all settings acording to the used schema and an overview containing the number of inserted, updated, unchanged and failed lines (see below).

Import

Subsections of Import

Import GFBio

Import GFBio

Import data from GFBio online service

Diversity Collection

Import Wizard

Choosing the File and Settings

Choosing the data ranges

Attaching data

Merging data

Table data

Testing

Import

Description

Subsections of Wizard

Diversity Collection

Import Wizard

Columns

Decisive columns

Key columns

Source

Transformation

Copy

Prefix and Postfix

Column selection

Multi column

Information

Diversity Collection

Import Wizard

Transformation

Cut

Translate

Regular expression

Replacement

Calculation

Filter

Diversity Collection

Import Wizard

Tutorial

Import of the collection events

Choosing the File

Choosing the data ranges

Attaching data

Merging data

Table data

Testing

Import