Diversity Collection
Import Wizard
The import wizard is the general way to import data into a Diversity Workbench module database. It allows the import from tab separated text files (tsv) into the database tables. The key import steps, the definition of a mapping from tab seperated columns in the text file to the coresponding database table columns, is designed in the import wizard. As the mapping can be sometimes cumbersom to develop, the import wizard allows to save the mapping for repeated import of equaly structured tsv files.
The examples below are from the module
DiversityAgents, but are valid for any other module as well.
With the current solution please ensure that there are no concurrent imports in the same database.
With this import routine, you can import data from text files (as
tab-separated lists) into the database. A short introduction is
provided in a video
.
Choose Data →
Import
→
Wizard →
Agent
from the menu. A window as shown below will open that will lead you
through the import of the data.
The window is separated in 3 areas. On the left side, you see a list of possible data related import steps according to the type of data you choose for the import. On the right side you see the list of currently selected import steps. In the middle part the details of the selected import steps are shown.

Choosing the File and Settings
- File: As a first step, choose the
File from where the data should be imported. The currently supported format is tab-separated text. Choosing a file will automatically set the default directory for the import files. To avoid setting this directory, deselect the option
Adapt default directory in the context menu of the button to open the file.
- Encoding: Choose the Encoding of the file, e.g. Unicode. The preferred encoding is UTF8.
- Lines: The Start line and End line will automatically be set
according to your data. You may change these to restrict the data lines
that should be imported. The not imported
parts in the file are indicated as shown below with a gray background. If the
- First line: The option
First line contains the column definition decides if this line will not be imported.
- Duplicates: To avoid duplicate imports you can
Use the default duplicate check - see a video
for an explanation.
- Language: If your data contains e.g. date information where notations differ between countries (e.g. 31.4.2013 - 4.31.2013), choose the Language / Country to ensure a correct interpretation of your data.
- Line break: With the option
Translate \r\n to line break the character sequence \r\n in the data will be translated in a line break in the database.
- SQL statements: To save all SQL statements that are generated during a test or import, you can check the option
Record all SQL statements. 
- Schema: Finally you can select a prepared Schema (see chapter Schema below) for the import.

Choosing the data ranges
In the selection list on the left side of the window (see below) all
possible import steps for the data are listed according to the type of
data you want to import.

The import of certain tables can be paralleled. To add parallels click
on the
add button (see below). To remove parallels, use
the
button. Only selected ranges will appear in
the list of the steps on the right (see below).

To import information of logging columns like who created and changed
the data, click on the
include logging columns button in the header line. This will include additional substeps for every step containing the
logging columns (see below). If you do not import these data, they will
be automatically filled by default values like the current time and
user.

Attaching data
You can either import your data as new data or
Attach
them to data in the database. Select the import step
Attachment
from the list. All tables that are selected and contain columns at which
you can attach data are listed (see below). Either choose the first
option
Import as new data or one of the
columns the attachment columns offered like SeriesCode in the table
Series in the example below.

If you select a column for attachment, this column will be marked with a
blue background (see below and chapter Table data).

Merging data
You can either import your data as new data or
Merge them with data in the
database. Select the import step
Merge from the list. For
every table you can choose between
Insert,
Merge,
Update and
Attach (see below).
The
Insert option will import the data
from the file independent of existing data in the database.
The
Merge option will compare the data
from the file with those in the database according to the
Key columns (see below). If no matching data are
found in the database, the data from the file will be imported.
Otherwise the data will be updated.
The
Update option will compare the data
from the file with those in the database according to the
Key columns. Only matching data found in the
database will be updated.
The
Attach option will compare the data from
the file with those in the database according to the
Key columns. The found data will not be changed, but used as a
reference data in depending tables.

Empty content will be ignored e.g. for the
Merge or
Update option. To remove
content you have to enter the value NULL. As long as the column will
allow emty values, the content will be removed using the NULL value.
Table data
To set the source for the columns in the file, select the step of a
table listed underneath the
Merge step. All
columns available for importing data will be listed in the central part
of the window. In the example shown below, the first column is used to
attach the new data to data in the database.

A reminder in the header line will show you which actions are still
needed to import the data into the table:
- Please select at least one column
= No
column has been selected so far.
- Please select at least one decisive column
= If data will be imported depends on the content of decisive
columns, so at least one must be selected.
- Please select the position in the file
=
The position in the file must be given if the data for a column
should be taken from the file.
- Please select at least one column for comparison
= For all merge types other than insert columns
for comparison with data in the database are needed.
- From file or For all
= For every you
have to decide whether the data are taken from the file or a value
is entered for all
- Please select a value from the list
= You have
to select a value from the provided list
- Please enter a value
= You have to enter
a value used for all datasets
The handling of the columns in described in the chapter
columns.
Testing
- To test if all requirements for the import are met use the
Testing step. You can use a certain line in
the file for your test and then click on the Test data in line: button. If there are still
unmet requirements, these will be listed in a window as shown below.

If finally all requirements are met, the testing function will try to
write the data into the database and display any errors that occurred as
shown below. All datasets marked with a red
background, produced some error.

To see the list of all errors, double click in the error list
window in the header line (see
below).

If finally no errors are left, your data are ready for import. The
colors in the table nodes in the tree indicate the handling of the
datasets:
- INSERT
- MERGE
- UPDATE,
- No difference
- Attach
- No data
The colors of the table columns indicate whether a column is decisive
, a key column
or an attachment column
.
If you suspect, that the import file contains data already present in
the database, you may test this and extract only the missing lines in a
new file. Choose the attachment column (see chapter Attaching data) and
click on the button Check for already present data. The data already
present in the database will be marked red
(see below). Click on the button
Save missing data as text file
to store the
data not present in the database in a new file for the import. The
import of agents contains the option
Use
default duplicate check for AgentName that is selected by default. To
ensure the employment of this option the column AgentName must be filled
according to the generation of the name by the insert trigger of the
table Agent (InheritedNamePrefix + ' ' + Inheritedname + ', ' +
GivenName + ' ' + GivenNamePostfix + ', ' + InheritedNamePostfix +
', ' + AgentTitle - for details, see the
documentation of the database).

If you happen to get a file with a content as shown below, you may have
seleted the wrong encoding or the encoding is incompatible. Please try
to save the original file as UTF8 and select this encoding for the
import.

Import
- With the last step you can finally start to import the data into the
database. If you want to repeat the import with the same settings and
data of the same structure, you can save a schema of the current
settings (see below). You optionally can include a description of your
schema and with the
button you can
generate a file containing only the description.
Schedule for import of tab-separated text files into DiversityAgents
- Target within DiversityAgents: Agent
- Database version: 02.01.13
- Schedule version: 1
- Use default duplicate check: ✔
- Lines: 2 - 7
- First line contains column definition: ✔
- Encoding: UTF8
- Language: US
Lines that could not be imported will be marked with a red background
while imported lines are marked green (see below).

If you want to save lines that produce errors during the import in a
separate file, use the Save failed lines option. The protocol of the
import will contain all settings according to the used schema and an
overview containing the number of inserted, updated, unchanged and
failed lines (see below).

Description
- A description of the schema may be included in the schema itself or with
a click on the
Import button generated as a
separate file. This file will be located in a separate directory
Description to avoid confusion with import schemas. An example for a
description file is shown below, containing common settings, the
treatment of the file columns and interface settings as defined in the
schema.
Subsections of Wizard
Diversity Collection
Import Wizard
Columns

If the content of a file should be imported into a certain column of a
table, mark it with the
checkbox.
Decisive columns
The import depends on the data found in the file where certain columns
can be selected as decisive. Only those lines will be imported where
data are found in any of these
decisive columns. To mark a column as
decisive, click on the
icon at the beginning of the line (see below).

In the example shown below, the file column Organims
2 was marked as decisive. Therefore only the
two lines containing content in
this column will be imported.

Key columns
For the options
Merge,
Update and
Attach the import compares the data from the file with those already
present in the database. This comparison is done via key columns.
To make a column a key column, click on the
icon at
the beginning of the line. You can define as many key columns as you
need to ensure a valid comparison of the data.
Source
The data imported into the database can either be taken
From file or the same value that you
enter into the window or select from a list can be used
For all datasets. If you choose the
From file option, a window as shown below will pop up. Just click in
the column where the data for the column should be taken from and click
OK (see below).

If you choose the
For all option, you
can either enter text, select a value from a list or use a
checkbox for YES or NO.
The data imported may be
transformed e.g. to adapt them to a format
demanded by the database. For further details please see the chapter
Transformation.
Copy
If data in the source file are missing in subsequent lines as shown
below,

you can use the
Copy line option to fill in
missing data as shown below where the blue
values are copied into empty fields during the
import. Click on the
button to ensure that
missing values are filled in from previous lines.

Prefix and Postfix
In addition to the transformation of the values from the file, you may
add a pre- and a postfix. These will be added after the transformation
of the text. Double-click in the field to see or edit the content. The
pre- and a postfix values will only be used, if the file
contains data for the current position.
Column selection
If for any reason, a column that should take its content from the
imported file misses the position of the file or you want to change the
position click on the
button. In case a
position is present, this button will show the number of the column. A
window as shown below will pop up where you can select and change the
position in the file.

Multi column
The content of a column can be composed from the content of several
columns in the file. To
add additional file columns, click on the
button. A window as shown below will pop up, showing
you the column selected so far, where the sequence is indicated in the
header line. The first column is
marked with a blue background while the added
columns are marked with a green
background (see below).

To remove an added column, use the
button (see
below).

The
button opens a window displaying the
information about the column. For certain datatypes additional options
are included (see Pre- and Postfix).
Diversity Collection
Import Wizard
The data imported may be transformed e.g. to adapt them to a format
demanded by the database. A short introduction is provided in a video
.
Click on the
button to open a window as shown
below.

Here you can enter 4 types of transformation that should be applied to
your data.
Cut out parts,
Translate contents from the file, RegEx
apply regular expressions or
Replace text in the
data from the file. All transformations will be applied in the sequence
they had been entered. Finally, if a prefix and/or a postfix are
defined, these will be added after the transformation. To remove a
transformation, select it and click on the
button.
Cut
With the
cut transformation you can restrict the
data taken from the file to a part of the text in the file. This is done
by splitters and the position after splitting. In the example below, the
month of a date should be extracted from the information. To achieve
this, the splitter '.' is added and than the position set to 2. You
can change the direction of the sequence with the button
Seq starting at the first position and
starting at the last position. Click on
the button Test the transformation to see the result of your
transformation.

Translate
The
translate transformation translates
values from the file into values entered by the user. In the example
above, the values of the month cut out from the date string should be
translated from roman into numeric notation. To do this click on the
button to add a translation transformation
(see below). To list all different values present in the data, click on
the
button. A list as shown below will be created.
You may as well use the
and
buttons to add or remove values from the list or the
button to clear the list. Then enter the
translations as shown below. Use the
save button to
save entries and the Test the transformation button to see the
result.

To load a predefined list for the transformation use the
button. A window as shown below will open.
Choose the encoding of the data in your translation source, if the first
line contains the column definition and click on
the
open button to open a file. Click OK to use
the values from the file for the translation.

Regular expression
The RegEx transformation using regular expressions will transform the values
according to the entered Regular expression and Replace by
vales. For more details please see documentations about regular
expressions.

Replacement
The
replacement transformation replaces any text in the data by a text
specified by the user. In the example shown below, the text "." is
replaced by "-".

Calculation
The Σ calculation transformation performs a calculation on numeric value,
dependent on an optional condition. In the example below, 2 calculations
were applied to convert 2-digit values into 4 digit years.

Filter
The
filter transformation compares the values from the file with a value
entered by the user. As a result you can either
Import content of column in file
or
Import a fixed value. To select
another column that should be compared, click on the
button and choose a column from the file in
the window that will open. If the column that should be compared is not
the column of the transformation, the number of the column will be shown
instead of the
symbol. To add further filter
conditions use the
add button. For the combination of
the conditions you can choose among AND and OR.

Diversity Collection
Import Wizard
Tutorial
This tutorial demonstrates the import of a small file into the database.
The following data should be imported (the example file is included in
the software): At the end of this tutorial you will have imported
several datasets and practiced most of the possibilities provided by the
import wizard. The import is done in 2 steps to demonstrate the
attachment functionality of the wizard.
Import of the collection events
Choose Data → Import →
Wizard →
import Specimen ... from the menu. A window
as shown below will open. This will lead you through the import of the
data. The window is separated in 3 areas. On the left side, you see a
list of possible data related import steps according to the type of data
you choose for the import. On the right side you see the list of
currently selected import steps. In the middle part the details of the
selected import steps are shown.

Choosing the File
As a first step, choose the
File] from where the data should be imported. The
currently supported format is [tab-separated text]. Than choose
the Encoding of the file, e.g. Unicode. The Start line and End
line will automatically be set according to your data. You may change
these to restrict the data lines that should be imported. The [not
imported parts] in
the file are indicated as shown below with a gray background. If the
First line contains the column definition
this line will not be imported as well. If your data contains e.g. date
information where notations differ between countries (e.g. 31.4.2013 -
4.31.2013), choose the Language / Country to ensure a correct
interpretation of your data. Finally you can select a prepared
Schema (see chapter Schema below) for the import.

Choosing the data ranges
In the selection list on the left side of the window (see below) all
possible import steps for the data are listed according to the type of
data you want to import.

Certain tables can be imported in parallel. To add parallels click on
the
add button (see below). To remove parallels, use the
button. Only selected ranges will appear in the
list of the steps on the right (see below).

To import information of logging columns like who created and changed
the data, click on
button in the header line.
This will include a additional substeps for every step containing the
logging columns (see below). If you do not import these data, they will
be automatically filled by default values like the current time and
user.

Attaching data
You can either import your data as new data or
[Attach]
them to data in the database. Select the import step
[Attachment]
from the list. All tables that are selected and contain columns at which
you can attach data are listed (see below). Either choose the first
option
Import as new data or one of the
columns the attachment columns offered like SeriesCode in the table
Series in the example below.

If you select a column for attachment, this column will be marked with a
blue backgroud (see below and chapter Table data).

Merging data
You can either import your data as new data or
Merge them wih data in the
database. Select the import step
Merge from the list. For
every table you can choose between
Insert,
Merge,
Update and
Attach (see below).
The
Insert option will import the data
from the file independent of existing data in the database.
The
Merge option will compare the data
from the file with those in the database according to the
Key columns (see below). If no matching data are
found in the database, the data from the file will be imported,
otherwise the data will be updated..
The
Update option will compare the data
from the file with those in the database according to the
Key columns. Only matching data found in the
database will be updated.
The
Attach option will compare the data from
the file with those in the database according to the
Key columns. The found data will not be changed, but used as a
reference data in depending tables.

Table data
To set the source for the columns in the file, select the step of a
table listed underneath the Merge step. All columns available for
importing data will be listed in the central part of the window. In the
example shown below, the first column is used to attach the new data to
data in the database.

A reminder in the header line will show you what actions are still
needed to import the data into the table:
- Please select at least one column
= No
column has been selected so far.
- Please select at least one decisive column
= If data will be imported depends on the content of decisive
colums, so at least one must be selected.
- Please select the position in the file
=
The position in the file must be given if the data for a column
should be taken from the file.
- Please select at least one column for comparision
= For all merge types other than insert columns
for comparision with data in the database are needed.
- From file or For all
= For every you
have to decide whether the data are taken from the file or a value
is entered for all
- Please select a value from the list
= You have
to select a value from the provided list
- Please enter a value
= You have to enter
a value used for all datasets
The handling of the columns in described in the chapter
columns.
Testing
To test if all requirements for the import are met use the
Testing step. You can use a certain line in
the file for you test and than click on the Test data in line:
button. If there are still unmet requirements, these will be listed in a
window as shown below.

If finally all requirements are met, the testing function will try to
write the data into the database and display you any errors that
occurred as shown below. All datasets marked with a [red
backgroud], produced some error.

To see the list of all errors, double click in the [error list
window] in the header line (see
below).

If finally no errors are left, your data are ready for import. The
colors in the table nodes in the tree indicate the handling of the
datasets: [INSERT],
[MERGE],
[UPDATE], [No
difference].
[Attach], [No
data]. The colors of the table colums
indicate whether a colums is [decisive] , a [key
column] or an [attachment
column].
In case you get an error because you can not specify the
analysis you may have to enter an analysis.
Choose Administration - Analysis from the menu. If no analysis is
available create a new analysis and link it to your project and the
taxonomic groups that are imported. For more datails see the chapter
Analysis.
Import
With the last step you can finally start to import the data into the
database. If you want to repeat the import with the same settings and
data of the same structure, you can save a schema of the current
settings.
Lines that could not be imported will be marked with a red background
while imported lines are marked green (see below).

If you want to save lines that produce errors during the import in a
separate file, use the Save failed lines option. The protocol of the
import will contain all settings acording to the used schema and an
overview containing the number of inserted, updated, unchanged and
failed lines (see below).
