Import
There are two options to import data:
There are two options to import data:
For a short introduction see
.
In the menu choose Data - Import -
GFBio to open a window as shown below. Enter
the login data (User + Password) and the Key of the project.
To retrieve the data click on the Connect
button. The software will retrieve the data provided for the project as
shown below. Information concerning agents will be taken from the
selected DiversityAgents database and the project selected within this
database. The GFBio portal does not enforce roles for all the agents
enter. If you want to add a role for agents where no role has been given
in the GFBio portal, you may choose a default.
If you get an error message about missing identifier types, please open Administration - Identifier types… and insert the missing types.
If all needed types are available click Start
import to import the data either into an existing project or a new
project that you may include in an existing parent project.
The examples below are from the module DiversityAgents, but are valid for any other module as well.
With the current solution please ensure that there are no concurrent imports in the same database.
For some imports like e.g. for Collections in
DiversityCollection you will be reminded to update the cache tables for the hierarchies.
With this import routine, you can import data from text files (as
tab-separated lists) into the database. A short introduction is
provided in a video
.
Choose Data →
Import
→
Wizard →
Agent
from the menu. A window as shown below will open that will lead you
through the import of the data. The window is separated in 3 areas. On
the left side, you see a list of possible data related import steps
according to the type of data you choose for the import. On the right
side you see the list of currently selected import steps. In the middle
part the details of the selected import steps are shown.
In the selection list on the left side of the window (see below) all possible import steps for the data are listed according to the type of data you want to import.
The import of certain tables can be paralleled. To add parallels click
on the button (see below). To remove parallels, use
the
button. Only selected ranges will appear in
the list of the steps on the right (see below).
To import information of logging columns like who created and changed
the data, click on the include logging columns button in the header line. This will include additional substeps for every step containing the
logging columns (see below). If you do not import these data, they will
be automatically filled by default values like the current time and
user.
You can either import your data as new data or
Attach
them to data in the database. Select the import step
Attachment
from the list. All tables that are selected and contain columns at which
you can attach data are listed (see below). Either choose the first
option
Import as new data or one of the
columns the attachment columns offered like SeriesCode in the table
Series in the example below.
If you select a column for attachment, this column will be marked with a blue background (see below and chapter Table data).
You can either import your data as new data or
Merge them with data in the
database. Select the import step
Merge from the list. For
every table you can choose between
Insert,
Merge,
Update and
Attach (see below).
The Insert option will import the data
from the file independent of existing data in the database.
The Merge option will compare the data
from the file with those in the database according to the
Key columns (see below). If no matching data are
found in the database, the data from the file will be imported.
Otherwise the data will be updated.
The Update option will compare the data
from the file with those in the database according to the
Key columns. Only matching data found in the
database will be updated.
The Attach option will compare the data from
the file with those in the database according to the
Key columns. The found data will not be changed, but used as a
reference data in depending tables.
Empty content will be ignored e.g. for the
Merge or
Update option. To remove
content you have to enter the value NULL. As long as the column will
allow emty values, the content will be removed using the NULL value.
To set the source for the columns in the file, select the step of a
table listed underneath the Merge step. All
columns available for importing data will be listed in the central part
of the window. In the example shown below, the first column is used to
attach the new data to data in the database.
A reminder in the header line will show you which actions are still needed to import the data into the table:
The handling of the columns in described in the chapter columns.
- To test if all requirements for the import are met use the
Testing step. You can use a certain line in
the file for your test and then click on the Test data in line: button. If there are still
unmet requirements, these will be listed in a window as shown below.
If finally all requirements are met, the testing function will try to write the data into the database and display any errors that occurred as shown below. All datasets marked with a red background, produced some error.
To see the list of all errors, double click in the error list window in the header line (see below).
If finally no errors are left, your data are ready for import. The colors in the table nodes in the tree indicate the handling of the datasets:
The colors of the table columns indicate whether a column is decisive
, a key column
or an attachment column
.
If you suspect, that the import file contains data already present in
the database, you may test this and extract only the missing lines in a
new file. Choose the attachment column (see chapter Attaching data) and
click on the button Check for already present data. The data already
present in the database will be marked red
(see below). Click on the button
Save missing data as text file to store the
data not present in the database in a new file for the import. The
import of agents contains the option
Use
default duplicate check for AgentName that is selected by default. To
ensure the employment of this option the column AgentName must be filled
according to the generation of the name by the insert trigger of the
table Agent (InheritedNamePrefix + ' ' + Inheritedname + ', ' +
GivenName + ' ' + GivenNamePostfix + ', ' + InheritedNamePostfix +
', ' + AgentTitle - for details, see the
documentation of the database).
If you happen to get a file with a content as shown below, you may have seleted the wrong encoding or the encoding is incompatible. Please try to save the original file as UTF8 and select this encoding for the import.
- With the last step you can finally start to import the data into the
database. If you want to repeat the import with the same settings and
data of the same structure, you can save a schema of the current
settings (see below). You optionally can include a description of your
schema and with the
button you can
generate a file containing only the description.
Schedule for import of tab-separated text files into DiversityAgents
Lines that could not be imported will be marked with a red background while imported lines are marked green (see below).
If you want to save lines that produce errors during the import in a separate file, use the Save failed lines option. The protocol of the import will contain all settings according to the used schema and an overview containing the number of inserted, updated, unchanged and failed lines (see below).
- A description of the schema may be included in the schema itself or with
a click on the
Import button generated as a
separate file. This file will be located in a separate directory
Description to avoid confusion with import schemas. An example for a
description file is shown below, containing common settings, the
treatment of the file columns and interface settings as defined in the
schema.
If the content of a file should be imported into a certain column of a
table, mark it with the checkbox.
The import depends on the data found in the file where certain columns
can be selected as decisive. Only those lines will be imported where
data are found in any of these
decisive columns. To mark a column as
decisive, click on the
icon at the beginning of the line (see below).
In the example shown below, the file column Organims 2 was marked as decisive. Therefore only the two lines containing content in this column will be imported.
For the options Merge,
Update and
Attach the import compares the data from the file with those already
present in the database. This comparison is done via key columns.
To make a column a key column, click on the
icon at
the beginning of the line. You can define as many key columns as you
need to ensure a valid comparison of the data.
The data imported into the database can either be taken
From file or the same value that you
enter into the window or select from a list can be used
For all datasets. If you choose the
From file option, a window as shown below will pop up. Just click in
the column where the data for the column should be taken from and click
OK (see below).
If you choose the For all option, you
can either enter text, select a value from a list or use a
checkbox for YES or NO.
The data imported may be transformed e.g. to adapt them to a format
demanded by the database. For further details please see the chapter
Transformation.
If data in the source file are missing in subsequent lines as shown below,
you can use the Copy line option to fill in
missing data as shown below where the blue
values are copied into empty fields during the
import. Click on the
button to ensure that
missing values are filled in from previous lines.
In addition to the transformation of the values from the file, you may add a pre- and a postfix. These will be added after the transformation of the text. Double-click in the field to see or edit the content. The pre- and a postfix values will only be used, if the file contains data for the current position.
If for any reason, a column that should take its content from the
imported file misses the position of the file or you want to change the
position click on the button. In case a
position is present, this button will show the number of the column. A
window as shown below will pop up where you can select and change the
position in the file.
The content of a column can be composed from the content of several
columns in the file. To add additional file columns, click on the
button. A window as shown below will pop up, showing
you the column selected so far, where the sequence is indicated in the
header line. The first column is
marked with a blue background while the added
columns are marked with a green
background (see below).
To remove an added column, use the button (see
below).
The button opens a window displaying the
information about the column. For certain datatypes additional options
are included (see Pre- and Postfix).
The data imported may be transformed e.g. to adapt them to a format
demanded by the database. A short introduction is provided in a video
.
Click on the
button to open a window as shown
below.
Here you can enter 4 types of transformation that should be applied to
your data. Cut out parts,
Translate contents from the file, RegEx
apply regular expressions or
Replace text in the
data from the file. All transformations will be applied in the sequence
they had been entered. Finally, if a prefix and/or a postfix are
defined, these will be added after the transformation. To remove a
transformation, select it and click on the
button.
With the cut transformation you can restrict the
data taken from the file to a part of the text in the file. This is done
by splitters and the position after splitting. In the example below, the
month of a date should be extracted from the information. To achieve
this, the splitter '.' is added and than the position set to 2. You
can change the direction of the sequence with the button
Seq starting at the first position and
starting at the last position. Click on
the button Test the transformation to see the result of your
transformation.
The translate transformation translates
values from the file into values entered by the user. In the example
above, the values of the month cut out from the date string should be
translated from roman into numeric notation. To do this click on the
button to add a translation transformation
(see below). To list all different values present in the data, click on
the
button. A list as shown below will be created.
You may as well use the
and
buttons to add or remove values from the list or the
button to clear the list. Then enter the
translations as shown below. Use the
save button to
save entries and the Test the transformation button to see the
result.
To load a predefined list for the transformation use the
button. A window as shown below will open.
Choose the encoding of the data in your translation source, if the first
line contains the column definition and click on
the
button to open a file. Click OK to use
the values from the file for the translation.
The RegEx transformation using regular expressions will transform the values according to the entered Regular expression and Replace by vales. For more details please see documentations about regular expressions.
The replacement transformation replaces any text in the data by a text
specified by the user. In the example shown below, the text "." is
replaced by "-".
The Σ calculation transformation performs a calculation on numeric value, dependent on an optional condition. In the example below, 2 calculations were applied to convert 2-digit values into 4 digit years.
The filter transformation compares the values from the file with a value
entered by the user. As a result you can either
Import content of column in file
or
Import a fixed value. To select
another column that should be compared, click on the
button and choose a column from the file in
the window that will open. If the column that should be compared is not
the column of the transformation, the number of the column will be shown
instead of the
symbol. To add further filter
conditions use the
button. For the combination of
the conditions you can choose among AND and OR.
This tutorial demonstrates the import of a small file into the database. The following data should be imported (the example file is included in the software): At the end of this tutorial you will have imported several datasets and practiced most of the possibilities provided by the import wizard. The import is done in 2 steps to demonstrate the attachment functionality of the wizard.
Choose Data → Import → Wizard →
import Specimen ... from the menu. A window
as shown below will open. This will lead you through the import of the
data. The window is separated in 3 areas. On the left side, you see a
list of possible data related import steps according to the type of data
you choose for the import. On the right side you see the list of
currently selected import steps. In the middle part the details of the
selected import steps are shown.
As a first step, choose the File] from where the data should be imported. The
currently supported format is [tab-separated text]. Than choose
the Encoding of the file, e.g. Unicode. The Start line and End
line will automatically be set according to your data. You may change
these to restrict the data lines that should be imported. The [not
imported parts] in
the file are indicated as shown below with a gray background. If the
First line contains the column definition
this line will not be imported as well. If your data contains e.g. date
information where notations differ between countries (e.g. 31.4.2013 -
4.31.2013), choose the Language / Country to ensure a correct
interpretation of your data. Finally you can select a prepared
Schema (see chapter Schema below) for the import.
In the selection list on the left side of the window (see below) all possible import steps for the data are listed according to the type of data you want to import.
Certain tables can be imported in parallel. To add parallels click on
the button (see below). To remove parallels, use the
button. Only selected ranges will appear in the
list of the steps on the right (see below).
To import information of logging columns like who created and changed
the data, click on button in the header line.
This will include a additional substeps for every step containing the
logging columns (see below). If you do not import these data, they will
be automatically filled by default values like the current time and
user.
You can either import your data as new data or
[Attach]
them to data in the database. Select the import step
[Attachment]
from the list. All tables that are selected and contain columns at which
you can attach data are listed (see below). Either choose the first
option
Import as new data or one of the
columns the attachment columns offered like SeriesCode in the table
Series in the example below.
If you select a column for attachment, this column will be marked with a blue backgroud (see below and chapter Table data).
You can either import your data as new data or
Merge them wih data in the
database. Select the import step
Merge from the list. For
every table you can choose between
Insert,
Merge,
Update and
Attach (see below).
The Insert option will import the data
from the file independent of existing data in the database.
The Merge option will compare the data
from the file with those in the database according to the
Key columns (see below). If no matching data are
found in the database, the data from the file will be imported,
otherwise the data will be updated..
The Update option will compare the data
from the file with those in the database according to the
Key columns. Only matching data found in the
database will be updated.
The Attach option will compare the data from
the file with those in the database according to the
Key columns. The found data will not be changed, but used as a
reference data in depending tables.
To set the source for the columns in the file, select the step of a table listed underneath the Merge step. All columns available for importing data will be listed in the central part of the window. In the example shown below, the first column is used to attach the new data to data in the database.
A reminder in the header line will show you what actions are still needed to import the data into the table:
The handling of the columns in described in the chapter columns.
To test if all requirements for the import are met use the
Testing step. You can use a certain line in
the file for you test and than click on the Test data in line:
button. If there are still unmet requirements, these will be listed in a
window as shown below.
If finally all requirements are met, the testing function will try to write the data into the database and display you any errors that occurred as shown below. All datasets marked with a [red backgroud], produced some error.
To see the list of all errors, double click in the [error list window] in the header line (see below).
If finally no errors are left, your data are ready for import. The colors in the table nodes in the tree indicate the handling of the datasets: [INSERT], [MERGE], [UPDATE], [No difference]. [Attach], [No data]. The colors of the table colums indicate whether a colums is [decisive] , a [key column] or an [attachment column].
In case you get an error because you can not specify the
analysis you may have to enter an analysis.
Choose Administration - Analysis from the menu. If no analysis is
available create a new analysis and link it to your project and the
taxonomic groups that are imported. For more datails see the chapter
Analysis.
With the last step you can finally start to import the data into the database. If you want to repeat the import with the same settings and data of the same structure, you can save a schema of the current settings.
Lines that could not be imported will be marked with a red background while imported lines are marked green (see below).
If you want to save lines that produce errors during the import in a separate file, use the Save failed lines option. The protocol of the import will contain all settings acording to the used schema and an overview containing the number of inserted, updated, unchanged and failed lines (see below).