Wizard Advanced Tutorial

Import wizard - tutorial for advanced functions

The second part of the import wizard tutorial is dedicated to some advanced functions of the import wizard. When data are imported from the file formats DELTA or SDD, no import mapping information is stored, because all logical references are completely satisfied within the data files. The starting point of this tutorial, which was taken from a real life example, is a database imported from a DELTA file. For the datasets a lot of pictures are available on a web server. A web application reads the data from the original database (where the DELTA file was generated) and gets the information about available pictures from a second database to display both in a browser. From the original databases several tables were extracted and now the pictures shall be appended to the imported data.

 

Overview of the data tables and necessary import steps  

Step 1 - Preparations: Data import from DELTA file and new import session  

Step 2 - Import of categorical state mapping 

Step 3 - Import of descriptor mapping 

Step 4 - Import of description mapping 

Step 5 - Import of resouces for descriptors 

Step 6 - Import of resouces for categorical states 

Step 7 - Import of resouces for descriptions 

Step 8 - Import of resouce variant 

 

Subsections of Wizard Advanced Tutorial

Wizard Advanced Tutorial Step 1

Step 1 - Preparations: Data import from DELTA file and new import session

Choose Data -> Import -> Import DELTA … (see Import DELTA file) from the menu and import the DELTA file to project “Deemy” (see below). If the original database contains special characters, e.g. the German letters “ä”, “ö” or “ü”, it is recommended to specify the export character set “Unicode” or “UTF” if the application allows that. If the character set “ANSI” or “ASCII” was used, you may try the corresponding encoding setting to get a satifactory import result. The option “Accept comma as decimal separator” was checked, because the export has been done on a German computer system, where a value like “3.14” is exported as “3,14”.

Close the window above and choose Data -> Import -> Wizard -> Organize session … from the menu. A window as shown below will open, click the New button to create a new import session. Select project “Deemy” and enter a session description. Finally click button Save to store the data (see below).

 

When you now click on button Mapping you can see that no mapping data are available (see below).

 

 

Next: Step 2 - Import of categorical state mapping 

Wizard Advanced Tutorial Step 2

Step 2 - Import categorical state mapping

In the Import session form choose Import mapping -> Descriptor … from the menu. A window as shown below opens that will lead you through the import of the descriptor mapping data. 

The only available import step Descriptor Mapping is already selected at the left side of the window. Now choose the File from where the data should be imported. Open file “Deemy_CHAR.txt”. The chosen encoding ANSI of the file is sufficient. The file column “CharName” contains the descriptor names and file column “CID” the external ID needed for the import of the categorical state mapping import (see below).

In the step table at the right side you find the import step Descriptor Mapping. Click on it and in the center window the assignemt data for the internal “object_id” and the “external_key” are displayed. In column “object_id” click on to make this the decisive column, further click on From file to select the column “CharName” as data source. Now click on the  In column “external_key” click on From file to select the column “CID” as data source. After that the columns should look as shown below.  

Remark: In the import wizards for the import mapping “object_id” allways represents the internal ID of the database. The matching database entry is searched by comparing the label of the database entry to the selected file column. If there are several descriptors (or descriptions) with identical names, the import will generate errors. For categorical states a special handling is available if the state names are not unique.

 

Testing

To test if all requirements for the import are met use the Testing step. You can use a certain line in the file for your test and than click on the Test data in line: button. If there are still unmet requirements, these will be listed in a window. In our example no error occured and the test for the first data line is shown below.

 

Import

With the last step you can start to import the data into the database. If you want to repeat the import with the same settings and data of the same structure, you can save a schema of the current settings (see below).

 

 

Append categorical state mapping

Close the import form for descriptors. In the Import session form choose Import mapping -> Categorical state … from the menu and open file “Deemy_CS.txt” (see below). 

The only available import step CategoricalState Mapping is already selected at the left side of the window. In the step table at the right side you find the import step CategoricalState Mapping, too. Click on it and in the center window the assignemt data for the internal “object_id”, the “parent_key” and the “external_key” are displayed. In column “object_id” click on to make this the decisive column, further click on From file to select the column “CharStateName” as data source. In column “parent_key” you have to specify the parent of the categorical state, i.e. the external descriptor ID. Therefore click on From file to select the column “CID” as data source. In column “external_key” click on From file to select the column “StateID” as data source. After that the columns should look as shown below.  

In the source database of this example not only the categorical states as known in DiversityDescriptions are present, but also some “pseudo states” that represent statistical measures of quantitative descriptors or the data status value “not applicable”. The real categorical states can be recognized by a numeric value in file column “CS”. In any case the import wizard check if a categorical state with the label specified in file column “CharStateName” exists in the database. Therefore let’s do a first test for some selected file lines.

 

Testing

To test if all requirements for the import are met use the Testing step. You can use a certain line in the file for your test and than click on the Test data in line: button. If there are still unmet requirements, these will be listed in a window. Perform the import test for file lines 2, 13 and 12 (see below).

 

The file line 2 refers to parent “CID=1”, which belongs to a text descriptor. The pseudo state “(internal)” was not found as a categorical state in the database, therefore not import is performend for the file line.

 

The file line 13 refers to parent “CID=3”, which belongs to a categorical descriptor. The categorical state “monopodial-pinnate” was found exactly once in the database, therefore the import test was successful.

 

The file line 12 refers to parent “CID=3”, which belongs to a categorical descriptor. But the categorical state “absent” was found 152 time in the database. Therefore it was not possible to find the correct categorical state. But the error message already gives a hint how to solve the problem: To get an unambigious match, additionally the (external) descriptor ID must be specified.

Select the import step CategoricalState Mapping and click on the button at the end of line “object_id”. Select file column “CID”, which contains the reference to the descriptor and enter the separator character | (pipe symbol) in field Pre.: of the new line. Additionally click on button in the first line of “object_id”. In the transformation window insert one replacement (button ): Replace <br> by <br /> . This transformation is neccessary, because the formatting tag “<br>” will be converted to the standardized format “<br /> during export from the original database and import from DELTA. You can check that transformation by the test functions for lines 1860 and 3555. After that the column should look as shown below.

The import test with file line 12 now gives a positive result as shown below..

 

Import

With the last step you can start to import the data into the database. If you want to repeat the import with the same settings and data of the same structure, you can save a schema of the current settings. The imported data lines are marked green (see below).

 

Next: Step 3 - Import of descriptor mapping 

Wizard Advanced Tutorial Step 3

Step 3 - Import of descriptor mapping

Close the import form for the categorical state mapping. In the Import session form choose Clear mapping -> Descriptor from the menu (see below) and answer the followind question with “Yes”. This mapping is based on data column “CID” and was needed in the previous step to append the categorical state mapping data. For the picture import the descriptor mapping based on data column “CharID” is required.

In the Import session form choose Import mapping -> Descriptor … from the menu and open file “Deemy_CHAR.txt”. The file column “CharName” contains the descriptor names and file column “CharID” the foreign ID (see below).

In the step table at the right side you find the import step Descriptor Mapping. Click on it and in the center window the assignemt data for the internal “object_id” and the “external_key” are displayed. In column “object_id” click on to make this the decisive column, further click on From file to select the column “CharName” as data source. Now click on the  In column “external_key” click on From file to select the column “CharID” as data source. After that the columns should look as shown below.  

 

Testing

To test if all requirements for the import are met use the Testing step. You can use a certain line in the file for your test and than click on the Test data in line: button. If there are still unmet requirements, these will be listed in a window. In our example no error occured and the test for the first data line is shown below.

 

Import

With the last step you can start to import the data into the database. If you want to repeat the import with the same settings and data of the same structure, you can save a schema of the current settings (see below).

 

Next: Step 4 - Import of description mapping 

Wizard Advanced Tutorial Step 4

Step 4 - Import description mapping

Close the import form for descriptors. In the Import session form choose Import mapping -> Description … from the menu and open file “Deemy_ITEM.txt” (see below). 

The only available import step Description Mapping is already selected at the left side of the window. In the step table at the right side you find the import step Description Mapping, too. Click on it and in the center window the assignemt data for the internal “object_id” and the “external_key” are displayed. In column “object_id” click on to make this the decisive column, further click on From file to select the column “ItemName” as data source. In column “external_key” click on From file to select the column “ItemID” as data source. After that the columns should look as shown below.  

 

Testing

To test if all requirements for the import are met use the Testing step. You can use a certain line in the file for your test and then click on the Test data in line: button. If there are still unmet requirements, these will be listed in a window. In our example no error occured and the test for the first data line is shown below.

 

 

Import

With the last step you can start to import the data into the database. If you want to repeat the import with the same settings and data of the same structure, you can save a schema of the current settings. The imported data lines are marked green (see below).

The failed lines are in this example caused by duplicate item names twice in the database. This problem can be fixed by renaming the ambigious entries in the database and the import file, e.g. to “Alnirhiza cystidiobrunnea + Alnus 1” and “Alnirhiza cystidiobrunnea + Alnus 2” rsp. “Lactarius omphaliformis Romagn. + Alnus 1” and “Lactarius omphaliformis Romagn. + Alnus 2”. 

 

Next: Step 5 - Import of resources for descriptors 

Wizard Advanced Tutorial Step 5

Step 5 - Import of resources for descriptors

Close the import wizard for the mapping data and the import session window. Now choose Data -> Import -> Wizard -> Import resources -> Descriptor resources … from the menu. A window as shown below will open to select an import session. Select the session for project “Deemy”.

After clicking [OK] the following window opens that will lead you through the import of the descriptor resource data. Open file “Deemy_RSC.txt” (see below).

 

Selecting the data ranges

In the selection list on the left side of the window all possible import steps for the data are listed according to the type of data you want to import. The step Descriptor is already selected, additionally check the step Descriptor resource (see below).

We attach the descriptor resource values to the descriptors, therefore we will not change anything in the descriptor but will attach data. In import step Attachment at the right side select Descriptor id (see below). 

Select the import step Merge from the list. For Descriptor we select the Attach option because this tables shall not be changed, for the other step Insert should already be selected, because a new entry has to be inserted (see below).

In the step table at the right side you find the import steps Descriptor and Descriptor resource and below them the data groups of the import steps. Deselect every column from import step Descriptor except “id”. Mark the “id” column as Key column for comparison during attachment and click on From file to select the column “CharID” as data source. The “id” column of import step Descriptor now looks as shown below.

In the import step Descriptor resource click on Resource ID and in the center window the assignemt data for the resource id (“id”) are displayed. Click on to make this the decisive column, further click on From file to select the column “Resource” as data source. After that the column should look as shown below.  

Click on Resource name. The center window shows the data column “label”. Click on From file in the “label” line to select file column “Resource”. After the resource number the value in data column “Caption” shall be inserted, included in brackets, if it is present. Click on the button at the end of line “label” and select column “Caption”. Enter  ( (blank and opening bracket) in field Pre.: and ) in field Post.: of the new line. After that the column should look as shown below.

Finally click on Sequence number. In the center window select the data column “display_order”, click on From file and select file column “DisplayOrder” (see below).

 

Testing

To test if all requirements for the import are met use the Testing step. The test for the first data line is shown below.

 

Import

With the last step you can start to import the data into the database. If you want to repeat the import with the same settings and data of the same structure, you can save a schema of the current settings. There are 86 lines that were not imported due to duplicate entries (see below).

The failed lines are caused by duplicate entries, i.e. the resource was already imported for the descriptor.

 

Next: Step 6 - Import of resources for categorical states 

Wizard Advanced Tutorial Step 6

Step 6 - Import of resources for categorical states

Close the import wizard for the descriptor resources. Now choose Data -> Import -> Wizard -> Import resources -> State resources … from the menu, select the session for project “Deemy”. The following window opens that will lead you through the import of the categorical state resource data. Open file “Deemy_RSC.txt” (see below).

 

Selecting the data ranges

In the selection list on the left side of the window all possible import steps for the data are listed according to the type of data you want to import. Deselect the step Descriptor, it is not needed since the categorical states have been assigned unambiguous external IDs in step 3. Check the steps Categorical state and State resource (see below).

We attach the state resource values to the categorical states, therefore we will not change anything in the categorical state but will attach data. In import step Attachment at the right side select Categorial state id (see below). 

Select the import step Merge from the list. For Categorical state we select the Attach option because this tables shall not be changed, for the other step Insert should already be selected, because a new entry has to be inserted (see below).

In the step table at the right side you find the import steps Categorical state and State resource and below them the data groups of the import steps. Deselect every column from import step Categorical state except “id”. Mark the “id” column as Key column for comparison during attachment and click on From file to select the column “StateID” as data source. The “id” column of import step Categorical state now looks as shown below.

In the import step State resource click on Resource ID and in the center window the assignemt data for the resource id (“id”) are displayed. Click on to make this the decisive column, further click on From file to select the column “Resource” as data source. After that the column should look as shown below.  

Click on Resource name. The center window shows the data column “label”. Click on From file in the “label” line to select file column “Resource”. After the resource number the value in data column “Caption” shall be inserted, included in brackets, if it is present. Click on the button at the end of line “label” and select column “Caption”. Enter  ( (blank and opening bracket) in field Pre.: and ) in field Post.: of the new line. After that the column should look as shown below.

Finally click on Sequence number. In the center window select the data column “display_order”, click on From file and select file column “DisplayOrder” (see below).

 

Testing

To test if all requirements for the import are met use the Testing step. The test for the second data line is shown below.

 

Import

With the last step you can start to import the data into the database. If you want to repeat the import with the same settings and data of the same structure, you can save a schema of the current settings. There are 561 lines that were not imported due to duplicate entries (see below).

The failed lines are caused by duplicate entries, i.e. the resource was already imported for the categorical state.

 

Next: Step 7 - Import of resources for descriptions 

Wizard Advanced Tutorial Step 7

Step 7 - Import of resources for descriptions

Close the import wizard for the state resources. Now choose Data -> Import -> Wizard -> Import resources -> Description resources … from the menu, select the session for project “Deemy”. The following window opens that will lead you through the import of the categorical state resource data. Open file “Deemy_RSC.txt” (see below).

 

Selecting the data ranges

In the selection list on the left side of the window all possible import steps for the data are listed according to the type of data you want to import. Step Description is already selected. Aditionally check step  Description resource (see below).

We attach the description resource values to the descriptions, therefore we will not change anything in the description but will attach data. In import step Attachment at the right side select Description id (see below). 

Select the import step Merge from the list. For Description we select the Attach option because this tables shall not be changed, for the other step Insert should already be selected, because a new entry has to be inserted (see below).

In the step table at the right side you find the import steps Description and Description resource and below them the data groups of the import steps. Deselect every column from import step Description except “id”. Mark the “id” column as Key column for comparison during attachment and click on From file to select the column “ItemID” as data source. The “id” column of import step Description now looks as shown below.

In the import step Description resource clickon Resource ID and in the center window the assignment data for the resource id (“id”) are displayed. Click on to make this the decisive column, further click on From file to select the column “Resource” as data source. After that the column should look as shown below.  

Click on Resource name. The center window shows the data column “label”. Click on From file in the “label” line to select file column “Resource”. After the resource number the value in data column “Caption” shall be inserted, included in brackets, if it is present. Click on the button at the end of line “label” and select column “Caption”. Enter  ( (blank and opening bracket) in field Pre.: and ) in field Post.: of the new line. After that the column should look as shown below.

Finally click on Sequence number. In the center window select the data column “display_order”, click on From file and select file column “DisplayOrder” (see below).

 

Testing

To test if all requirements for the import are met use the Testing step. The test for the data line 717 is shown below.

 

Import

With the last step you can start to import the data into the database. If you want to repeat the import with the same settings and data of the same structure, you can save a schema of the current settings. There are 177 lines that were not imported due to duplicate entries (see below).

The failed lines are caused by duplicate entries, i.e. the resource was already imported for the description.

 

Next: Step 8 - Import of resource variants 

Wizard Advanced Tutorial Step 8

Step 8 - Import of resource variants

The import wizards used in step 5 up to step 7 allow appending a resource variant to one resource. Those wizards can be used most efficiently if the data that are needed for the resource table and the resource variant are located at the same file. In our example there is the complication that the direction of the resource reference is in the opposite direction than in the original database. In DiversityDescription a resource references e.g. a descriptor and one or more resource variants reference the resource. In the original database several entities, e.g. descriptors or states, may reference the same picture.

During the import of the resources we used the picture number as the external key of the resources. Together with their parent key, e.g. a descriptor ID, this gives unambiguous entries although the external resource ID alone is ambiguous. Now we want to create a resource variant, containing the URL of the picture, for each resource entry with the same external resource ID.

Since this “multiple” import is no standard feature of the import wizard, the following description shows a work-around: During the import the first resource entry with a matching “Resource ID” that is not referenced by any Resource variant will be available for data update and appending of a new resource variant. A repeated import with the same setting will find the next resource entry and so on until all ambigious resource entries are processed. 

Close the import wizard for the description resources. Now choose Data -> Import -> Wizard -> Import resources -> Resource variants … from the menu, select the session for project “Deemy”. The following window opens that will lead you through the import of the categorical state resource data. Open file “Deemy_IMG.txt” (see below).

 

Selecting the data ranges

In the selection list on the left side of the window all possible import steps for the data are listed according to the type of data you want to import. The available steps Update resource and Resource variant are already selected.

We want to update some fields of the resource table with values form the data file attach the resource variant to the resource. In import step Attachment at the right side select Update resource id (see below). Note: With this import wizard only update of resources is supported. 

Select the import step Merge from the list. For Update resource we keep the Merge option because this table shall be updated, for the other step Insert should already be selected, because a new entry has to be inserted (see below).

In the import step Update resource click on Resource ID and in the center window the assignment data for the resource id (“id”) are displayed. Click on to make this the decisive column. Mark the “id” column as Key column for comparison during attachment and click on From file to select the column “PID” as data source. After that the column should look as shown below.

Click on Resource name. The center window shows the data column “label” and “detail”. Deselect the “label” entry and select “detail”. Click on From file to select the column “SourceTitle” as data source and enter Source:  in field Pre.: (double-click in the field to open a separate edit window). Now click on the button at the end of line “detail”, select file column “Volume” and enter , vol.  in field Pre.:. Repeat the last step for file columns “Pages” (, p. ) and “ReferenceNotes” (, notes: ). After that the column should look as shown below. 

Click on Resource rights and in the center window the assignment data for the resource rights are displayed. Select “rights_text”. Click on From file to select the column “Author” as data source and enter ©  (Alt+0169 and a blank) in field Pre.:. Now click on the button at the end of line “rights_text”, select file column “DateYear” and enter ,  in field Pre.:. After that the column should look as shown below. 

In the import step Resource variant click on Resource link. The center window shows the data column “url”. Click on to make this the decisive column and on From file in the “url” line to select file column “FileName”. Double-click on the text box after Pre.: to open a separate edit window. Here enter the web address of the picture server where the files are located and confirm with “OK”. After that the column should look as shown below.

Click on Variant type. In the center window select the data column “variant_id”, click on For all: and select the value “good quality” (see below).

Click on the import step Resource variant to find some ungrouped fields. In the center window select the data column “pixel_width”, click on From file and select the value “WidthD”. Now select the data column “pixel_height”, click on From file and select the value “HeightD”. Finally select the data column “mime_type”, click on From file and select the value “FileName”. Click on button to define a transformation. In the transformation window click on the cut transformation, enter Position: 2, click on to enter splitter character . (period) to extract the file extension. Now click on to insert a translation table and to insert the values contained in the file column. “gif” shall be converted to image/gif, “jpg”  will become image/jpeg (see below).

After that the columns should look as shown below.

 

Testing

To test if all requirements for the import are met use the Testing step. The test for the first data line is shown below.

 

Import

With the last step you can start to import the data into the database. If you want to repeat the import with the same settings and data of the same structure, you can save a schema of the current settings. As mentioned in the introduction, the import step has to be repeated until no more resource variant is imported. At the first run 789 lines were imported (see below).

At the second run, started by another click on Start import, 152 lines were imported (see below).

Finally, at the seventh run no further line is imported (see below).

 

Wizard Advanced Tutorial Step Overview

Overview of the data tables and necessary import steps

From the original database several tables have been extracted that contain the descriptor names, categorical state names and description names, together with their internal IDs in the foreign database. Additionally there is a table that assigns picture IDs to the IDs of descriptors, categorical states and descriptions. The last table connects the picture IDs to file names. In DiversityDescriptions resources are represented by the tablese “Resource”, which holds some general information and is linked to descriptors, categorical states or descriptions. Table “Resource variant” holds the URL of the resources and each table row is assigned to one entry in table “Resource”.

Find below a part of the table “Deemy_RSC.txt”, which corresponds quite well to the table “Resource” in DiversityDescriptions. It references either to a description (“ItemID”), a descriptor (“CharID”) or a categorical state (“StateID”). 

The value in column “Resource” corresponds to column “PID” of the table “Deemy_IMG.txt” (see below), where the picture file name is specified. Since all pictures are accessible over a URL containing that file name, this table can be used for import to data table “Resource variant” in DiversityDescriptions. 

To import the picture data, first the data in table “Deemy_RSC.txt” must be appended to the existing descriptors, categorical states and descriptions. Then the data from table “Deemy_IMG.txt” must be appended to the resource entries. Since the basic data are imported from a DELTA file, no mapping information have been stored, which are needed to append the resource data. Therefore at first the mapping information must be imported from three additional tables. 

 

Mapping data

To allow appending of resource data to the existing database objects, we first must create the mapping information of the external IDs of the foreign database to the actual IDs in DiversityDescriptions. Find below the table “Deemy_Char.txt”, which contains the descriptor name (“CharName”), the internal “CharID” and an external “CID”. 

For the picture import each descriptor must be mapped to its “CharID”, which can be done by a special mapping import available in the Import session form. When we now take a look at the “Deemy_CS.txt” (see below), which contains the categorical state data, we discover a problem: The categorical states contain the required “StateID”, but they are connected to their descriptors by the value “CID”, not “CharID”.

This problem can be solved by importing the descriptor mapping twice: First the descriptor mapping is imported by using the “CID” and the categorical states are appended to the descriptors. Then the descriptor mapping is cleared and imported again, this time using the final value from column “CharID”.

The last table is “Deemy_Item.txt”, which contains the mapping information for the descriptions. Here the data column “ItemID” must be mapped to the descriptions (see below).

 

Next: Step 1 - Preparations: Data import from DELTA file and new import session