Wizard Tutorial Step Epilogue

Epilogue

When you close the import wizard and start a query for descriptions of project “Agricultural survey” you will find the three datasets and the imported descriptor data (see image below).

Finnally two more aspects of the imports wizard shall be discussed from a retrospective view. The first one concerns the mapping of external and internal keys and the role of the import session. The second one takes a closer look on the role of the “ID” columns during import.

Mapping of external and internal keys

When opening the import wizard you have to select rsp. create an import session. Imports into Diversity Descriptions usually require at least two import operations, e.g. for descriptors and descriptions. The description data reference descriptors or categorical states. Within the database those relations are built based on numeric values that are provided by the database during creation of the corresponding objects. In the external data files the relations are usually built by numbers coordinated by the user (“QuestionNumber”) or by the entity names.

The import session stores the external and internal key values in separate database tables and therefore builds a bracket around the different import operations. Each import session is assigned to one project, but for each project several import sessions may be created. The mapping data may be viewed by opening the menu item Data -> Import -> Wizard -> Organize sessions …, selecting the session and clicking the button Mapping (see image below).

Selecting import columns for the “ID” fields

As an addition to the tutorial steps a closer look on the role of the “ID” fields shall be taken. In principle the most important IDs during import concern the Descriptor ID and the Categorical state ID during descriptor import. To decide which file column shall be used for that values during import, it is important to know how these elements are referenced in the other files.

For the descriptor import, you should take a look at the description data table (see above), which is part of the tutorial example. The descriptor is referenced by column “QestionNumber”, which matches homonymous column of the descriptor data table (see below). Therefore the natural approach is to use this column as input for the Descriptor ID during the descriptor import. Since in most practical cases the descriptors will have a numbering column, which is used in the referencing table. Surely more variety exists in the way the categorical states are listed in the descriptor data file and the way they are referenced by the description data file.

In the tutorial the first complication is that the possible states are all concatenated, separated by a semicolon, into a single column of the descriptor data file. This causes some effort in the transformation, because the states have to be splitted into the single values. The question is, what is the Categorical state ID? The answer can be found in the upper table, because the state name is explicitely mentioned in the description data file as reference. I.e. for the descriptor import the state name must be used for the Categorical state ID, too.

In Diversity Descriptions the categorical state names must be unique in relation to their descriptor. But different descriptors may have states with the same names. In our example this situation occures with the two boolean descriptors (states “Yes” and “No”) and the state value “Others”, wich is used by two descriptors. Therefore it is generally recommended to specify the descriptor for the import of categorical summary data as demonstrated in the tutorial.