Cache Database Transfer

Transfer of the data

To transfer the data you have several options:

  • Single transfer : Transfer data of a single project
  • Transfer via bcp : Transfer data of the data via bcp
  • Bulk transfer : Transfer data of all projects and sources selected for the schedule based transfer

With the resp. button you can decide if the data should be checked for updates. If this option is active ( ) the program will compare the contents and decide if a transfer is needed. If a transfer is needed, this will be indicated with a red border of the transfer button . If you transferred only a part of the data this will be indicated by a thin red border for the current session . The context menu of the button View differences will show the accession numbers of the datasets with changes after the last transfer (see below).

 

Embargo

If an embargo has been defined in DiversityProjects, a message will be shown (see below) and you well get a warning if you start a transfer. The automatic transfer for projects with an embargo will be blocked.

 

Competing transfer

If a competing transfer is active for the same step, this will be indicated as shown below. While this transfer is active, any further transfer for this step will be blocked.

If this competing transfer is due to e.g. a crash and is not active any more, you have to get rid of the block to preceed with the transfer of the date. To do so you have to reset the status of the transfer. Check the scheduler as shown below. This will activate the button.

Now click on the button to open the window for setting the scheduler options as shown below.

To finally remove the block by the [Active transfer], click on the button. This will remove the block and you can preceed with the transfer of the data.

 

Single transfer

To transfer the data for a certain project, click on the button in the Cache- or Postgres data range (see below).

A window as shown below will open, where all data ranges for the transfer will be listed. With the  button you can set the timeout for the transfer of the data where 0 means infinite and is recommended for large amounts of data. With the button you can switch on resp. of the generation of a report. To stop the execution in case of an error click on the button. The button will change from to and the execution will stop in case of an error. Click on the Start transfer button to start the transfer.

After the data are transferred, the number and data are visible as shown below.

After the data are transferred successful transfers as indicated by an error by . The reason for the failure is shown if you click on the button. For the transfer to the Postgres database the number in the source and the target will be listed as shown below indicating deviating numbers of the data. For the detection of certain errors it may help to activate the logging as described in the chapter TransferSettings.

 

To inspect the first 100 lines of the transferred data click on the button.

 

Bulk transfer

To transfer the data for all projects selected for the schedule based transfer, click on the button in the cache- or Postgres data range (see below).

Together with the transfer of the data, reports will be generated and stored in the reports directory. Click on the button in the Timer area to open this directory. To inspect data in the default schemas (dbo for SQL-Server and public for Postgres ) outside the project schemata, use the buttons shown in the image above.

 

Transfer as background process

Transfer data from database to cache database and all Postgres databases

To transfer the data as a background process use the following arguments:

  • CacheTransfer
  • Server of the SQL-server database
  • Port of SQL-server
  • Database with the source data
  • Server for Postgres cache database
  • Port for Postgres server
  • Name of the Postgres cache database
  • Name of Postgres user
  • Password of Postgres user

For example:

C:\DiversityWorkbench\DiversityCollection> DiversityCollection.exe CacheTransfer snsb.diversityworkbench.de 5432 DiversityCollection 127.0.0.1 5555 DiversityCollectionCache PostgresUser myPostgresPassword

The application will transfer the data according to the settings, generate a protocol as described above and quit automatically after the transfer of the data. For an introduction see a short tutorial . The user starting the process needs a Windows authentication with access to the SQL-Server database and proper rights to transfer the data. The sources and projects within DiversityCollection will be transferred according to the settings (inclusion, filter, days and time). The transfer will be documented in report files. Click on the button to access these files. For a simulation of this transfer click on the Transfer all data according to the settings button at the top of the form. This will ignore the time restrictions as defined in the settings and will start an immediate transfer of all selected data.

To generate transfer reports and document every step performed by the software during the transfer of the data use a different first argument:

  • CacheTransferWithLogging
  • ...

C:\DiversityWorkbench\DiversityCollection> DiversityCollection.exe CacheTransferWithLogging snsb.diversityworkbench.de 5432 DiversityCollection 127.0.0.1 5555 DiversityCollectionCache PostgresUser myPostgresPassword

To transfer only the data from the main database into the cache database use a different first argument:

  • CacheTransferCacheDB
  • ...

To transfer only the data from the cache database into the postgres database use a different first argument:

  • CacheTransferPostgres
  • ...

The remaining arguments correspond to the list above. The generated report files are located in the directory .../ReportsCacheDB and the single steps are witten into the file DiversityCollectionError.log.

History

For every transfer of the data along the pipeline, the settings (e.g. version of the databases) the number of transferred data together with the execution time and additional information are stored. Click on the  button in the respective part to get a list of all previous transfers together with these data (see below).

Infrastructure

Sources

The transfer of the data from the main databases into the cache database is handled via views and stored procedures in the cache database that are generated by the program. The views provide the data in the main databases while the procedures refer to these views to copy the data from the tables in the main database into the cache database. The views are listed in the table [Module]Source where [Module] is the Name of the module without the leading “Diversity”.

As an example for the module DiversityAgents the cache database contains a view Agents_TNT_TaxRefagents where the source is the module DiversityAgents, the server is TNT and the project is TaxRefagents. The views for a source are listed in the table AgentSource. This view refers to the corresponding data in the module. The procedure procTransferAgent selects data in the view and copies the data into the table Agent.

For source objects the schema is dbo.

Projects

The transfer of the data from the main database into the cache database is handled via views and stored procedures in the cache database that are generated by the program. The views provide the data in the main database while the procedures refer to these views to copy the data from the tables in the main database into the cache database.

As an example the cache database contains a view ViewAnalysis that selects data in the table Analysis in the main database. The procedure [Project].procPublishAnalysis refers to the view and copies the data into the table [Project].CacheAnalysis where [Project] is the schema of the project.

The transfers together with information on the versions, settings etc. are listed in the table ProjectTransfer.

Subsections of Transfer

Cache Database Transfer

Transfer of the data triggered by a batch command

Transfer as background process

Transfer data from database to cache database and all Postgres databases

The server has to start the application with a Windows account with access to the database server and the databases with the following permissions/roles:

  • DiversityCollection:
    • Role CacheAdmin
  • DiversityProjects:
    • Role DiversityWorkbenchUser
  • Additional sources, e.g. DiversityTaxonNames:
    • In every database from which data should be transferred: Role DiversityWorkbenchUser or corresponding roles

 

 

To transfer the data as a background process use the following arguments:

  • CacheTransfer
  • Server of the SQL-server database
  • Port of SQL-server
  • Database with the source data
  • Server for Postgres cache database
  • Port for Postgres server
  • Name of the Postgres cache database
  • Name of Postgres user
  • Password of Postgres user

For example:

C:\\DiversityWorkbench\\DiversityCollection\> DiversityCollection.exe CacheTransfer snsb.diversityworkbench.de 5432 DiversityCollection 127.0.0.1 5555 DiversityCollectionCache postgres myPassword

The application will transfer the data according to the settings, generate a protocol as described above and quit automatically after the transfer of the data. For an introduction see a short tutorial . The user starting the process needs a Windows authentication with access to the SQL-Server database and proper rights to transfer the data. The sources and projects within DiversityCollection will be transferred according to the settings (inclusion, filter, days and time). The transfer will be documented in report files. Click on the button to access these files. For a simulation of this transfer click on the Transfer all data according to the settings  button at the top of the form.

 

Cache Database

Transfer Settings

Settings for the transfer of the projects in the cache database

To edit the general settings for the transfer, click on the button in the main form. A window as shown below will open. Here you can set the timeout for the transfer in minutes. The value 0 means that no time limit is set and the program should try infinite to transfer the data.

The transfer via bcp does not rely on these numbers and uses a much faster way of transfer, yet its needs detailed configuration. If you are connected to a postgres database, the transfer directory and the bash file for conversion and import can be set in the last line.

 

Logging

For the detection of certain errors it may help to log the events of the transfer by activating the logging: . The logging is set per application, not per database. So to detect errors in a transfer started by a scheduler on a server, you have to activate the logging in the application started by the server. The log is written into the error log of the application. To see the content of the log either use the Protocol section or in the main window select Help - ErrorLog from the menu.

 

Stop on error

In case of an error during the transfer you can stop the transfer: . The stop in case of an error is set per application, not per database. So to stop the transfer in case of errors in a transfer started by a scheduler on a server, you have to activate this in the application started by the server.

 

Scheduled transfer

The scheduled transfer is meant to be launched on a server on a regular basis, e.g. once a week, once a day, every hour etc.. The transfer of the data via the scheduled transfer will take place according to the settings. This means the program will check if the next planned time for a data transfer is passed and only than start to transfer the data. To include a source in the schedule, check the selector for the scheduler. To set the time and days scheduled for a transfer, click on the button. A window as shown below will open where you can select the time and the day(s) of the week when the transfer should be executed.

The planned points in time are shown in the form as shown below.

The protocol of the last transfer can be seen as in the window above or if you click on the button. If an error occurred this can be inspected with a click no the button.

If another transfer on the same source has been started, no further transfer will be started. In the program this competing transfer is shown as below.

You can remove this block with a click on the button. In opening window (see below) click on the button. This will as well remove error messages from previous transfers.

A further option for restriction of the transfers is the comparison of the date when the last transfer has been executed. Click on the button to change it to . In this state the program will compare the dates of the transfers and execute the transfer only if new data are available.

 

 

 

 

 

Diversity Collection

Transfer Protocol

The protocol of any transfer is written into the error log of the application. To examine the messages of the transfers within the cache database you may for every single step click on the button in the transfer window in case an error occurred. In the main window the protocol for all transfers is shown in the Protocol  part. A click on the button requeries resp. shows the protocol. The option will show the line numbers of the protocol and the option will restrict the output to failures and errors stored in the protocol. By default both options are seleted and the number of lines is restricted to 1000. In case of longer protocols change the number of lines for the output. The button will show the protocol in the default editor of your computer. The button will clear the protocol.

 

Cache Database

Transfer Via BCP

Transfer of data from SQL-Server to Postgres via bcp

To transfer large amounts of data from the SQL-Server database to a Postgres database the normal path is clearly too slow. For these cases an alternative option is provided using the transfer in a csv file on a shared folder on the server where the Postgres databases are hosted.   

If the shared directory, mount point and the bash file are set (see below - on first tab [Update, Sources] ) the batch transfer can be activated. Click e.g. on the button to set the values. The window that will open offers a list with existing entries for the respecitive value. Please ask the postgres server administrator for details.

To use the batch transfer for a project, click in the checkbox as shown below.

The image for the transfer will change from to . Now the data of every table within the project will be transferred via bcp. To return to the standard transfer, just deselect the checkbox.   

To stop the execution in case of an error click on the button in the window that will open after you clicked the button to start the transfer. The button will change from to and the execution will stop in case of an error. The objects created for the transfer (csv file, tables) of the data will not be removed to enable an inspection. 

Details about the setup are described in the chapter Setup for thetransfer via bcp.

Details about the export are described in the chapter Export using thetransfer via bcp.  

 

 

Subsections of Transfer Via BCP

Cache Database

Transfer Via BCP Export

Export using the transfer via bcp

Overview

The image shows an overview of the steps described below. The numbers in the arrows refer to the described steps.   

 

Export

Initialization of the batch export

In the shared directory folders will be created according to .../<Database>/<Schema>. The data will be exported into a csv file in the created schema (resp. project) folder in the shared directory. The initialization of the export is performed by the procedure procBpcInitExport (see below and [ 1 ] in the overview image above). 

Creation of view as source for the batch export

To export the data into the file a view (see below) will be created transforming the data in the table of the SQL-Server database according to the requirements of the export (quoting and escaping of existing quotes). The creation of the view is performed by the procedure procBcpViewCreate (see above and [ 2 ] in the overview image above).

The views provide the data from the SQL-Server tables in the sequence as defined in the Postgres tables and perform a reformatting of the string values (see example below).

CASE WHEN \[AccessionNumber\] IS NULL THEN NULL ELSE \'\"\' + REPLACE(\[AccessionNumber\], \'\"\', \'\"\"\') + \'\"\' END AS \[AccessionNumber\]

 

Export of the data in to a csv file in the transfer directory

To data will be exported using the procedure procBcpExort (see above and [ 3 ] in the overview image above) into a csv file in the directory created in the shared folder (see below).

Creation of a table as target for the intermediate storage of the data

For the intermediate storage of the data, a temporary table (..._Temp) is created (see below and [ 4 ] in the overview image above). This table is the target of the bash import described below.

 

Bash conversion and import of the data in to the intermediate storage table

As Postgres accepts only UTF-8 without the Byte OrderMark (BOM) the exported csv file must be converted into UTF-8 without BOM. For this purpose there are scripts provided for every Windows-SQL Server instance (/database/exchange/bcpconv_INSTANCE). These scripts accept the UTF-16LE file that should be converted as an argument in dependence of the name of the instance, e.g. for INSTANCE 'devel':

/database/exchange/bcpconv_devel DiversityCollectionCache_Test/Project_GFBio202000316SNSB/CacheCollectionSpecimen.csv

The scripts are designed as shown below (for INSTANCE 'devel'):

#!/bin/bash iconv -f UTF16LE -t UTF-8 /database/exchange/devel/\$1 \|sed \'1s/\^\\xEF\\xBB\\xBF//\' \| tr -d \'\\r\\000\'

AS a first step  iconv converts the file from UTF-16LE to UTF-8.

AS next step  sed removes the Byte Order Mark (BOM).

AS final step  tr  removes NULL characters.

Import with COPY

The script above is used as source for the import in Postgres using the psql COPY command as shown in the example below (for INSTANCE 'devel').

COPY \"Project_GFBio202000316SNSB\".\"CacheCollectionSpecimen_Temp\" FROM PROGRAM \'bash /database/exchange/bcpconv_devel DiversityCollectionCache_Test/Project_GFBio202000316SNSB/CacheCollectionSpecimen.csv\' with delimiter E\'\\t\' csv;

The options set the tab sign as delimiter:   with delimiter E\'\\t\'  and csv as format of the file: csv .

Within the csv file empty fields are taken as NULL-values and quotes empty strings "" are taken as empty string. All strings must be included in quotation marks ("...") and quotation marks (") within the strings must be replaced by 2 quotation marks ("") - see example below. This conversion is performed in the view described above. 

any\"string  &rarr;  \"any\"\"string\"

Bash conversion and Import with COPY relate to [ 5 ] in the overview image above. The COPY command is followed by a test for the existance of the created file.

 

Backup of the data from the main table

Before removing the old data from the main table, these data are stored in a backup table (see below and [ 6 ] in the overview image above). 

 

Removing data from the main table

After a backup was created and the new data are ready for import, the main table is prepared and as first step the old data are removed (see [ 7 ] in the overview image above).

 

Getting the primary key of the main table

To enable a fast entry of the data, the primary key must be removed. There the definition of this key must be stored for recreation after the import (see below and [ 8 ] in the overview image above). 

 

Removing the primary key from the main table

After the definition of the primary key has been extracted from the table, the primary key is removed (see below and [ 9 ] in the overview image above). 

 

Inserting data from the intermediate storage into the main table and clean up

After the data and the primary key has been removed, the new data are transferred from the temporary table into the main table (see [ 10 ] in the overview image above) and the number is compared to the number in the temporary table to ensure the correct transfer. 

In case of a failure the old data will be restored from the backup table (see [ 11 ] in the overview image above). 

After the data have been imported the primary key is restored (see [ 12 ] in the overview image above). Finally the intermediate table and the backup table (see [ 13 ] in the overview image above) and the csv file (see [ 14 ] in the overview image above) are removed. 

 

 

 

Cache Database

Transfer Via BCP Setup

Setup for the transfer via bcp

There are several preconditions to the export via bcp.  

bcp

The export relies on the external bcp software that must have been installed on the database server.To test if you have bcp installed, start the command lind and type bcp. If bcp is installed you get a result like shown below.

 

SQL-Server

The transfer via bcp is not possible with SQL-Server Express editions. To use this option you need at least SQL-Server Standard edition.

To enable the transfer the XP-cmd shell must be enabled on the SQL-Server. Choose Facets from the context menu as shown below.

In the window that will open change XPCmdShellEnabled to True as shown below.

 

Login cmdshell

The update script for the cache database will create a login cmdshell used for the transfer of the data (see below). This login will get a random password and connecting to the server is denied. This login is needed to execute the transfer procedures (see below) and should not be changed. The login is set to disabled as shown below.

Stored procedures

Furthermore 4 procedures and a table for the export of the data in a shared directory on the Postgres server are created (see below).

  

 

Login PostgresBulkExport

You have to create a Windows login (e.g. PostgresBulkExport) on the server with full access to the shared directory and set this account as the server proxy:

Open Windows Administrative Tools - Computer Management - System-Tools - Local Users and Groups
left klick on folder Users and select New User...

In the window that will open enter the details for the user and create the user (e.g. PostgresBulkExport) and click ** OK** to create the user.

In the File Explorer select the shared folder where the files from the export will be stored. From the context menu of the shared folder (e.g. u:\\postgres) select Properties In the Tab security click on button  Edit… In the window that will open click on button  Add… Type or select the User (e.g. PostgresBulkExport) and click OK

Select the new user in the list and in Permissons check Full Controll at Allow Leave with Button OK Close Window with Button OK

Go through the corresponding steps for the share (e.g. u:) to ensure access.

As an alternative you may use SQL to great the Login:

CREATE LOGIN \[DOMAIN\\LOGIN\] FROM WINDOWS;

where DOMAIN is the domain for the windows server and LOGIN is the name of the windows user.

exec sp_xp_cmdshell_proxy_account \'DOMAIN\\LOGIN\',\'[?password?]\';

where DOMAIN is the domain for the windows server, LOGIN is the name of the windows user and PASSWORD is the password for the login.

After the user is created and the permissions are set, set this user as cmdshell proxy (see below)

creat a new login:

  • Open Sql Server Managment Studio \
  • select Security \
  • select Logins \
    • in the context menu select New Login …
    • in the window that will open, click on the Search... button
      • Type in object name (e.g.: PostgresBulkExport)\
      • click on the Button Check Names  (name gets expanded with domain name)\
      • click button OK twice to leave the window and create the new login

in the context menu select Properties and include the login in the users of the cache database (see below)  

On the database server add this user as new login and add it to the cache database. This login must be in the database role CacheUser to be able to read the data that should be exported (see below).

 

Login PostgresTransfer

Another login (e.g. PostgresTransfer) on the Windows server is needed with read only access to the shared directory to read the exported data and import them into the Postgres database. This login is used by the Postgres server to access the shared directory. It does not need to have interactive login permissions.

 

Shared directory on Windows-server

On the Windows side the program needs an accessible shared directory in which sub-directories will be created and into which the files will be exported. This directory is made accessible (shared folder) to CIFS/SMB with read only access for the login used for the data transfer (e.g. PostgresTransfer). This shared directory is mounted on the Postgres server using CIFS (/database/exchange/INSTANCE). The INSTANCE is an identifier/server-name for the windows server. If more windows server deliver data to one postgres server these INSTANCE acronyms need to be unique for each windows server.

On the Windows-server this directory is as available as "postgres". The directory will be mounted depending on the INSTANCE under the path /database/exchange/INSTANCE. The corresponding fstab entries for the INSTANCEs 'bfl and 'devel' are:

/etc/fstab:
//bfl/postgres /database/exchange/bfl cifsro,username=PostgresTransfer,password=[[?password?]]
//DeveloperDB/postgres /database/exchange/devel cifsro,username=PostgresTransfer,password=[[?password?]]
command meaning
[?Instance?]/postgres is the UNC path to the shared folder
/database/exchange/[?Instance?] is local path on the postgres server where to mount the shared folder
PostgresTransfer is the windows user name to access the shared folder
?password? is the password for the windows user to access the shared folder

These mounts are needed for every Windows-server resp. SQL-Server instance. Instead of fstab you may use systemd for mounting.

 

Bugfixing and errorhandling

If you get the following error:

An error occurred during the execution of xp_cmdshell. A call to \'LogonUserW\' failed with error code: \'1326\'.

This means normally, that the passwort is wrong, but could also be a policy problem. First recrate the proxy user with the correct password:

use \[Master\];
-- Erase Proxy user:
exec sp_xp_cmdshell_proxy_account Null;
go

-- create proxy user:
exec sp_xp_cmdshell_proxy_account \'[?domain?]\\PostgresBulkExport\',\'[?your password for PostgresBulkExport?]\';\
go

And if this does not work, change the policy: Add proxy user in the group to run as batch job:

Open the Local Security Policy

  • Local Policies
    • User Rights Assignment

Log on as a batch job Add user or group ... Type user in object name field (e.g. PostgresBulkExport) Leave Local Security Policy editor by clicking Button OK twice

For further information see https://www.databasejournal.com/features/mssql/xpcmdshell-for-non-system-admin-individuals.html

NOT FOR PRODUCTION!

Setup of a small test routine.

Remove cmdshell account after your tests and recreate it with the CacheDB-update-scripts from DiversityCollection (Replace [?marked values?] with the values of your environment):

USE \[master\];
CREATE LOGIN \[cmdshell\] WITH PASSWORD = \'[?your password for cmdshell?]\',
CHECK_POLICY=OFF;
CREATE USER \[cmdshell\] FOR LOGIN \[cmdshell\];
GRANT EXEC ON xp_cmdshell TO \[cmdshell\];\

-- setup proxy
CREATE LOGIN \[[?domain?]\\PostgresBulkExport\] FROM WINDOWS;
exec sp_xp_cmdshell_proxy_account\
\'[?domain?]\\PostgresBulkExport\',\'[?your password for PostgresBulkExport?]\';\

USE \[[?DiversityCollection cache database?]\];
CREATE USER \[cmdshell\] FOR LOGIN \[cmdshell\];

-- allow execution to non privileged users/roles
grant execute on \[dbo\].\[procCmdShelltest\] to \[CacheUser\];
grant execute on \[dbo\].\[procCmdShelltest\] to \[[?domain?]\\AutoCacheTransfer\];


-- recreate Proxy User:

exec sp_xp_cmdshell_proxy_account Null;
go

-- create proxy user:
exec sp_xp_cmdshell_proxy_account '[?domain?]\\PostgresBulkExport\',\'[?your password for PostgresBulkExport?]\';
go

-- Make a test:
create PROCEDURE \[dbo\].\[procCmdShelltest\]
with execute as \'cmdshell\'  
AS
SELECT user_name();  \-- should always return \'cmdshell\'
exec xp_cmdshell \'dir\';
go

execute \[dbo\].\[procCmdShelltest\];
go

-- Should return two result sets
-- \'cmdshell\'
-- and the filelisting of the current folder


-- Check Proxy user:
-- The proxy credential will be called ##xp_cmdshell_proxy_account##.

select credential_identity from sys.credentials where name = '##xp_cmdshell_proxy_account##\';

-- should be the assigned windows user, e.g. [?domain?]\\PostgresBulkExport

 

Settings for the bulk transfer

To enable the bulk transfer via bcp enter the path of the directory on the Postgres server, the mount point of the postgres server and the bashfile as shown below.

To use the batch transfer for a project, click in the checkbox as shown below.

The image for the transfer will change from to . Now the data of every table within the project will be transferred via bcp. To return to the standard transfer, just deselect the checkbox.   

 

Role pg_execute_server_program

The user executing the transfer must be in the role pg_execute_server_program.

 

Cache Database Restrictions

Restrictions for the data transfer into the cache database

The restrictions of the published data are defined in the main database via projects , data withholding and embargos .

In the cache database further restrictions can be set for every project and include

The collectors may be anonymized. For the localisation systems it is possible to restrict the precision of the coordinates. To set these restrictions, click on the  button. A window as shown below will open.

Restrict the localisation systems , the site properties, the taxonomic groups , the material categories , the analysis and by moving the entries between the Not published and Published list. For an overview see a short tutorial .

Specimen

A filter for the specimen can be added in the first tab by the creation of a restriction applied on the specimen within a project as shown below and explained in the video Video starten.

Define the filter with the options described below. Test the restiction with the button and save it with the button. The restrictions will be converted into a SQL statement as shown below that will be applied in the filter for the transfer (see below). With the button you can remove the filter.

  • order by: The values that should be displayed in the result list.
  • /: Hide resp. show the area for the restriction.
  • : Start query according to the Query conditions
  • : Convert the Query conditions into a restriction
  • Project: Select projects that should be included in the restriction
    • + or |: If projects should be combined with AND resp. OR
    • : Add projects to the list for the restriction as shown below:
      • : Remove project from the list
  • Specimen Accession number: Filter for the accession number of the specimen
  • Storage: If a specimen part should be present
  • Transaction: If a transaction should be present
  • Image: If a specimen image should be present