Cache Database Infrastructure

Infrastructure for the cache database

For the administration of the data that are published via the cache database, certain tables as shown below are used. These are either placed in the schema dbo or a schema named according to the published project, e.g. Project_Test for a project with the name “Test”. Additionally some basic data are stored in dedicated tables of the main DiversityDescriptions database.

 

Tables in the main database

In the main DiversityDescriptions database there is a number of tables holding the cache database name, information about datawithholding and mapping information of the IDs. This information is needed to restore the cache database in case of loss. For the database diagram take a look at the database section of this manual.  

 

Central tables in the cache database

There are a number of tables placed in the schema dbo that are accessible by all projects.

 

Published project tables

The central published project tables contain the information about the projects that are published together with the target (Postgres) databases and the packages including optional add-ons into which they had been transferred. This information is used to ensure a recovery in case of a loss of the targets.

 

 

Source tables

To access sources from other modules (e.g. DiversityReferences) there are tables for the storage of the principal access to the modules and a number of tables containing the data (depending on the module).

Access tables

These tables contain the principal access like the name of the view defined to access the data. The example below lists the tables defined for the module DiversityReferences, but there are corresponding tables for every module accessed by the cache database. 

Data tables

These tables contain the data provided by the module and therefore depend on the module. The example below lists the tables defined for the module DiversityReferences, but there are corresponding tables for every module accessed by the cache database. Since data from various source databases may be acumulated in the cache database, in general all the data tables include the BaseURL as part of their keys to ensure unambiguousness.  

To access the data in the source database for the module views are generated by the client. The name of these views are composed according to the name of the database and the project to ensure a unique name. Furthermore letters are appended to identify subordinated tables. These are stored in the table “<module>Source” and are used by the client for a transfer of the data from the module database into the tables in the cache database. The example below lists the view names for the module DiversityReferences. In this example the source database “DiversityReferences_Test” and project “DALI” result in the main table name. By appending “_R” the view name for subordinated table “ReferenceRelator” is built. This gives the views References_Test_DALI and References_Test_DALI_R.

 

Project tables in the cache database

These tables contain the data of the projects with every project having its own schema. The tables correspond to the tables in the main database of the module with according the following assignment. In the third columns the views of the cache database are listed to access the DiversityDescriptions data. The view access besides the main tables listed in the second table column and the ID mapping tables. For the summary data (CacheDescription) additionally subordinated tables, e.g. Modifier, are accessed to resolve relations in DiversityDescriptions as simple character strings.

 

Besides the tables mentioned above, the auxilliary tables ProjectLockedDescriptor and ProjectLockedScope contain the descriptor IDs and scope types that shall be excluded from the transfer to the cache database. The auxilliary tables ProjectPublishedTranslation and ProjectPublishedDescriptorTree contain language codes of translations (columns label, wording and detail of source tables Descriptor, CategoricalState and Description) and the descriptor tree IDs that shall be included in the cache database transfer. Together with the extended query parameter, which are stored in the columns FilterCommand and FilterParameter of the table ProjectPublished, they build the transferrestrictions of the cache database. Finally, in table CacheMetadata some data from the DiversityProjects database are stored.

The main tables CacheItem, CacheCharacter and CacheState have a numeric key (ItemID, CharID and StateID), which is identical to the unique key in the main database. However, in the cache database the main adress attributes are IID, CID and CS. CID and SD are in principle the descriptor and categorical state sequence numbers, where the mapping algorith guarantees unique ascending values. In this adressing schema a single state is identified by the combination of CID and CS.

Additionally in table CacheState the recommended statistical measures of quantitative descriptors are inclueded, where the measure code (e.g. “Min”, “Max” or “Mean”) is inserted in CS. In table CacheDescription, which holds the single descriptor or data status values, the CID and CS are specified for a specific categorical state. For quantitative data in CS the measurement type is identified by the measure code. For text and molecular sequence data CS is supplied with the fixed texts “TE” and “MS”. In case of descriptor status data CS is set NULL. Instead the data status code is inserted in column Status.  

 

Project procedures for the data transfer into the project tables

For every project table there is a set of procedures that transfers the data from the main database into the cache table. The names of these procedures are procPublish + the name of the target table without “Cache” e.g. procPublishCharacter for the transfer of the data into the table CacheCharacter. The first steps of the data transfer perform an update the ID mapping tables in the main database. This is done in the procedures procPublishMappingItem, procPublishCharacter and procPublishState, which call dedicated procedures in the DiversityDescriptions database. As mentioned above, the original IDs (ItemID, CharID and StateID) are stored together resulting mapped IDs (IID, CID and CS) in the cache database tables. To view the mapping information, the views CacheMappingItem, CacheMappingCharacter and CacheMappingState select the appropriate values from the cache tables.

 

List of tables mentioned above

Table ProjectPublished

The projects published via the cache database (Details about the projects are defined in DiversityProjects)

Table ProjectTarget

The targets of the projects, i.e. the Postgres databases

Table ProjectTargetPackage

Packages for projects as documented in the table Package in the Postgres database

Column Data type Description Nullable
ProjectID int Refers to ProjectID in table ProjectTarget NO
TargetID int Referes to TargetID in table ProjectTarget NO
Package nvarchar (50) Package installed for this project target NO

Table ProjectTransfer

The transfers of data of a project

Table ReferenceRelator

Column Data type Description Nullable
BaseURL varchar (500) - NO
RefID int - NO
Role nvarchar (3) - NO
Sequence int - NO
Name nvarchar (255) - NO
AgentURI varchar (255) - YES
SortLabel nvarchar (255) - YES
Address nvarchar (1000) - YES
SourceView nvarchar (128) - NO

Table ReferenceTitle

Table ReferenceTitleSource

Table ReferenceTitleSourceTarget

The targets of the projects, i.e. the Postgres databases

Table ReferenceTitleSourceView

Table Target

The postgres databases as targets for the data

Column Data type Description Nullable
TargetID int ID of the target on a postgres server, PK NO
Server nvarchar (255) Name of IP of the Server NO
Port smallint Port for accessing the server NO
DatabaseName nvarchar (255) The name of the database NO

Table CacheCharacter

Character (= descriptors, features) define variables

Table CacheCharacterTree

The descriptor trees

Column Data type Description Nullable
CharTreeID int Database-internal ID of this record (primary key) NO
CharTreeName nvarchar (255) Descriptor tree name NO

Table CacheCharacterTreeNode

The character tree nodes

Table CacheDescription

The description data in the database

Table CacheItem

The description item in the database

Table CacheMetadata

Column Data type Description Nullable
ProjectID int - NO
ProjectTitle nvarchar (400) - YES
ProjectTitleCode nvarchar (254) - YES
StableIdentifier nvarchar (500) - YES
TechnicalContactName nvarchar (254) - YES
TechnicalContactEmail nvarchar (254) - YES
TechnicalContactPhone nvarchar (254) - YES
TechnicalContactAddress nvarchar (254) - YES
ContentContactName nvarchar (254) - YES
ContentContactEmail nvarchar (254) - YES
ContentContactPhone nvarchar (254) - YES
ContentContactAddress nvarchar (254) - YES
OtherProviderUDDI nvarchar (254) - YES
DatasetTitle nvarchar (254) - YES
DatasetDetails nvarchar (MAX) - YES
DatasetCoverage nvarchar (254) - YES
DatasetURI nvarchar (254) - YES
DatasetIconURI nvarchar (254) - YES
DatasetVersionMajor nvarchar (254) - YES
DatasetCreators nvarchar (254) - YES
DatasetContributors nvarchar (254) - YES
DatasetGUID nvarchar (254) - YES
DateCreated nvarchar (254) - YES
DateModified nvarchar (254) - YES
SourceID nvarchar (254) - YES
SourceInstitutionID nvarchar (254) - YES
OwnerOrganizationName nvarchar (254) - YES
OwnerOrganizationAbbrev nvarchar (254) - YES
OwnerContactPerson nvarchar (254) - YES
OwnerContactRole nvarchar (254) - YES
OwnerAddress nvarchar (254) - YES
OwnerTelephone nvarchar (254) - YES
OwnerEmail nvarchar (254) - YES
OwnerURI nvarchar (254) - YES
OwnerLogoURI nvarchar (254) - YES
IPRText nvarchar (254) - YES
IPRDetails nvarchar (254) - YES
IPRURI nvarchar (254) - YES
CopyrightText nvarchar (254) - YES
CopyrightDetails nvarchar (254) - YES
CopyrightURI nvarchar (254) - YES
TermsOfUseText nvarchar (500) - YES
TermsOfUseDetails nvarchar (254) - YES
TermsOfUseURI nvarchar (254) - YES
DisclaimersText nvarchar (254) - YES
DisclaimersDetails nvarchar (254) - YES
DisclaimersURI nvarchar (254) - YES
LicenseText nvarchar (254) - YES
LicensesDetails nvarchar (254) - YES
LicenseURI nvarchar (254) - YES
AcknowledgementsText nvarchar (254) - YES
AcknowledgementsDetails nvarchar (254) - YES
AcknowledgementsURI nvarchar (254) - YES
CitationsText nvarchar (254) - YES
CitationsDetails nvarchar (254) - YES
CitationsURI nvarchar (254) - YES
RecordBasis nvarchar (254) - YES
KindOfUnit nvarchar (254) - YES
HigherTaxonRank nvarchar (254) - YES
TaxonomicGroup nvarchar (254) - YES
BaseURL varchar (254) - YES
RecordURI nvarchar (500) - YES
ProjectLanguageCode nvarchar (3) - YES

Table CacheResource

The available resources

Table CacheScope

The scope of the description

Table CacheState

The states available for characters

Table CacheTranslation

The available translations

Column Data type Description Nullable
TranslationID int Database-internal ID of this record (primary key) NO
EntityID int Database-internal ID of the referenced record NO
LanguageCode nvarchar (3) Three-character language code of the translation NO
SourceColumn nvarchar (255) Name of the original table column NO
TargetTable nvarchar (255) Name of the target table: “State”, “Character” or “Item” NO
Text nvarchar (MAX) Translated text YES

Table ProjectLockedDescriptor

The descriptors (=characters) that shall not be published

Column Data type Description Nullable
DescriptorID int Database-internal descriptor ID of descriptive data that shall no be published (primary key) NO

Table ProjectLockedScope

The scope types that shall not be published

Column Data type Description Nullable
ScopeType nvarchar (255) Scope types that shall not be pulished (primary key) NO

Table ProjectPublishedDescriptorTree

The descriptor tree IDs that shall be published

Column Data type Description Nullable
DescriptorTreeID int IDs of descriptor trees that shall be published (primary key) NO

Table ProjectPublishedTranslation

The translation languages that shall be published

Column Data type Description Nullable
LanguageCode nvarchar (3) Three-letter language codes of translations that shall be published (primary key) NO