Summarize Data

Summarize data

Currently the following functions for summarizing data are available:

Summarize descriptions : Summarize the data of selected descriptions and create a new one or update an existing description.

Summarize sample data : Summarize the sample data of selected descriptions and update their summary data.

 

Summarizing methods

Depending on the selected summarizing function either the descriptor data of selected descriptions or the sampling data of the selected descriptions build the data source. Summarization of the data is done according their data type.

 

Categorical summary data

For building of categorical summary data the single categorical states of the data sources are accumulated. In general for each states test notes can be entered. To summarize the text notes, the different notes are accumulated (append in a new line), if they are not yet included in the summary note.

If modifier have to be processed, each combination of a categerical state and a modifier will be treated as separate values. Summarization of notes text will be done separately for each of these tuples. Only modifiers that are assigned to a descriptor as recommended modifier (see Editing the descriptor - Descriptor tree tab ) are evaluated, otherwise the modifier values are ignored.

Categorical descriptors may be marked as “exclusive”, which means that only on state may be selected. If the approptiate option is set for the summarization, the categorical states that is most often selected in the source data will be used in the target. 

 

Quantitative summary data

For building of quantitative summary data from the numeric values of the data sources all recommended statistical measures (see Editing thedescriptor - Descriptor tree tab ) for the processed descriptor are calculated. Text notes are accumulated for each statistical measure the same way as described for categorical summary data.

If modifier have to be processed, all the modifier are accumulated for eache statistical measure. Since in the resulting quantitative summary data only one modifier value is allowed for each statistical measure, the most often used modfier is inserted.

 

Text descriptor data

To summarize the text descriptor data, the different texts are accumulated (append in a new line), if they are note yet included in the summary text. Text notes are accumulated in the same way.

 

Molecular sequence data

To summarize the molecular sequence data, the different sequences are accumulated (append in a new line), if they are note yet included in the target sequence. Text notes are accumulated in the same way.

 

 

Jan 14, 2025

Subsections of Summarize Data

Statistical Measures

Statistical measures

The following table gives an overview of the available statistical measures. In column Calculation a hint is given, how the measuere is calculated from the basic data rows.

Name Abbr. Calculation
Lower range limit (human estimate) - x1..xn sorted list: x1
Upper range limit (human estimate) + x1..xn sorted list: xn
Central or typical value (human estimate) centr. x1..xn sorted list: xn/2
Lower range limit (legacy data stat. meth. unknown) -(?) x1..xn sorted list: x1
Upper range limit (legacy data stat. meth. unknown) +(?) x1..xn sorted list: xn
Central or typical value (legacy data stat. meth. unknown) centr.(?) x1..xn sorted list: xn/2
Minimum value Min Absolute smallest value
Maximum value Max Absolute largest value
Mean (= average) μ μ=(1/n)∑(n)xi
Harmonic mean hμ=n/(∑(n)(1/xi))
hμ=0 if any xi=0
Geometric mean gμ=n√∏(n)xi
Mode mode Value that appears most often (ambigious!)
Median med. x1..xn sorted list
n odd: med=x(n+1)/2
n even: med=(xn/2+x(n/2+1))/2
Interquartile mean (= average) IQM x1..xn sorted list
IQM=μ(x0.25n+1..x0.75n+1)
Variance (sample df = n-1) Var. Sn-1=(1/n-1)∑(n)(xi-μ)2
Variance (population; df = n; rarely applicable!) Var. (pop.) Sn=(1/n)∑(n)(xi-μ)2
Standard deviation (sample) s.d. σn-1=√Sn-1
Standard deviation (population; df = n; rarely applicable!) s.d. (pop.) σn=√Sn
Mean deviation m.d. md=(1/n)∑(n)|xi-μ|
Mean deviation from median m.d.m. mdm=(1/n)∑(n)|xi-med|
Coefficient of variation (sample) CV CV=σn
Corrected coefficient of variation (sample) CVC CVC=σn-1
Total range TR x1..xn sorted list
TR=xn-x1
Interquartile range IQR x1..xn sorted list
IQR=x0.75n+1-x0.25n+1
Standard error of mean s.e. σxn-1/√n
Standard error of variance (of multiple samples) s.e.(var.) Sx=Sn-1/n
Skewness Skw. ϒ1=(1/n)∑(n)((xi-μ)/σ)3
Kurtosis Kurt. Kurt=(1/n)∑(n)((xi-μ)/σ)4
Sample size n Number of values

 

Adjust display sequence

You may adjust the display sequence of the statistical measures for the whole database. Choose Administation -> Database -> Statistical measures … from the menu. A window will open as shown below.

  

The sequence number column (“No.”) determines the display sequence in various forms. You may order the entries by clicking on a column header. With the arrow buttons ( ) you can move the selected entries up or down within the table, clicking button or shifts the selected entries to the top respective bottom of the table. After ordering the entries click button to renumber the table entries (starting with “1” for the first table entry) and make the changes effective. To save the changes in the database, leave the form with the OK button, to discard all changes click the Abort button. 

 

May 3, 2024

Summarize Descriptions

Summarize description data

With this form you can summarize the information of selected descriptions and store the summarized data in a new description or update an existing one. After connecting a database select Edit -> Summarize descriptions … from the menu.

 

Description selection

 In the first tab Description selection you search for the source descriptions that shall be summarized into a new or an existing target description (see image below).

You have to select at least the mandatory parameter Project and start a query with button . From the result list superfluous entries may be removed with the button. For a detailled description of the query control please refer to section Query.

 

Descriptor selection

In tab Descriptor selection select the descriptors that shall be summarized in table column OK (see image below). You can select rsp. deselect all descriptors with buttons and . Button inverts all selections. To select all categorical, quantitative, text or sequence descriptors the buttons , , and are avaiable. With button you may select a descriptor tree rsp. descriptor tree node to select all descriptors assigned to the selected element.

   

For quantitative descriptors a statistical measure must be specified where the values for summarization are available. When the descriptor table is filled, for each quantitative descriptor the available recommended measures are checked if one of the measures “Mean (= average)”, “Central or typical value (human estimate)”, “Central or typical value (legacy data stat. meth. unknown)”, “Mode” or “Sample size” is available (priority in this sequence), it is pre-selected in colum Measure. You may modify this adjustment for each single descriptor or use button to change the setting for all selected descriptors. 

 

Generator options

In section Summarize options you may specify to Ignore notes and to Ignore modfier values of the source descriptions. If you select Restrict exclusive descriptors, the most often selected categorical state of a “exclusive” descriptor will be set instead of accumulating all source values. Accumulate scopes will collect all scope values, Accumulate resources all resources of the source descriptions to the target. Write item count will enter notes containing information about the number of collected items in the summary data. If you additionally select Write detailled notes, the source description IDs will be listed in the notes. Finally, Write summary infomation inserts a summary about summarized descriptions into the target description details (see image below).

In section Target description you may select the Target project (usually the same as the source description’s). You may either create a New description and enter the description name or Update a description selected form a drop-down list.  

In section Status data you may control the summarization behaviour for every descriptor data status value:

  • Ignore    
    If in a summarized description the corresponding data status is present for a descriptor value, it will be summarized but the descriptor data status will not be set in the target description.
  • Summarize
    As for Ignore the descriptor values will be summarized. Additionally the descriptor data status will be set in the target description.
  • Omit data
    If in a summarized description the corresponding data status is present for a descriptor value, it will not be summarized and the descriptor data status will not be set in the target description.

After checking the settings click in button Start generator . During processing the icon of the button changes to  and you may abort by clicking the button. In the Preview area a table with the generated rsp. updated target description (coloured background, may be change with button ) and the summarized source descriptions (grey background) is shown. If the target description has been modified, this is indicated ba a yellow background of the description title (see image above). Updated values are shown as blue text. By double-clicking on a field in the preview table you may view the contents in a separate browser window. To save the updated values and close the window click the OK button. To exit without saving click Abort. In this case you will be asked if you want to save modified data.

Since building the preview table may take some time, especially if a lot of descriptors have been selected, you may use the button Recalculate for restarting the summary process. In this case the descriptor columns will not be re-built, only the summary data will be calculated and actualized. This feature may be useful, if you change some settings an want to update the data. If you use the Recalculate button with an empty preview table, only the description titles will be displayed. Anyway you can view the summarized data using the button (see below).

You may store all changed entries by clicking the button or omit all changes and reload the data by clicking the button. To view the description details of the currently selected entry click the button and a separate browser window opens (see image below). 

  

 

Jan 14, 2025

Summarize Sampling Data

Summarize sample data

With this form you can summarize the information of selected descriptions and store the summarized data in a new description or update an existing one. After connecting a database select Edit -> Summarize sample data … from the menu.

 

Description selection

 In the first tab Description selection you search for the desriptions that shall be updated by their summarized sample data (see image below).

You have to select at least the mandatory parameter Project and start a query with button . From the result list superfluous entries may be removed with the button. For a detailled description of the query control please refer to section Query.

 

Descriptor selection

In tab Descriptor selection select the descriptors that shall be summarized in table column OK (see image below). You can select rsp. deselect all descriptors with buttons and . Button inverts all selections. To select all categorical, quantitative, text or sequence descriptors the buttons , , and are avaiable. With button you may select a descriptor tree rsp. descriptor tree node to select all descriptors assigned to the selected element.

   

 

Generator options

In section Summarize options you may specify to Ignore notes and to Ignore modfier values of the description’s sample data. If you select Restrict exclusive descriptors, the most often selected categorical state of a “exclusive” descriptor will be set instead of accumulating all source values. Accumulate scopes will collect all scope values of the sampling events (geographic areas) and sampling units (specimen) in the description summary. Write item count will enter notes containing information about the number of collected items in the summary data. If you additionally select Write detailled notes, the source description IDs will be listed in the notes. Finally, Write summary infomation inserts a summary about summarized sampling events into the target description details (see image below).

In section Descriptions and sampling events you find the descriptions from the query result list in tab Description selection and their sampling events. You may exclude single sampling events or even the whole description form the summarization. In the latter case the description data will not be changed.  

After checking the settings click in button Start generator . During processing the icon of the button changes to  and you may abort by clicking the button. In the Preview area a table with the updated target descriptions (coloured background, may be change with button ) and the summarized sampling units (grey background) is shown. If the target description has been modified, this is indicated ba a yellow background of the description title (see image above). Updated values are shown as blue text. By double-clicking on a field in the preview table you may view the contents in a separate browser window. To save the updated values and close the window click the OK button. To exit without saving click Abort. In this case you will be asked if you want to save modified data.

Since building the preview table may take some time, especially if a lot of descriptors have been selected, you may use the button Recalculate for restarting the summary process. In this case the descriptor columns will not be re-built, only the summary data will be calculated and actualized. This feature may be useful, if you change some settings an want to update the data. If you use the Recalculate button with an empty preview table, only the description titles will be displayed. Anyway you can view the summarized data using the button (see below).

You may store all changed entries by clicking the button or omit all changes and reload the data by clicking the button. To view the description details of the currently selected entry click the button and a separate browser window opens (see image below).