Overview
The THREDDS Data Manager (TDM) creates indexes for GRIB featureCollections, in a process separate from the TDS. This allows lengthy file scanning and reindexing to happen in the background. The TDS uses the existing indices until notified that new ones are ready.
The TDM shares the TDS configuration, including threddsConfig.xml
and the server configuration catalogs.
On server startup, it reads through the catalogs and finds GRIB featureCollections
with a <tdm>
element and adds them to a list.
It can index once or periodically, depending on how you configure the <tdm>
element.
If you change the configuration, you must restart the TDM.
- For static datasets, let the TDM create the indexes, then start the TDS.
- For dynamic datasets, the TDM should run continually, and can send messages to the TDS when a dataset changes.
Installing The TDM
Get the current jar linked from the TDS Download Page
The TDM can be run from anywhere on the local machine, but by convention we create a directory $tds.content.root.path}/tdm
, and run the TDM from there.
Create a shell script to run the TDM, for example runTdm.sh
:
<JAVA> <JVM options> -Dtds.content.root.path=<content directory> -jar <TDM jar> [-tds <tdsServers>] [-cred <user:passwd>] [-showOnly] [-log level]
<JAVA>
Large collections need a lot of memory, so use a 64-bit JVM<JVM options>
-Xmx4g*
to give it 4 Gbytes of memory (for example). More is better.
-Dtds.content.root.path=<content directory>
this passes the content directory as a system property. The thredds configuration catalogs andthreddsConfig.xml
are found in<content directory>/thredds
. Use an absolute path.-jar tdm-5.0.jar
: execute the TDM from the jar file-tds <tdsServers>
: (optional) list of TDS servers to notify. If more than one, separate with commas, with no blanks. Specify only the scheme, host and optional port with a trailing slash for example:http://localhost:8081/
-cred <user:passwd>
: (optional) if you send notifications, the TDS will authenticate using this user name and password. If you do not include this option, you will be prompted for the password on startup, and the user name will be set totdm
.-showOnly
: (optional) if this is present, just show the featureCollections that will be indexed and exit.-log level
: (optional) set the log4j logging level =DEBUG
,INFO
(default),WARN
,ERROR
Example:
/opt/jdk/bin/java -Xmx4g -Dtds.content.root.path=/opt/tds/content -jar tdm-5.0.jar -tds "http://thredds.unidata.ucar.edu/,http://thredds2.unidata.ucar.edu:8081/"
Troubleshooting
- Make sure the
<JVM Options>
, including-Dtds.content.root.path
, come before the-jar <TDM jar>
- The
<content directory>
does not include the/thredds
subdirectory, e.g./opt/tds/content
not/opt/tds/content/thredds
. - Regarding permissions:
- You must run the TDM as a user who has read and write permission into the data directories, so it can write the index files (OR)
- If you are using GRIB index redirection, the TDM must have read access to the data directories, and write access to the index directories.
Running The TDM:
- Upon server startup, if
-tds
was used, but-cred
was not, you will be prompted for the password for thetdm
user password. This allows you to start up the TDM without putting the password into a startup script. Note that usertdm
should be given only the role oftdsTrigger
, which only gives rights to trigger collection reloading. - The TDM will write index files into the data directories or index directories.
The index files will have extensions
gbx9
andncx4
. - For each
featureCollection
, a log file is created in the TDM working directory, with namefc.<collectionName>.log
. Monitor these logs to look for problems with the indexing. - If you start the TDS in a shell, its best to put in the background so it can run independent of the shell:
^Z (this is Control-Z)
bg
Sending Triggers To The TDS
The TDM scans the files in the feature collection, and when it detects that the collection has changed, rewrites the index files.
If enabled, it will send a trigger message to the TDS, and the TDS will reload that dataset.
To enable this, you must configure the TDS with the tdsTrigger
role, and add the user tdm
with that role.
Typically, you do that by editing the ${tomcat_home}/conf/tomcat-user.xml
file, e.g.:
<?xml version='1.0' encoding='utf-8'?>
<tomcat-users>
<role ... />
<role rolename="tdsTrigger"/>
<user ... />
<user username="tdm" password="secret" roles="tdsTrigger"/>
</tomcat-users>
tdm
user has only the tdsTrigger
role.
If you don’t want to allow external triggers, for example if your datasets are static, simply don’t enable the tdsTrigger
role in Tomcat.
You can also set trigger="false"
in the update
element in your catalog:
<update startup="never" trigger="false" />
Catalog Configuration Examples
Example configuration in the TDS configuration catalogs.
Point the TDM to the content directory using -Dtds.content.root.path=<content directory>
on the TDM command line.
Static Dataset:
<featureCollection name="NOMADS CFSRR" featureType="GRIB2" harvest="true"
path="grib/NOMADS/cfsrr/timeseries">
<metadata inherited="true">
<dataType>GRID</dataType>
<dataFormat>GRIB-2</dataFormat>
</metadata>
<collection name="NOMADS-cfsrr-timeseries" spec="/san4/work/jcaron/cfsrr/**/.*grib2$"
dateFormatMark="#cfsrr/#yyyyMM" timePartition="directory"/>
<tdm rewrite="always"/>
</featureCollection>
rewrite="always"
tells the TDM to index this dataset upon TDM startup.- A log file will be written to
fc.NOMADS-cfsrr-timeseries.log
in the TDM working directory. - The TDS will use the existing indexes, it does not monitor any changes in the dataset.
Dynamic dataset:
<featureCollection name="DGEX-Alaska_12km" featureType="GRIB2" harvest="true"
path="grib/NCEP/DGEX/Alaska_12km">
<metadata inherited="true">
<dataType>GRID</dataType>
<dataFormat>GRIB-2</dataFormat>
</metadata>
<collection name="DGEX-Alaska_12km"
spec="/data/ldm/pub/native/grid/NCEP/DGEX/Alaska_12km/.*grib2$"
dateFormatMark="#DGEX_Alaska_12km_#yyyyMMdd_HHmm"
timePartition="file"
olderThan="5 min"/>
<tdm rewrite="true" rescan="0 0/15 * * * ? *" trigger="allow"/>
<update startup="never" trigger="allow" />
</featureCollection>
<tdm>
element for the TDMrewrite="test"
tells the TDM to test for dataset changesrescan="0 0/15 * * * ? *"
rescan directories every 15 minutes.
<update>
element for the TDSstartup="never"
tells the TDS to read in thefeatureCollection
when starting up, using the existing indicestrigger="allow"
enables the TDS to receive messages from the TDM when the dataset has changed
GCPass1
This is a utility program to examine the files in a collection before actually indexing them.
Example:
java -Xmx2g -classpath tdm-4.6.jar thredds.tdm.GCpass1 -spec "Q:/cdmUnitTest/gribCollections/rdavm/ds083.2/PofP/**/.*grib1" -useCacheDir "C:/temp/cache/" > gcpass1.out
Command Line Arguments:
Usage: thredds.tdm.GCpass1 [options]
Options:
-h, --help
Display this help and exit
Default: false
-isGrib2
Is Grib2 collection.
Default: false
-partition
Partition type: none, directory, file
Default: directory
-regexp
Collection regexp string, exactly as in the <featureCollection>.
-rootDir
Collection rootDir, exactly as in the <featureCollection>.
-spec
Collection specification string, exactly as in the <featureCollection>.
-useCacheDir
Set the Grib index cache directory.
-useTableVersion
Use Table version to make seperate variables.
Default: false
- You must have
spec
or (regexp
androotDir
). - If
useCacheDir
is not set, indexes will be in the data directories.
Sample Output:
FeatureCollectionConfig name= 'GCpass1' collectionName= 'GCpass1' type= 'GRIB1' # <1>
spec= 'B:/rdavm/ds083.2/grib1/**/.*grib1'
timePartition= directory
#files #records #vars #runtimes #gds
Directory B:\rdavm\ds083.2\grib1 # <2>
Directory B:\rdavm\ds083.2\grib1\1999 # <3>
B:\rdavm\ds083.2\grib1\1999\1999.07 total 1 244 63 1 1 1999-07-30T18:00:00Z - 1999-07-30T18:00:00Z # <4>
B:\rdavm\ds083.2\grib1\1999\1999.08 total 119 29046 66 119 1 1999-08-01T00:00:00Z - 1999-08-31T18:00:00Z
B:\rdavm\ds083.2\grib1\1999\1999.09 total 89 21755 66 89 1 1999-09-01T00:00:00Z - 1999-09-30T12:00:00Z
B:\rdavm\ds083.2\grib1\1999\1999.10 total 62 15128 63 62 1 1999-10-01T00:00:00Z - 1999-10-31T12:00:00Z
B:\rdavm\ds083.2\grib1\1999\1999.11 total 97 23816 66 97 1 1999-11-01T00:00:00Z - 1999-11-30T18:00:00Z
B:\rdavm\ds083.2\grib1\1999\1999.12 total 120 29512 66 120 1 1999-12-01T00:00:00Z - 1999-12-31T18:00:00Z
B:\rdavm\ds083.2\grib1\1999 total 488 119501 66 488 1 1999-07-30T18:00:00Z - 1999-12-31T18:00:00Z # <5>
Directory B:\rdavm\ds083.2\grib1\2000 #<3>
B:\rdavm\ds083.2\grib1\2000\2000.01 total 124 30504 64 124 1 2000-01-01T00:00:00Z - 2000-01-31T18:00:00Z # <4>
B:\rdavm\ds083.2\grib1\2000\2000.02 total 116 28536 64 116 1 2000-02-01T00:00:00Z - 2000-02-29T18:00:00Z
B:\rdavm\ds083.2\grib1\2000\2000.03 total 124 30504 64 124 1 2000-03-01T00:00:00Z - 2000-03-31T18:00:00Z
B:\rdavm\ds083.2\grib1\2000\2000.04 total 120 29520 64 120 1 2000-04-01T00:00:00Z - 2000-04-30T18:00:00Z
...
B:\rdavm\ds083.2\grib1\2014\2014.11 total 120 34560 76 120 1 2014-11-01T00:00:00Z - 2014-11-30T18:00:00Z
B:\rdavm\ds083.2\grib1\2014\2014.12 total 67 19296 76 67 1 2014-12-01T00:00:00Z - 2014-12-17T12:00:00Z
B:\rdavm\ds083.2\grib1\2014 total 1403 444544 116 1403 1 2014-01-01T00:00:00Z - 2014-12-17T12:00:00Z #<5>
B:\rdavm\ds083.2\grib1 total 22347 6546693 118 22347 1 1999-07-30T18:00:00Z - 2014-12-17T12:00:00Z #<5>
#files #records #vars #runtimes #gds
grand total 22347 6546693 118 22347 1 #<6>
referenceDate (22347) #<7>
1999-07-30T18:00:00Z - 2014-12-17T12:00:00Z: count = 22347
table version (2) #<8>
7-0-1: count = 3188
7-0-2: count = 6543505
variable (118) #<9>
5-wave_geopotential_height_anomaly_isobaric_10: count = 22076
5-wave_geopotential_height_isobaric_10: count = 22344
Absolute_vorticity_isobaric_10: count = 581022
Albedo_surface_Average: count = 6922
...
gds (1) # <10>
1645598069: count = 6546693
gdsTemplate (1) # <11>
0: count = 6546693
vertCoordInGDS (0) # <12>
predefined (0) # <13>
thin (0) # <14>
- The Feature Collection configuration
- The top-level directory
- Subdirectory
- Partitions - in this case these are directories because this is a directory partition.
- number of files in the partition
- number of records in the partition
- number of separate variables in the partition. Inhomogeneous partitions look more complex to the user.
- number of runtimes in the partition
- number of horizontal (GDS), which are turned into groups
- the starting and ending runtime. Look for overlapping partitions
- Sum of subpartitions for this partition
- Grand sum over all partitions
- Summary (n, start/end) of run dates
- list of all table versions found, count of number of records for each. Possibility that variables that should be separated by table version.
- list of all variables found, count of number of records for each. Possibility that stray records are in the collection.
- list of all GDS hashes found, count of number of records for each. Possibility of spurious differences with GDS hashes.
- list of all GDS templates found, count of number of records for each
- count of records that have vertical coordinates in the GDS (GRIB1 only)
- count of records that have predefined GDS (GRIB1 only) Possibility of unknown predefined GDS.
- count of records that have Quasi/Thin Grid (GRIB1 only)