featureCollection element is a way to tell the TDS to serve collections of CDM Feature Datasets.
Currently, this is used for gridded and point datasets whose time and spatial coordinates are recognized by the CDM software stack.
This allows the TDS to automatically create logical datasets composed of collections of files, and to allow subsetting in coordinate space on them, through the WMS, WCS, NetcdfSubset, and CDM Remote Feature services.
Feature Collections have been undergoing continual development and refinement in the recent version of the TDS, and as you upgrade there are (mostly) minor changes to configuration and usage.
featureCollection element was first introduced TDS 4.2, replacing the
fmrcDataset element in earlier versions.
TDS 4.2 allowed
featureType = FMRC,
TDS 4.3 added
featureType = GRIB, used for collections of GRIB files.
TDS 4.5 changed this usage to
featureType = GRIB1 or
TDS 5.0 added refinements for performance.
Only serve GRIB files with featureCollection=GRIB1 or GRIB2. Do not use FMRC, or NcML Aggregations on GRIB files.
A fair amount of the complexity of feature collections is managing the collection of files on the server, both in creating indexes for performance, and in managing collections that change. For high-performance servers, it is necessary to let a background process manage indexing, and the THREDDS Data Manager (TDM) is now available for that purpose.
This document gives an overview of Feature Collections, as well as a complete syntax of allowed elements.
Specific topics covered here are:
- Example catalog elements
- Description of elements in TDS configuration catalogs
- The Date Extractor
- Using external triggers for updating collections
- Static vs Changing Datasets (non-GRIB, GRIB)
- NcML Modificatoins (non-GRIB)
For Feature Type specific information, see:
- FMRC Collections
- Point Collections
- GRIB Collections
- GRIB specific configuration
- GRIB Collection FAQs
Example Catalog Elements
featureCollection element is a subtype of
It defines a logical dataset for the TDS.
All of the elements that can be used inside of a
dataset element can be used inside of a
Example 1: Simple Case Using Defaults
<featureCollection name="NCEP Polar Stereographic" featureType="GRIB2" path="grib/NCEP/NAM/Polar_90km"> <!-- 1 --> <collection name="NCEP-NAM-Polar_90km" spec="/data/ldm/pub/native/grid/NCEP/NAM/Polar_90km/NAM_Polar_90km_.*\.grib2$"/> <!-- 2 --> </featureCollection>
GRIB2Feature Collection dataset is defined, with the “human readable” name of “NCEP Polar Stereographic”. Its URL path(s) will look like
http://server/thredds/<service>/grib/NCEP/NAM/Polar_90km/... The Dataset
IDis automatically set to the path, so that its dataset page will be
- Defines the files in the collection as any file in the directory
/data/ldm/pub/native/grid/NCEP/NAM/Polar_90km/that matches the regular expression
NAM_Polar_90km.*\.grib2$. In this case, it means any filename starting with
NAM_Polar_90kmand ending with
.grib2. The collection name is
NCEP-NAM-Polar_90km, which is used for index file names, etc.
Example 2: Specify The Options Explicitly
<featureCollection name="NCEP NAM Alaska(11km)" featureType="GRIB2" path="grib/NCEP/NAM/Alaska_11km"> <metadata inherited="true"> <serviceName>GribServices</serviceName> <!-- 1 --> <documentation type="summary">NCEP GFS Model : AWIPS 230 grid</documentation> <!-- 2 --> </metadata> <collection spec="/data/ldm/pub/native/grid/NCEP/NAM/Alaska_11km/.*grib2$" <!-- 3 --> name="NAM_Alaska_11km" dateFormatMark="#NAM_Alaska_11km_#yyyyMMdd_HHmm" <!-- 4 --> timePartition="file" <!-- 5 --> olderThan="5 min"/> <!-- 6 --> <update startup="nocheck"/> <!-- 7 --> <tdm rewrite="test" rescan="0 0/15 * * * ? *" /> <!-- 8 --> </featureCollection>
- Arbitrary metadata can be added to the catalog.
Here, we indicate to use the service called
GribServices(not shown, but likely a compound service that includes all the services you want to provide for
documentionelement of type
summaryis added to the catalog for this dataset.
- The collection consists of all files ending with
grib2in the directory
- A date will be extracted from the filename, and the files will then be sorted by date. Important if the lexicographic ordering is different that the date order.
- Partitioning will happen at the
- Only include files whose
lastModifieddate is more than 5 minutes old. This is to exclude files that are actively being created.
- Instruct the TDS to use the collection index if it already exists, without testing if it’s up-to-date.
- Instruct the TDM to examine all the files to detect if they have changed since the index was written. Rescan every 15 minutes.
Description Of Elements In TDS Configuration Catalogs
featureCollection is a kind of
dataset element, and so can contain the same elements and attributes of that element.
Following is the XML Schema definition for the
<xsd:element name="featureCollection" substitutionGroup="dataset"> <xsd:complexType> <xsd:complexContent> <xsd:extension base="DatasetType"> <xsd:sequence> <xsd:element type="collectionType" name="collection"/> <xsd:element type="updateType" name="update" minOccurs="0"/> <xsd:element type="tdmType" name="tdm" minOccurs="0"/> <xsd:element type="protoDatasetType" name="protoDataset" minOccurs="0"/> <xsd:element type="fmrcConfigType" name="fmrcConfig" minOccurs="0"/> <xsd:element type="pointConfigType" name="pointConfig" minOccurs="0"/> <xsd:element type="gribConfigType" name="gribConfig" minOccurs="0"/> <xsd:element type="fileSortType" name="filesSort" minOccurs="0" /> <xsd:element ref="ncml:netcdf" minOccurs="0"/> </xsd:sequence> <xsd:attribute name="featureType" type="featureTypeChoice" use="required"/> <xsd:attribute name="path" type="xsd:string" use="required"/> </xsd:extension> </xsd:complexContent> </xsd:complexType> </xsd:element> <xsd:simpleType name="featureTypeChoice"> <xsd:union memberTypes="xsd:token"> <xsd:simpleType> <xsd:restriction base="xsd:token"> <xsd:enumeration value="FMRC"/> <xsd:enumeration value="GRIB1"/> <xsd:enumeration value="GRIB2"/> <xsd:enumeration value="Point"/> <xsd:enumeration value="Station"/> </xsd:restriction> </xsd:simpleType> </xsd:union> </xsd:simpleType>
Here is an example
featureCollection as you might put it into a TDS catalog:
<featureCollection name="Metar Station Data" harvest="true" featureType="Station" path="nws/metar/ncdecoded"> <!-- 1 --> <metadata inherited="true"> <!-- 2 --> <serviceName>fullServices</serviceName> <documentation type="summary">Metars: hourly surface weather observations</documentation> <documentation xlink:href="http://metar.noaa.gov/" xlink:title="NWS/NOAA information"/> <keyword>metar</keyword> <keyword>surface observations</keyword> </metadata> <collection name="metars" spec="/data/ldm/pub/decoded/netcdf/surface/metar/Surface_METAR_#yyyyMMdd_HHmm#.nc$" /> <!-- 3 --> <update startup="test" rescan="0 0/15 * * * ? *"/> <!-- 4 --> <protoDataset choice="Penultimate" /> <!-- 5 --> <pointConfig datasetTypes="cdmrFeature Files"/> <!-- 6 --> <netcdf xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2"> <!-- 7 --> <attribute name="Conventions" value="CF-1.6"/> </netcdf> </featureCollection>
featureCollectionis declared, using the name and harvest attributes declared by the
featureTypeis a mandatory attribute defining the type of the feature collection. The
pathis also required, which defines what the URL of this collection will be. It must be unique over the entire TDS. If an
IDattribute is not specified on the
pathattribute is used as the
ID(this is a recommended idiom).
- As is usual with
datasetelements, a block of metadata can be declared that will be inherited by all the datasets.
- The collection of files is defined. Each dataset is assigned a nominal time by extracting a date from the filename.
- Specify that the collection is updated, when the TDS starts and in a background thread, every 15 minutes.
- The prototype dataset is the next-to-last in the collection when sorted by time.
- Configuration specific to the Point feature type: expose a
cdmrRemoteservice on the entire collection, and serve all the component files using the default service, in this example the compound service
- This NcML wraps each dataset in the collection. This attribute overrides any existing one in the datasets; it tells the CDM to parse the station information using the CF Conventions.
collection element defines the collection of datasets.
<collection spec="/data/ldm/pub/native/satellite/3.9/WEST-CONUS_4km/WEST-CONUS_4km_3.9_.*gini$" dateFormatMark="#WEST-CONUS_4km_3.9_#yyyyMMdd_HHmm" name="WEST-CONUS_4km" olderThan="15 min" />
The XML Schema for the
<xsd:complexType name="collectionType"> <xsd:attribute name="spec" type="xsd:string" use="required"/> (1) <xsd:attribute name="name" type="xsd:token"/> (2) <xsd:attribute name="olderThan" type="xsd:string" /> (3) <xsd:attribute name="dateFormatMark" type="xsd:string"/> (4) <xsd:attribute name="timePartition" type="xsd:string"/> (5) </xsd:complexType>
spec(required): collection specification string. In this example, the collection contains all files in the directory
/data/ldm/pub/native/satellite/3.9/WEST-CONUS_4km/whose filename matches the regular expression
.means “match any number of characters” and
gini$means “ends with the characters gini”. If you wanted to match
.gini, you would need to escape the “.”, i.e/,
name(required): the collection name, which must be unique for all collections served by your TDS. This is used for external triggers, for the CDM collection index files, and for logging and debugging messages. If missing, the name attribute on the
<featureCollection>element is used. However, we recommend that you create a unique, immutable name for the dataset collection, and put it in this name attribute of the collection element.
olderThan(optional): Only files whose
lastModifieddate is older than this are included. This is used to exclude files that are in the process of being written. However, it only applies to newly found files; that is, once a file is in the collection it is not removed because it was updated.
dateFormatMark(optional): This defines a DateExtractor, which is applied to each file in the collection to assign it a date, which is used for sorting, getting the latest file, and possibly for time partitioning. In this example, the string
WEST-CONUS_4km_3.9_is located in each file path, then the SimpleDateFormat template
yyyyMMdd_HHmmis applied to the next characters of the filename to create a date. A DateExtractor can also be defined in the collection specification string, but in that case the date must be contained just in the file name, as opposed to the complete file path which includes all of the parent directory names. Use this OR a date extractor in the specification string, but not both.
timePartition(optional): Currently only used by GRIB collections, see here for more info.
Provides control over the choice of the prototype dataset for the collection.
The prototype dataset is used to populate the metadata for the feature collection.
Note that this is not used by
GRIB feature collections.
<protoDataset choice="Penultimate" change="0 2 3 * * ? *"> <netcdf xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2"> <attribute name="featureType" value="timeSeries"/> </netcdf> </protoDataset>
The XML Schema definition for the
<xsd:complexType name="protoDatasetType"> <xsd:sequence> <xsd:element ref="ncml:netcdf" minOccurs="0"/> (1) </xsd:sequence> <xsd:attribute name="choice" type="protoChoices"/> (2) <xsd:attribute name="change" type="xsd:string"/> (3) </xsd:complexType>
ncml:netcdf(optional): ncml elements that modify the prototype dataset.
choice = [First | Random | Penultimate | Latest]: select prototype from a time ordered list, using the first, a randomly selected one, the next to last, or the last dataset in the list. The default is
change = "cron expr"(optional): On rolling datasets, you need to change the prototype periodically, otherwise it will get deleted eventually. This attribute specifies when the
protoDatasetshould be re-selected, using a cron expression.
The choice of the
protoDataset matters when the datasets are not homogeneous:
- Global and variable attributes are taken from the prototype dataset.
- If a variable appears in the prototype dataset, it will appear in the feature collection dataset. If it doesn’t appear in other datasets, it will have missing data for those times.
- If a variable does not appear in the prototype dataset, it will not appear in the feature collection dataset, even if it appears in other datasets.
For collections that change, the
update element provides options to update the collection, either synchronously (while a user request waits) or asynchronously (in a background task, so that requests do not wait).
GRIB collections, updating of the collection by the TDS is no longer supported, for either static or dynamic collections (use the TDM and the
tdm element for this).
However, even for
GRIB collections, the
update element can be used to control if collections can be re-read by the TDS using an external trigger.
<update startup="test" rescan="0 0/30 * * * ? *" trigger="false"/>
- If the dataset has been updated when the TDS starts up then test it, and in a background process recheck it every 30 minutes.
- Do not allow external triggers.
- Note: Cannot use for GRIB collections, see
<update recheckAfter="15 min" />
- Test if the dataset has been updated only when a request comes in for it, and the dataset hasn’t been checked for 15 minutes.
<update startup="never" trigger="allow" />
- Never update the collection indices, but allow an external program (such as the TDM) to send a trigger telling the TDS that it should reread the collection into memory.
- This is useful for large collections of data where even testing if a dataset has changed can be costly.
The XML Schema definition for the update element:
<xsd:complexType name="updateType"> <xsd:attribute name="recheckAfter" type="xsd:string" /> <!-- 1 --> <xsd:attribute name="rescan" type="xsd:token"/> <!-- 2 --> <xsd:attribute name="trigger" type="collectionUpdateType"/> <!-- 3 --> <xsd:attribute name="startup" type="collectionUpdateType"/> <!-- 4 --> </xsd:complexType>
recheckAfter: This will cause a new scan whenever a request comes in and this much time has elapsed since the last scan. The request will wait until the scan is finished and a new collection is built (if needed), and so is called synchronous updating. This option will be ignored if you are using the rescan attribute or if you have a tdm element.
rescan: uses a cron expression to specify when the collection should be re-scanned in a background task, and re-tested to see if it has changed. This is called asynchronous updating.
trigger: if set to
allow(default), then external triggering will be allowed. This allows collections to be updated by an external program (or person using a browser) sending an explicit “trigger” URL to the server. This URL is protected by HTTPS, so you must enable triggers for this to work. Set this to
falseto disable triggering.
[never | nocheck | testIndexOnly | test | always](default never). The collection is read on server startup, and tested whether it is up to date, depending on the collectionUpdateType
<xsd:simpleType name="collectionUpdateType"> <xsd:union memberTypes="xsd:token"> <xsd:simpleType> <xsd:restriction base="xsd:token"> <xsd:enumeration value="never"/> <!-- 1 --> <xsd:enumeration value="nocheck"/> <!-- 2 --> <xsd:enumeration value="testIndexOnly"/> <!-- 3 --> <xsd:enumeration value="test"/> <!-- 4 --> <xsd:enumeration value="always"/> <!-- 5 --> </xsd:restriction> </xsd:simpleType> </xsd:union> </xsd:simpleType>
never: the collection is used as it is, and no checking is done. The collection index must already exist. Use this for very large collections that you don’t want to inadvertently scan.
nocheck: the collection index is used if it exists, without checking whether it’s up-to-date. If it doesn’t exist, build it.
testIndexOnly: the collection index is used if it exists and it is newer than all of its immediate children.
true: the collection’s data files are scanned and the new collection of children is compared to the old collection. If there are any changes, the index is rebuilt.
always: the collection is always re-scanned and the indices are rebuilt.
tdm Element (GRIB Only)
You must use the
tdm element for GRIB collections - the
update element no longer applies.
The TDM is a separate process that uses the same configuration catalogs as the TDS, and updates GRIB collections in the background.
<tdm rewrite="test" />
- This example tells the TDM (not the TDS) to test if the dataset has changed (with respect to any existing indexes on disk), and if so update it. If no indexes exists on disk at the time the TDM is run, then create them. Once the test is complete and any indexes are created, the TDM will not check again unless the process is stopped and a new session of the TDM is started.
<tdm rewrite="test" rescan="0 4,19,34,49 * * * ? *" />
This example tells the TDM (not the TDS) to test if the dataset has changed 4 times every hour, specifically, at
49minutes past the hour. If the collection has changed, new indices will be recreated, and a trigger will be sent to the TDS to tell it to re-read the collection so that the new data will show up in the TDS client catalogs.
The TDM uses the trigger
This is done by executing an
Getrequest to the TDS. It is sent when the TDM has done a rescan, the collection has changed, and a new collection index was made. The trigger tells the TDS to read in the new collection index.
The XML Schema definition for the tdm element:
<xsd:complexType name="tdmType"> <xsd:attribute name="rewrite" type="collectionUpdateType"/> <!-- 1 --> <xsd:attribute name="rescan" type="xsd:token"/> <!-- 2 --> </xsd:complexType>
rewrite: one of the collectionUpdateTypes, except for
never. The most useful value is test.
rescan: uses a cron expression to specify when the collection should be re-scanned.
When a collection shows a list of files, the files will be sorted by increasing name.
To use a decreasing sort, use the
fileSort element inside the
<featureCollection ... > ... <filesSort increasing = "false" /> </featureCollection>
Feature Collections sometimes (Point, FMRC, and time partitioned GRIB) need to know how to sort the collection of files.
In those cases you need to have a date in the filename and need to specify a date extractor in the specification string or include a
If the date is in the filename only, you can use the collection specification string. A
applied to the file
would result in the extraction of the date
In this case,
- the characters in the filename before the first the
#symbol are counted.
- the characters between the
#symbols are counted to determine the position of the characters to extract from the filename (positions 18 though 30).
- the SimpleDateFormat pattern
yyyyMMdd_HHmmis applied to the extracted characters.
- the characters in the filename before the first the
When the date is in the directory name and not completely in the filename, you must use the
dateFormatMark. For example with a file path:
In this case, the
#characters delineate the substring match on the entire pathname, not just the file name. Immediately following the match comes the string to be parsed by the SimpleDateFormat pattern, in this example:
Note that the
/Run_is enclosed in single quotes. This tells SimpleDateFormat to interpret these characters literally, and they must match characters in the filename exactly.
You might also need to put the SimpleDateFormat before the substring match. For example, in the following structure:
Let’s pretend “stuff” differs for each subdirectory, so you can’t match on it. However, you can match on
Experiment, so you can use:
Note that whatever you match on must be unique in the pathname.
The TDS supports a REST interface that allows authorized users to send a trigger to the TDS to tell it to reread a feature collection. The URL for the trigger is:
name is the collection name, and
type is a collectionUpdateType.
- Typically, the trigger is used by the TDM, but it can also be used manually or by another program.
- Triggering is password-protected and uses TLS/SSL (see Enable TLS/SSL Encryption for more information).
- You must give the role
tdsTriggerto any user you want to have the right to send a trigger.
- To enable the TDM trigger, create a user named
tdmand give that user the
You can see a list of the Feature Collection datasets (and manually send a
trigger=nocheck to specific datasets) from the TDS admin page at
To have access to this page the user must have the role
tdsConfig (see enabling Remote Management).
Static Versus Changing Datasets (Not GRIB)
Static Collection - Small Or Rarely-Used
If you have a collection that doesn’t change, do not include an
The first time the dataset is accessed, it will be read in and then never changed.
Static Collection - Fast Response
If you have a collection that doesn’t change, but you want to have it ready for requests, then use:
<update startup ="always" />
The dataset will be scanned at startup time and then never changed.
Large Static Collection
You have a large collection, which takes a long time to scan. You must carefully control when/if it will be scanned.
<update startup ="nocheck" />
The dataset will be read in at start up time by using the existing indexes (if they exist). If indexes don’t exist, they will be created on start up.
If it occasionally changes, then you want to manually tell it when to re-scan:
<update startup ="nocheck" trigger="allow" />
The dataset will be read in at start up time by using the existing indexes, and you manually tell it when to rebuild the index. You must enable triggers.
Changing Collection - Small or Rarely Used
For collections that change but are rarely used, use the
recheckAfter attribute on the
This minimizes unneeded processing for lightly used collections.
This is also a reasonable strategy for small collections which don’t take very long to scan.
<update recheckAfter="15 min" />
rescanattribute. If you do, the
recheckAfterwill be ignored.
Changing Collection - Fast response
When you want to ensure that requests are answered as quickly as possible, read it at startup and update the collection in the background using
<update startup="test" rescan="0 20 * * * ? *" />
This cron expression says to rescan the collection files “every hour at 20 past the hour”, and rebuild the dataset if needed.
Sporadically Changing Collection
To externally control when a collection is updated, enable remote triggers, and when the dataset changes, send a trigger to the TDS.
Static Versus Changing GRIB Datasets
Changing GRIB Collection (Small Or Large)
You have a GRIB collection that changes. The TDS can only scan indices at startup time. You must use the TDM to detect any changes.
<tdm rewrite="test" rescan="0 0/15 * * * ? *" />
The dataset will be read in at startup time by the TDS using the existing indexes, and will be scanned by the TDM every 15 minutes, which can be configured to send a trigger as needed.
For very large collections, the
rescan schedule should be carefully considered.
<tdm rewrite="test" rescan="0 0 3 * * ? *" />
The dataset will be read in at TDS startup time by using the existing indexes (they must already exist). The TDM will test if it’s changed “once a day at 3 am”, and send a trigger to the TDS if needed.
Very Large GRIB Collection That Doesn’t Change
You have a very large collection, which takes a long time to scan. You must carefully control when/if it will be scanned.
<update trigger="false"/> <tdm rewrite="test"/>
The TDS never scans the collection, it always uses existing indices (which must already exist). Run the TDM first, then after the indices are made, you can stop the TDM and start the TDS. Since the collection does not change, there is no need to tell the TDS to re-read the collection, so disable triggering. If the collection is updated, the TDM will need to be ran again, and the TDS will need to be restarted.
NcML is no longer used to define the collection, but it may still be used to modify the feature collection dataset for FMRC or Point (not GRIB).
<featureCollection featureType="FMRC" name="RTOFS Forecast Model Run Collection" path="fmrc/rtofs"> <collection spec="c:/rps/cf/rtofs/.*ofs_atl.*\.grib2$" recheckAfter="10 min" olderThan="5 min"/> <!-- 1 --> <netcdf xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2"> <!-- 2 --> <variable name="time"> <attribute name="units" value="hours since 1953-11-29T08:57"/> </variable> </netcdf> <protoDataset> <netcdf xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2"> <!-- 3 --> <attribute name="speech" value="I'd like to thank all the little people..."/> <variable name="mixed_layer_depth"> <attribute name="long_name" value="mixed_layer_depth @ surface"/> <attribute name="units" value="m"/> </variable> </netcdf> </protoDataset> </featureCollection>
- The collection is defined by a
collectionelement, allowing any number of forecast times per file
- When you want to modify the component files of the collection, you put an NcML-related elements inside the
featureCollectionelement. This modifies the component files before they are turned into a gridded dataset. In this case we have fixed the time coordinate
unitsattribute, otherwise the individual files would not get recognized as Grid datasets and the feature collection will fail.
- When you want to modify the resulting
FMRCdataset, you put an NcML element inside the
protoDatasetelement. In this case we have added a global attribute named
speechand 2 attributes on the variable named
featureCollections are your friend: a
featureCollectionmay be updated in the background, but aggregations will only be updated when the user makes a request (synchronously), which means the user has to wait until the update is complete (gonna have a bad time).