The featureCollection element is a way to tell the TDS to serve collections of CDM Feature Datasets. Currently this is used for gridded and point datasets whose time and spatial coordinates are recognized by the CDM software stack. This allows the TDS to automatically create logical datasets composed of collections of files, and to allow subsetting in coordinate space on them, eg through the WMS, WCS, and Netcdf Subsetting Service.
Feature Collections have been undergoing continual development and refinement in the recent version of the TDS, and as you upgrade there are (mostly) minor changes to configuration and usage. The featureCollection element was first introduced TDS 4.2, replacing the fmrcDataset element in earlier versions. TDS 4.2 allowed featureType = FMRC, Point, and Station. TDS 4.3 added featureType = GRIB, used for collections of GRIB files. TDS 4.5 changed this usage to featureType = GRIB1 or GRIB2. Currently, one should only serve GRIB files with featureCollection=GRIB1 or GRIB2. One should not use FMRC, or NcML Aggregations on GRIB files.
A fair amount of the complexity of feature collections is managing the collection of files on the server, both in creating indexes for performance, and in managing collections that change. For high-performance servers, it is necessary to let a background process manage indexing, and the THREDDS Data Manager (TDM) is now available for that purpose.
This document gives an overview of Feature Collections, as well as a complete syntax of allowed elements. For Feature Type specific information, see:
Also see:
The featureCollection element is a subtype of dataset element. It defines a logical dataset for the TDS. All of the elements that can be used inside of a dataset element can be used inside of a featureCollection element.
1) <featureCollection name="NCEP Polar Sterographic" featureType="GRIB2" path="grib/NCEP/NAM/Polar_90km"> 2) <collection name="NCEP-NAM-Polar_90km" spec="/data/ldm/pub/native/grid/NCEP/NAM/Polar_90km/NAM_Polar_90km_.*\.grib2$"/> </featureCollection>
<featureCollection name="NCEP NAM Alaska(11km)" featureType="GRIB2" path="grib/NCEP/NAM/Alaska_11km"> <metadata inherited="true">
1) <serviceName>GribServices</serviceName> 2) <documentation type="summary">NCEP GFS Model : AWIPS 230 (G) Grid. Global Lat/Lon grid</documentation>
</metadata> 3)<collection spec="/data/ldm/pub/native/grid/NCEP/NAM/Alaska_11km/.*grib2$" name="NAM_Alaska_11km" 4) dateFormatMark="#NAM_Alaska_11km_#yyyyMMdd_HHmm" 5) timePartition="file" 6) olderThan="5 min"/> 7)<update startup="nocheck" trigger="allow"/> 8)<tdm rewrite="test" rescan="0 0/15 * * * ? *" /> 9)<gribConfig datasetTypes="TwoD Best Latest" /> </featureCollection>
A featureCollection is a kind of dataset element, and so can contain the same elements and attributes of that element. Following is the XML Schema definition for the featureCollection element:
<xsd:element name="featureCollection" substitutionGroup="dataset">Here is an example featureCollection as you might put it into a TDS catalog:
<xsd:complexType>
<xsd:complexContent>
<xsd:extension base="DatasetType">
<xsd:sequence>
<xsd:element type="collectionType" name="collection"/>
<xsd:element type="updateType" name="update" minOccurs="0"/>
<xsd:element type="tdmType" name="tdm" minOccurs="0"/>
<xsd:element type="protoDatasetType" name="protoDataset" minOccurs="0"/> <xsd:element type="fmrcConfigType" name="fmrcConfig" minOccurs="0"/>
<xsd:element type="pointConfigType" name="pointConfig" minOccurs="0"/>
<xsd:element type="gribConfigType" name="gribConfig" minOccurs="0"/>
<xsd:element ref="ncml:netcdf" minOccurs="0"/>
</xsd:sequence>
<xsd:attribute name="featureType" type="featureTypeChoice" use="required"/>
<xsd:attribute name="path" type="xsd:string" use="required"/>
</xsd:extension>
</xsd:complexContent>
</xsd:complexType>
</xsd:element>
1)<featureCollection name="Metar Station Data" harvest="true" featureType="Station" path="nws/metar/ncdecoded"> 2) <metadata inherited="true"> <serviceName>fullServices</serviceName>
<documentation type="summary">Metars: hourly surface weather observations</documentation> <documentation xlink:href="http://metar.noaa.gov/" xlink:title="NWS/NOAA information"/> <keyword>metar</keyword> <keyword>surface observations</keyword> </metadata> 3) <collection name="metars" spec="/data/ldm/pub/decoded/netcdf/surface/metar/Surface_METAR_#yyyyMMdd_HHmm#.nc$" /> 4) <update startup="test" rescan="0 0/15 * * * ? *"/> 5) <protoDataset choice="Penultimate" /> 6) <pointConfig datasetTypes="cdmrFeature Files"/> 7) <netcdf xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2">
<attribute name="Conventions" value="CF-1.6"/>
</netcdf> </featureCollection>
A collection element defines the collection of datasets. Example:
<collection spec="/data/ldm/pub/native/satellite/3.9/WEST-CONUS_4km/WEST-CONUS_4km_3.9_.*gini$" dateFormatMark="#WEST-CONUS_4km_3.9_#yyyyMMdd_HHmm" name="WEST-CONUS_4km" olderThan="15 min" />The XML Schema for the collection element:
<xsd:complexType name="collectionType">
1) <xsd:attribute name="spec" type="xsd:string" use="required"/>
2) <xsd:attribute name="name" type="xsd:token"/>
3) <xsd:attribute name="olderThan" type="xsd:string" />
4) <xsd:attribute name="dateFormatMark" type="xsd:string"/>
5) <xsd:attribute name="timePartition" type="xsd:string"/>
</xsd:complexType>
where
Feature Collections sometimes (Point, FMRC (ususally), and time partitioned GRIB) need to know how to sort the collection of files, and in those cases you need to have a date in the filename, and to specify a date extractor in the specification string or include a dateFormatMark attribute.
1. If the date is in the filename only, you can use the collection specification string, aka a spec:
/data/ldm/pub/native/grid/NCEP/GFS/Alaska_191km/GFS_Alaska_191km_#yyyyMMdd_HHmm#\.nc$applied to the file /data/ldm/pub/native/grid/NCEP/GFS/Alaska_191km/GFS_Alaska_191km_20111226_1200.grib1 would extract the date 2011-11-26T12:00:00.
In this case, #yyyyMMdd_HHmm# is positional: it counts the charactors before the '#' and then extracts the charactors in the filename (here at position 17 though 30) and applies the SimpleDateFormat yyyyMMdd_HHmm pattern to them.
2. When the date is in the directory name and not completely in the filename, you must use the dateFormatMark. For example with a file path
/data/ldm/pub/native/grid/NCEP/GFS/Alaska_191km/20111226/Run_1200.grib1
Use
dateFormatMark="#Alaska_191km/#yyyyMMdd'/Run_'HHmm"In this case, the '#' characters delineate the substring match on the entire pathname. Immediately following the match comes the string to be given to SimpleDateFormat, in this example:
yyyyMMdd'/Run_'HHmmNote that the /Run_ is enclosed in single quotes. This tells SimpleDateFormat to interpret these characters literally, and they must match characters in the filename exactly.
You might also need to put the SimpleDateFormat before the substring match, eg in the following, stuff differs for each subdirectory, so you can't match on it:
/dataroot/stuff/20111226/Experiment-02387347.grib1
However, you can match on Experiment so you can use:
dateFormatMark="yyyyMMdd#/Experiment#"Note that whatever you match on must be unique in the pathname.
Provides control over the choice of the prototype dataset for the collection. The prototype dataset is used to populate the metadata for the feature collection. Example:
<protoDataset choice="Penultimate" change="0 2 3 * * ? *"> <netcdf xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2">The XML Schema definition for the protoDataset element:
<attribute name="featureType" value="timeSeries"/> </netcdf> </protoDataset>
<xsd:complexType name="protoDatasetType"> <xsd:sequence> 1) <xsd:element ref="ncml:netcdf" minOccurs="0"/> </xsd:sequence> 2) <xsd:attribute name="choice" type="protoChoices"/> 3) <xsd:attribute name="change" type="xsd:string"/> </xsd:complexType>
where:
The choice of the protoDataset matters when the datasets are not homogenous:
For collections that change, the update element provides options to update the collection, either synchronously (while a user request waits) or asynchronously (in a background task, so that requests do not wait). If there is no update element, then the dataset is considered static, and the indexes are never updated by the TDS. (To force updated indices, delete the collection index, usually <collection root directory> / <dataset name>.ncx.). Examples:
<update startup="test" rescan="0 0/30 * * * ? *" trigger="false"/> <update recheckAfter="15 min" /> <update startup="never" trigger="allow" />
The XML Schema definition for the update element:
<xsd:complexType name="updateType">
1) <xsd:attribute name="startup" type="xsd:token"/>
2) <xsd:attribute name="recheckAfter" type="xsd:string" /> 3) <xsd:attribute name="rescan" type="xsd:token"/>
4) <xsd:attribute name="trigger" type="xsd:token"/>
</xsd:complexType>
where:
For GRIB collections, dynamic updating of the collection by the TDS is no longer supported (use the TDM for this). Therefore recheckAfter and rescan are ignored on an update element for a GRIB collection.
You must use the tdm element for GRIB collections that change. The TDM is a seperate process that uses the same configuration catalogs as the TDS, and updates GRIB collections in the background. Example:
<tdm rewrite="test" rescan="0 4,19,34,49 * * * ? *" />
The XML Schema definition for the tdm element:
<xsd:complexType name="updateType">
1) <xsd:attribute name="rewrite" type="xsd:token"/>
2) <xsd:attribute name="rescan" type="xsd:token"/>
</xsd:complexType>
where:
There are several way to update a feature collection when it changes, specified by attributes on the update element:
If you have a collection that doesn't change, do not include an update element. The first time that the dataset is accessed, it will be read in and then never changed.
If you have a collection that doesn't change, but you want to have it ready for requests, then use:
<update startup ="always" />
The dataset will be scanned at startup time and then never changed.
You have a large collection, which takes a long time to scan. You must carefully control when/if it will be scanned.
<update startup ="nocheck" />
The dataset will be read in at startup time by using the existing indexes (if they exist). If indexes dont exist, they will be created on startup.
If it occasionally changes, then you want to manually tell it when to rescan:
<update startup ="nocheck" trigger="allow" />
The dataset will be read in at startup time by using the existing indexes, and you manually tell it when to rebuild the index. You must enable triggers.
For collections that change but are rarely used, use the recheckAfter attribute on the update element. This minimizes unneeded processing for lightly used collections. This is also a reasonable strategy for small collections which don't take very long to build.
<update recheckAfter="15 min" />
Do not include both a recheckAfter and a rescan attribute. If you do, the recheckAfter will be ignored.
When you want to ensure that requests are answered as quickly as possible, read it at startup and also update the collection in the background using rescan:
<update startup="test" rescan="0 20 * * * ? *" />
This cron expression says to rescan the collection files every hour at 20 past the hour, and rebuild the dataset if needed.
To externally control when a collection is updated, use:
<update trigger="allow" />
You must enable remote triggers, and when the dataset changes, send a message to a special URL in the TDS.
You have a GRIB collection that changes. The TDS can only scan/write indices at startup time. You must use the TDM to detect any changes.
<update startup="test" trigger="allow"/>
<tdm rewrite="test" rescan="0 0/15 * * * ? *" trigger="allow"/>
The dataset will be read in at startup time by the TDS using the existing indexes, and will be scanned by the TDM every 15 minutes, which will send a trigger as needed.
You have a very large collection, which takes a long time to scan. You must carefully control when/if it will be scanned.
<update startup="never"/>
<tdm rewrite="test"/>
The TDS never scans the collection, it always uses existing indices, which must already exist. Run the TDM first, then after the indices are made, you can stop the TDM and start the TDS.
You have a very large collection which changes, and takes a long time to scan. You must carefully control when/if it will be scanned.
<update startup="never" trigger="allow"/>
<tdm rewrite="test" rescan="0 0 3 * * ? *" />
The dataset will be read in at startup time by using the existing indexes which must exist. The TDM will test if its changed once a day at 3 am, and send a trigger to the TDS if needed.
NcML is no longer used to define the collection, but it may still be used to modify the feature collection dataset, for FMRC or Point (not GRIB).
<featureCollection featureType="FMRC" name="RTOFS Forecast Model Run Collection" path="fmrc/rtofs"> 1) <collection spec="c:/rps/cf/rtofs/.*ofs_atl.*\.grib2$" recheckAfter="10 min" olderThan="5 min"/> 2) <netcdf xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2"> <variable name="time"> <attribute name="units" value="hours since 1953-11-29T08:57"/> </variable> </netcdf> <protoDataset> 3) <netcdf xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2"> <attribute name="speech" value="I'd like to thank all the little people..."/> <variable name="mixed_layer_depth"> <attribute name="long_name" value="mixed_layer_depth @ surface"/> <attribute name="units" value="m"/> </variable> </netcdf> </protoDataset> </featureCollection>
where: