GRIB Feature Collection Datasets are collections of GRIB records, which contain gridded data, typically from numeric model output. Because of the complexity of how GRIB data is written and stored, the TDS has developed specialized handling of GRIB datasets, as of version 4.3, called GRIB Feature Collections.
The GRIB Collections framework has been rewritten in CDM version 4.5, in order to handle large collections efficiently. Version 4.5 requires Java 7. Some of the new capabilities in version 4.5 are:
Implementation notes:
The GRIB Collections framework has been rewritten in CDM version 4.6, in order to handle very large collections efficiently. Oh wait we already did that in 4.5. Sorry, it wasnt good enough.
Also see:
1)<featureCollection featureType="GRIB1" name="rdavm partition none" path="gribCollection/none">
2) <metadata inherited="true">
3) <dataFormat>GRIB-2</dataFormat> <!--not used --> <serviceName>all</serviceName>
<dataType>Grid</dataType>
</metadata>
4) <collection name="ds083.2-none" 5) spec="Q:/cdmUnitTest/gribCollections/rdavm/ds083.2/PofP/**/.*grib1" 6) timePartition="none"/>
7) <update startup="never" trigger="allow"/> 8) <tdm rewrite="test" rescan="0 0/15 * * * ? *" trigger="allow"/>
9) <gribConfig datasetTypes="TwoD Latest Best" />
</featureCollection>
The above example generates a TwoD and Best dataset for the entire collection, a reference to the latest datset, as well as one dataset for each reference time in the collection, which become nested datasets in the catalog. These datasets are named by their index files, in the form <collection-name>.<referenceTime>.ncx3, eg GFS-Puerto_Rico-20141110-000000.ncx3
The simplified catalog is:
<dataset name="NCEP GFS Puerto_Rico (191km)">
<metadata inherited="true">
<serviceName>VirtualServices</serviceName>
<dataType>GRID</dataType>
<dataFormat>GRIB-2</dataFormat>
</metadata>
<dataset name="Full Collection (Reference / Forecast Time) Dataset" ID="fmrc/NCEP/GFS/Puerto_Rico/TwoD" urlPath="fmrc/NCEP/GFS/Puerto_Rico/TwoD">
<documentation type="summary">Two time dimensions: reference and forecast; full access to all GRIB records</documentation>
</dataset>
<dataset name="Best NCEP GFS Puerto_Rico (191km) Time Series" ID="fmrc/NCEP/GFS/Puerto_Rico/Best" urlPath="fmrc/NCEP/GFS/Puerto_Rico/Best">
<documentation type="summary">Single time dimension: for each forecast time, use GRIB record with smallest offset from reference time</documentation>
</dataset>
<dataset name="Latest Collection for NCEP GFS Puerto_Rico (191km)" urlPath="latest.xml">
<serviceName>latest</serviceName>
</dataset>
<catalogRef xlink:href="/thredds/catalog/fmrc/NCEP/GFS/Puerto_Rico/GFS-Puerto_Rico-20141110-000000.ncx3/catalog.xml" />
<catalogRef xlink:href="/thredds/catalog/fmrc/NCEP/GFS/Puerto_Rico/GFS-Puerto_Rico-20141110-060000.ncx3/catalog.xml" />
<catalogRef xlink:href="/thredds/catalog/fmrc/NCEP/GFS/Puerto_Rico/GFS-Puerto_Rico-20141110-120000.ncx3/catalog.xml" />
</dataset>
The catalogRefs are links to virtual datasets, formed from the collection of records for the specified reference time, and independent of which file stores them.
Now suppose that we modify the above example and use timePartition="directory":
<featureCollection featureType="GRIB1" name="rdavm partition directory" path="gribCollection/pofp">
<metadata inherited="true">
<serviceName>all</serviceName>
<dataType>Grid</dataType>
</metadata>
<collection name="ds083.2-directory" spec="Q:/cdmUnitTest/gribCollections/rdavm/ds083.2/PofP/**/.*grib1" timePartition="directory"/>
<update startup="test" />
<gribConfig datasetTypes="TwoD Latest Best" />
</featureCollection> <featureCollection name="NAM-Polar90" featureType="GRIB" path="grib/NCEP/NAM/Polar90"> <metadata inherited="true"> <dataFormat>GRIB-2</dataFormat> </metadata> <collection spec="G:/mlode/polar90/.*grib2$" 1) timePartition="file" 2) dateFormatMark="#NAM_Polar_90km_#yyyyMMdd_HHmm" /> 3) <update startup="true" trigger="allow"/> </featureCollection>
A time partition generates one collection dataset, one dataset for each partition, and one dataset for each individual file in the collection:
<dataset name="NAM-Polar90" ID="grib/NCEP/NAM/Polar90"> <catalogRef xlink:href="/thredds/catalog/grib/NCEP/NAM/Polar90/collection/catalog.xml" xlink:title="collection"/> <catalogRef xlink:href="/thredds/catalog/grib/NCEP/NAM/Polar90/NAM-Polar90_20110301/catalog.xml" xlink:title="NAM-Polar90_20110301"> <catalogRef xlink:href="/thredds/catalog/grib/NCEP/NAM/Polar90/NAM-Polar90_20110301/files/catalog.xml" xlink:title="files" /> </catalogRef> <catalogRef xlink:href="/thredds/catalog/grib/NCEP/NAM/Polar90/NAM-Polar90_20110302/catalog.xml" xlink:title="NAM-Polar90_20110302"> <catalogRef xlink:href="/thredds/catalog/grib/NCEP/NAM/Polar90/NAM-Polar90_20110302/files/catalog.xml" xlink:title="files" name="" /> </catalogRef> ... </dataset>de-referencing the catalogRefs, and simplifying:
<dataset name="NAM-Polar90" ID="grib/NCEP/NAM/Polar90"> 1)<dataset name="NAM-Polar90-collection" urlPath="grib/NCEP/NAM/Polar90/collection"> 2)<dataset name="NAM-Polar90_20110301" urlPath="grib/NCEP/NAM/Polar90/NAM-Polar90_20110301/collection"> 3) <dataset name="NAM_Polar_90km_20110301_0000.grib2" urlPath="grib/NCEP/NAM/Polar90/files/NAM_Polar_90km_20110301_0000.grib2"/>
<dataset name="NAM_Polar_90km_20110301_0600.grib2" urlPath="grib/NCEP/NAM/Polar90/files/NAM_Polar_90km_20110301_0600.grib2"/> ... </dataset>
4)<dataset name="NAM-Polar90_20110302-collection" urlPath="grib/NCEP/NAM/Polar90/NAM-Polar90_20110302/collection"> <dataset name="NAM_Polar_90km_20110302_0000.grib2" urlPath="grib/NCEP/NAM/Polar90/files/NAM_Polar_90km_20110302_0000.grib2"/>
<dataset name="NAM_Polar_90km_20110302_0600.grib2" urlPath="grib/NCEP/NAM/Polar90/files/NAM_Polar_90km_20110302_0600.grib2"/> ... </dataset> ... </dataset>
So the datasets that are generated from a Time Partition with name, path, and partitionName:
dataset | catalogRef | name | path |
---|---|---|---|
collection | path/collection/catalog.xml | name | path/name/collection |
partitions | path/partitionName/catalog.xml | partitionName | path/partitionName/collection |
individual files | path/partitionName/files/catalog.xml | filename | path/files/filename |
When a Grib Collection contains multiple horizontal domains (i.e. distinct Grid Definition Sections (GDS)), each domain gets placed into a seperate group. As a rule, one can't tell if there are seperate domains without reading the files. If you open this collection through the CDM (eg using ToolsUI) you would see a dataset that contains groups. The TDS, however, separates groups into different datasets, so that each dataset has only a single (unnamed, aka root) group.
<featureCollection name="RFC" featureType="GRIB" path="grib/NPVU/RFC"> <metadata inherited="true"> <dataFormat>GRIB-1</dataFormat> <serviceName>all</serviceName> </metadata> <collection spec="/tds2012data/grib/rfc/ZETA.*grib1$" dateFormatMark="yyyyMMdd#.grib1#"/> 1) <gribConfig>
<gdsHash from="-752078894" to="1193085709"/>
<gdsName hash='-1960629519' groupName='KTUA:Arkansas-Red River RFC'/>
<gdsName hash='-1819879011' groupName='KFWR:West Gulf RFC'/>
<gdsName hash='-1571856555' groupName='KORN:Lower Mississippi RFC'/>
<gdsName hash='-1491065322' groupName='KKRF:Missouri Basin RFC'/>
<gdsName hash='-1017807718' groupName='TSJU:San Juan PR WFO'/>
<gdsName hash='-1003775954' groupName='NCEP-QPE National Mosaic'/>
<gdsName hash='-529497359' groupName='KRHA:Middle Atlantic RFC'/>
<gdsName hash='289752153' groupName='KRSA:California-Nevada RFC-6hr'/>
<gdsName hash='424971237' groupName='KRSA:California-Nevada RFC-1hr'/>
<gdsName hash='511861653' groupName='KTIR:Ohio Basin RFC'/>
<gdsName hash='880498701' groupName='KPTR:Northwest RFC'/>
<gdsName hash='1123818409' groupName='KTAR:Northeast RFC'/>
<gdsName hash='1174418106' groupName='KNES-National Satellite Analysis'/>
<gdsName hash='1193085709' groupName='KMSR:North Central RFC'/>
<gdsName hash='1464276934' groupName='KSTR:Colorado Basin RFC'/>
<gdsName hash='1815048381' groupName='KALR:Southeast RFC'/>
</gribConfig>
</featureCollection>
For each group, this generates one collection dataset, and one dataset for each individual file in the group:
<catalog> <dataset name="KALR:Southeast RFC" urlPath="grib/NPVU/RFC/KALR-Southeast-RFC/collection"> <catalogRef xlink:href="/thredds/catalog/grib/NPVU/RFC/KALR-Southeast-RFC/files/catalog.xml" xlink:title="files" name="" /> </dataset> <dataset name="KFWR:West Gulf RFC" urlPath="grib/NPVU/RFC/KFWR-West-Gulf-RFC/collection"> <catalogRef xlink:href="/thredds/catalog/grib/NPVU/RFC/KFWR-West-Gulf-RFC/files/catalog.xml" xlink:title="files" name="" /> </dataset> ... </catalog>Note that the groups are sorted by name, and that there is no overall collection for the dataset. Simplifying:
<catalog> 1)<dataset name="KALR:Southeast RFC" urlPath="grib/NPVU/RFC/KALR-Southeast-RFC/collection"> 2) <dataset name="ZETA_KALR_NWS_152_20120111.grib1" urlPath="grib/NPVU/RFC/files/ZETA_KALR_NWS_152_20120111.grib1"/> <dataset name="ZETA_KALR_NWS_160_20120111.grib1" urlPath="grib/NPVU/RFC/files/ZETA_KALR_NWS_160_20120111.grib1"/> ... </dataset> 3)<dataset name="KFWR:West Gulf RFC" urlPath="grib/NPVU/RFC/KFWR-West-Gulf-RFC/collection"> <dataset name="ZETA_KFWR_NWS_152_20120111.grib1" urlPath="grib/NPVU/RFC/files/ZETA_KFWR_NWS_152_20120111.grib1"/> <dataset name="ZETA_KFWR_NWS_161_20120110.grib1" urlPath="grib/NPVU/RFC/files/ZETA_KFWR_NWS_161_20120110.grib1"/> ... </dataset> ... </catalog>
So the datasets that are generated from a Grib Collection with groupName and path :
dataset | catalogRef | name | path |
---|---|---|---|
group collection | groupName | path/groupName/collection | |
individual files | path/groupName/files/catalog.xml | filename | path/files/filename |
Here is a time partitioned dataset with multiple groups:
<featureCollection name="NCDC-CFSR" featureType="GRIB" path="grib/NCDC/CFSR"> <metadata inherited="true"> <dataFormat>GRIB-2</dataFormat> </metadata> <collection spec="G:/nomads/cfsr/timeseries/**/.*grb2$" 1) timePartition="directory" 2) dateFormatMark="#timeseries/#yyyyMM"/> <update startup="true" trigger="allow"/> <gribConfig> 3) <gdsHash from="1450218978" to="1450192070"/> 4) <gdsName hash='1450192070' groupName='FLX GaussianT382'/> <gdsName hash='2079260842' groupName='FLX GaussianT62'/> ... 5) <intvFilter excludeZero="true"/> </gribConfig> </featureCollection>
A time partition with multiple groups generates an overall collection dataset for each group, a collection dataset for each group in each partition, and a dataset for each individual file:
<dataset name="NCDC-CFSR" ID="grib/NCDC/CFSR"> 1) <catalogRef xlink:href="/thredds/catalog/grib/NCDC/CFSR/collection/catalog.xml" xlink:title="collection" name="" /> 4) <catalogRef xlink:href="/thredds/catalog/grib/NCDC/CFSR/200808/catalog.xml" xlink:title="200808" name="" /> 8) <catalogRef xlink:href="/thredds/catalog/grib/NCDC/CFSR/200809/catalog.xml" xlink:title="200809" name="" /> ... </dataset>de-referencing the catalogRefs, and simplifying:
<dataset name="NCDC-CFSR" ID="grib/NCDC/CFSR"> 1)<dataset name="NCDC-CFSR"> 2) <dataset name="FLX GaussianT382" urlPath="grib/NCDC/CFSR/NCDC-CFSR/FLX-GaussianT382"/> 3) <dataset name="FLX GaussianT62" urlPath="grib/NCDC/CFSR/NCDC-CFSR/FLX-GaussianT62"> ... </dataset> 4)<dataset name="200808" > 5) <dataset name="FLX GaussianT382" urlPath="grib/NCDC/CFSR/200808/FLX-GaussianT382"> 6) <catalogRef xlink:href="/thredds/catalog/grib/NCDC/CFSR/200808/FLX-GaussianT382/files/catalog.xml" xlink:title="files" name="" /> </dataset> 7) <dataset name="FLX GaussianT62" urlPath="grib/NCDC/CFSR/200808/FLX-GaussianT62"> <catalogRef xlink:href="/thredds/catalog/grib/NCDC/CFSR/200808/FLX-GaussianT62/files/catalog.xml" xlink:title="files" name="" /> </dataset> ... </dataset> 8)<dataset name="200809" > ... </dataset>
So the datasets that are generated from a Time Partition with name, path, groupName, and partitionName:
dataset | catalogRef | name | path |
---|---|---|---|
overall collection for group | path/groupName/collection/catalog.xml | groupName | path/name/groupName |
collection for partition and group | path/partitionName/catalog.xml | groupName | path/partitionName/groupName |
individual files | path/partitionName/groupName/files/catalog.xml | partitionName/filename | path/files/filename |