GRIB Feature Collection Datasets are collections of GRIB records, which contain gridded data, typically from numeric model output. Because of the complexity of how GRIB data is written and stored, the TDS has developed specialized handling of GRIB datasets called GRIB Feature Collections.
- The user specifies the collection of GRIB-1 or GRIB-2 files, and the software turns them into a dataset.
- The indexes, once written, allow fast access and scalability to very large datasets.
- Multiple horizontal domains are supported and placed into separate groups.
- Interval time coordinates are fully supported.
- Feature Collection overview
- GRIB specific configuration
- GRIB Collection FAQ
- GRIB Feature Collection Tutorial
- CDM GRIB Collection Processing
- CDM GRIB Tables
Multiple Dataset Collections
When a GRIB Collection contains multiple runtimes, and the valid times (forecast times) overlap, a
TwoD time dataset is created.
From that, a
Best time dataset is also created.
When a GRIB Collection contains multiple horizontal domains (i.e. distinct Grid Definition Sections (GDS)), each domain gets placed into a separate group (CDM) or Dataset (TDS).
Collection endpoints are of the form:
path: collection path
partitionName: used to disambiguate multiple dataset types within a collection: _TwoD or Best
groupName: used only when there are multiple groups (horizontal coordinates): group name or empty
The GRIB Collections framework has been rewritten in CDM version 4.5, in order to handle large collections efficiently. Some of the new capabilities in version 4.5 are:
- GRIB Collections now keep track of both the
The collection is partitioned by reference time.
- A collection with a single reference time will have a single partition with a single time coordinate.
- A collection with multiple reference times will have partitions for each reference time, plus a
PartitionCollectionthat represents the entire collection. Very large collections should be partitioned by directory and/or file, creating a tree of partitions.
PartitionCollectionhas two datasets (kept in separate groups): the
TwoDdataset has two time coordinates -
referencetime (a.k.a. run time) and
forecasttime (a.k.a. valid time), and corresponds to the FMRC
forecasttime is two dimensional, corresponding to all the times available for each reference time.
Bestdataset has a single
forecasttime coordinate, the same as 4.3 GRIB Collections and FMRC Best datasets. If there are multiple GRIB records corresponding to the same forecast time, the record with the smallest offset from its reference time is used.
featureTypeattribute is now
- For each GRIB file, a grib index is written, named
<grib filename>.gbx9. Once written, this never has to be rewritten.
- For each
referencetime, a cdm index is written, named
<collection.referenceTime>.ncx2. This occasionally has to be rewritten when new CDM versions are released, or if you modify your GRIB configuration.
- For each
PartitionCollection, a cdm index is written named
<collection name>.ncx2. This must be rewritten if any of the collection files change.
- The cdm indexing uses extension .ncx2, in order to coexist with the .ncx indexes of previous versions.
If you are upgrading to 4.5, and no longer running earlier versions, remove the ncx files (but save the
- For large collections, especially if they change, the THREDDS Data Manager (TDM) must be run as a separate process to update the index files. Generally it is strongly recommended to run the TDM, and configure the TDS to only read and never write the indexes.
- Collections in the millions of records are now feasible. Java 7 NIO2 package is used to efficiently scan directories.
The GRIB Collections framework has been rewritten in CDM version 4.6, in order to handle very large collections efficiently. Oh wait we already did that in 4.5. Sorry, it wasn’t good enough.
TimePartitioncan now be set to
file, a time period, or
none. Details here.
referencetimes are handled more efficiently, e.g. only one index file typically needs to be written.
- Global attributes are promoted to dataset properties in the catalog
- Internal changes:
- Internal memory use has been reduced.
- Runtime objects are now immutable, which makes caching possible.
RandomAccessFilesare kept in a separate pool, so they can be cached independent of the Collection objects.
The GRIB Collections framework has been significantly improved in CDM version 5.0, in order to handle very large collections efficiently.
- Collection index files now use the suffix .ncx4. These will be rewritten first time you access the files.
- The gbx9 files do NOT need to be rewritten, which is good because those are the slow ones.
- You no longer need specify the
dataType, as these are automatically added
- It is recommended to not specify the set of
servicesused, but accept the default set of services.
- You no longer need specify the