As of CDM version 4.3, GRIB datasets are handled as collections of GRIB files. A GRIB file is an unordered collection of GRIB records. A GRIB dataset is a therefore a collection of GRIB records in one or more files. A GRIB dataset can only operate on local files. A THREDDS Data Server (TDS) can make GRIB datasets remotely accessible, eg through OPeNDAP, WMS, or the NetCDF Subset Service (NCSS).
The CDM can only read GRIB files, it cannot write them. It can, however, rewrite GRIB into netCDF using CF Conventions. Before version 4.3.13, it can only write netCDF-3 format files, which are typically 4-20 times larger than GRIB. As of 4.3.13, the CDM can write to netCDF-4 format, with file sizes comparable to GRIB, typically within a factor of two.
A GRIB collection must follow these homogeneity constraints:
In addition:
For each GRIB file, a GRIB index file is written with suffix .gbx9. This file contains everything in the GRIB file except the data. Generally it is 300-1000 times smaller than the original file. Once written, it typically never has to be rewritten. If the GRIB file changes, the CDM should detect that and rewrite the index file. If there is any doubt about that, delete the index file and let it get recreated.
For each GRIB collection, a CDM collection index file is written with suffix .ncx3. This file contains all the metadata and the coordinates for the collection. It is usually fairly small (a few dozen KBytes to a few MBytes for a large collection), and once created, makes accessing the GRIB collection very fast. In general it will be updated if needed, but one can always delete it and let it be recreated.
If one opens a single GRIB file in the CDM, a gbx9 and ncx3 file will be created for that file. If one opens a collection of multiple GRIB files, a gbx9 file is created for each file, and one ncx3 file is created for the entire collection.
Both kinds of index files are binary, private formats for the CDM, whose format may change as needed. Your application should not depend in any way on the details of these formats.
Moving GRIB files. When GRIB index files (gbx9) are created, they store the name of the GRIB data file. However, this is not used except for debugging. So you can move the data files and the GBX files as needed. The CDM index files (ncx3) also store the names of the GRIB data files, and (usually) needs the GRIB files to exist there. So if you move, best to delete the ncx3 files and recreate.
The use of external tables in GRIB is quite problematic (read here for more details). Nonetheless, GRIB files are in wide use internationally and contain invaluable data. The CDM is a general-purpose GRIB reading library that makes GRIB data available through the CDM/NetCDF API, that is, as multidimensional data arrays and CF-compliant metadata and coordinates.
Because of flaws in the design of GRIB and flaws in actual practice when writing GRIB, any general purpose GRIB reader can only make a best effort in interpreting arbitrary GRIB records. It is therefore necessary, for anything other than casual use, to carefully examine the output of CDM GRIB datasets and compare this against the documentation. In particular, GRIB records may refer to local tables that are missing or incorrect in the CDM, and they may override standard WMO tables without the CDM being able to detect that they are doing so. It is often necessary for users to contact the data producer to obtain the correct tables for the particular dataset they want to read. This is also necessary for other GRIB reading tools like wgrib (NCEP) and gribex (ECMWF).
The CDM has a number of ways to allow you to use new tables or override incorrect ones globally or by dataset. The good news is that if users contribute these fixes back to the CDM, everyone can take advantage of them and the set of "correct" datasets will grow. The WMO has greatly improved the process of using the standard tables, and hopefully GRIB data producers will continue to improve methods for writing GRIB and maintaining local tables.
The CDM is used primarily to open single GRIB files, and the TDS is used to manage large and very large collections of files. Here is a summary of the ways that an application might use the CDM to open GRIB files.
Pass the local data file location to any of the standard dataset opening classes:
The GRIB Index (.gbx9) and GRIB Collection index (.ncx3) files will be created as needed.
If the GRIB Collection index (.ncx3) already exists, one can pass that to any of the standard dataset opening classes. In this case, the collection is created from reading the ncx3 file with no checking against the original data file(s). The original data files are only accessed when data is requested from them. Be careful not to move the data files once the index files are created. If you do need to move the data files, its best to recreate the index files.
You can use a GRIB <featureCollection> element to define the GRIB Collection, and generates the CDM index (ncx3) file.
For simple cases, you can create the ncx3 file based on a collection spec using ToolsUI: IOSP/GRIB1(2)/GribCollection. Enter the collection spec and hit Enter. To write the index file, hit the "Write Index" button on the right. Give it a memorable name and hit Save. Its is currently not possible to pass GRIB Collection Configuration elements in this way.
In versions 4.2 and before, Grib files were typically aggregated using NcML Aggregations. While this could work if the GRIB files were truly homogenous, in practice this often has problems; the aggregation would appear ok, but in fact be incorrect in various subtle ways. This was one of the motivations for developing GRIB collections, which collects the GRIB records into multidimensional arrays and can (mostly) figure out the right thing to do without user intervention. NcML Aggregations on GRIB files are not supported in versions 4.3 and above. You must use GRIB collections.
You can use NcML to open a single GRIB file, and modify the way GRIB records are processed. All of the configuration options that you can use inside the TDS <gribConfig> element can be used inside the <iospParam> element of the NcML, for example:
<?xml version="1.0" encoding="UTF-8"?> <netcdf xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2" location="E:/ncep/NDFD_CONUS_5km_conduit_20120119_1800.grib2"> <iospParam> <gdsHash from="-2121584860" to="28944332"/> <pdsHash> <useTableVersion>true</useTableVersion> </pdsHash> </iospParam> </netcdf>
See GRIB Collection Configuration for a description of all of the options.
Note that you cannot use NcML to open a collection of GRIB files. You must generate the Grib Collection index file in a seperate step.
A GRIB file is an unordered collection of GRIB records. A GRIB record consists of a single 2D (x, y) slice of data. The CDM library reads a GRIB file and creates a 2, 3,4, or 5 dimension Variable (time, ensemble, z, y, x), by finding the records with the same parameter, with different time / level / ensemble coordinates. This amounts to guessing the dataset schema and the intent of the data provider, and is unfortunately a bit arbitrary. Most of our testing is against the NCEP operational models from the IDD, and so are influenced by those. Deciding how to group the GRIB records into CDM Variables is one of the main source of problems.It uses the following GRIB fields to construct a unique variable:
The GRIB-1 variable id is:%paramName[_%level][_layer][_%interval][_%statName] where: %paramName = parameter name from GRIB-1 table 2 (cleaned up). if unknown, use VAR_%d-%d-%d-%d (see below) %level = short form of level name from GRIB-1 table 3, if defined. _layer = added if its a vertical layer (literal) %timeInterval = time interval name (eg "12_hour" or "mixed") %statName = name of statistical type if applicable, from GRIB-1 table 5
VAR_%d-%d-%d-%d[_L%d][_layer][_I%s][_S%d] where: %d-%d-%d-%d = center-subcenter-tableVersion-paramNo L%d = level type (octet 10 of PDS), if defined. _layer = added if its a vertical layer (literal) I%s = interval name (eg "12_hour" or "mixed") if a time interval S%d = stat type (octet 21 of PDS) if applicable
The GRIB-2 variable name is:
%paramName[_error][_%level][_layer][_%interval][_%statName][_%ensDerivedType][_probability_%probName] where: %paramName = parameter name from GRIB-2 table 4.2 (cleaned up); if unknown, use VAR_%d-%d-%d_FROM%d-%d = VAR_discipline-category-paramNo_FROM_center-subcenter %level = short form of level name from GRIB-2 table 4.5, if defined. _layer = added if its a vertical layer (literal) %timeInterval = time interval name (eg "12_hour" or "mixed") %statName = name of statistical type if applicable, from GRIB-2 table 4.10 %ensDerivedType = name of ensemble derived type if applicable, from GRIB-2 table 4.7 %probName = name of probability type if applicable
The GRIB-2 variable id is:
VAR_%d-%d-%d[_error][_L%d][_layer][_I%s_S%d][_D%d][_Prob_%s] where: VAR_%d-%d-%d = discipline-category-paramNo L%d = level type code I%s = time interval name (eg "12_hour" or "mixed") S%d = statistical type code if applicable D%d = derived type code if applicable Prob_%s = probability name if applicable
See ucar.nc2.grib.grib1.Grib1Rectilyser.cdmVariableHash() and ucar.nc2.grib.grib2.Grib2Rectilyser.cdmVariableHash() for complete details.
One can use the CDM to process GRIB records individually, without building the CDM multidimensional variables. Note that this functionality is not part of a supported public API, and is subject to change. However these APIs are reletively stable.
For GRIB1 reading, use the classes in ucar.nc2.grib.grib1:
RandomAccessFile raf = new RandomAccessFile(filepath, "r");
Grib1RecordScanner reader = new Grib1RecordScanner(raf); while (reader.hasNext()) { ucar.nc2.grib.grib1.Grib1Record gr1 = reader.next(); // do good stuff } raf.close();
or similarly for GRIB2, use the classes in ucar.nc2.grib.grib2:
RandomAccessFile raf = new RandomAccessFile(filepath, "r"); Grib2RecordScanner scan = new Grib2RecordScanner(raf); while (scan.hasNext()) { ucar.nc2.grib.grib2.Grib2Record gr2 = scan.next(); // do stuff } raf.close();
The details vary a bit between GRIB1 and GRIB2. To read the data from a GRIB1 record:
float[] data = gr1.readData(raf);
To read the data from a GRIB2 record:
Grib2SectionDataRepresentation drs = gr2.getDataRepresentationSection(); float[] data = gr2.readData(raf, drs.getStartingPosition());