Overview
A Forecast Model Run Collection (FMRC) is a collection of forecast model runs which can be uniquely identified by the start of the model run, called the model run time, (also called the analysis time or generating time or reference time). Each model run has a series of forecast times. A collection of these runs therefore has two time coordinates, the run time and the forecast time. An FMRC creates a 2D time collection dataset, and then creates various 1D time subsets out of it. See this poster for a detailed example.
Previously this functionality was provided using FMRC Aggregation through NcML and the <fmrcDataset> element in the TDS configuration catalog.
As of TDS 4.2, that implementation is now deprecated and <featureCollection> elements are the correct way to provide this functionality.
As of 4.3, one should only serve GRIB files with featureCollection=GRIB, e.g., not with FMRC.
Typically, FMRC is used for collections of model runs stored in netCDF/CF files.
Constraints On FMRC
- The component files of the collection must all be recognized as Grid Feature datasets by the CDM software.
- Each component file must have a single reference time.
- The times and variables for a model run can be in a single file or spread out among multiple files.
- The model runs are assumed to be homogenous, that is, they contain the same collection of variables and attributes, and they must be on the same horizontal and vertical grid. The model runs can differ only in their time and runtime coordinates and the actual data values.
Notes
- It’s best if the reference time is part of the filename, in a way that can be extracted with a DateExtractor.
- If there is a
serviceType=HTTPServerfor the Feature Collection, it is removed from the virtual datasets (all except the Files datasets). - If an
IDattribute is not specified on thefeatureCollection, thepathattribute is used as theID. This is a preferred idiom. - Note that for the case when a model run dataset is in a single file, it may be different than the same file as seen through the corresponding
_Filesdataset, ifregularizeis enabled. In that case, the time coordinates will be regularized across all model run datasets in the collection. - The FMRC virtual dataset is assembled by examining the Grid Coordinate Systems of the component files. One can use NcML to fix some problems in the component files.
fmrcConfig Element
Defines options on feature collections with featureType=FMRC.
<fmrcConfig regularize="false" datasetTypes="TwoD Best Files Runs" />
<fmrcConfig regularize="false" datasetTypes="Files">
<bestDataset name="Best_Exclude_Spinup" offsetsGreaterEqual="0"/>
</fmrcConfig>
where:
-
regularize: Iftrue, then the runs for a given hour (from0Z) are assumed to have the same forecast time coordinates. For example, if you have 4 model runs per day (e.g.:0,6,12,18Z) and many days of model runs, then all the6Zruns for all days will have the same time coordinates, etc. This “regularizes” time coordinates, and is useful when there may be missing forecast times, which may result in creating a new time coordinate. Leave this tofalseunless you really have a series of runs with uniform offsets. -
datasetTypes: list the dataset types that are exposed in the TDS catalog. The possible values are:TwoD: dataset with two time dimensions (run time and forecast time), which contains all the data in the collection.Best: dataset using the latest model data available for each possible forecast hour.Files: each component file of the collection is available separately, as in adatasetScan. A “latest” file will be added if there is a “latest” Resolver service in the catalog.Runs: A model run dataset contains all the data for one run time.ConstantForecasts: A constant forecast dataset is created from all the data that have the same forecast time. This kind of dataset has successively shorter forecasts of the same endpoint.ConstantOffsets: A constant offset dataset is created from all the data that have the same offset from the beginning of the run.
-
bestDataset: you can define your own “best dataset”. This uses the same algorithm as theBestdataset above, but excludes data based on its offset hour. In the above example, aBestdataset is created with offset hours less than0excluded.name: the human visible name of the definedBestdataset, must be unique within thefmrcConfigelement. Do not usebest.ncd,fmrc.ncd,runs,files,forecast, oroffset.offsetsGreaterEqual: forecast offset hours (forecast time - run time) less than this value are excluded.
Notes:
- If an
fmrcConfigelement is not present, the default isregularize=false, anddatasetTypes="TwoD Best Files Runs". Specifying your ownfmrcConfigcompletely overrides the datasetTypes default. - When using FMRC for gridded data that doesn’t have
2Dtimes, be sure to putregularize=false(or leave it off).
Working With FMRC In ToolsUI
The ToolsUI FMRC tab allows you to view internal structures of an FMRC.
You can pass it a collection specification string or a file with a featureCollection element in it.
Working With FMRC In Client Software
Opening An FMRC
Use static method on ucar.nc2.ft.fmrc.Fmrc:
public static Fmrc open(String collection, Formatter errlog, Formatter debug);
The collection may be one of:
- collection specification string
catalog:catalogURLfilename.ncml
Run Date
If a dateFormatMark is given, a DateExtractor extracts the run-date from the filename or URL.
Otherwise, there must be global attributes _CoordinateModelBaseDate or _CoordinateModelRunDate inside each dataset.
The GRIB IOSP reader automatically adds this global attribute.
Forecast Date
Each file is opened as a GridDataset:
gds = GridDataset.open( mfile.getPath());
and the forecast time coordinates are extracted from the grid coordinate system.
There is no need to specify forecastModelRunCollection vs forecastModelRunSingleCollectionc, nor timeUnitsChange.
This is detected automatically.
Regular
If true, then all runs for a given offset hour (from 0Z) are assumed to have the same forecast time coordinates.
This obviates the need for the FMRC definition files which previously were used on the Unidata data server motherlode.
This evens out time coordinates, and compensates for missing forecast times in the IDD feed.
Persistent Caching
An fmrInv.xml file is made which records the essential grid information from each file.
It is cached in a persistent Berkeley Database (bdb) key/value store, so that it only has to be done the first time the file is accessed in an FMRC.
Each collection becomes a separate bdb database, and each file in the collection is an element in the database, with the filename as the key and the fmrInv.xml as the value.
When a collection is scanned, any filenames already in the database are reused.
Any new ones are read and added to the database.
Any entries in the database that no longer have a filename associated with them are deleted.
ToolsUI collections tab allows you to delete database or individual elements.
Conversion Of <datasetFmrc> Yo <featureCollection>
There is no need to specify forecastModelRunCollection versus forecastModelRunSingleCollection, nor timeUnitsChange.
This is detected automatically.
Old Way #1
<datasetFmrc name="NCEP-GFS-CONUS_80km" collectionType="ForecastModelRuns" harvest="true"
path="fmrc/NCEP/GFS/CONUS_80km"> <!-- 1 -->
<metadata inherited="true">
<documentation type="summary">good stuff</documentation>
</metadata>
<netcdf xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2" enhance="true"> <!-- 2 -->
<aggregation dimName="run" type="forecastModelRunCollection"
fmrcDefinition="NCEP-GFS-CONUS_80km.fmrcDefinition.xml" recheckEvery="15 min">
<scan location="/data/ldm/pub/native/grid/NCEP/GFS/CONUS_80km/" suffix=".grib1"
dateFormatMark="GFS_CONUS_80km_#yyyyMMdd_HHmm" subdirs="true" olderThan="5 min"/>
</aggregation>
</netcdf>
<fmrcInventory location="/data/ldm/pub/native/grid/NCEP/GFS/CONUS_80km/"
suffix=".grib1"
fmrcDefinition="NCEP-GFS-CONUS_80km.fmrcDefinition.xml" /> <!-- 3 -->
<addTimeCoverage datasetNameMatchPattern="GFS_CONUS_80km_([0-9]{4})([0-9]{2})([0-9]{2})_([0-9]{2})00.grib1$"
startTimeSubstitutionPattern="$1-$2-$3T$4:00:00"
duration="240 hours"/>
</datasetFmrc>
where:
datasetFmrcreplaced byfeatureCollection- optional
collectionType=ForecastModelRuns→ mandatoryfeatureType=FMRC
- optional
- NcML
netcdfelement describing the aggregation is now done bycollectionelementaggregationdimName,type, andfmrcDefinitionare no longer needednetcdf scanlocation,suffix,subdirs, anddateFormatMarkare replaced bycollection spec
fmrcInventoryandaddTimeCoverageelements are no longer needed.
Old Way #2
<datasetFmrc name="RTOFS Forecast Model Run Collection" path="fmrc/rtofs">
<netcdf xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2">
<variable name="mixed_layer_depth"> <!-- 1 -->
<attribute name="long_name" value="mixed_layer_depth @ surface"/>
<attribute name="units" value="m"/>
</variable>
<aggregation dimName="runtime" type="forecastModelRunSingleCollection" timeUnitsChange="true" recheckEvery="10 min">
<variable name="time"> <!-- 2 -->
<attribute name="units" value="hours since "/>
</variable>
<scanFmrc location="c:/rps/cf/rtofs" regExp=".*ofs_atl.*\.grib2$"
runDateMatcher="#ofs.#yyyyMMdd" forecastOffsetMatcher="HHH#.grb.grib2#" subdirs="true"
olderThan="10 min"/> <!-- 3 -->
</aggregation>
</netcdf>
</datasetFmrc>
where:
- On the outside of the aggregation, attributes are being added/modified for the existing variable
mixed_layer_depthin the resulting FMRC dataset. - On the inside of the aggregation, an attribute is being added/modified for the existing variable
timefor each dataset in the collection. Typically, you need to do this in order to make the component files into a gridded dataset. - The collection is defined by a
scanFmrcelement, creating aforecastModelRunSingleCollectionwith one forecast time per file.
New Way
<featureCollection name="NCEP-GFS-CONUS_80km" featureType="FMRC" harvest="true"
path="fmrc/NCEP/GFS/CONUS_80km">
<metadata inherited="true">
<documentation type="summary">good stuff</documentation>
</metadata>
<collection spec="/data/ldm/pub/native/grid/NCEP/GFS/CONUS_80km/GFS_CONUS_80km_#yyyyMMdd_HHmm#.grib1"
recheckAfter="15 min"
olderThan="5 min"/> <!-- 1 -->
<update startup="true" rescan="0 5 3 * * ? *" /> <!-- 2 -->
<protoDataset choice="Penultimate" change="0 2 3 * * ? *" /> <!-- 3 -->
<fmrcConfig regularize="true"
datasetTypes="TwoD Best Files Runs ConstantForecasts ConstantOffsets" /> <!-- 4 -->
</featureCollection>
collectionspecelementcollectionrecheckAfteris the same asaggregationrecheckEverycollectionolderThanis the same asscanolderThan
update(optional) allows control over when thefeatureCollectionis updated.protoDataset(optional) allows control over the selection of theprototypicaldataset.fmrcConfig(optional) allows control over which FMRC virtual datasets are made available.