Overview
A Forecast Model Run Collection (FMRC) is a collection of forecast model runs which can be uniquely identified by the start of the model run, called the model run time, (also called the analysis time or generating time or reference time). Each model run has a series of forecast times. A collection of these runs therefore has two time coordinates, the run time and the forecast time. An FMRC creates a 2D time collection dataset, and then creates various 1D time subsets out of it. See this poster for a detailed example.
Previously this functionality was provided using FMRC Aggregation through NcML and the <fmrcDataset>
element in the TDS configuration catalog.
As of TDS 4.2, that implementation is now deprecated and <featureCollection>
elements are the correct way to provide this functionality.
As of 4.3, one should only serve GRIB files with featureCollection=GRIB
, e.g., not with FMRC.
Typically, FMRC is used for collections of model runs stored in netCDF/CF files.
Constraints On FMRC
- The component files of the collection must all be recognized as Grid Feature datasets by the CDM software.
- Each component file must have a single reference time.
- The times and variables for a model run can be in a single file or spread out among multiple files.
- The model runs are assumed to be homogenous, that is, they contain the same collection of variables and attributes, and they must be on the same horizontal and vertical grid. The model runs can differ only in their time and runtime coordinates and the actual data values.
Notes
- It’s best if the reference time is part of the filename, in a way that can be extracted with a DateExtractor.
- If there is a
serviceType=HTTPServer
for the Feature Collection, it is removed from the virtual datasets (all except the Files datasets). - If an
ID
attribute is not specified on thefeatureCollection
, thepath
attribute is used as theID
. This is a preferred idiom. - Note that for the case when a model run dataset is in a single file, it may be different than the same file as seen through the corresponding
_Files
dataset, ifregularize
is enabled. In that case, the time coordinates will be regularized across all model run datasets in the collection. - The FMRC virtual dataset is assembled by examining the Grid Coordinate Systems of the component files. One can use NcML to fix some problems in the component files.
fmrcConfig
Element
Defines options on feature collections with featureType=FMRC
.
<fmrcConfig regularize="false" datasetTypes="TwoD Best Files Runs" />
<fmrcConfig regularize="false" datasetTypes="Files">
<bestDataset name="Best_Exclude_Spinup" offsetsGreaterEqual="0"/>
</fmrcConfig>
where:
-
regularize
: Iftrue
, then the runs for a given hour (from0Z
) are assumed to have the same forecast time coordinates. For example, if you have 4 model runs per day (e.g.:0
,6
,12
,18Z
) and many days of model runs, then all the6Z
runs for all days will have the same time coordinates, etc. This “regularizes” time coordinates, and is useful when there may be missing forecast times, which may result in creating a new time coordinate. Leave this tofalse
unless you really have a series of runs with uniform offsets. -
datasetTypes
: list the dataset types that are exposed in the TDS catalog. The possible values are:TwoD
: dataset with two time dimensions (run time and forecast time), which contains all the data in the collection.Best
: dataset using the latest model data available for each possible forecast hour.Files
: each component file of the collection is available separately, as in adatasetScan
. A “latest” file will be added if there is a “latest” Resolver service in the catalog.Runs
: A model run dataset contains all the data for one run time.ConstantForecasts
: A constant forecast dataset is created from all the data that have the same forecast time. This kind of dataset has successively shorter forecasts of the same endpoint.ConstantOffsets
: A constant offset dataset is created from all the data that have the same offset from the beginning of the run.
-
bestDataset
: you can define your own “best dataset”. This uses the same algorithm as theBest
dataset above, but excludes data based on its offset hour. In the above example, aBest
dataset is created with offset hours less than0
excluded.name
: the human visible name of the definedBest
dataset, must be unique within thefmrcConfig
element. Do not usebest.ncd
,fmrc.ncd
,runs
,files
,forecast
, oroffset
.offsetsGreaterEqual
: forecast offset hours (forecast time - run time) less than this value are excluded.
Notes:
- If an
fmrcConfig
element is not present, the default isregularize=false
, anddatasetTypes="TwoD Best Files Runs"
. Specifying your ownfmrcConfig
completely overrides the datasetTypes default. - When using FMRC for gridded data that doesn’t have
2D
times, be sure to putregularize=false
(or leave it off).
Working With FMRC In ToolsUI
The ToolsUI FMRC
tab allows you to view internal structures of an FMRC.
You can pass it a collection specification string or a file with a featureCollection
element in it.
Working With FMRC In Client Software
Opening An FMRC
Use static method on ucar.nc2.ft.fmrc.Fmrc
:
public static Fmrc open(String collection, Formatter errlog, Formatter debug);
The collection may be one of:
- collection specification string
catalog:catalogURL
filename.ncml
Run Date
If a dateFormatMark
is given, a DateExtractor extracts the run-date from the filename or URL.
Otherwise, there must be global attributes _CoordinateModelBaseDate
or _CoordinateModelRunDate
inside each dataset.
The GRIB IOSP reader automatically adds this global attribute.
Forecast Date
Each file is opened as a GridDataset:
gds = GridDataset.open( mfile.getPath());
and the forecast time coordinates are extracted from the grid coordinate system.
There is no need to specify forecastModelRunCollection
vs forecastModelRunSingleCollectionc
, nor timeUnitsChange
.
This is detected automatically.
Regular
If true, then all runs for a given offset hour (from 0Z
) are assumed to have the same forecast time coordinates.
This obviates the need for the FMRC definition files which previously were used on the Unidata data server motherlode.
This evens out time coordinates, and compensates for missing forecast times in the IDD feed.
Persistent Caching
An fmrInv.xml
file is made which records the essential grid information from each file.
It is cached in a persistent Berkeley Database (bdb) key/value store, so that it only has to be done the first time the file is accessed in an FMRC.
Each collection becomes a separate bdb
database, and each file in the collection is an element in the database, with the filename as the key and the fmrInv.xml
as the value.
When a collection is scanned, any filenames already in the database are reused.
Any new ones are read and added to the database.
Any entries in the database that no longer have a filename associated with them are deleted.
ToolsUI collections
tab allows you to delete database or individual elements.
Conversion Of <datasetFmrc>
Yo <featureCollection>
There is no need to specify forecastModelRunCollection
versus forecastModelRunSingleCollection
, nor timeUnitsChange
.
This is detected automatically.
Old Way #1
<datasetFmrc name="NCEP-GFS-CONUS_80km" collectionType="ForecastModelRuns" harvest="true"
path="fmrc/NCEP/GFS/CONUS_80km"> <!-- 1 -->
<metadata inherited="true">
<documentation type="summary">good stuff</documentation>
</metadata>
<netcdf xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2" enhance="true"> <!-- 2 -->
<aggregation dimName="run" type="forecastModelRunCollection"
fmrcDefinition="NCEP-GFS-CONUS_80km.fmrcDefinition.xml" recheckEvery="15 min">
<scan location="/data/ldm/pub/native/grid/NCEP/GFS/CONUS_80km/" suffix=".grib1"
dateFormatMark="GFS_CONUS_80km_#yyyyMMdd_HHmm" subdirs="true" olderThan="5 min"/>
</aggregation>
</netcdf>
<fmrcInventory location="/data/ldm/pub/native/grid/NCEP/GFS/CONUS_80km/"
suffix=".grib1"
fmrcDefinition="NCEP-GFS-CONUS_80km.fmrcDefinition.xml" /> <!-- 3 -->
<addTimeCoverage datasetNameMatchPattern="GFS_CONUS_80km_([0-9]{4})([0-9]{2})([0-9]{2})_([0-9]{2})00.grib1$"
startTimeSubstitutionPattern="$1-$2-$3T$4:00:00"
duration="240 hours"/>
</datasetFmrc>
where:
datasetFmrc
replaced byfeatureCollection
- optional
collectionType=ForecastModelRuns
→ mandatoryfeatureType=FMRC
- optional
- NcML
netcdf
element describing the aggregation is now done bycollection
elementaggregation
dimName
,type
, andfmrcDefinition
are no longer needednetcdf scan
location
,suffix
,subdirs
, anddateFormatMark
are replaced bycollection spec
fmrcInventory
andaddTimeCoverage
elements are no longer needed.
Old Way #2
<datasetFmrc name="RTOFS Forecast Model Run Collection" path="fmrc/rtofs">
<netcdf xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2">
<variable name="mixed_layer_depth"> <!-- 1 -->
<attribute name="long_name" value="mixed_layer_depth @ surface"/>
<attribute name="units" value="m"/>
</variable>
<aggregation dimName="runtime" type="forecastModelRunSingleCollection" timeUnitsChange="true" recheckEvery="10 min">
<variable name="time"> <!-- 2 -->
<attribute name="units" value="hours since "/>
</variable>
<scanFmrc location="c:/rps/cf/rtofs" regExp=".*ofs_atl.*\.grib2$"
runDateMatcher="#ofs.#yyyyMMdd" forecastOffsetMatcher="HHH#.grb.grib2#" subdirs="true"
olderThan="10 min"/> <!-- 3 -->
</aggregation>
</netcdf>
</datasetFmrc>
where:
- On the outside of the aggregation, attributes are being added/modified for the existing variable
mixed_layer_depth
in the resulting FMRC dataset. - On the inside of the aggregation, an attribute is being added/modified for the existing variable
time
for each dataset in the collection. Typically, you need to do this in order to make the component files into a gridded dataset. - The collection is defined by a
scanFmrc
element, creating aforecastModelRunSingleCollection
with one forecast time per file.
New Way
<featureCollection name="NCEP-GFS-CONUS_80km" featureType="FMRC" harvest="true"
path="fmrc/NCEP/GFS/CONUS_80km">
<metadata inherited="true">
<documentation type="summary">good stuff</documentation>
</metadata>
<collection spec="/data/ldm/pub/native/grid/NCEP/GFS/CONUS_80km/GFS_CONUS_80km_#yyyyMMdd_HHmm#.grib1"
recheckAfter="15 min"
olderThan="5 min"/> <!-- 1 -->
<update startup="true" rescan="0 5 3 * * ? *" /> <!-- 2 -->
<protoDataset choice="Penultimate" change="0 2 3 * * ? *" /> <!-- 3 -->
<fmrcConfig regularize="true"
datasetTypes="TwoD Best Files Runs ConstantForecasts ConstantOffsets" /> <!-- 4 -->
</featureCollection>
collection
spec
elementcollection
recheckAfter
is the same asaggregation
recheckEvery
collection
olderThan
is the same asscan
olderThan
update
(optional) allows control over when thefeatureCollection
is updated.protoDataset
(optional) allows control over the selection of theprototypical
dataset.fmrcConfig
(optional) allows control over which FMRC virtual datasets are made available.