An NcML document is an XML document that uses the NetCDF Markup Language to define a CDM dataset. NcML can be embedded directly into the TDS catalogs to achieve a number of powerful features, shown below. This embedded NcML is only useful in the TDS server catalogs, it is not meaningful to a THREDDS client, and so is not included in the client catalogs.
One can put an NcML element inside a dataset
element, in which case it is a self-contained NcML dataset, or inside a datasetScan
element, where it modifies a regular dataset. In both cases, we call the result a virtual dataset, and you cannot serve a virtual dataset with a
file-serving protocol like FTP or HTTP. However, you can use subsetting services like OPeNDAP, WCS, WMS and NetcdfSubset.
dataset
elementNcML embedded in a TDS dataset
element creates a self-contained NcML dataset
. The TDS dataset does not refer to a data
root, because the NcML contains its own location. The TDS dataset must have a unique URL path (this is true for all TDS datasets), but unlike a regular
dataset, does not have to match a data root.
You can use use NcML to modify an existing CDM dataset:
<catalog xmlns="http://www.unidata.ucar.edu/namespaces/thredds/InvCatalog/v1.0" xmlns:xlink="http://www.w3.org/1999/xlink" name="TDS workshop test 1" version="1.0.2"> 1) <service name="ncdods" serviceType="OPENDAP" base="/thredds/dodsC/"/>
2) <dataset name="Example NcML Modified" ID="ExampleNcML-Modified" urlPath="ExampleNcML/Modified.nc"> <serviceName>ncdods</serviceName> 3) <netcdf xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2" location="/machine/tds/workshop/ncml/example1.nc"> 4) <variable name="Temperature" orgName="T"/> 5) <variable name="ReletiveHumidity" orgName="rh"> 6) <attribute name="long_name" value="relatively humid"/> <attribute name="units" value="percent (%)"/> 7) <remove type="attribute" name="description"/> </variable > </netcdf> </dataset> </catalog>
service
is defined that allows the virtual dataset to be served through OPENDAP. Make sure that the base
attribute
is exactly as shown.
urlPath
of ExampleNcML/Modified.nc
. The urlPath
is essentially
arbitrary, but must be unique within the TDS, and you should maintain a consistent naming convention to ensure uniqueness, especially for large
collections of data. Its important to also give the dataset a unique ID
.
/machine/tds/workshop/ncml/example1.nc
. Note that
you must declare the NcML namespace exactly as shown.
T
in the original file is renamed Temperature.rh
in the original file is renamed RelativeHumidity.rh
are defined, long_name
and units
. If these already exist, they are replaced.rh
called description
is removed.Lets look at serving a file directly vs serving it through NcML:
<catalog xmlns="http://www.unidata.ucar.edu/namespaces/thredds/InvCatalog/v1.0" xmlns:xlink="http://www.w3.org/1999/xlink" name="TDS workshop test 2" version="1.0.2"> <service name="ncdods" serviceType="OPENDAP" base="/thredds/dodsC/"/> 1) <datasetRoot path="test/ExampleNcML" location="/machine/tds/workshop/ncml/" /> 2) <dataset name="Example Dataset" ID="Example" urlPath="test/ExampleNcML/example1.nc"> <serviceName>ncdods</serviceName> </dataset> 3) <dataset name="Example NcML Modified" ID="Modified" urlPath="ExampleNcML/Modified.nc"> <serviceName>ncdods</serviceName> 4) <netcdf xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2" location="/machine/tds/workshop/ncml/example1.nc"> <variable name="Temperature" orgName="T"/> </netcdf> </dataset> </catalog>
datasetRoot
is defined that associates URL path test/ExampleNcML
with the disk location /data/nc/.
dataset
is created with a urlPath
of test/ExampleNcML/example.nc
. The first part of the path is matched to
the datasetRoot
, so that the full dataset
location is /data/nc/example1.nc.
This file is served directly by this
dataset
element.
dataset
defined by the embedded NcML. The virtual dataset
is given an (arbitrary) urlPath
of ExampleNcML/Modified.nc
.
/data/nc/example1.nc
. The only modification is to
rename the variable T
to Temperature
.
Here is an example that defines a dataset using NcML aggregation.
<catalog xmlns="http://www.unidata.ucar.edu/namespaces/thredds/InvCatalog/v1.0" xmlns:xlink="http://www.w3.org/1999/xlink" name="TDS workshop test 3" version="1.0.2"> 1) <service name="ncdods" serviceType="OPENDAP" base="/thredds/dodsC/" /> 2) <dataset name="Example NcML Agg" ID="ExampleNcML-Agg" urlPath="ExampleNcML/Agg.nc"> 3) <serviceName>ncdods</serviceName> 4) <netcdf xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2"> 5) <aggregation dimName="time" type="joinExisting"> 6) <scan location="/machine/tds/workshop/ncml/cg/" dateFormatMark="CG#yyyyDDD_HHmmss" suffix=".nc" subdirs="false"/> </aggregation> </netcdf> </dataset> </catalog>
service
is defined called ncdods
.dataset
is defined, which must have a urlPath
that is unique within the TDS, in this case
ExampleNcML/Agg.nc
.
ncdods
service.netcdf
element is embedded inside the THREDDS dataset element.aggregation
of type joinExisting
is declared, using the existing time dimension as the aggregation dimension./machine/tds/workshop/ncml/cg/
that end with .nc
will be scanned to create the aggregation. A
dateFormatMark is used to define the time coordinates, indicating there is exactly one time coordinate in each file.
datasetScan
elementIf an NcML element is added to a DatasetScan
, it will modify all of the datasets contained within
the DatasetScan. It is not self-contained, however, since it gets its location from the datasets that are dynamically scanned.
1) <datasetScan name="Ocean Satellite Data" ID="ocean/sat" path="ocean/sat" location="/machine/tds/workshop/ncml/ocean/"> <filter> <include wildcard="*.nc" /> </filter> 2) <metadata inherited="true"> <serviceName>ncdods</serviceName> <dataType>Grid</dataType> </metadata> 3) <netcdf xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2"> <attribute name="Conventions" value="CF-1.0"/> </netcdf> </datasetScan>
datasetScan
element is created whose contained datasets start with URL path ocean/sat
, and whose contents are all the
files in the directory /machine/tds/workshop/ncml/ocean/
which end in .nc
.
ncdods
service and are of type Grid
.Conventions="CF-1.0"
added to it. Note that there is no location
attribute, which is implicitly supplied by the datasets found by thedatasetScan
.
The scan
element in the NcML aggregation is similar in purpose to the datasetScan
element, but be careful not to confuse the two.
The datasetScan
element is more powerful, and has more options for filtering etc. Its job is to create nested dataset
elements
inside the datasetScan, and so has various options to add information to those nested datasets. It has a generalized framework (CrawlableDataset) for
crawling other things besides file directories. The scan
element's job is to easily specify what files go into an NcML aggregation, and those
individual files are hidden inside the aggregation dataset. It can only scan file directories. In the future, some of the capabilities of
datasetScan
will migrate into NcML scan
.
Lets look at using a DatasetScan and an Aggregation scan on the same collection of files. Download catalogScan.xml
, place it in your TDS ${tomcat_home}/content/thredds
directory and
add a catalogRef
to it from your main catalog.
<?xml version="1.0" encoding="UTF-8"?> <catalog xmlns="http://www.unidata.ucar.edu/namespaces/thredds/InvCatalog/v1.0" xmlns:xlink="http://www.w3.org/1999/xlink" name="TDS workshop test 4" version="1.0.2"> <service name="ncdods" serviceType="OPENDAP" base="/thredds/dodsC/"/> 1) <dataset name="Example NcML Agg" ID="ExampleNcML-Agg" urlPath="ExampleNcML/Agg.nc"> <serviceName>ncdods</serviceName> 2) <netcdf xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2"> <aggregation dimName="time" type="joinExisting" recheckEvery="4 sec"> <scan location="/machine/tds/workshop/ncml/cg/" dateFormatMark="CG#yyyyDDD_HHmmss" suffix=".nc" subdirs="false"/> </aggregation> </netcdf> </dataset> 3) <datasetScan name="CG Data" ID="cg/files" path="cg/files" location="/machine/tds/workshop/ncml/cg/"> <metadata inherited="true"> <serviceName>ncdods</serviceName> <dataType>Grid</dataType> </metadata> <filter> 4) <include wildcard="*.nc"/> </filter> 5) <netcdf xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2"> <attribute name="Yoyo" value="Ma"/> </netcdf> </datasetScan> </catalog>
ExampleNcML/Agg.nc
recheckEvery
attribute only applies when using a scan element.datasetScan
element is created whose contained datasets start with URL path cg/files
, and which scans the directory
/workshop/test/cg/
.nc
.Start and restart your TDS and look at those datasets through the HTML interface and through ToolsUI.
featureCollection
elementHere we show a brief example of modifying files with NcML in a featureCollection
element.
Download catalogFmrcNcml.xml
, place it in the ${tomcat_home}/content/thredds
directory
and add a catalogRef
to it from your main catalog:
<?xml version="1.0" encoding="UTF-8"?> <catalog xmlns="http://www.unidata.ucar.edu/namespaces/thredds/InvCatalog/v1.0" xmlns:xlink="http://www.w3.org/1999/xlink" name="Unidata THREDDS Data Server" version="1.0.3"> <service name="ncdods" serviceType="OPENDAP" base="/thredds/dodsC/"/> <featureCollection featureType="FMRC" name="GOMOOS" harvest="true" path="fmrc/USGS/GOMOOS"> <metadata inherited="true"> <serviceName>ncdods</serviceName> <dataFormat>netCDF</dataFormat> <documentation type="summary">Munge this with NcML</documentation> </metadata> <collection spec="/machine/tds/workshop/ncml/gomoos/gomoos.#yyyyMMdd#.cdf$"/> <protoDataset> 1) <netcdf xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2"> <attribute name="History" value="Processed by Kraft"/> </netcdf> </protoDataset> 2) <netcdf xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2"> <variable name="time"> <attribute name="units" value="days since 2006-11-01 00:00 UTC"/> </variable> <attribute name="Conventions" value="CF-1.0"/> </netcdf> </featureCollection> </catalog>
protoDataset
is modified by adding a global attribute History="Processed by Kraft"
.Conventions="CF-1.0"
You might wonder why not put the global attribute Conventions="CF-1.0"
on the protoDataset instead of on each individual dataset?
The reason is because in an FMRC, each dataset is converted into a GridDataset, and then combined into the FMRC. So the modifications in 2) are whats
needed to make the individual datasets be correctly interpreted as a Grid dataset. The modifications to the protoDataset
are then applied to
the resulting FMRC 2D dataset.
When things go wrong, its best to first debug the aggregation outside of the TDS:
<dataset>
element will be a <netcdf>
element, that
is the NcML aggregation. Extract it out and put it in a file called "test.ncml
".
<?xml version="1.0"encoding="UTF-8"?>
recheckEvery
attribute if present on the <scan>
element.<scan>
location is available on the machine you are running ToolsUItest.ncml
and try to open it.
recheckEvery
attribute on the scan
element and open the dataset, then reopen after a new file has arrived (and recheckEvery
time has passed). Generally you make recheckEvery
very short while testing.
recheckEvery
attribute on the scan
element. See if OPeNDAP access works.
recheckEvery
attribute (if needed) and test again.Remember that you can't use HTTPServer for NcML datasets. Use only the subsetting services OpenDAP, WCS, WMS, and NetcdfSubset.