What Is Metadata?
The term metadata refers to “data about data”. The term is ambiguous, as it is used for two fundamentally different concepts. Structural metadata is about the design and specification of data structures and is more properly called “data about the containers of data”; descriptive metadata, on the other hand, is about individual instances of application data, the data content.
Metadata Tour On Several TDS Sites
Introduction
A simple catalog may contain very minimal information about its datasets, at minimum just a name, a service and a URL for each dataset. An enhanced catalog is one in which you have added enough metadata that it’s possible to create a Digital Library record for import into one of the Data Discovery Centers like IDN.
The THREDDS catalog specification allows you to add all kinds of metadata, in fact, you can put any information you want into metadata elements by using separate XML namespaces. The TDS comes with an example enhanced catalog that contains a recommended set of metadata that you can use as a template. We recommend that you aim for this level of metadata in all the datasets you want to publish.
Annotated Example
The example enhanced catalog lives at ${tomcat_home}/content/thredds/enhancedCatalog.xml
:
<?xml version="1.0" encoding="UTF-8"?>
<catalog xmlns="http://www.unidata.ucar.edu/namespaces/thredds/InvCatalog/v1.0"
xmlns:xlink="http://www.w3.org/1999/xlink"
name="Unidata THREDDS/IDD Data Server" version="1.0.1"> <!-- 1 -->
<service name="latest" serviceType="Resolver" base="" /> <!-- 2 -->
<service name="both" serviceType="Compound" base=""> <!-- 3 -->
<service name="ncdods" serviceType="OPENDAP" base="/thredds/dodsC/" />
<service name="HTTPServer" serviceType="HTTPServer" base="/thredds/fileServer/" />
</service>
<dataset name="NCEP Model Data"> <!-- 4 -->
<metadata inherited="true"> <!-- 5 -->
<serviceName>both</serviceName>
<authority>edu.ucar.unidata</authority>
<dataType>Grid</dataType>
<dataFormat>NetCDF</dataFormat>
<documentation type="rights">Freely available</documentation>
<documentation xlink:href="http://www.emc.ncep.noaa.gov/modelinfo/index.html"
xlink:title="NCEP Model documentation" />
<creator>
<name vocabulary="DIF">DOC/NOAA/NWS/NCEP</name>
<contact url="http://www.ncep.noaa.gov/" email="http://www.ncep.noaa.gov/mail_liaison.shtml" />
</creator>
<publisher>
<name vocabulary="DIF">UCAR/UNIDATA</name>
<contact url="http://www.unidata.ucar.edu/" email="support@unidata.ucar.edu" />
</publisher>
<timeCoverage>
<end>present</end>
<duration>14 days</duration>
</timeCoverage>
</metadata>
<datasetScan name="ETA Model/CONUS 80 km" ID="NCEP-ETA"
path="testEnhanced" location="content/dodsC/data/"> <!-- 6 -->
<metadata inherited="true"> <!-- 7 -->
<documentation
type="summary">NCEP North American Model : AWIPS 211 (Q) Regional - CONUS (Lambert Conformal).
Model runs are made at 12Z and 00Z, with analysis and forecasts every 6 hours out to 60 hours.
Horizontal = 93 by 65 points, resolution 81.27 km, LambertConformal projection.
Vertical = 1000 to 100 hPa pressure levels.</documentation> <!-- 8 -->
<geospatialCoverage> <!-- 9 -->
<northsouth>
<start>26.92475</start>
<size>15.9778</size>
<units>degrees_north</units>
</northsouth>
<eastwest>
<start>-135.33123</start>
<size>103.78772</size>
<units>degrees_east</units>
</eastwest>
</geospatialCoverage>
<variables vocabulary="CF-1"> <!-- 10 -->
<variable name="Z_sfc" vocabulary_name="geopotential_height"
units="gp m">Geopotential height, gpm</variable>
</variables>
</metadata>
<filter> <!-- 11 -->
<include wildcard="*eta_211.nc" />
</filter>
<addDatasetSize/>
<addProxies/>
<simpleLatest />
</addProxies>
<addTimeCoverage datasetNameMatchPattern="([0-9]{4})([0-9]{2})([0-9]{2})([0-9]{2})_eta_211.nc$"
startTimeSubstitutionPattern="$1-$2-$3T$4:00:00" duration="60 hours" />
</datasetScan>
</dataset>
</catalog>
Annotations
- This is the standard
catalog
element for version 1.0.1. The only thing you should change is the name. - You need this service in order to use the
addProxies
child element of thedatasetScan
element. - This is a compound service gives access to the datasets both through OpenDAP and through HTTP file transfer.
- This is a collection level dataset that we added in order to demonstrate factoring out information. It’s not particularly needed in this example, which only contains one nested dataset (the datasetScan at (6)), but for more complicated situations it’s very useful.
- The metadata element that’s part of the collection dataset at (4). Because it has
inherited=true
, everything in it will apply to the collection’s nested datasets. The specific fields are ones that often can be factored out in this way, but your catalog may be different.serviceName
: indicates that all the nested datasets will use the compound service named both.authority
: used to create globally unique dataset IDs. Note the use of reverse DNS naming, which guarantees global uniqueness.dataType
: all datasets are of type Grid.dataFormat
: all datasets have file type NetCDF.rights
: a documentation element indicating who is allowed to use the data.documentation
: an example of how to embed links to web pages.creator
: who created the dataset. Note that we used standard names from IDN DIF vocabulary.publisher
: who is serving the datasettimeCoverage
: the time range that the collection of data covers. In this example, there are 14 days of data available in the collection, ending with the present time.
- The
datasetScan
element dynamically creates a subcatalog by scanning the directory named bylocation
. Thename
attribute is used as the title of DL records, so try to make it concise yet descriptive. TheID
is also very important. See here for a full description of thedatasetScan
element. - This metadata also applies to all the dynamically created datasets in the
datasetScan
element. - The summary
documentation
is used as a paragraph-length summary of the dataset in Digital Libraries. Anyone searching for your data will use this to decide if it’s the data they are looking for. - The
geospatialCoverage
is a lat/lon (and optionally elevation) bounding box for the dataset. - The
variables
element list the data variables available in the dataset. - There are a number of special instructions for datasetScan (see here for the gory details).
The
filter
element specifies which files and directories to include or exclude from the catalog. TheaddDatasetSize
element indicates that adataSize
element should be added to each atomic dataset. TheaddProxies
element causes resolver datasets to be added at each collection level when accessed resolve to the latest dataset at that collection level. This is useful for real-time collections. TheaddTimeCoverage
dynamically adds atimeCoverage
element to the individual datasets in the collection, which will override the timeCoverage inherited from the collection dataset metadata at (5). This is useful for the common case that all the datasets in a collection differ only in their time coverage.
Resources
Metadata Standards
There are a number of existing metadata standards available for describing datasets. These include:
- Dublin Core - general library discovery metadata standard
- International Directory Network (IDN) - standard for geophysical data
- ISO 19115 - standard for geophysical data (FGDC is merging/synchronizing with this ISO standard)
Including Metadata Records in THREDDS Catalogs
Any metadata records can be included directly in or referenced from a THREDDS metadata
element.
Here is an example of how to include a Dublin Core record directly in a THREDDS metadata
element:
<metadata xmlns:dc="http://purl.org/dc/elements/1.1/">
<dc:title>NCEP GFS Model - Alaska 191km </dc:title>
<dc:creator>NOAA/NCEP</dc:creator>
...
</metadata>
Here is an example of how to reference a metadata
record (xlink
attributes are used):
<metadata xlink:title="NCEP GFS Model - Alaska 191km"
xlink:href="http://server/dc/ncep.gfs.alaska_191km.xml" />
What’s The Difference Between Metadata And Documentation?
When the material is an XML file meant for software to read, use a metadata
element.
When it’s an HTML page meant for a human to read, use a documentation
element:
<documentation xlink:title="My Data" xlink:href="http://my.server/md/data1.html" />