netCDF Attribute Convention for Dataset Discovery - Issues and ToDo List

Ethan Davis

last updated 26 September 2005

To Do List

Add examples for each attribute
Add sample file

Issues

Mappings between standard variable names
Need standard names for variable name conventions,

e.g., "CF-1.0", "GCMD", "GRIB-1"
Would URIs be better
Do like XML namespace: prefix maps to URI, prefix the values with the domain prefix, e.g. "cf:sea_surface_temperature". That way don't need seperate attribute to indicate domain and simplifies having a mixed domain list. How define prefix/URI mappings? domain attribute with form "prefix1:uri1, prefix2:uri2, ..."?

Mappings between standard keywords

Need standard names for standard keyword conventions

E.g., "AGU Index Terms" (http://www.agu.org/pubs/indexterms/); "GCMD Science Keywords" (http://gcmd.gsfc.nasa.gov/Resources/valids/gcmd_parameters.html)

Geospatial coverage:

Where should a textual description of the spatial coverage go in the THREDDS metadata
attribute lat/lon min/max (simple though units might be confusing) VS THREDDS northsouth/eastwest start/size (solves dateline problem) - what about geoX, geoY, geoZ, then units makes more sense

reference, source, and comment should be added

Attribute
Description
THREDDS
CF
NUG
COARDS

reference
Published or web-based references that describe the data or methods used to produce it.
reference

source
The method of production of the original data.

source

comment
Miscellaneous information about the data.

comment

remove publisher information (not appropriate as internal information to a dataset that can be published at multiple sites).

Creator and Pubisher don't really indicate who a user should contact if they have trouble with the dataset. Should there be a "main contact" as well? [from Peter Cornillon]. MY RESPONSE: I guess this might depend on the kind of trouble a user is having. Access trouble should probably contact the publisher whereas data quality issues should perhaps be sent to the creator or ...

Seem to have crossed from "discovery" metadata to "use" metadata in the variable attributes list (units e.g.). If going to do that, should also add "missing value" flag metadata. [from Peter Cornillon]

THREDDS metadata issues:

Figure out how to handle Dublin Core temporal coverage qualifiers ("valid", "created", etc) in terms of metadata/date@type and metadata/timeCoverage/{start|end}@type (problem is that some can be ranges and others not but timeCoverage). Removed "valid" and

???

Comparing Proposed Attributes to CF Attributes

CF Attribute	Proposed Attribute	Discussion
comment	comment (add to proposal?)	This seems to be a very general slot for comments on the data, the project, the processing. I'm not sure how this would fit into the data discovery arena. Could just be used as text to feed into a free text search (extension to summary?). I.e., how would this be mapped into THREDDS metadata (maybe documentation).
comment	summary	More general than a summary.
	acknowledgement	I don't see a place for this in CF. Maybe comment.
history	history	Pretty direct mapping from CF to this proposal. How compare with processing_level?
institution	creator_name creator_url creator_email	Good semantic mapping to/from CF (in proposal, creator can be individual or institution). However, the more structured nature of the creator_* attributes might cause problems with an actual mapping to/from the more free text nature of the institution attribute.
	contributor_name contributor_role	Another possible mapping (contributor can also be individual or institution).
	id naming_authority	Not good match. The id/naming_authority pair is intended to provide a "globally" unique ID for a dataset; doesn't have to be related to creation of dataset.
	project	Kind of one level above creator/institution. More of a "why was this dataset created" rather than "where was it created".
	summary	The summary is intended as a human readable description of the dataset that can be used in free text searches. Should probably contain creator/institution information.
references	references (add to proposal?)	Certainly good information to have but I'm not sure how this would be used in data discovery.
source	source (add to proposal?)	Seems like this would be a good addition to the proposed attributes. This information should probably also be in the summary attribute.
	processing_level	As Jonathan said, this is a bit vague. However, some places have specific processing level terminology. Do we want to allow for specifyingcontrolled lists of values?
	project	I think project fits better in the creator/institution area than source.
	summary	The summary is intended as a human readable description of the dataset that can be used in free text searches. Should probably contain source information.
standard_name	standard_name	Direct mapping between CF and this proposal. The only change is to allow use of non-CF standard name values (which should only be done if the CF convention is not being followed). This is done by indicating in the standard_name_vocabulary attribute the name of the variable name controlled vocabulary that is being used. ???: For a CF file, values must be from CF standard name table. Do we want to allow CF compliant files to have alternate "standard names"? If so, need to not use "standard_name".
title	title	Direct mapping between CF and this proposal.
	time_coverage_* geospatial_*	Some points from Jonathan: 1) can deduce info from coordinate variables; 2) need to be rewritten if subselection is made. We do need some way to bubble this information up to tools that harvest dataset discovery information that won't be CF aware (some digital libraries won't even be all that data aware). We're also looking (in THREDDS) at containing this info at the catalog level. So, maybe that is a better solution.

Attribute	Description	THREDDS	CF	NUG	COARDS
reference	Published or web-based references that describe the data or methods used to produce it.		reference
source	The method of production of the original data.		source
comment	Miscellaneous information about the data.		comment