netCDF Attribute Convention for Dataset Discovery - Issues and ToDo List

Ethan Davis

last updated 26 September 2005


To Do List



Issues

  • Mappings between standard keywords
  • Geospatial coverage:
  • reference, source, and comment should be added
    Attribute
    Description
    THREDDS
    CF
    NUG
    COARDS
    reference
    Published or web-based references that describe the data or methods used to produce it.
    reference


    source
    The method of production of the original data.

    source


    comment
    Miscellaneous information about the data.

    comment


  • remove publisher information (not appropriate as internal information to a dataset that can be published at multiple sites).
  • Creator and Pubisher don't really indicate who a user should contact if they have trouble with the dataset. Should there be a "main contact" as well? [from Peter Cornillon]. MY RESPONSE: I guess this might depend on the kind of trouble a user is having. Access trouble should probably contact the publisher whereas data quality issues should perhaps be sent to the creator or ...
  • Seem to have crossed from "discovery" metadata to "use" metadata in the variable attributes list (units e.g.). If going to do that, should also add "missing value" flag metadata. [from Peter Cornillon]
  • THREDDS metadata issues:
  • ???


  • Comparing Proposed Attributes to CF Attributes


    CF Attribute Proposed Attribute Discussion
    comment
    comment (add to proposal?)
    This seems to be a very general slot for comments on the data, the project, the processing. I'm not sure how this would fit into the data discovery arena. Could just be used as text to feed into a free text search (extension to summary?). I.e., how would this be mapped into THREDDS metadata (maybe documentation).
    summary
    More general than a summary.

    acknowledgement
    I don't see a place for this in CF. Maybe comment.
    history
    history
    Pretty direct mapping from CF to this proposal. How compare with processing_level?
    institution
    creator_name
    creator_url
    creator_email
    Good semantic mapping to/from CF (in proposal, creator can be individual or institution). However, the more structured nature of the creator_* attributes might cause problems with an actual mapping to/from the more free text nature of the institution attribute.
    contributor_name
    contributor_role
    Another possible mapping (contributor can also be individual or institution).
    id
    naming_authority
    Not good match. The id/naming_authority pair is intended to provide a "globally" unique ID for a dataset; doesn't have to be related to creation of dataset.
    project
    Kind of one level above creator/institution. More of a "why was this dataset created" rather than "where was it created".
    summary
    The summary is intended as a human readable description of the dataset that can be used in free text searches. Should probably contain creator/institution information.
    references
    references (add to proposal?)
    Certainly good information to have but I'm not sure how this would be used in data discovery.
    source
    source (add to proposal?)
    Seems like this would be a good addition to the proposed attributes. This information should probably also be in the summary attribute.
    processing_level
    As Jonathan said, this is a bit vague. However, some places have specific processing level terminology. Do we want to allow for specifyingcontrolled lists of values?
    project
    I think project fits better in the creator/institution area than source.
    summary
    The summary is intended as a human readable description of the dataset that can be used in free text searches. Should probably contain source information.
    standard_name
    standard_name
    Direct mapping between CF and this proposal. The only change is to allow use of non-CF standard name values (which should only be done if the CF convention is not being followed). This is done by indicating in the standard_name_vocabulary attribute the name of the variable name controlled vocabulary that is being used.

    ???: For a CF file, values must be from CF standard name table. Do we want to allow CF compliant files to have alternate "standard names"? If so, need to not use "standard_name".
    title
    title
    Direct mapping between CF and this proposal.

    time_coverage_*
    geospatial_*
    Some points from Jonathan: 1) can deduce info from coordinate variables; 2) need to be rewritten if subselection is made.

    We do need some way to bubble this information up to tools that harvest dataset discovery information that won't be CF aware (some digital libraries won't even be all that data aware). We're also looking (in THREDDS) at containing this info at the catalog level. So, maybe that is a better solution.