NetCDF
4.8.1
|
The netCDF data model is the way that we think about data. The data model of dimensions, variables, and attributes, which define the The Classic Model, was extended starting with netCDF-4.0. The new The Enhanced Data Model supports the classic model in a completely backward-compatible way, while allowing access to new features such as groups, multiple unlimited dimensions, and new types, including user-defined types.
For maximum interoparability with existing code, new data should be created with the The Classic Model.
The classic netCDF data model consists of variables, dimensions, and attributes. This way of thinking about data was introduced with the very first netCDF release, and is still the core of all netCDF files.
In version 4.0, the netCDF data model has been expanded. See The Enhanced Data Model.
Variables | N-dimensional arrays of data. Variables in netCDF files can be one of six types (char, byte, short, int, float, double). |
Dimensions | describe the axes of the data arrays. A dimension has a name and a length. An unlimited dimension has a length that can be expanded at any time, as more data are written to it. NetCDF files can contain at most one unlimited dimension. |
Attributes | annotate variables or files with small notes or supplementary metadata. Attributes are always scalar values or 1D arrays, which can be associated with either a variable or the file as a whole. Although there is no enforced limit, the user is expected to keep attributes small. |
With netCDF-4, the netCDF data model has been extended, in a backwards compatible way.
The new data model, which is known as the “Common Data Model” is part of an effort here at Unidata to find a common engineering language for the development of scientific data solutions. It contains the variables, dimensions, and attributes of the classic data model, but adds:
groups - A way of hierarchically organizing data, similar to directories in a Unix file system.
user-defined types - The user can now define compound types (like C structures), enumeration types, variable length arrays, and opaque types.
These features may only be used when working with a netCDF-4/HDF5 file. Files created in classic formats cannot support groups or user-defined types (see netcdf_format).
With netCDF-4/HDF5 files, the user may define groups, which may contain variables, dimensions, and attributes. In this way, a group acts as a container for the classic netCDF dataset. But netCDF-4/HDF5 files can have many groups, organized hierarchically.
Each file begins with at least one group, the root group. The user may then add more groups, receiving a new ncid for each group created.
Since each group functions as a complete netCDF classic dataset, it is possible to have variables with the same name in two or more different groups, within the same netCDF-4/HDF5 data file.
Dimensions have a special scope: they may be seen by all variables in their group, and all descendant groups. This allows the user to define dimensions in a top-level group, and use them in many sub-groups.
Since it may be necessary to write code which works with all types of netCDF data files, we also introduce the ability to create netCDF-4/HDF5 files which follow all the rules of the classic netCDF model. That is, these files are in HDF5, but will not support multiple unlimited dimensions, user-defined types, groups, etc. They act just like a classic netCDF file.
NetCDF can be used to store many kinds of data, but it was originally developed for the Earth science community.
NetCDF views the world of scientific data in the same way that an atmospheric scientist might: as sets of related arrays. There are various physical quantities (such as pressure and temperature) located at points at a particular latitude, longitude, vertical level, and time.
A scientist might also like to store supporting information, such as the units, or some information about how the data were produced.
The axis information (latitude, longitude, level, and time) would be stored as netCDF dimensions. Dimensions have a length and a name.
The physical quantities (pressure, temperature) would be stored as netCDF variables. Variables are N-dimensional arrays of data, with a name and an associated set of netCDF dimensions.
It is also customary to add one variable for each dimension, to hold the values along that axis. These variables are called “coordinate variables.” The latitude coordinate variable would be a one-dimensional variable (with latitude as its dimension), and it would hold the latitude values at each point along the axis.
The additional bits of metadata would be stored as netCDF attributes.
Attributes are always single values or one-dimensional arrays. (This works out well for a string, which is a one-dimensional array of ASCII characters.)
The pres_temp_4D_wr.c/pres_temp_4D_rd.c examples show how to write and read a file containing some four-dimensional pressure and temperature data, including all the metadata needed.