NetCDF Users Guide
v1.1
|
Data in a netCDF file may be one of the external_types, or may be a user-defined data type (see User Defined Types).
#External Data Types {#external_types}
The atomic external types supported by the netCDF interface are:
These types were chosen to provide a reasonably wide range of trade-offs between data precision and number of bits required for each value. These external data types are independent from whatever internal data types are supported by a particular machine and language combination.
These types are called "external", because they correspond to the portable external representation for netCDF data. When a program reads external netCDF data into an internal variable, the data is converted, if necessary, into the specified internal type. Similarly, if you write internal data into a netCDF variable, this may cause it to be converted to a different external type, if the external type for the netCDF variable differs from the internal type.
The separation of external and internal types and automatic type conversion have several advantages. You need not be aware of the external type of numeric variables, since automatic conversion to or from any desired numeric type is available. You can use this feature to simplify code, by making it independent of external types, using a sufficiently wide internal type, e.g., double precision, for numeric netCDF data of several different external types. Programs need not be changed to accommodate a change to the external type of a variable.
If conversion to or from an external numeric type is necessary, it is handled by the library.
Converting from one numeric type to another may result in an error if the target type is not capable of representing the converted value. For example, an internal short integer type may not be able to hold data stored externally as an integer. When accessing an array of values, a range error is returned if one or more values are out of the range of representable values, but other values are converted properly.
Note that mere loss of precision in type conversion does not return an error. Thus, if you read double precision values into a single-precision floating-point variable, for example, no error results unless the magnitude of the double precision value exceeds the representable range of single-precision floating point numbers on your platform. Similarly, if you read a large integer into a float incapable of representing all the bits of the integer in its mantissa, this loss of precision will not result in an error. If you want to avoid such precision loss, check the external types of the variables you access to make sure you use an internal type that has adequate precision.
The names for the primitive external data types (char, byte, ubyte, short, ushort, int, uint, int64, uint64, float or real, double, string) are reserved words in CDL, so the names of variables, dimensions, and attributes must not be type names.
It is possible to interpret byte data as either signed (-128 to 127) or unsigned (0 to 255). However, when reading byte data to be converted into other numeric types, it is interpreted as signed.
For the correspondence between netCDF external data types and the data types of a language see Variables.
The only kind of data structure directly supported by the netCDF classic abstraction, i.e. CDF-1, 2, and 5 formats, is a collection of named arrays with attached vector attributes. NetCDF is not particularly well-suited for storing linked lists, trees, sparse matrices, ragged arrays or other kinds of data structures requiring pointers.
It is possible to build other kinds of data structures in netCDF classic formats, from sets of arrays by adopting various conventions regarding the use of data in one array as pointers into another array. The netCDF library won't provide much help or hindrance with constructing such data structures, but netCDF provides the mechanisms with which such conventions can be designed.
The following netCDF classic example stores a ragged array ragged_mat using an attribute row_index to name an associated index variable giving the index of the start of each row. In this example, the first row contains 12 elements, the second row contains 7 elements (19 - 12), and so on. (NetCDF-4 includes native support for variable length arrays. See below.)
As another example, netCDF variables may be grouped within a netCDF classic dataset by defining attributes that list the names of the variables in each group, separated by a conventional delimiter such as a space or comma. Using a naming convention for attribute names for such groupings permits any number of named groups of variables. A particular conventional attribute for each variable might list the names of the groups of which it is a member. Use of attributes, or variables that refer to other attributes or variables, provides a flexible mechanism for representing some kinds of complex structures in netCDF datasets.
NetCDF supported six data types through version 3.6.0 (char, byte, short, int, float, and double). Starting with version 4.0, many new data types are supported (unsigned int types, strings, compound types, variable length arrays, enums, opaque).
In addition to the new atomic types the user may define types.
Types are defined in define mode, and must be fully defined before they are used. New types may be added to a file by re-entering define mode.
Once defined the type may be used to create a variable or attribute.
Types may be nested in complex ways. For example, a compound type containing an array of VLEN types, each containing variable length arrays of some other compound type, etc. Users are cautioned to keep types simple. Reading data of complex types can be challenging for Fortran users.
Types may be defined in any group in the data file, but they are always available globally in the file.
Types cannot have attributes (but variables of the type may have attributes).
Only files created with the netCDF-4/HDF5 mode flag (NC_NETCDF4) but without the classic model flag (NC_CLASSIC_MODEL) may use user-defined types or the new atomic data types.
Once types are defined, use their ID like any other type ID when defining variables or attributes. Use functions
nc_put_att()
/ nc_get_att()
nc_put_var()
/ nc_get_var()
nc_put_var1()
/ nc_get_var1()
nc_put_vara()
/ nc_get_vara()
nc_put_vars()
/ nc_get_vars()
functions to access attribute and variable data of user defined type.
Compound types allow the user to combine atomic and user-defined types into C-like structs. Since users defined types may be used within a compound type, they can contain nested compound types.
Users define a compound type, and (in their C code) a corresponding C struct. They can then use nc_put_vara()
and related functions to write multi-dimensional arrays of these structs, and nc_get_vara()
calls to read them.
While structs, in general, are not portable from platform to platform, the HDF5 layer (when installed) performs the magic required to figure out your platform's idiosyncrasies, and adjust to them. The end result is that HDF5 compound types (and therefore, netCDF-4 compound types), are portable.
For more information on creating and using compound types, see Compound Types in The NetCDF C Interface Guide.
Variable length arrays can be used to create a ragged array of data, in which one of the dimensions varies in size from point to point.
An example of VLEN use would the to store a 1-D array of dropsonde data, in which the data at each drop point is of variable length.
There is no special restriction on the dimensionality of VLEN variables. It's possible to have 2D, 3D, 4D, etc. data, in which each point contains a VLEN.
A VLEN has a base type (that is, the type that it is a VLEN of). This may be one of the atomic types (forming, for example, a variable length array of NC_INT), or it can be another user defined type, like a compound type.
With VLEN data, special memory allocation and deallocation procedures must be followed, or memory leaks may occur.
Compression is permitted but may not be effective for VLEN data, because the compression is applied to structures containing lengths and pointers to the data, rather than the actual data.
For more information on creating and using variable length arrays, see Variable Length Arrays in The NetCDF C Interface Guide.
Opaque types allow the user to store arrays of data blobs of a fixed size.
For more information on creating and using opaque types, see Opaque Type in The NetCDF C Interface Guide.
Enum types allow the user to specify an enumeration.
For more information on creating and using enum types, see Enum Type in The NetCDF C Interface Guide.
Each netCDF variable has an external type, specified when the variable is first defined. This external type determines whether the data is intended for text or numeric values, and if numeric, the range and precision of numeric values.
If the netCDF external type for a variable is char, only character data representing text strings can be written to or read from the variable. No automatic conversion of text data to a different representation is supported.
If the type is numeric, however, the netCDF library allows you to access the variable data as a different type and provides automatic conversion between the numeric data in memory and the data in the netCDF variable. For example, if you write a program that deals with all numeric data as double-precision floating point values, you can read netCDF data into double-precision arrays without knowing or caring what the external type of the netCDF variables are. On reading netCDF data, integers of various sizes and single-precision floating-point values will all be converted to double-precision, if you use the data access interface for double-precision values. Of course, you can avoid automatic numeric conversion by using the netCDF interface for a value type that corresponds to the external data type of each netCDF variable, where such value types exist.
The automatic numeric conversions performed by netCDF are easy to understand, because they behave just like assignment of data of one type to a variable of a different type. For example, if you read floating-point netCDF data as integers, the result is truncated towards zero, just as it would be if you assigned a floating-point value to an integer variable. Such truncation is an example of the loss of precision that can occur in numeric conversions.
Converting from one numeric type to another may result in an error if the target type is not capable of representing the converted value. For example, an integer may not be able to hold data stored externally as an IEEE floating-point number. When accessing an array of values, a range error is returned if one or more values are out of the range of representable values, but other values are converted properly.
Note that mere loss of precision in type conversion does not result in an error. For example, if you read double precision values into an integer, no error results unless the magnitude of the double precision value exceeds the representable range of integers on your platform. Similarly, if you read a large integer into a float incapable of representing all the bits of the integer in its mantissa this loss of precision will not result in an error. If you want to avoid such precision loss, check the external types of the variables you access to make sure you use an internal type that has a compatible precision.
Whether a range error occurs in writing a large floating-point value near the boundary of representable values may be depend on the platform. The largest floating-point value you can write to a netCDF float variable is the largest floating-point number representable on your system that is less than 2 to the 128th power. The largest double precision value you can write to a double variable is the largest double-precision number representable on your system that is less than 2 to the 1024th power.
The _uchar and _schar functions were introduced in netCDF-3 to eliminate an ambiguity, and support both signed and unsigned byte data. In netCDF-2, whether the external NC_BYTE type represented signed or unsigned values was left up to the user. In netcdf-3, we treat NC_BYTE as signed for the purposes of conversion to short, int, long, float, or double. (Of course, no conversion takes place when the internal type is signed char.) In the _uchar functions, we treat NC_BYTE as if it were unsigned. Thus, no NC_ERANGE error can occur converting between NC_BYTE and unsigned char.