Mapping between the CDM and NetCDF-4 Data Models

last modified: June 2014

The CDM data model is close to, but not identical to the NetCDF-4 data model. When reading netCDF-4 files, one is interested in the mapping from netCDF-4 to CDM. This mapping is relatively stable. As of version 4.3, the CDM can write to netCDF-4 files, and one is interested in the mapping from CDM to netCDF-4. This mapping is still being developed, eg to give users some control where needed.

NetCDF-4 intentionally supports a simpler data model than HDF5, which means there are HDF5 files that cannot be converted to netCDF-4. See: https://www.unidata.ucar.edu/software/netcdf/docs/faq.html#fv15


Data Model Differences

DataTypes

From netCDF-4 to CDM

From CDM to netCDF-4

Type Definitions

From netCDF-4 to CDM

From CDM to netCDF-4

Attributes

In CDM, an attribute type may only be a scalar or 1D array of signed byte, short, int, long, float, double, or String. A char type is mapped to a String.

From netCDF-4 to CDM

From CDM to netCDF-4


Differences between netCDF-4 C and Java libraries for netCDF4 files

Unsigned types

Enum Typedefs

Attributes

Creation Order

Compound field Types


Differences between netCDF-4 C and Java libraries for HDF5 files

Fixed length Strings with anonymous dimension

Anonymous dimensions

Time datatype (HDF type 2)

 


Internal Notes

1) char arrays are interpreted as UTF-8 bytes array (Strings) when they are attributes . but data arrays are not, they are run through unsignedToShort() and cast to char. this seems like trouble.

2) nc4 allows arbitrary composition of vlen. cdm tries to map these to a variable length dimension, to get a ragged array, not part of the data type. But Arrays are rectangular, so its a difficult fit.

could define ArrayRagged which maps to C multidim arrays.

its natural to map

 int data(x,y,*) -> int(*) data(x,y)

but it doesn't generalize well to nested vlens. nc4 solution is to declare each type separately and chain them:

 int(*) type1;
 type1(*) type2;
 type2 data(x,y);

Array.isVariableLength(). IOSP might return ArrayInteger from int data(*). Needs to return ArrayObject for int data(3,*), with Array.isVariableLength() true.

int(*)     returns ArrayInt
int(3,*)   returns ArrayObject(3) with ArrayInt(*) inside
int(*,3)  returns Array(n,3), whatever n happens to be.
int(3,*,*) returns ArrayObject(3) with ArrayObject(*) inside with ArrayInt(*) inside.
int(*,3,*) returns ArrayObject(n) with ArrayObject(3) inside with ArrayInt(*) inside.
int(*,*,3) returns ArrayObject(n) with ArrayInt(*,3) inside. OR  ArrayObject(n) with ArrayObject(*) with ArrayInt(3) inside.

struct {
  int i1;
  float vf(*);
} s(3);

is like float(3,*) -> ArrayObject(3) with ArrayFloat(*), inside the ArrayStructure.
this is getting out of control

3) attributes : n4 can be user defined types, cdm: 1 dim array of primitive or String.

netcdf tst_enums {
  types:
    ubyte enum Bradys {Mike = 8, Carol = 7, Greg = 6, Marsha = 5, Peter = 4, Jan = 3, Bobby = 2, Whats-her-face = 1, Alice = 0} ;

// global attributes:
  Bradys :brady_attribute = Alice, Peter, Mike ;
}

netcdf R:/testdata/netcdf4/nc4/tst_enums.nc {
 types:
  enum Bradys { 'Alice' = 0, 'Whats-her-face' = 1, 'Bobby' = 2, 'Jan' = 3, 'Peter' = 4, 'Marsha' = 5, 'Greg' = 6, 'Carol' = 7, 'Mike' = 8};

 :brady_attribute = "Alice", "Peter", "Mike";
}