Writing an IOSP - Details | netCDF-Java Documentation

netCDF-Java Documentation Writing an IOSP - Details

Registering a new IOSP

You must register your IOSP at runtime before it can open any files. See runtime_loading to learn how to register your IOSP.

Note: When registering, an instance of the class will be created. This means that there must be a default constructor that has no arguments. Since there is no way to call any other constructor, the simplest thing is to not define a constructor for your class, and the compiler will add a default constructor for you. If you do define a constructor with arguments, you must also explicitly add a no-argument constructor.

If you contribute your IOSP and it becomes part of the CDM release, it will automatically be registered in the NetcdfFile class.

Troubleshooting

When registering an IOSP, the following exceptions may occur:

InstantiationException: occurs when no default constructor exists
ClassNotFoundException: occurs when the IOSP class name passed in to the register function as a String cannot be found by the NetcdfFile ClassLoader (this almost always means that your classpath is wrong)
IllegalAccessException: is thrown if you do not have the rights to access the IOSP class

IOSP lifecycle and thread safety

An IOSP is registered by passing in the IOSP class. An object of that class is immediately instantiated and stored. This object is used when NetcdfFile queries all the IOSPs by calling isValidFile. This makes the querying as fast as possible. Since there is only one IOSP object for the library, the isValidFile method must be made thread-safe. To make it thread safe, isValidFile must modify only local (heap) variables, not instance or class variables.

When an IOSP claims a file, a new IOSP object is created and open is called on it. Therefore each dataset that is opened has its own IOSP instance assigned to it, and so the other methods of the IOSP do not need to be thread-safe. The NetcdfFile keeps a reference to this IOSP object. When the client releases all references to the NetcdfFile, the IOSP will be garbage collected.

Important IOSP methods

The `isValidFile` method

 // Check if this is a valid file for this IOServiceProvider. 
 public boolean isValidFile( ucar.unidata.io.RandomAccessFile raf) throws IOException;

The isValidFile method must quickly and accurately determine if the file is one that the IOSP knows how to read. If this is done incorrectly, it will interfere with reading other file types. As described in the previous section, it must be thread safe. It must also not assume what state the RandomAccessFile is in. If the file is not yours, return false as quickly as possible. An IOException must be thrown only if the file is corrupted. Since its unlikely that you can tell if the file is corrupt for any file type, you should probably catch IOExceptions and return false instead.

Example 1:

public boolean isValidFile(RandomAccessFile raf) throws IOException {
  // 1) Start reading at the first byte of the file
  raf.seek(0);
  // 2) Read 8 bytes and convert to String
  byte[] b = new byte[8];
  raf.read(b);
  String test = new String(b);
  // 3) Compare to known patterns
  return test.equals(pattern1) || test.equals(pattern2);
}

Note: The file is assumed bad if the IOSP cannot read the first 8 bytes of the file. It is hard to imagine a valid file of less than 8 bytes. Still, be careful of your assumptions.

Example 2:

public boolean isValidFile(RandomAccessFile raf) {
  try {
    // 1) The IOSP will read in numbers that it expects to be in big-endian format.
    // It must not assume what mode the RandomAccessFile is in.
    raf.order(RandomAccessFile.BIG_ENDIAN);
    raf.seek(0);
    // 2) It creates a BufrInput object and delegates the work to it.
    // Since this is a local instance, this is thread-safe.
    BufrInput bi = new BufrInput(raf);
    return bi.isValidFile();
    // 2) Catch the IOExceptions
  } catch (IOException ex) {
    return false;
  }
}

Since the instantiated BufrInput is a local instance, this is a thread-safe example. Creating new objects should be avoided when possible for speed, but sometimes it’s necessary. Note: In this example, the IOSP catches IOExceptions and returns false; it would arguably be better for BufrInput to return null, following the rule that Exceptions should only be used in exceptional circumstances. Getting passed a file that is not yours is not exceptional.

Example 3 (BAD!):

private Grib1Input scanner;
private int edition;

public boolean isValidFile(RandomAccessFile raf) throws IOException {
  raf.seek(0);
  raf.order(RandomAccessFile.BIG_ENDIAN);
  scanner = new Grib1Input(raf);
  edition = scanner.getEdition();
  return (edition == 1);
}

In this example, isValidFile violates the thread-safe requirement since the Grib1Input and edition variables are instance variables.

The mistake might be because you want to use a scanner object and edition in the rest of the methods. Here’s the right way to do this:

private Grib1Input scanner;
private int edition;

public boolean isValidFile(RandomAccessFile raf) throws IOException {
  raf.seek(0);
  raf.order(RandomAccessFile.BIG_ENDIAN);
  Grib1Input scanner = new Grib1Input(raf);
  int edition = scanner.getEdition();
  return (edition == 1);
}

public void build(RandomAccessFile raf, Group.Builder rootGroup, CancelTask cancelTask)
    throws IOException {
  raf.seek(0);
  raf.order(RandomAccessFile.BIG_ENDIAN);
  scanner = new Grib1Input(raf);
  edition = scanner.getEdition();
  // ...
}

The isValidFile method creates local variables for everything it has to do. The build method has to repeat that, but it is allowed to store instance variables that can be used in the rest of the methods, for the duration of the IOSP object.

The `build` method

  // Open existing file, and populate it. Note that you cannot reference the NetcdfFile within this routine.
  public void build(RandomAccessFile raf, Group.Builder rootGroup, CancelTask cancelTask) throws IOException;

Once an IOSP returns true on isValidFile, a new IOSP object is created and build is called on it. The job of build is to examine the contents of the file and create CDM objects that expose all of the interesting information in the file, using the Group.Builder API. Sticking with the simple Netcdf-3 data model for now, this means populating the Group.Builder object with Dimension, Attribute, and Variable objects.

`Attribute`

An Attribute is a (name, value) pair, where name is a String, and value is a 1D array of Strings or numbers. Attributes are thought of as metadata about your data. All Attributes are read and kept in memory, so you should not put large data arrays in Attributes. You can add global Attributes that apply to the entire file:

rootGroup.addAttribute(new Attribute("Conventions", "CF-1.0"));
rootGroup.addAttribute(new Attribute("version", 42));

Or you can add Attributes that are contained inside a Variable and apply only to that Variable, using the Variable.Builder API:

Variable.Builder var = Variable.builder().setName("variable");
var.addAttribute(Attribute.builder("missing_value").setDataType(DataType.DOUBLE)
    .setValues(Array.factory(DataType.DOUBLE, new int[] {1, 2}, new double[] {999.0, -999.0}))
    .build());

`Dimension`

A Dimension describes the index space for the multidimension arrays of data stored in Variables. A Dimension has a String name and in integer length. In the Netcdf-3 data model, Dimensions are shared between Variables and stored globally.

rootGroup.addDimension(Dimension.builder("lat", 190).build());
rootGroup.addDimension(Dimension.builder("lon", 360).build());

`Variable`

The actual data is contained in Variables, which are containers for multidimension arrays of data. In the Netcdf-3 data model, Variables can have type DataType.BYTE, DataType.CHAR, DataType.SHORT, DataType.INT, DataType.FLOAT, or DataType.DOUBLE.

If a Variable is unsigned (bytes, shorts or integer data types), you must add the Unsigned attribute:

var.addAttribute(new Attribute("_Unsigned", "true"));

Here is an example creating a Variable of type short called “elevation”, adding several attributes to it, and adding it to the Group.Builder. The Dimensions lat and lon must already have been added. When setting Dimensions, the slowest-varying Dimension goes first (C/Java order).

rootGroup.addVariable(Variable.builder().setParentGroupBuilder(rootGroup).setName("elevation")
    .setDataType(DataType.SHORT).setDimensionsByName("lat lon")
    .addAttribute(new Attribute("units", "m"))
    .addAttribute(
        new Attribute("long_name", "digital elevation in meters above mean sea level"))
    .addAttribute(new Attribute("missing_value", (short) -9999)));

A special kind of Variable is a Coordinate Variable, which is used to name the coordinate values of a Dimension. A Coordinate Variable has the same name as its single dimension. For example:

Variable.Builder lat = Variable.builder().setParentGroupBuilder(rootGroup).setName("lat")
    .setDataType(DataType.FLOAT).setDimensionsByName("lat")
    .addAttribute(new Attribute("units", "degrees_north"));
rootGroup.addVariable(lat);

It is often convenient for IOSPs to set the data values of coordinate (or other) Variables.

Array data = Array.makeArray(DataType.FLOAT, 180, 90.0, -1.0);
lat.setCachedData(data, false);

Here, Array.makeArray is a convenience method that generates an evenly spaced array of length 180, starting at 90.0 and incrementing -1.0. That array is then cached in the Variable, and used whenever a client asks for data from the Variable. If a Variable has cached data, then readData will never be called for it.

The `readData` method

  // Read data from a top level Variable and return a memory resident Array.
  public ucar.ma2.Array readData(Variable v2, Section section) throws IOException, InvalidRangeException;

When a client asks to read data from a Variable, the data is taken from the Variable’s data cache, if it exists, or the readData method of the IOSP is called. The client may ask for all of the data, or it may ask for a hyperslab of data described by the Section parameter. The Section contains a java.util.List of ucar.ma2.Range objects, one for each Dimension in the Variable, in order of the Variable’s Dimensions.

Here is an example, that assume the data starts at the start of the file, is in big-endian format, and is stored as a regular array of 16-bit integers on disk:

Example 1: Reading the entire `Array`

raf.seek(0);
raf.order(RandomAccessFile.BIG_ENDIAN);
int size = (int) v2.getSize();
short[] arr = new short[size];

int count = 0;
while (count < size)
  arr[count++] = raf.readShort(); // copy into primitive array

Array data = Array.factory(DataType.SHORT, v2.getShape(), arr);
return data.section(wantSection.getRanges());

The RandomAccessFile reads 16-bit integers, advancing automatically. The Array.section method creates a logical section of the data array, returning just the section requested.

For large arrays, reading in all of the data can be too expensive. If your data has a regular layout, you can use LayoutRegular helper object:

Example 2: Using `ucar.nc2.iosp.LayoutRegular` to read just the requested `Section`:

raf.seek(0);
raf.order(RandomAccessFile.BIG_ENDIAN);
int size = (int) v2.getSize();
int[] arr = new int[size];

LayoutRegular layout = new LayoutRegular(0, v2.getElementSize(), v2.getShape(), wantSection);
while (layout.hasNext()) {
  Layout.Chunk chunk = layout.next();
  raf.seek(chunk.getSrcPos());
  raf.readInt(arr, (int) chunk.getDestElem(), chunk.getNelems()); // copy into primitive array
}
return Array.factory(DataType.INT, v2.getShape(), arr);

Example 3: Storing `Variable` specific information in SPobject

The previous examples essentially assumed a single data Variable whose data starts at byte 0 of the file. Typically you want to store various kinds of information on a per-variable basis, to make it easy and fast to respond to the readData request. For example, suppose there were multiple Variables starting at different locations in the file. You might compute these file offsets during the build call, storing that and other info in a VarInfo object:

class VarInfo {
  long filePos;
  int otherStuff;
}

private RandomAccessFile raf;

public void build(RandomAccessFile raf, Group.Builder rootGroup, CancelTask cancelTask)
    throws IOException {
  // save RandomAccessFile as instance variable
  this.raf = raf;
  // ...
  Variable.Builder elev = Variable.builder().setName("elevation").setDataType(DataType.SHORT);
  // .. add Variable attributes as above

  VarInfo vinfo = new VarInfo();
  // figure out where the elevation Variable's data starts
  vinfo.filePos = calcPosition();
  vinfo.otherStuff = 42;
  elev.setSPobject(vinfo);
  // add Variable
  rootGroup.addVariable(elev);
  // ...
}

public Array readData(Variable v2, Section wantSection)
    throws IOException, InvalidRangeException {
  VarInfo vinfo = (VarInfo) v2.getSPobject();

  raf.seek(vinfo.filePos);
  raf.order(RandomAccessFile.BIG_ENDIAN);
  int size = (int) v2.getSize();
  int[] arr = new int[size];
  // ...

  return Array.factory(DataType.INT, v2.getShape(), arr);
}

The setSPobject and getSPobject methods on the Variable are for the exclusive use of the IOSP. Use them in any way you need.

The `finishBuild` method

  // Sometimes the builder needs access to the finished objects. This is called after ncfile.build()
  public void buildFinish(NetcdfFile ncfile);

Adding Coordinate System Information

Adding coordinate system information is the single most useful thing you can do to your datasets, to make them accessible to other programmers. As the IOSP writer, you are in the best position to understand the data in the file and correctly interpret it. You should, in fact, understand what the coordinate systems are at the same time you are deciding what the Dimension, Variables, and Attribute objects are.

Since there is no CoordinateSystem object directly stored in a netCDF file, CoordinateSystem information is encoded using a convention for adding Attributes, naming Variables and Dimensions, etc. in a standard way. The simplest and most direct way to add coordinate systems is to use the CDM _Coordinate Attribute Conventions. Another approach is to follow an existing convention, in particular the CF Convention is an increasingly important one for gridded model data, and work is being done to make it applicable to other kinds of data.

When a client opens your file through the NetcdfFiles interface, they see exactly what Dimension, Variable, and Attribute objects you have populated the NetcdfFile object with, no more and no less. When a client uses the NetcdfDatasets interface in enhanced mode, the coordinate system information is parsed by a CoordSysBuilder object, and Coordinate Axis, Coordinate System, and Coordinate Transform objects are created and made available through the NetcdfDataset API. In some cases, new Variables, Dimensions and Attributes may be created. Its very important that the IOSP writer follow an existing Convention and ensure that the Coordinate System information is correctly interpreted, particularly if you want to take advantage of the capabilities of the CDM Scientific Datatype Layer, such as serving the data through WCS or the Netcdf Subset Service.

Writing an IOSP - Details

Registering a new IOSP

Troubleshooting

IOSP lifecycle and thread safety

Important IOSP methods

The isValidFile method

Example 1:

Example 2:

Example 3 (BAD!):

The build method

Attribute

Dimension

Variable

The readData method

Example 1: Reading the entire Array

Example 2: Using ucar.nc2.iosp.LayoutRegular to read just the requested Section:

Example 3: Storing Variable specific information in SPobject

The finishBuild method