Writing an IOSP for Netdf-Java (version 4+)

We will work on an example lightning data test file, which looks like:

USPLN-LIGHTNING,2006-10-23T18:01:00,2006-10-23T18:01:00
2006-10-23T17:59:39,18.415434,-93.480526,-26.8,1
2006-10-23T17:59:40,5.4274766,-71.2189314,-31.7,1
2006-10-23T17:59:44,9.3568365,-76.8001513,-34.3,1
...

This is a text file, with variable length lines. We won’t worry much about the nuances of the data, we just need to know that there are occasional header lines starting with USPLN-LIGHTNING, and a separate line for each lightning strike, with comma delimited fields. The fields are:

  1. date of strike (GMT)
  2. latitude
  3. longitude
  4. intensity
  5. number of strokes

We will walk through implementing methods in a new IOSP:

public class UspLightning extends AbstractIOServiceProvider {

  public boolean isValidFile(RandomAccessFile raf) throws IOException {
    // TO BE IMPLEMENTED
  }

  public boolean isBuilder() {
    return true;
  }

  public void build(RandomAccessFile raf, Group.Builder rootGroup, CancelTask cancelTask)
      throws IOException {
    // TO BE IMPLEMENTED
  }

  public Array readData(Variable v2, Section section)
      throws IOException, InvalidRangeException {
    // NOT IMPLEMENTED IN THIS EXAMPLE
    return null;
  }

  public String getFileTypeId() {
    return "USPLN-LIGHTNING";
  }

  public String getFileTypeDescription() {
    return "Data from lightning data test file";
  }
}
return new UspLightning();

Note: that we have already implemented three methods: isBuilder returns true, getFileTypeId returns a String identifier, and getFileTypeDescription returns a description of our IOSP.

Implementing isValidFile

First, we must identify our files. It’s not foolproof, but we will assume that all our files start with the exact String USPLN-LIGHTNING, so our isValidFile function can look like this:

String token = "USPLN-LIGHTNING";
// 1) Make sure you are at the start of the file. In general, we won't be, since some other IOSP has also been
// reading from it.
raf.seek(0);
// 2) Read in the exact number of bytes of the desired String
int n = token.length();
byte[] b = new byte[n];
raf.read(b);
// 3) Turn it into a String and require an exact match.
String got = new String(b);
return got.equals(token);

Reading a file

To implement our build method, we will have to add all the Variable, Attributes and Dimensions to the empty NetcdfFile object. The Dimensions have to have the actual lengths in them, so we need to find out how many strike records there are. Since these are variable length records, we have no choice but to read through the entire file. So we start with creating a private method to do so. We will ignore the occasional header records, and place each strike into a Strike object:

private int readAllData(RandomAccessFile raf)
    throws IOException, NumberFormatException, ParseException {
  ArrayList records = new ArrayList();
  // 1) This allows us to parse date Strings.
  java.text.SimpleDateFormat isoDateTimeFormat =
      new java.text.SimpleDateFormat("yyyy-MM-dd'T'HH:mm:ss");
  isoDateTimeFormat.setTimeZone(java.util.TimeZone.getTimeZone("GMT"));
  // 2) Make sure we are at the start of the file.
  raf.seek(0);
  while (true) {
    String line = raf.readLine();
    // 3) Read one line at a time. When finished, we get a null return.
    if (line == null)
      break;
    // 4) Skip the header lines
    if (line.startsWith(token))
      continue;
    // 5) A StringTokenizer will break the line up into tokens, using the "," character.
    // It turns out that raf.readLine() leave the line endings on, so by including them in here,
    // they will be ignored by the StringTokenizer.
    StringTokenizer stoker = new StringTokenizer(line, ",\r\n");
    while (stoker.hasMoreTokens()) {
      // 6) Get the comma-delimited tokens and parse them according to their data type.
      Date d = isoDateTimeFormat.parse(stoker.nextToken());
      double lat = Double.parseDouble(stoker.nextToken());
      double lon = Double.parseDouble(stoker.nextToken());
      double amp = Double.parseDouble(stoker.nextToken());
      int nstrikes = Integer.parseInt(stoker.nextToken());
      // 7) Store them in a Strike object and keep a list of them.
      Strike s = new Strike(d, lat, lon, amp, nstrikes);
      records.add(s);
    }
  }
  // 8) Return the number of records.
  return records.size();
}

private class Strike {
  int d;
  double lat, lon, amp;
  int n;

  Strike(Date d, double lat, double lon, double amp, int n) {
    // 9) We are keeping the data as a number of seconds
    this.d = (int) (d.getTime() / 1000);
    this.lat = lat;
    this.lon = lon;
    this.amp = amp;
    this.n = n;
  }
}

Implementing the build method

Now we can populate the empty Group.Builder with the necessary objects in our build method, as follows:

// 1) Read through the data, find out how many records there are.
int n;
try {
  n = readALlData(raf);
} catch (ParseException e) {
  // 2) Not really a very robust way to handle this;
  // it would maybe be better to discard individual malformed lines.
  throw new IOException("bad data");
}

// 3) Create a Dimension named record, or length n. Add it to the file.
Dimension dim = Dimension.builder("record", n).build();
rootGroup.addDimension(dim);

// 4) Add a Variable named date. It has the single dimension named record.
// To be udunits compatible, we have decided to encode it as seconds since 1970-01-01 00:00:00,
// which we set as the units. We make it an integer data type.
rootGroup.addVariable(Variable.builder().setName("date").setDataType(DataType.INT)
    .addDimension(dim).addAttribute(new Attribute("long_name", "date of strike"))
    .addAttribute(new Attribute("units", "seconds since 1970-01-01 00:00;00")));

// 5) Similarly, we go through and add the other Variables, adding units and long_name attributes, etc.
rootGroup.addVariable(Variable.builder().setName("lat").setDataType(DataType.DOUBLE)
    .addDimension(dim).addAttribute(new Attribute("long_name", "latitude"))
    .addAttribute(new Attribute("units", "degrees_north")));

rootGroup.addVariable(Variable.builder().setName("lon").setDataType(DataType.DOUBLE)
    .addDimension(dim).addAttribute(new Attribute("long_name", "longitude"))
    .addAttribute(new Attribute("units", "degrees_east")));

rootGroup.addVariable(Variable.builder().setName("strikeAmplitude").setDataType(DataType.DOUBLE)
    .addDimension(dim).addAttribute(new Attribute("long_name", "amplitude of strike"))
    .addAttribute(new Attribute("units", "kAmps"))
    .addAttribute(new Attribute("missing_value", new Double(999))));

rootGroup.addVariable(Variable.builder().setName("strokeCount").setDataType(DataType.INT)
    .addDimension(dim).addAttribute(new Attribute("long_name", "number of strokes per flash"))
    .addAttribute(new Attribute("units", "")));

// 7) Add a few global attributes. On a real IOSP, we would try to make this much more complete.
rootGroup.addAttribute(new Attribute("title", "USPN Lightning Data"));
rootGroup.addAttribute(new Attribute("history", "Read directly by Netcdf Java IOSP"));

Implementing read methods

At this point we need to figure out how to implement the read methods. Since we have no Structures, we can ignore readNestedData. Of course, you are probably saying “we already read the data, are we just going to throw it away?”. So for now, lets suppose that we have decided that these are always small enough files that we can safely read the entire data into memory. This allows us to create the data arrays during the open and cache them in the Variable.

First we’ll create some instance fields to hold our read data, one for each netCDF Variable:

private ArrayInt.D1 dateArray;
private ArrayDouble.D1 latArray;
private ArrayDouble.D1 lonArray;
private ArrayDouble.D1 ampArray;
private ArrayInt.D1 nstrokesArray;

The additional code in for our readAllData method looks like:

private int readAllData(RandomAccessFile raf)
    throws IOException, NumberFormatException, ParseException {
  ArrayList records = new ArrayList();
  // ...
  // 1) Create the Strike records same as above ....
  // ...
  int n = records.size();
  int[] shape = new int[] {n};
  // 2) Once we know how many records there are, we create a 1D Array of that length.
  // For convenience we cast them to the rank and type specific Array subclass.
  dateArray = (ArrayInt.D1) Array.factory(DataType.INT, shape);
  latArray = (ArrayDouble.D1) Array.factory(DataType.DOUBLE, shape);
  lonArray = (ArrayDouble.D1) Array.factory(DataType.DOUBLE, shape);
  ampArray = (ArrayDouble.D1) Array.factory(DataType.DOUBLE, shape);
  nstrokesArray = (ArrayInt.D1) Array.factory(DataType.INT, shape);

  // 3) Loop through all the records and transfer the data into the corresponding Arrays.
  for (int i = 0; i < records.size(); i++) {
    Strike strike = (Strike) records.get(i);
    dateArray.set(i, strike.d);
    latArray.set(i, strike.lat);
    lonArray.set(i, strike.lon);
    ampArray.set(i, strike.amp);
    nstrokesArray.set(i, strike.n);
  }

  return n;
}

Note: Once we return from this method, the ArrayList of records and the Strike objects themselves are no longer used anywhere, so they will get garbage-collected. So we don’t have the data taking twice as much space as needed.

Caching the read data

// ...
rootGroup.addVariable(Variable.builder().setName("date").setDataType(DataType.INT)
    .addDimension(dim).addAttribute(new Attribute("long_name", "date of strike"))
    .addAttribute(new Attribute("units", "seconds since 1970-01-01 00:00;00"))
    .setCachedData(dateArray, false));

rootGroup.addVariable(Variable.builder().setName("lat").setDataType(DataType.DOUBLE)
    .addDimension(dim).addAttribute(new Attribute("long_name", "latitude"))
    .addAttribute(new Attribute("units", "degrees_north")).setCachedData(latArray, false));
// ...
// do this for all variables

The method setCachedData sets the data array for a Variable. It must be the complete data array for the Variable, with the correct type and shape. Having set this, the read method will never be called for that Variable, it will always be satisfied from the cached data Array. If all Variables have cached data, then the read method will never be called, so we don’t need to implement it.

Adding Coordinate Systems and Typed Dataset information

AS an IOServiceProvider implementer, you presumably know everything there is to know about this data file. If you want your data file to be understood by the higher layers of the CDM, you should also add the coordinate system and typed dataset information that is needed. To do so, you need to understand the Conventions used by these layers. In this case, we have Point data, so we are going to use Unidata’s _Coordinate Conventions and Unidata’s Point Observation Conventions which requires us to add certain attributes. The payoff is that we can then look at our data through the Point tab of the ToolsUI.

The additional code in the build method looks like this:

// ...
// 1) Add attributes on time, lat, and lon variables that identify them as coordinate axes
rootGroup.addVariable(Variable.builder().setName("date").setDataType(DataType.INT)
    .addDimension(dim).addAttribute(new Attribute("long_name", "date of strike"))
    .addAttribute(new Attribute("units", "seconds since 1970-01-01 00:00;00"))
    .addAttribute(new Attribute(_Coordinate.AxisType, AxisType.Time.toString()))
    .setCachedData(dateArray, false));

rootGroup.addVariable(Variable.builder().setName("lat").setDataType(DataType.DOUBLE)
    .addDimension(dim).addAttribute(new Attribute("long_name", "latitude"))
    .addAttribute(new Attribute("units", "degrees_north"))
    .addAttribute(new Attribute(_Coordinate.AxisType, AxisType.Lat.toString()))
    .setCachedData(latArray, false));

rootGroup.addVariable(Variable.builder().setName("lon").setDataType(DataType.DOUBLE)
    .addDimension(dim).addAttribute(new Attribute("long_name", "longitude"))
    .addAttribute(new Attribute("units", "degrees_east"))
    .addAttribute(new Attribute(_Coordinate.AxisType, AxisType.Lon.toString()))
    .setCachedData(lonArray, false));
// ...
// 2) Add some global attributes identifying the Convention, the datatype,
// and which dimension to use to find the observations
rootGroup.addAttribute(new Attribute("Conventions", "Unidata Observation Dataset v1.0"));
rootGroup.addAttribute(new Attribute("cdm_data_type", "Point"));
rootGroup.addAttribute(new Attribute("observationDimension", "record"));

// 3) The Point data type also requires that the time range and lat/lon bounding box be specified as shown
// in global attributes.
MAMath.MinMax mm = MAMath.getMinMax(dateArray);
rootGroup.addAttribute(
    new Attribute("time_coverage_start", ((int) mm.min) + "seconds since 1970-01-01 00:00;00"));
rootGroup.addAttribute(
    new Attribute("time_coverage_end", ((int) mm.max) + "seconds since 1970-01-01 00:00;00"));

mm = MAMath.getMinMax(latArray);
rootGroup.addAttribute(new Attribute("geospatial_lat_min", new Double(mm.min)));
rootGroup.addAttribute(new Attribute("geospatial_lat_max", new Double(mm.max)));

mm = MAMath.getMinMax(lonArray);
rootGroup.addAttribute(new Attribute("geospatial_lon_min", new Double(mm.min)));
rootGroup.addAttribute(new Attribute("geospatial_lon_max", new Double(mm.max)));

Register your IOSP

We now have not only a working IOSP, but a PointObsDataset that can be displayed and georeferenced! You will need to register your IOSP. The easiest way is to load it at runtime (e.g. on application spin-up) as follows:

NetcdfFiles.registerIOProvider("UspLightning");

For more information on registering an IOSP, see the runtime loading

Once your IOSP is registered, calling NetcdfFiles.open or NetcdfDatasets.open on the provided data file will use your implementation.