Writing an IOSP - Overview | netCDF-Java Documentation

netCDF-Java Documentation Writing an IOSP - Overview

Writing an IOSP for Netdf-Java (version 4+)

A client uses the NetcdfFiles, NetcdfDatasets, or one of the Scientific Feature Type APIs to read data from a CDM file. These provide a rich and sometimes complicated API to the client. Behind the scenes, when any of these APIs actually read from a dataset, however, they use a very much simpler interface, the I/O Service Provider or IOSP for short. The Netcdf Java library has many implementations of this interface, one for each different file format that it knows how to read. This design pattern is called a Service Provider.

IOSPs are managed by the NetcdfFiles class. When a client requests a dataset (by calling NetcdfFiles.open), the file is opened as a ucar.unidata.io.RandomAccessFile (an improved version of java.io.RandomAccessFile). Each registered IOSP is then asked is this your file? by calling isValidFile. The first one that returns true claims it. When you implement isValidFile in your IOSP, it must be very fast and accurate.

The ucar.nc2.IOServiceProvider interface

When implementing an IOSP, your class should extend ucar.nc2.iosp.AbstractIOServiceProvider. This provides default implementation of some of the methods in the IOServiceProvider interface, so minimally, you only have to implement a few methods:

public class MyIosp extends AbstractIOServiceProvider {

  /**
   * Methods that must be implemented
   */
  public boolean isValidFile(RandomAccessFile raf) throws IOException {
    // You must examine the file that is passed to you, and quickly and accurately determine if it can be opened
    // by this IOSP. You may not keep any state (i.e. store any information) in this call,
    // and it must be thread-safe.
  }

  public Array readData(Variable v2, Section section)
      throws IOException, InvalidRangeException {
    // Data will be read from Variable through this call. The Section defines the requested data subset.
  }

  public String getFileTypeId() {
    // See below for details on File Types.
  }

  public String getFileTypeDescription() {
    // See below for details on File Types.
  }

  public boolean isBuilder() {
    // This method should return true.
    // See notes below regarding the Builder pattern and API changes.
  }

  public void build(RandomAccessFile raf, Group.Builder rootGroup, CancelTask cancelTask)
      throws IOException {
    // If isValidFile returns true, the build method will be called.
    // This method should populate a CDM object from the RandomAccessFile, using the Group.Builder object.
    // If you need to do a lot of I/O, you should periodically check cancelTask.isCancel(), and if its true,
    // return immediately. This allows users to cancel the opening of a dataset if its taking too long.
  }

  /**
   * Methods with a default implementation, that can optionally be overriden
   */
  public void buildFinish(NetcdfFile ncfile) {
    // Implement any clean-up or finish actions for your file type.
  }

  public String getFileTypeVersion() {
    // See below for details on File Types.
  }

  public ucar.ma2.Array readSection(ParsedSectionSpec cer)
      throws IOException, InvalidRangeException {
    // If you use Structures, data for Variables that are members of Structures are read through this method.
    // If you dont override, the default implementation in AbstractIOServiceProvider is used.
    // Override in order to improve performance.
  }

  public StructureDataIterator getStructureIterator(Structure s, int bufferSize)
      throws java.io.IOException {
    // If any of your top-level variables (not inside of a Structure) are Sequences,
    // this is how the data in them will be accessed, and you must implement it.
  }

  public boolean syncExtend() throws IOException {
    // If the file may change since it was opened, you may optionally implement this routine.
    // The changes must not affect any of the structural metadata.
    // For example, in the NetCDF-3 IOSP, we check to see if the record dimension has grown.
  }

  public Object sendIospMessage(Object message) {
    // This allows applications to pass an arbitrary object to the IOSP,
    // through the NetcdfFiles.open( location, buffer_size, cancelTask, spiObject) method.
    // As a rule, you should not count on having such special information available,
    // unless you are controlling all data access in an application.
  }

  public String getDetailInfo() {
    // Here you can pass any information that is useful to debugging.
    // It can be viewed through the ToolsUI application.
  }
}

You must define your file type and assign your IOSP a unique id with the getFileTypeId, getFileTypeDescription, and getFileTypeDescription methods. See the CDM File Types documentation for more information.

Note: As of netCDF-Java version 5, IOSPs utilize a Builder design pattern to create immutable NetcdfFile objects. The Builder pattern replaces open and close with build and buildFinish. The isBuilder method indicates whether an IOSP is following the Builder pattern. Your IOSP should have an isBuilder method that returns true and should implement build instead of open.

Design goals for IOSP implementations

Allow access to the dataset through the netCDF/CDM API
Allow user access to every interesting bit of information in the dataset
Hide details related to file format (eg links, file structure details)
Try to mimic data access efficiency of netCDF-3
Create good use metadata: accurate coordinate systems, enable classification by scientific data type
Create good discovery metadata in the global attributes: title, creator, version, date created, etc.
Follow standards and good practices

Design issues for IOSP implementors

What are the netCDF objects to expose? Should I use netCDF-3 or full netCDF4/CDM data model? Attributes vs Variables?
How do I make data access efficient? What are the common use cases?
How much work should I do in the open method? Can/should I defer some processing?
Should I cache data arrays? Can I provide efficient strided access?
What to do if dataset is not self-contained : external tables, hardcoding?