Registering a new IOSP
You must register your IOSP at runtime before it can open any files. See runtime_loading to learn how to register your IOSP.
Note: When registering, an instance of the class will be created. This means that there must be a default constructor that has no arguments. Since there is no way to call any other constructor, the simplest thing is to not define a constructor for your class, and the compiler will add a default constructor for you. If you do define a constructor with arguments, you must also explicitly add a no-argument constructor.
If you contribute your IOSP and it becomes part of the CDM release, it will automatically be registered in the NetcdfFile
class.
Troubleshooting
When registering an IOSP, the following exceptions may occur:
InstantiationException
: occurs when no default constructor existsClassNotFoundException
: occurs when the IOSP class name passed in to the register function as aString
cannot be found by the NetcdfFileClassLoader
(this almost always means that your classpath is wrong)IllegalAccessException
: is thrown if you do not have the rights to access the IOSP class
IOSP lifecycle and thread safety
An IOSP is registered by passing in the IOSP class. An object of that class is immediately instantiated and stored. This object is used when NetcdfFile
queries all the IOSPs by calling isValidFile
. This makes the querying as fast as possible. Since there is only one IOSP object for the library, the
isValidFile
method must be made thread-safe. To make it thread safe, isValidFile
must modify only local (heap) variables, not instance or class variables.
When an IOSP claims a file, a new IOSP object is created and open
is called on it. Therefore each dataset that is opened has its own IOSP instance assigned to it,
and so the other methods of the IOSP do not need to be thread-safe. The NetcdfFile
keeps a reference to this IOSP object. When the client releases all references
to the NetcdfFile
, the IOSP will be garbage collected.
Important IOSP methods
The isValidFile
method
// Check if this is a valid file for this IOServiceProvider.
public boolean isValidFile( ucar.unidata.io.RandomAccessFile raf) throws IOException;
The isValidFile
method must quickly and accurately determine if the file is one that the IOSP knows how to read.
If this is done incorrectly, it will interfere with reading other file types. As described in the previous section, it must be thread safe.
It must also not assume what state the RandomAccessFile
is in. If the file is not yours, return false
as quickly as possible.
An IOException
must be thrown only if the file is corrupted. Since its unlikely that you can tell if the file is corrupt for any file type,
you should probably catch IOExceptions
and return false
instead.
Example 1:
public boolean isValidFile(RandomAccessFile raf) throws IOException {
// 1) Start reading at the first byte of the file
raf.seek(0);
// 2) Read 8 bytes and convert to String
byte[] b = new byte[8];
raf.read(b);
String test = new String(b);
// 3) Compare to known patterns
return test.equals(pattern1) || test.equals(pattern2);
}
Note: The file is assumed bad if the IOSP cannot read the first 8 bytes of the file. It is hard to imagine a valid file of less than 8 bytes. Still, be careful of your assumptions.
Example 2:
public boolean isValidFile(RandomAccessFile raf) {
try {
// 1) The IOSP will read in numbers that it expects to be in big-endian format.
// It must not assume what mode the RandomAccessFile is in.
raf.order(RandomAccessFile.BIG_ENDIAN);
raf.seek(0);
// 2) It creates a BufrInput object and delegates the work to it.
// Since this is a local instance, this is thread-safe.
BufrInput bi = new BufrInput(raf);
return bi.isValidFile();
// 2) Catch the IOExceptions
} catch (IOException ex) {
return false;
}
}
Since the instantiated BufrInput
is a local instance, this is a thread-safe example.
Creating new objects should be avoided when possible for speed, but sometimes it’s necessary.
Note: In this example, the IOSP catches IOExceptions
and returns false
; it would arguably be better for BufrInput
to return null
,
following the rule that Exceptions
should only be used in exceptional circumstances. Getting passed a file that is not yours is not exceptional.
Example 3 (BAD!):
private Grib1Input scanner;
private int edition;
public boolean isValidFile(RandomAccessFile raf) throws IOException {
raf.seek(0);
raf.order(RandomAccessFile.BIG_ENDIAN);
scanner = new Grib1Input(raf);
edition = scanner.getEdition();
return (edition == 1);
}
In this example, isValidFile
violates the thread-safe requirement since the Grib1Input
and edition
variables are instance variables.
The mistake might be because you want to use a scanner object and edition in the rest of the methods. Here’s the right way to do this:
private Grib1Input scanner;
private int edition;
public boolean isValidFile(RandomAccessFile raf) throws IOException {
raf.seek(0);
raf.order(RandomAccessFile.BIG_ENDIAN);
Grib1Input scanner = new Grib1Input(raf);
int edition = scanner.getEdition();
return (edition == 1);
}
public void build(RandomAccessFile raf, Group.Builder rootGroup, CancelTask cancelTask)
throws IOException {
raf.seek(0);
raf.order(RandomAccessFile.BIG_ENDIAN);
scanner = new Grib1Input(raf);
edition = scanner.getEdition();
// ...
}
The isValidFile
method creates local variables for everything it has to do. The build
method has to repeat that,
but it is allowed to store instance variables that can be used in the rest of the methods, for the duration of the IOSP object.
The build
method
// Open existing file, and populate it. Note that you cannot reference the NetcdfFile within this routine.
public void build(RandomAccessFile raf, Group.Builder rootGroup, CancelTask cancelTask) throws IOException;
Once an IOSP returns true on isValidFile
, a new IOSP object is created and build
is called on it. The job of build
is to examine the contents of the file
and create CDM objects that expose all of the interesting information in the file, using the Group.Builder
API.
Sticking with the simple Netcdf-3 data model for now, this means populating the Group.Builder
object with Dimension
, Attribute
, and Variable
objects.
Attribute
An Attribute
is a (name, value) pair, where name is a String
, and value is a 1D array of Strings
or numbers.
Attributes
are thought of as metadata about your data. All Attributes
are read and kept in memory, so you should not put large data arrays in Attributes
.
You can add global Attributes
that apply to the entire file:
rootGroup.addAttribute(new Attribute("Conventions", "CF-1.0"));
rootGroup.addAttribute(new Attribute("version", 42));
Or you can add Attributes
that are contained inside a Variable
and apply only to that Variable
, using the Variable.Builder
API:
Variable.Builder var = Variable.builder().setName("variable");
var.addAttribute(Attribute.builder("missing_value").setDataType(DataType.DOUBLE)
.setValues(Array.factory(DataType.DOUBLE, new int[] {1, 2}, new double[] {999.0, -999.0}))
.build());
Dimension
A Dimension
describes the index space for the multidimension arrays of data stored in Variables
.
A Dimension
has a String
name and in integer length. In the Netcdf-3 data model, Dimensions
are shared between Variables
and stored globally.
rootGroup.addDimension(Dimension.builder("lat", 190).build());
rootGroup.addDimension(Dimension.builder("lon", 360).build());
Variable
The actual data is contained in Variables
, which are containers for multidimension arrays of data. In the Netcdf-3 data model, Variables
can have
type DataType.BYTE
, DataType.CHAR
, DataType.SHORT
, DataType.INT
, DataType.FLOAT
, or DataType.DOUBLE
.
If a Variable
is unsigned (bytes, shorts or integer data types), you must add the Unsigned
attribute:
var.addAttribute(new Attribute("_Unsigned", "true"));
Here is an example creating a Variable
of type short called “elevation”, adding several attributes to it, and adding it to the Group.Builder
.
The Dimensions
lat and lon must already have been added. When setting Dimensions
, the slowest-varying Dimension
goes first (C/Java order).
rootGroup.addVariable(Variable.builder().setParentGroupBuilder(rootGroup).setName("elevation")
.setDataType(DataType.SHORT).setDimensionsByName("lat lon")
.addAttribute(new Attribute("units", "m"))
.addAttribute(
new Attribute("long_name", "digital elevation in meters above mean sea level"))
.addAttribute(new Attribute("missing_value", (short) -9999)));
A special kind of Variable
is a Coordinate Variable
, which is used to name the coordinate values of a Dimension
.
A Coordinate Variable
has the same name as its single dimension. For example:
Variable.Builder lat = Variable.builder().setParentGroupBuilder(rootGroup).setName("lat")
.setDataType(DataType.FLOAT).setDimensionsByName("lat")
.addAttribute(new Attribute("units", "degrees_north"));
rootGroup.addVariable(lat);
It is often convenient for IOSPs to set the data values of coordinate (or other) Variables
.
Array data = Array.makeArray(DataType.FLOAT, 180, 90.0, -1.0);
lat.setCachedData(data, false);
Here, Array.makeArray
is a convenience method that generates an evenly spaced array of length 180, starting at 90.0 and incrementing -1.0.
That array is then cached in the Variable
, and used whenever a client asks for data from the Variable
. If a Variable
has cached data,
then readData
will never be called for it.
The readData
method
// Read data from a top level Variable and return a memory resident Array.
public ucar.ma2.Array readData(Variable v2, Section section) throws IOException, InvalidRangeException;
When a client asks to read data from a Variable
, the data is taken from the Variable
’s data cache, if it exists, or the readData
method of the IOSP is called.
The client may ask for all of the data, or it may ask for a hyperslab
of data described by the Section
parameter. The Section
contains a java.util.List
of
ucar.ma2.Range
objects, one for each Dimension
in the Variable
, in order of the Variable
’s Dimensions
.
Here is an example, that assume the data starts at the start of the file, is in big-endian format, and is stored as a regular array of 16-bit integers on disk:
Example 1: Reading the entire Array
raf.seek(0);
raf.order(RandomAccessFile.BIG_ENDIAN);
int size = (int) v2.getSize();
short[] arr = new short[size];
int count = 0;
while (count < size)
arr[count++] = raf.readShort(); // copy into primitive array
Array data = Array.factory(DataType.SHORT, v2.getShape(), arr);
return data.section(wantSection.getRanges());
The RandomAccessFile
reads 16-bit integers, advancing automatically. The Array.section
method creates a logical section of the data array,
returning just the section requested.
For large arrays, reading in all of the data can be too expensive. If your data has a regular layout, you can use LayoutRegular
helper object:
Example 2: Using ucar.nc2.iosp.LayoutRegular
to read just the requested Section
:
raf.seek(0);
raf.order(RandomAccessFile.BIG_ENDIAN);
int size = (int) v2.getSize();
int[] arr = new int[size];
LayoutRegular layout = new LayoutRegular(0, v2.getElementSize(), v2.getShape(), wantSection);
while (layout.hasNext()) {
Layout.Chunk chunk = layout.next();
raf.seek(chunk.getSrcPos());
raf.readInt(arr, (int) chunk.getDestElem(), chunk.getNelems()); // copy into primitive array
}
return Array.factory(DataType.INT, v2.getShape(), arr);
Example 3: Storing Variable
specific information in SPobject
The previous examples essentially assumed a single data Variable
whose data starts at byte 0 of the file.
Typically you want to store various kinds of information on a per-variable basis, to make it easy and fast to respond to the readData
request.
For example, suppose there were multiple Variables
starting at different locations in the file. You might compute these file offsets during the build
call,
storing that and other info in a VarInfo
object:
class VarInfo {
long filePos;
int otherStuff;
}
private RandomAccessFile raf;
public void build(RandomAccessFile raf, Group.Builder rootGroup, CancelTask cancelTask)
throws IOException {
// save RandomAccessFile as instance variable
this.raf = raf;
// ...
Variable.Builder elev = Variable.builder().setName("elevation").setDataType(DataType.SHORT);
// .. add Variable attributes as above
VarInfo vinfo = new VarInfo();
// figure out where the elevation Variable's data starts
vinfo.filePos = calcPosition();
vinfo.otherStuff = 42;
elev.setSPobject(vinfo);
// add Variable
rootGroup.addVariable(elev);
// ...
}
public Array readData(Variable v2, Section wantSection)
throws IOException, InvalidRangeException {
VarInfo vinfo = (VarInfo) v2.getSPobject();
raf.seek(vinfo.filePos);
raf.order(RandomAccessFile.BIG_ENDIAN);
int size = (int) v2.getSize();
int[] arr = new int[size];
// ...
return Array.factory(DataType.INT, v2.getShape(), arr);
}
The setSPobject
and getSPobject
methods on the Variable
are for the exclusive use of the IOSP. Use them in any way you need.
The finishBuild
method
// Sometimes the builder needs access to the finished objects. This is called after ncfile.build()
public void buildFinish(NetcdfFile ncfile);
Adding Coordinate System Information
Adding coordinate system information is the single most useful thing you can do to your datasets,
to make them accessible to other programmers. As the IOSP writer, you are in the best position to understand the data in the file and
correctly interpret it. You should, in fact, understand what the coordinate systems are at the same time you are deciding what the
Dimension
, Variables
, and Attribute
objects are.
Since there is no CoordinateSystem
object directly stored in a netCDF file, CoordinateSystem information is encoded using a
convention for adding Attributes
,
naming Variables
and Dimensions
, etc. in a standard way. The simplest and most direct way to add coordinate systems is to use the
CDM _Coordinate Attribute Conventions. Another approach is to follow an existing convention, in particular the
CF Convention is an increasingly important one for gridded model data,
and work is being done to make it applicable to other kinds of data.
When a client opens your file through the NetcdfFiles
interface, they see exactly what Dimension
, Variable
, and Attribute
objects
you have populated the NetcdfFile
object with, no more and no less. When a client uses the NetcdfDatasets
interface in enhanced mode,
the coordinate system information is parsed by a CoordSysBuilder object, and Coordinate Axis, Coordinate System,
and Coordinate Transform objects are created and made available through the NetcdfDataset API. In some cases, new Variables,
Dimensions and Attributes may be created. Its very important that the IOSP writer follow an existing Convention and ensure that the
Coordinate System information is correctly interpreted, particularly if you want to take advantage of the capabilities of the CDM Scientific
Datatype Layer, such as serving the data through WCS or the
Netcdf Subset Service.