Tutorial: Working with NetcdfFile

A NetcdfFile provides read-only access to datasets through the netCDF API (to write data, use NetcdfFormatWriter). Use the static NetcdfFiles.open methods to open a netCDF file, an HDF5 file, or any other file which has an IOServiceProvider implementation that can read the file with the NetCDF API. Use NetcdfDataset.openFile for more general reading capabilities, including OPeNDAP, NcML, and THREDDS datasets.

Read access for some file types is provided through optional modules and must be included in your netCDF build as artifacts. To see what module you will need to include for your data, read more about CDM file types.

Opening a NetcdfFile

A simple way to open a NetcdfFile:

try (NetcdfFile ncfile = NetcdfFiles.open(pathToYourFileAsStr)) {
  // Do cool stuff here
} catch (IOException ioe) {
  // Handle less-cool exceptions here
  logger.log(yourOpenNetCdfFileErrorMsgTxt, ioe);
}

The NetcdfFiles class will open local files for which an IOServiceProvider implementation exists. The current set of files that can be opened by the CDM are here.

When you open any of these files, the IOSP populates the NetcdfFile with a set of Variable, Dimension, Attribute, and possibly Group, Structure, and EnumTypedef objects that describe what data is available for reading from the file. These objects are called the structural metadata of the dataset, and they are read into memory at the time the file is opened. The data itself is not read until requested.

If NetcdfFiles.open is given a filename that ends with .Z, .zip, .gzip, .gz, or .bz2, it will uncompress the file before opening, preferably in the same directory as the original file. See DiskCache for more details.

Using ToolsUI to browse the metadata of a dataset

The NetCDF Tools User Interface (aka ToolsUI) is a program for browsing and debugging NetCDF files. You can download toolsUI.jar from the netCDF-Java downloads page. You can then run ToolsUI from the command line using a command similar to:

java -Xmx1g -jar toolsUI.jar
Tools UI Viewer

In this screen shot, the Viewer tab is shown displaying a NetCDF file in a tree view (on the left), and a table view of the variables (on the right). By selecting a Variable, right clicking to get the context menu, and choosing Show Declaration, you can also display the Variable’s declaration in CDL in a popup window. The NCDump Data option from the same context menu will allow you to dump all or part of a Variable’s data values from a window like this:

Tools UI Viewer

Note that you can edit the Variable’s ranges (T(0:30:10, 1, 0:3) in this example) to dump just a subset of the data. These are expressed with Fortran 90 array section syntax, using zero-based indexing. For example, varName( 12:22 , 0:100:2, :, 17) specifies an array section for a four dimensional variable. The first dimension includes all the elements from 12 to 22 inclusive, the second dimension includes the elements from 0 to 100 inclusive with a stride of 2, the third includes all the elements in that dimension, and the fourth includes just the 18th element.

The following code to dump data from your program is equivalent to the above ToolsUI actions:

// varName is a string with the name of a variable, e.g. "T"
Variable v = ncfile.findVariable(varName);
if (v == null)
  return;
try {
  // sectionSpec is string specifying a range of data, eg ":,1:2,0:3"
  Array data = v.readArray(new Section(sectionSpec));
  String arrayStr = NcdumpArray.printArray(data, varName, null);
  logger.log(arrayStr);
} catch (IOException | InvalidRangeException e) {
  logger.log(yourReadVarErrorMsgTxt, e);
}

Reading data from a Variable

If you want all the data in a variable, use:

Array data = v.readArray();

When you want to subset the data, you have a number of options, all of which have situations where they are the most convenient. Take, for example, the 3D variable T in the above example:

double T(time=31, lat=3, lon=4);

and you want to extract the third time step, and all lat and lon points, then use:

int[] origin = new int[] {2, 0, 0};
int[] size = new int[] {1, 3, 4};
Array data = v.readArray(new Section(origin, size));

Or suppose you want to loop over all time steps, and make it general to handle any sized 3 dimensional variable:

int[] varShape = v.getShape();
int[] origin = new int[3];

int[] size = new int[] {1, varShape[1], varShape[2]};
// read each time step, one at a time
for (int i = 0; i < varShape[0]; i++) {
  origin[0] = i;
  Array data = v.readArray(new Section(origin, size));
  logger.log(NcdumpArray.printArray(data, "T", null));
}

Note that varShape holds the total number of elements that can be read from the variable; origin is the starting index, and size is the number of elements to read. This is different from the Fortran 90 array syntax, which uses the starting and ending array indices (inclusive):

Array data = v.readArray(new Section("2,0:2,1:3"));

If you want strided access, you can use the Fortran 90 string routine:

Array data = v.readArray(new Section("2,0:2,0:3:2"));

Reading with Range Objects

For general programing, use the read method that takes a ucar.array.Section. A ucar.array.Range follows the Fortran 90 array syntax, taking the starting and ending indices (inclusive), and an optional stride:

List ranges = new ArrayList();
// List of Ranges equivalent to ("2,0:2,0:3:2")
ranges.add(new Range(2, 2));
ranges.add(new Range(0, 2));
ranges.add(new Range(0, 3, 2));
Array data = v.readArray(new Section(ranges));

For example, to loop over all time steps of the 3D variable T, taking every second lat and every second lon point:

// get variable shape
int[] varShape = v.getShape();
List ranges = new ArrayList();
ranges.add(null);
ranges.add(new Range(0, varShape[1] - 1, 2));
ranges.add(new Range(0, varShape[2] - 1, 2));

// loop time steps
for (int i = 0; i < varShape[0]; i++) {
  ranges.set(0, new Range(i, i));
  Array data = v.readArray(new Section(ranges));
  logger.log(NcdumpArray.printArray(data, "T", null));
}

The Section class encapsulates a list of Range objects and contains a number of useful methods for moving between lists of Ranges and origin, shape arrays. To create a Section from a list of Ranges:

Section section = new Section(rangeList);

// convert section to equivalent origin, size arrays
int[] origins = section.getOrigin();
int[] shape = section.getShape();

Reading scalar data

Data from a Variable is always read into an Array, however the getScalar convenience method converts theArray to a scalar. If you know the data type, you can read any Variable into a scalar numeric:

int ival = ((Array<Integer>) intVar.readArray()).getScalar();
double dval = ((Array<Double>) doubleVar.readArray()).getScalar();

Iterating data in Arrays

Once you have read the data in, you usually have an Array object to work with. The shape of the Array will match the shape of the Variable (if all data was read) or the shape of the Section (if a subset was read). There are a number of ways to access data in the Array. Here is an example of accessing data in a 3D array, keeping track of index:

Array data = v.readArray();

int[] shape = data.getShape();
Index index = data.getIndex();
for (int i = 0; i < shape[0]; i++) {
  for (int j = 0; j < shape[1]; j++) {
    for (int k = 0; k < shape[2]; k++) {
      double dval = (double) data.get(index.set(i, j, k));
    }
  }
}

If you want to iterate over all the data in a variable of any rank, without keeping track of the indices, you can use the Array.iterator:

Array data = v.readArray();
double sum = 0.0;

Iterator ii = data.iterator();
while (ii.hasNext()) {
  sum += (double) ii.next();
}

If you know the Array’s rank and type, you can cast to the appropriate subclass and use the get() and set() methods, for example:

Array<Double> data = (Array<Double>) v.readArray();

int[] shape = data.getShape();
Index index = data.getIndex();
for (int i = 0; i < shape[0]; i++) {
  for (int j = 0; j < shape[1]; j++) {
    for (int k = 0; k < shape[2]; k++) {
      double dval = data.get(i, j, k);
    }
  }
}

Working with data read from a NetCDF (6.0+)

As of netCDF-Java 6.0, all objects used for reads and writes are immutable. Data read by netCDF-Java is returned as an immutable ucar.array.Array<T> object. If you would like to do any data manipulation on the read data, you will need to use the Arrays.copyPrimitiveArray method, which will return a 1D primitive array of the data:

Array<Double> dataSrc = (Array<Double>) v.readArray(); // data to be copied
double[] dest = (double[]) ucar.array.Arrays.copyPrimitiveArray(dataSrc);
// do something with copied data here
// ...

Writing temporary files to the disk cache

There are a number of places where the library needs to write files to disk. If you end up using the file more than once, its useful to cache these files.

  1. If a filename ends with .Z, .zip, .gzip, .gz, or .bz2, NetcdfFile.open will write an uncompressed file of the same name, but without the suffix.
  2. The GRIB IOSP writes an index file with the same name and a .gbx extension. Other IOSPs may do similar things in the future.
  3. Nexrad2 files that are compressed will be uncompressed to a file with an .uncompress prefix.

Before NetcdfFile.open writes the temporary file, it looks to see if it already exists. By default, it prefers to place the temporary file in the same directory as the original file. If it does not have write permission in that directory, by default it will use the directory ${user_home}/.unidata/cache/. You can change the directory by calling ucar.nc2.util.DiskCache.setRootDirectory(String cacheDir).

You might want to always write temporary files to the cache directory, in order to manage them in a central place. To do so, call ucar.nc2.util.DiskCache.setCachePolicy( boolean alwaysInCache) with parameter alwaysInCache = true. You may want to limit the amount of space the disk cache uses (unless you always have data in writeable directories, so that the disk cache is never used). To scour the cache, call DiskCache.cleanCache(). For long-running applications, you might want to do this periodically in a background timer thread, as in the following example.

// 1) Get the current time and add 30 minutes to it
Calendar c = Calendar.getInstance(); // contains current startup time
c.add(Calendar.MINUTE, 30); // add 30 minutes to current time

// 2) Make a class that extends TimerTask; the run method is called by the Timer
class CacheScourTask extends java.util.TimerTask {
  public void run() {
    StringBuilder sbuff = new StringBuilder();
    // 3) Scour the cache, allowing 100 Mbytes of space to be used
    DiskCache.cleanCache(100 * 1000 * 1000, sbuff);
    sbuff.append("----------------------\n");
    // 4) Optionally log a message with the results of the scour.
    logger.log(sbuff.toString());
  }
}

// 5) Start up a timer that executes the cache scour task every 60 minutes, starting in 30 minutes
java.util.Timer timer = new Timer();
timer.scheduleAtFixedRate(new CacheScourTask(), c.getTime(), (long) 1000 * 60 * 60);

// 6) Make sure you cancel the time before you application exits, or else the process will not terminate.
timer.cancel();

Opening remote files on an HTTP Server

Files can be made accessible over the network by simply placing them on an HTTP (web) server, like Apache. The server must be configured to set the Content-Length and Accept-Ranges: bytes headers. The client that wants to read these files just uses the usual NetcdfFile.open(String location, …) method to open a file. The location contains the URL of the file, for example: https://www.unidata.ucar.edu/staff/caron/test/mydata.nc. In order to use this option you need to have HttpClient.jar in your classpath. The ucar.nc2 library uses the HTTP 1.1 protocol’s Range command to get ranges of bytes from the remote file. The efficiency of the remote access depends on how the data is accessed. Reading large contiguous regions of the file should generally be good, while skipping around the file and reading small amounts of data will be poor. In many cases, reading data from a Variable should give good performance because a Variable’s data is stored contiguously, and so can be read with a minimal number of server requests. A record Variable, however, is spread out across the file, so can incur a separate request for each record index. In that case you may do better copying the file to a local drive, or putting the file into a THREDDS server which will more efficiently subset the file on the server.

Opening remote files on AWS S3

Files stored as single objects on AWS S3 can also be accessed using NetcdfFiles and NetcdfDatasets. For more information, please see the object store section of the Dataset URL documentation.