Tutorial: Working with NetcdfFile
A NetcdfFile
provides read-only access to datasets through the netCDF API (to write data, use NetcdfFileWriteable
).
Use the static NetcdfFiles.open
methods to open a netCDF file, an HDF5 file, or any other file which has an IOServiceProvider
implementation that can read the file with the NetCDF API.
Use NetcdfDataset.openFile
for more general reading capabilities, including OPeNDAP, NcML, and THREDDS datasets.
Read access for some file types is provided through optional modules and must be included in your netCDF build as artifacts. To see what module you will need to include for your data, read more about CDM file types.
Opening a NetcdfFile
A simple way to open a NetcdfFile:
try (NetcdfFile ncfile = NetcdfFiles.open(pathToYourFileAsStr)) {
// Do cool stuff here
} catch (IOException ioe) {
// Handle less-cool exceptions here
logger.log(yourOpenNetCdfFileErrorMsgTxt, ioe);
}
The NetcdfFiles
class will open local files for which an IOServiceProvider
implementation exists.
The current set of files that can be opened by the CDM are here.
When you open any of these files, the IOSP populates the NetcdfFile
with a set of Variable
, Dimension
, Attribute
, and possibly Group
, Structure
, and EnumTypedef
objects that describe what data is available for reading from the file.
These objects are called the structural metadata of the dataset, and they are read into memory at the time the file is opened. The data itself is not read until requested.
If NetcdfFiles.open
is given a filename that ends with .Z
, .zip
, .gzip
, .gz, or .bz2
, it will uncompress the file before opening, preferably in the same directory as the original file.
See DiskCache for more details.
Using ToolsUI to browse the metadata of a dataset
The NetCDF Tools User Interface (aka ToolsUI) is a program for browsing and debugging NetCDF files. You can download toolsUI.jar from the netCDF-Java downloads page. You can then run ToolsUI from the command line using a command similar to:
java -Xmx1g -jar toolsUI.jar
In this screen shot, the Viewer tab is shown displaying a NetCDF file in a tree view (on the left), and a table view of the variables (on the right).
By selecting a Variable
, right clicking to get the context menu, and choosing Show Declaration, you can also display the Variable
’s declaration in CDL in a popup window.
The NCDump Data option from the same context menu will allow you to dump all or part of a Variable
’s data values from a window like this:
Note that you can edit the Variable
’s ranges (T(0:30:10, 1, 0:3)
in this example) to dump just a subset of the data.
These are expressed with Fortran 90 array section syntax, using zero-based indexing.
For example, varName( 12:22 , 0:100:2, :, 17)
specifies an array section for a four dimensional variable.
The first dimension includes all the elements from 12 to 22 inclusive, the second dimension includes the elements from 0 to 100 inclusive with a stride of 2, the third includes all the elements in that dimension, and the fourth includes just the 18th element.
The following code to dump data from your program is equivalent to the above ToolsUI actions:
// varName is a string with the name of a variable, e.g. "T"
Variable v = ncfile.findVariable(varName);
if (v == null)
return;
try {
// sectionSpec is string specifying a range of data, eg ":,1:2,0:3"
Array data = v.read(sectionSpec);
String arrayStr = Ncdump.printArray(data, varName, null);
logger.log(arrayStr);
} catch (IOException | InvalidRangeException e) {
logger.log(yourReadVarErrorMsgTxt, e);
}
Reading data from a Variable
If you want all the data in a variable, use:
Array data = v.read();
When you want to subset the data, you have a number of options, all of which have situations where they are the most convenient.
Take, for example, the 3D variable T
in the above example:
double T(time=31, lat=3, lon=4);
and you want to extract the third time step, and all lat and lon points, then use:
int[] origin = new int[] {2, 0, 0};
int[] size = new int[] {1, 3, 4};
Array data3D = v.read(origin, size);
// remove dimensions of length 1
Array data2D = data3D.reduce();
Notice that the result of reading a 3D Variable is a 3D Array. To make it a 2D array call Array.reduce(), which removes any dimensions of length 1.
Or suppose you want to loop over all time steps, and make it general to handle any sized 3 dimensional variable:
int[] varShape = v.getShape();
int[] origin = new int[3];
int[] size = new int[] {1, varShape[1], varShape[2]};
// read each time step, one at a time
for (int i = 0; i < varShape[0]; i++) {
origin[0] = i;
Array data2D = v.read(origin, size).reduce(0);
logger.log(Ncdump.printArray(data2D, "T", null));
}
In this case, we call reduce(0), to reduce dimension 0, which we know has length one, but leave the other two dimensions alone.
Note that varShape
holds the total number of elements that can be read from the variable; origin
is the starting index, and size
is the number of elements to read.
This is different from the Fortran 90 array syntax, which uses the starting and ending array indices (inclusive):
Array data = v.read("2,0:2,1:3");
If you want strided access, you can use the Fortran 90 string routine:
Array data = v.read("2,0:2,0:3:2");
Reading with Range Objects
For general programing, use the read method that takes a List
of ucar.ma2.Range
objects.
A Range
follows the Fortran 90 array syntax, taking the starting and ending indices (inclusive), and an optional stride:
List ranges = new ArrayList();
// List of Ranges equivalent to ("2,0:2,0:3:2")
ranges.add(new Range(2, 2));
ranges.add(new Range(0, 2));
ranges.add(new Range(0, 3, 2));
Array data = v.read(ranges);
For example, to loop over all time steps of the 3D variable T
, taking every second lat and every second lon point:
// get variable shape
int[] varShape = v.getShape();
List ranges = new ArrayList();
ranges.add(null);
ranges.add(new Range(0, varShape[1] - 1, 2));
ranges.add(new Range(0, varShape[2] - 1, 2));
// loop time steps
for (int i = 0; i < varShape[0]; i++) {
ranges.set(0, new Range(i, i));
Array data2D = v.read(ranges).reduce(0);
logger.log(Ncdump.printArray(data2D, "T", null));
}
The Section
class encapsulates a list of Range
objects and contains a number of useful methods for moving between
lists of Ranges
and origin, shape arrays. To create a Section
from a list of Ranges
:
// a builder is used to create or modify a section
Section.Builder builder = new Section.Builder();
builder.appendRanges(ranges);
Section section = builder.build();
// convert section to equivalent origin, size arrays
int[] origins = section.getOrigin();
int[] shape = section.getShape();
Reading Scalar Data
There are convenience routines in the Variable
class for reading scalar variables:
double dval = v.readScalarDouble();
float fval = v.readScalarFloat();
int ival = v.readScalarInt();
The readScalarDouble()
routine, for example, will read a scalar variable’s single value as a double, converting it to double if needed.
This can also be used for 1D variables with dimension length = 1, e.g.:
double height_above_ground(level=1);
Scalar routines are available in for data types: byte
, double
, float
, int
, long
, short
, and String
.
The String
scalar method:
String sval = v.readScalarString();
can be used on scalar String
or char
, as well as 1D char
variables of any size, such as:
char varname(name_strlen=77);
Manipulating data in Arrays
Once you have read the data in, you usually have an Array
object to work with.
The shape of the Array
will match the shape of the Variable
(if all data was read) or the shape of the Section
(if a subset was read).
There are a number of ways to access data in the Array
.
Here is an example of accessing data in a 3D array, keeping track of index:
Array data = v.read();
int[] shape = data.getShape();
Index index = data.getIndex();
for (int i = 0; i < shape[0]; i++) {
for (int j = 0; j < shape[1]; j++) {
for (int k = 0; k < shape[2]; k++) {
double dval = data.getDouble(index.set(i, j, k));
}
}
}
If you want to iterate over all the data in a variable of any rank, without keeping track of the indices, you can use the IndexIterator
:
Array data = v.read();
double sum = 0.0;
IndexIterator ii = data.getIndexIterator();
while (ii.hasNext()) {
sum += ii.getDoubleNext();
}
You can also just iterate over a subset of the data defined by a List
of Ranges
use the RangeIterator
.
The following iterates over every 5th point of each dimension in an Array
of arbitrary rank:
Array data = v.read();
int[] dataShape = data.getShape();
List ranges = new ArrayList();
for (int i = 0; i < dataShape.length; i++) {
ranges.add(new Range(0, dataShape[i] - 1, 5));
}
double sum = 0.0;
IndexIterator ii = data.getRangeIterator(ranges);
while (ii.hasNext()) {
sum += ii.getDoubleNext();
}
In these examples, the data will be converted to double if needed.
If you know the Array’s rank and type, you can cast to the appropriate subclass and use the get()
and set()
methods, for example:
ArrayDouble.D3 data = (ArrayDouble.D3) v.read();
int[] shape = data.getShape();
Index index = data.getIndex();
for (int i = 0; i < shape[0]; i++) {
for (int j = 0; j < shape[1]; j++) {
for (int k = 0; k < shape[2]; k++) {
double dval = data.get(i, j, k);
}
}
}
There are a number of index reordering methods that operate on an Array
, and return another Array
with the same backing data storage, but with the indices modified in various ways:
// public Array flip(int dim);
Array reversedData = data.flip(0); // reverse time dimension
// public Array permute(int[] dims);
Array permutedData = data.permute(new int[] {1, 2, 0}); // time becomes last dimension
// public Array section(int[] origin, int[] shape, int[] stride);
Array data1D = data.section(origin, shape, stride); // 1D array of measurements at first lat/lon
// public Array sectionNoReduce(int[] origin, int[] , int[] stride);
Array data3D = data.sectionNoReduce(origin, shape, stride); // 3D array of measurements
// public Array reduce();
Array reducedData = data3D.reduce(); // same as data1D
// public Array slice(int dim, int val);
Array slideData = data.slice(0, 10); // 2D array of measurements for 10th time step
// public Array transpose( int dim1, int dim2);
Array transposedData = data.transpose(1, 2); // transpose lat and lon dimensions
The backing data storage for an Array
is a 1D Java array of the corresponding type (double[]
for ArrayDouble
, etc) with length Array.getSize()
.
You can work directly with the Java array by extracting it from the Array
:
double[] javaArray = (double[]) data.get1DJavaArray(DataType.DOUBLE);
If the Array
has the same type as the request, and the indices have not been reordered, this will return the backing array, otherwise it will return a copy with the requested type and correct index ordering.
Writing temporary files to the disk cache
There are a number of places where the library needs to write files to disk. If you end up using the file more than once, its useful to cache these files.
- If a filename ends with
.Z
,.zip
,.gzip
,.gz
, or.bz2
,NetcdfFile.open
will write an uncompressed file of the same name, but without the suffix. - The GRIB IOSP writes an index file with the same name and a
.gbx
extension. Other IOSPs may do similar things in the future. - Nexrad2 files that are compressed will be uncompressed to a file with an
.uncompress
prefix.
Before NetcdfFile.open
writes the temporary file, it looks to see if it already exists.
By default, it prefers to place the temporary file in the same directory as the original file.
If it does not have write permission in that directory, by default it will use the directory ${user_home}/.unidata/cache/
.
You can change the directory by calling ucar.nc2.util.DiskCache.setRootDirectory(String cacheDir)
.
You might want to always write temporary files to the cache directory, in order to manage them in a central place.
To do so, call ucar.nc2.util.DiskCache.setCachePolicy( boolean alwaysInCache)
with parameter alwaysInCache = true
.
You may want to limit the amount of space the disk cache uses (unless you always have data in writeable directories, so that the disk cache is never used).
To scour the cache, call DiskCache.cleanCache()
.
For long-running applications, you might want to do this periodically in a background timer thread, as in the following example.
// 1) Get the current time and add 30 minutes to it
Calendar c = Calendar.getInstance(); // contains current startup time
c.add(Calendar.MINUTE, 30); // add 30 minutes to current time
// 2) Make a class that extends TimerTask; the run method is called by the Timer
class CacheScourTask extends java.util.TimerTask {
public void run() {
StringBuilder sbuff = new StringBuilder();
// 3) Scour the cache, allowing 100 Mbytes of space to be used
DiskCache.cleanCache(100 * 1000 * 1000, sbuff);
sbuff.append("----------------------\n");
// 4) Optionally log a message with the results of the scour.
logger.log(sbuff.toString());
}
}
// 5) Start up a timer that executes the cache scour task every 60 minutes, starting in 30 minutes
java.util.Timer timer = new Timer();
timer.scheduleAtFixedRate(new CacheScourTask(), c.getTime(), (long) 1000 * 60 * 60);
// 6) Make sure you cancel the time before you application exits, or else the process will not terminate.
timer.cancel();
Opening remote files on an HTTP Server
Files can be made accessible over the network by simply placing them on an HTTP (web) server, like Apache.
The server must be configured to set the Content-Length
and Accept-Ranges: bytes
headers.
The client that wants to read these files just uses the usual NetcdfFile.open(String location, …)
method to open a file.
The location contains the URL of the file, for example: https://www.unidata.ucar.edu/staff/caron/test/mydata.nc
.
In order to use this option you need to have HttpClient.jar
in your classpath.
The ucar.nc2
library uses the HTTP 1.1 protocol’s Range
command to get ranges of bytes from the remote file.
The efficiency of the remote access depends on how the data is accessed.
Reading large contiguous regions of the file should generally be good, while skipping around the file and reading small amounts of data will be poor.
In many cases, reading data from a Variable
should give good performance because a Variable
’s data is stored contiguously, and so can be read with a minimal number of server requests.
A record Variable
, however, is spread out across the file, so can incur a separate request for each record index.
In that case you may do better copying the file to a local drive, or putting the file into a THREDDS server which will more efficiently subset the file on the server.
Opening remote files on AWS S3
Files stored as single objects on AWS S3 can also be accessed using NetcdfFiles
and NetcdfDatasets
.
For more information, please see the object store section of the Dataset URL documentation.