The netCDF-Java library can read datasets from a variety of sources. The dataset is named using a Uniform Resource Location (URL). This page summarizes the netCDF-Java API use of URLs.
Special Note: When working with remote data services, it’s important to note that not all servers handle encoded URLs.
By default, netCDF-Java will encode illegal URI characters using percent encoding (e.g. [
will become %5B
).
If you find you are having trouble accessing a remote dataset due to the encoding, set the java System Property httpservices.urlencode
to "false"
using, for example System.setProperty("httpservices.urlencode", "false");
.
ucar.nc2.NetcdfFile.open(String location)
Local Files
NetcdfFile
can work with local files, e.g:
/usr/share/data/model.nc
file:/usr/share/data/model.nc
file:C:/share/data/model.nc
(NOTE we advise using forward slashes everywhere, including Windows)data/model.nc
(relative to the current working directory)
When using a file location that has an embedded :
char, eg C:/share/data/model.nc
, it's a good idea to add the file:
prefix, to prevent the C:
from being misinterpreted as a URL schema.
Remote Files
HTTP
NetcdfFile
can open HTTP remote files, served over HTTP, for example:
- https://www.unidata.ucar.edu/software/netcdf-java/testdata/mydata1.nc
The HTTP server must implement the getRange header and functionality. Performance will be strongly affected by file format and the data access pattern.
To disambiguate HTTP remote files from OPeNDAP or other URLS, you can use httpserver:
instead of http:
, e.g.:
httpserver://www.unidata.ucar.edu/software/netcdf-java/testdata/mydata1.nc
Object Stores
NetcdfFiles
and NetcdfDatasets
can open files stored as a single objects on any Object Store that supports the AWS RESTful API with byte range-requests, similar to HTTP.
This new functionality is not available in the now deprecated NetcdfFile
and NetcdfDataset
open methods.
You will also need to include the cdm-s3
artifact in your build (visit the netcdf-java artifact guide for details).
Currently, this is not part of netcdfAll.jar
.
netCDF-Java implements a custom URI for identifying objects in an Object Store.
Using the generic URI syntax from RFC3986, the CDM will identify resources located in an object store as follows:
- scheme (required): defined to be cdms3
- authority (optional for AWS S3, otherwise required): If present, the authority component is preceded by a double slash (“//”) and is terminated by the next slash (“/”).
As with the generic URI syntax, the authority is composed of three parts:
- authority =
[ userinfo "@" ] host [ ":" port ]
- userinfo (optional): name of the profile to be used by the AWS SDK
- host (required): host name of the object store Note: If you need to supply a profile name when accessing an AWS S3 object, you must use the generic host name AWS in order to have a valid URI.
- port(optional): default: 443
- authority =
- path (required): path associated with the bucket
- may not be empty.
- the final path segment is interpreted to be the name of the object stores bucket.
- query (required): full or partial object key
- Only full keys can be used to read an object through the netCDF-Java API.
- Partial keys are treated as prefixes, and are used by netCDF-Java when, for example, performing bucket listing operations.
- fragment (optional): configuration options
- Configuration options may be passed in through fragment on the CDM S3 URI.
- Currently, only one configuration option is available and is used to describe a delimiter for keys that have been designed to be hierarchical. A commonly encountered case is that the object keys are the same as the file path on the system from which they were uploaded. In this case, the delimiter might be the “/” character. If the fragment is not used, netCDF-Java will assume there is no hierarchical structure to the object keys.
Example cdms3
URIs (Any S3 compatible Object Store):
- cdms3://profile_name@my.endpoint.edu/endpoint/path/bucket-name?super/long/key#delimiter=/
- cdms3://profile_name@my.endpoint.edu/bucket-name?super/long/key#delimiter=/
- cdms3://my.endpoint.edu/endpoint/path/bucket-name?super/long/key#delimiter=/
- cdms3://my.endpoint.edu/bucket-name?super/long/key#delimiter=/
Secure HTTP access is assumed by default.
Insecure HTTP access is attempted when of the following ports is explicitly referenced in the authority portion of the cdms3
URI:
- 80
- 8008
- 8080
- 7001 (used by WebLogic)
- 9080 (used by WebSphere)
- 16080 (used by Mac OS X Server)
Credentials
netCDF-Java uses the AWS SDK to manage credentials, even for non-AWS object stores.
One method for supplying credentials is through the use of a special credentials file, in which named profiles can be used to manage multiple sets of credentials.
References to profile_name
in the above examples corresponds to a named profile in an AWS credentials file.
The default credentials file is located in your home directory at <home-dir>/.aws/credentials
.
The aws.sharedCredentialsFile
Java System property can be used to define a different credentials file, for example:
System.setProperty("aws.sharedCredentialsFile", "C:/Users/me/mycredfile");
try (NetcdfFile ncfile = NetcdfFiles.open(AWS_G16_S3_URI_FULL)) {
...
} finally {
System.clearProperty(AWS_SHARED_CREDENTIALS_FILE_PROP);
}
The format of the credentials file is:
[default]
aws_access_key_id={DEFAULT_ACCESS_KEY_ID}
aws_secret_access_key={DEFAULT_SECRET_ACCESS_KEY}
[profile-name1]
aws_access_key_id={PROFILE_NAME1_ACCESS_KEY_ID}
aws_secret_access_key={PROFILE_NAME1_SECRET_ACCESS_KEY}
region=us-east-1
[region-only-profile]
region=us-gov-west-1
The aws_access_key_id
and aws_secret_access_key
parameters are used to define your credentials, even for non-AWS S3 Object Store systems.
Note that an AWS region can be set for a given profile in this same file.
For more information, please see the AWS Documentation.
Example cdms3
URIs (specific to AWS S3):
- cdms3:bucket-name?super/long/key
- cdms3://profile_name@aws/bucket-name?super/long/key
Note: In order to supply a profile name (one way to set the region and/or credentials) while maintaining conformance to the URI specification, you may use “aws” as the host.
In addition to the use of the credentials file for setting the region, as described above, the region may be set using the aws.region
Java System Property, or the AWS_REGION
environment variable.
Note that a region set within the credentials file for the default
profile will take precedence over all others.
Possible values for the region code can be found in the AWS Regional endpoints documentation.
When running in AWS and accessing objects from S3, it is better to avoid the use of a credentials file when possible. One way to do that is to attach an IAM Policy role to the EC2 instance or lambda function in which your code is running. For more information on IAM Profiles, please visit the AWS User Guide.
The following examples show how one could access the same GOES 16 data file across a variety of Object Store technologies (special thanks to the NOAA Big Data project’s):
AWS S3 bucket in the US East 1 region (open access):
String region = "us-east-1";
String bucket = "noaa-goes16";
String key =
"ABI-L1b-RadC/2017/242/00/OR_ABI-L1b-RadC-M3C01_G16_s20172420002168_e20172420004540_c20172420004583.nc";
String cdmS3Uri = "cdms3:" + bucket + "?" + key;
System.setProperty("aws.region", region);
try (NetcdfFile ncfile = NetcdfFiles.open(cdmS3Uri)) {
// do cool stuff here
} finally {
System.clearProperty("aws.region");
}
Google Cloud Storage (open access):
String host = "storage.googleapis.com";
String bucket = "gcp-public-data-goes-16";
String key =
"ABI-L1b-RadC/2017/242/00/OR_ABI-L1b-RadC-M3C01_G16_s20172420002168_e20172420004540_c20172420004583.nc";
String cdmS3Uri = "cdms3://" + host + "/" + bucket + "?" + key;
try (NetcdfFile ncfile = NetcdfFiles.open(cdmS3Uri)) {
// do cool stuff here
}
Open Science Data Cloud (Ceph) (open access):
String host = "griffin-objstore.opensciencedatacloud.org";
String bucket = "noaa-goes16-hurricane-archive-2017";
String key =
"ABI-L1b-RadC/242/00/OR_ABI-L1b-RadC-M3C01_G16_s20172420002168_e20172420004540_c20172420004583.nc";
String cdmS3Uri = "cdms3://" + host + "/" + bucket + "?" + key;
try (NetcdfFile ncfile = NetcdfFiles.open(cdmS3Uri)) {
// do cool stuff here
}
File Types
The local or remote file must be one of the formats that the netCDF-Java library can read. We call this set of files Common Data Model files, or CDM files for short, to make clear that the NetCDF-Java library is not limited to netCDF files.
If the URL ends with a with .Z
, .zip
, .gzip
, .gz
, or .bz2
, the file is assumed to be compressed.
The netCDF-Java library will uncompress/unzip and write a new file without the suffix, then read from the uncompressed file.
Generally it prefers to place the uncompressed file in the same directory as the original file.
If it does not have write permission on that directory, it will use the cache directory defined by ucar.nc2.util.DiskCache
.
ucar.nc2.dataset.NetcdfDataset.openFile(String location)
NetcdfDataset
adds another layer of functionality to the CDM data model, handling other protocols and optionally enhancing the dataset with Coordinate System information, scale/offset processing, dataset caching, etc.
openFile()
can open the same datasets asNetcdfFile
, plus those listed below.openDataset()
callsNetcdfDataset.openFile()
, then optionally enhances the dataset.acquireDataset()
allows dataset objects to be cached in memory for performance.
OPeNDAP datasets
NetcdfDataset
can open OPeNDAP datasets, which use a dods:
or http:
prefix, for example:
http://thredds.ucar.edu/thredds/dodsC/fmrc/NCEP/GFS/CONUS_95km/files/GFS_CONUS_95km_20070319_0600.grib1
dods://thredds.ucar.edu/thredds/models/NCEP/GFS/Global_5x2p5deg/GFS_Global_5x2p5deg_20070313_1200.nc
To avoid confusion with remote HTTP files, OPeNDAP URLs may use the dods:
prefix.
Also note that when passing an OPeNDAP dataset URL to the netCDF-Java library, do not include any the access suffixes, e.g. .dods
, .ascii
, .dds
, etc.
For an http:
URL, we make a HEAD
request, and if it succeeds and returns a header with Content-Description="dods-dds"
or "dods_dds"
, then we open as OPeNDAP.
If it fails we try opening as an HTTP remote file.
Using the dods:
prefix makes it clear which protocol to use.
The netCDF-Java NetcdfDatasets.open*
methods can also be used to read the binary response from an OPeNDAP server from a file on disk.
Note: one downside to this approach is that the entire dataset will be loaded into memory.
At a minimum, you will need to have saved the binary response (.dods
).
It is strongly recommended that you also save the Data Attribute Structure (.das
) as well, as this contains metadata for the dataset.
The two files must be located in the same directory and should only differ by file extension.
Once the files are in place, you may open the saved response by appending the file:
protocol to the path to the .dods
file:
// pathToDodsFile looks like C:/Users/me/Downloads/cool-dataset.nc.dods
try (NetcdfFile ncf = NetcdfDatasets.openFile("file:" + pathToDodsFile, null)) {
// Do cool stuff here
}
In the example above, pathToDodsFile
should look like C:/Users/me/Downloads/cool-dataset.nc.dods
or /home/me/data/cool-dataset.nc.dods
.
Again, is it strongly recommended that cool-dataset.nc.das
exist, but its existence is technically optional (but you will not have metadata without it).
As an example, the following two URLs will provide an example of each type of file needed:
NcML datasets
NetcdfDataset
can open NcML datasets, which may be local or remote, and must end with a .xml
or .ncml
suffix, for example:
/usr/share/data/model.ncml
file:/usr/share/data/model.ncml
https://www.unidata.ucar.edu/software/netcdf-java/testdata/mydata1.xml
Because xml is so widely used, we recommend using the .ncml
suffix when possible.
THREDDS Datasets
NetcdfDataset
can open THREDDS datasets, which are contained in THREDDS Catalogs.
The general form is:
thredds:catalogURL#dataset_id
where catalogURL
is the URL of a THREDDS catalog, and dataset_id
is the ID
of a dataset inside of that catalog.
The thredds:
prefix ensures that it is understood as a THREDDS dataset.
Examples:
thredds:http://localhost:8080/test/addeStationDataset.xml#surfaceHourly
thredds:file:c:/dev/netcdf-java-2.2/test/data/catalog/addeStationDataset.xml#AddeSurfaceData
In the first case, http://localhost:8080/test/addeStationDataset.xml
must be a catalog containing a dataset with ID
surfaceHourly
.
The second case will open a catalog located at c:/dev/netcdf-java-2.2/test/data/catalog/addeStationDataset.xml
and find the dataset with ID
AddeSurfaceData
.
NetcdfDataset
will examine the thredds dataset object and extract the dataset URL, open it and return a NetcdfDataset
.
If there are more than one dataset access URL, it will choose a service that it understands.
You can modify the preferred services by calling thredds.client.catalog.tools.DataFactory.setPreferAccess()
.
The dataset metadata in the THREDDS catalog may be used to augment the metadata of the NetcdfDataset
.
THREDDS Resolver Datasets
NetcdfDataset
can open THREDDS Resolver datasets, which have the form
thredds:resolve:resolverURL
The resolverURL
must return a catalog with a single top level dataset, which is the target dataset.
For example:
thredds:resolve:https://thredds.ucar.edu/thredds/catalog/grib/NCEP/GFS/Global_0p25deg/latest.xml
In this case, https://thredds.ucar.edu/thredds/catalog/grib/NCEP/GFS/Global_0p25deg/latest.html
returns a catalog containing the latest dataset in the grib/NCEP/GFS/Global_0p25deg
collection.
NetcdfDataset
will read the catalog, extract the THREDDS dataset, and open it as in section above.
CdmRemote Datasets
NetcdfDataset
can open CDM Remote datasets, with the form
cdmremote:cdmRemoteURL
for example
cdmremote:http://server:8080/thredds/cdmremote/data.nc
The cdmRemoteURL
must be an endpoint for a cdmremote
web service, which provides index subsetting on remote CDM datasets.
DAP4 datasets
NetcdfDataset
can open datasets through the DAP4 protocol.
The following url templates will be recognized as indicating the DAP4 protocol.
- *dap4://
/ / * - *https://
/ / #dap4* Examples: dap4:http://thredds.ucar.edu:8080/thredds/fmrc/NCEP/GFS/CONUS_95km/files/GFS_CONUS_95km_20070319_0600.grib1
https://thredds.ucar.edu:8080/thredds/models/NCEP/GFS/Global_5x2p5deg/GFS_Global_5x2p5deg_20070313_1200.nc#dap4
Note that when passing a DAP4 dataset URL to the netCDF-Java library, do not include any of the access suffixes, e.g. .dmr
, .dap
, .dsr
, .xml
etc.
ucar.nc2.ft.FeatureDatasetFactoryManager.open()
FeatureDatasetFactory
creates Feature Datasets for Coverages (Grids), Discrete Sampling Geometry (Point) Datasets, Radial Datasets, etc.
These may be based on local files, or they may use remote access protocols.
FeatureDatasetFactoryManager
can open the same URLs that NetcdfDataset
and NetcdfFile
can open, plus the following:
CdmrFeature Datasets
FeatureDatasetFactoryManager
can open CdmRemote Feature Datasets, which have the form
cdmrFeature:cdmrFeatureURL
for example:
cdmrFeature:http://server:8080/thredds/cdmremote/data.nc
The cdmrFeatureURL
must be an endpoint for a cdmrFeature
web service, which provides coordinate subsetting on remote Feature Type datasets.
THREDDS Datasets
FeatureDatasetFactoryManager
can also open CdmRemote Feature Datasets, by passing in a dataset ID
in a catalog, exactly as in NetcdfDataset.open
as explained above.
The general form is
thredds:catalogURL#dataset_id
where catalogURL
is the URL of a THREDDS catalog, and dataset_id
is the ID
of a dataset inside of that catalog.
The thredds:
prefix ensures that the URL is understood as a THREDDS catalog and dataset.
Example:
thredds:http://localhost:8081/thredds/catalog/grib.v5/gfs_2p5deg/catalog.html#grib.v5/gfs_2p5deg/TwoD
If the dataset has a cdmrFeature
service, the FeatureDataset
will be opened through that service.
This can be more efficient than opening the dataset through the index-based services like OPeNDAP
and cdmremote
.
Collection Datasets
FeatureDatasetFactoryManager
can open collections of datasets specified with a collection specification string.
This has the form
collection:spec
FeatureDatasetFactoryManager
calls CompositeDatasetFactory.factory(wantFeatureType, spec)
if found, which returns a FeatureDataset
.
Currently only a limited number of Point Feature types are supported. This is an experimental feature.
NcML referenced datasets
NcML datasets typically reference other CDM datasets, using the location
attribute of the netcdf
element, for example:
<?xml version="1.0" encoding="UTF-8"?>
<netcdf xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2"
location="file:/dev/netcdf-java-2.2/test/data/example1.nc">
...
The location
is passed to ucar.nc2.dataset.NetcdfDataset.openFile()
, and so can be any valid CDM dataset location.
In addition, an NcML referenced dataset location can be relative to the NcML file or the working directory:
- A relative URL resolved against the NcML location (eg
subdir/mydata.nc
). You must not use afile:
prefix in this case. - An absolute file URL with a relative path (eg
file:data/mine.nc
). The file will be opened relative to the working directory.
There are a few subtle differences between using a location in NcML and passing a location to the NetcdfDataset.openFile()
and related methods:
- In NcML, you MUST always use forward slashes in your paths, even when on a Windows machine.
For example:
file:C:/data/mine.nc
.NetcdfFile.open()
will accept backslashes on a Windows machine. - In NcML, a relative URL is resolved against the NcML location.
In
NetcdfFile.open()
, it is interpreted as relative to the working directory.
NcML scan location
NcML aggregation scan
elements use the location
attribute to specify which directory to find files in, for example:
<netcdf xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2">
<aggregation dimName="time" type="joinExisting">
<scan location="/data/model/" suffix=".nc" />
</aggregation>
</netcdf>
Allowable forms of the location for the scan directory are:
/usr/share/data/
file:/usr/share/data/
file:C:/share/data/model.nc
(NOTE we advise using forward slashes everywhere, including Windows)data/model.nc
(relative to the NcML directory)file:data/model.nc
(relative to the current working directory)
When using a directory location that has an embedded :
char, e.g. C:/share/data/model.nc
, its a really good idea to add the file:
prefix, to prevent the C:
from being misinterpreted as a URI schema.
Note that this is a common mistake:
<scan location="D:\work\agg" suffix=".nc" />
on a Windows machine, this will try to scan D:/work/agg/D:/work/agg
.
Use
<scan location="D:/work/agg" suffix=".nc" />
or better
<scan location="file:D:/work/agg" suffix=".nc" />