A collection specification string creates a collection of files by scanning file directories and looking for matches. It can optionally extract a date from a filename. It has these parts:
/data/ldm/pub/native/grid/NCEP/GFS/Alaska_191km/.*nc$
All files ending with "nc" in the directory /data/ldm/pub/native/grid/NCEP/GFS/Alaska_191km. The ".*nc$" is a regular expression which tries to match the path name after the top directory /data/ldm/pub/native/grid/NCEP/GFS/Alaska_191km/. The ".*" means "any number of any character" and the "nc$" means "ending with nc". If you want to make sure it ends with ".nc", you need:
/data/ldm/pub/native/grid/NCEP/GFS/Alaska_191km/.*\.nc$
Since "." is a special character in regular expressions, one needs to escape it to match a literal ".", so "\.nc$" means match the characters "." "n" "c" at the end of the string.
Its generally important to use the '$' to indicate the end of string, since a common convention is to write auxilary files by naming them <org file>.<ext>, and you need to eliminate the auxilary files from the collection.
/data/ldm/pub/native/grid/NCEP/GFS/Alaska_191km/**/.*\.nc$All files ending with ".nc" in the directory /data/ldm/pub/native/grid/NCEP/GFS/Alaska_191km and its subdirectories.
/data/ldm/pub/native/grid/NCEP/GFS/Alaska_191km/**/GFS_Alaska_191km_#yyyyMMdd_HHmm#\.nc$
Search the directory /data/ldm/pub/native/grid/NCEP/GFS/Alaska_191km and its subdirectories for files that match the regular expression:
GFS_Alaska_191km.........\.nc$
Remember that an unescaped "." matches any character. An escaped "\." matches the literal "." character.
From the filename, extract the date by applying the SimpleDateFormat template yyyyMMdd_HHmm to the portion of the filename after
GFS_Alaska_191km_
The idea is that one copies an example file path, and then modifies it: For example, copy an example filename:
/data/ldm/pub/native/grid/NCEP/GFS/Alaska_191km/20090301/GFS_Alaska_191km_20090301_0600.grib1Modify it to include subdirectories:
/data/ldm/pub/native/grid/NCEP/GFS/Alaska_191km/**/GFS_Alaska_191km_20090301_0600.grib1Demarcate the part of the filename where the run date is encoded, using '#' chars:
/data/ldm/pub/native/grid/NCEP/GFS/Alaska_191km/**/GFS_Alaska_191km_#20090301_0600#.grib1Substitute a SimpleDateFormat:
/data/ldm/pub/native/grid/NCEP/GFS/Alaska_191km/**/GFS_Alaska_191km_#yyyyMMdd_HHmm#.grib1Make sure that the name ends with "grib1":
/data/ldm/pub/native/grid/NCEP/GFS/Alaska_191km/**/GFS_Alaska_191km_#yyyyMMdd_HHmm#\.grib1$
.|*?+(){}[]^$\
<collection spec="/data/ldm/pub/native/grid/NCEP/GFS/Alaska_191km/**/GFS.*km.*grib$" dateFormatMark="yyyyMMdd_HHmm#.grib#$" />