Class CatalogCrawler


  • public class CatalogCrawler
    extends Object
    Crawl client catalogs
    Since:
    1/11/2015
    • Constructor Detail

      • CatalogCrawler

        public CatalogCrawler​(CatalogCrawler.Type type,
                              int max,
                              CatalogCrawler.Filter filter,
                              CatalogCrawler.Listener listen,
                              CancelTask task,
                              PrintWriter out,
                              Object context)
        Constructor.
        Parameters:
        type - CatalogCrawler.Type
        max - if > 0, only process max datasets, then exit (random_direct_max only)
        filter - dont process this dataset or its descendants. may be null
        listen - each dataset gets passed to the listener. if null, send the dataset name to standard out
        task - user can cancel the task (may be null)
        out - send status messages to here (may be null)
        context - caller can pass this object to Listener (eg used for thread safety)
    • Method Detail

      • crawl

        public int crawl​(String catUrl)
                  throws IOException
        Open a catalog and crawl (depth first) all the datasets in it. Any that pass the filter are sent to the Listener Close catalogs and release their resources as you.
        Parameters:
        catUrl - url of catalog to open (xml, not html)
        Returns:
        number of catalogs (this + catrefs) opened and crawled
        Throws:
        IOException
      • crawl

        public int crawl​(Catalog cat)
                  throws IOException
        Crawl a catalog thats already been opened.
        Parameters:
        cat - the catalog
        Returns:
        number of catalog references opened and crawled
        Throws:
        IOException
      • getNumReadFailures

        public int getNumReadFailures()