Class CatalogCrawler


  • public class CatalogCrawler
    extends Object
    This crawls a catalog tree for its datasets, which are sent to a listener. You can get all or some of the datasets. A "direct" dataset is one which hasAccess() is true, meaning it has one or more access elements.

    Example use:

     CatalogCrawler.Listener listener = new CatalogCrawler.Listener() {
       public void getDataset(InvDataset dd) {
         if (dd.isHarvest())
           doHarvest(dd);
       }
     };
     CatalogCrawler crawler = new CatalogCrawler(CatalogCrawler.USE_ALL_DIRECT, false, listener);
     
    • Constructor Detail

      • CatalogCrawler

        public CatalogCrawler​(int type,
                              boolean skipDatasetScan,
                              CatalogCrawler.Listener listen)
        Constructor.
        Parameters:
        type - CatalogCrawler.USE_XXX constant: When you get to a dataset containing leaf datasets, do all, only the first, or a randomly chosen one.
        skipDatasetScan - if true, dont recurse into DatasetScan elements. This is useful if you are looking only for collection level metadata.
        listen - this is called for each dataset.
    • Method Detail

      • crawl

        public int crawl​(String catUrl,
                         CancelTask task,
                         PrintWriter out,
                         Object context)
        Open a catalog and crawl (depth first) all the datasets in it. Close catalogs and release their resources as you.
        Parameters:
        catUrl - url of catalog to open
        task - user can cancel the task (may be null)
        out - send status messages to here (may be null)
        context - caller can pass this object in (used for thread safety)
        Returns:
        number of catalog references opened and crawled
      • crawl

        public int crawl​(InvCatalogImpl cat,
                         CancelTask task,
                         PrintWriter out,
                         Object context)
        Crawl a catalog thats already been opened. When you get to a dataset containing leaf datasets, do all, only the first, or a randomly chosen one.
        Parameters:
        cat - the catalog
        task - user can cancel the task (may be null)
        out - send status messages to here (may be null)
        context - caller can pass this object in (used for thread safety)
        Returns:
        number of catalog references opened and crawled
      • crawlDataset

        public void crawlDataset​(InvDataset ds,
                                 CancelTask task,
                                 PrintWriter out,
                                 Object context,
                                 boolean release)
        Crawl this dataset recursively, return all datasets
        Parameters:
        ds - the dataset
        task - user can cancel the task (may be null)
        out - send status messages to here (may be null)
        context - caller can pass this object in (used for thread safety)
      • crawlDirectDatasets

        public void crawlDirectDatasets​(InvDataset ds,
                                        CancelTask task,
                                        PrintWriter out,
                                        Object context,
                                        boolean release)
        Crawl this dataset recursively. Only send back direct datasets
        Parameters:
        ds - the dataset
        task - user can cancel the task (may be null)
        out - send status messages to here (may be null)
        context - caller can pass this object in (used for thread safety)