Class CrawlableDatasetFile

  • All Implemented Interfaces:
    CrawlableDataset
    Direct Known Subclasses:
    CrawlableDatasetAmazonS3

    public class CrawlableDatasetFile
    extends Object
    implements CrawlableDataset
    An implementation of CrawlableDataset where the dataset being represented is a local file (java.io.File).

    The constructor extends the allowed form of a CrawlableDataset path to allow file paths to be given in their native formats including Unix (/my/file), Windows (c:\my\file), and UNC file paths (\\myhost\my\file). However, the resulting CrawlableDataset path is normalized to conform to the allowed form of the CrawlableDataset path.

    This is the default implementation of CrawlableDataset used by CrawlableDatasetFactory if the class name handed to the createCrawlableDataset() method is null.

    Since:
    Jun 8, 2005 15:34:04 -0600
    • Constructor Detail

      • CrawlableDatasetFile

        public CrawlableDatasetFile​(String path,
                                    Object configObj)
        Constructor required by CrawlableDatasetFactory.
        Parameters:
        path - the path of the CrawlableDataset being constructed.
        configObj - the configuration object required by CrawlableDatasetFactory; it is ignored.
      • CrawlableDatasetFile

        public CrawlableDatasetFile​(File file)
    • Method Detail

      • getFile

        public File getFile()
        Provide access to the file that this CrawlableDataset represents.
        Returns:
        the file that this CrawlableDataset represents or null if it could not be obtained.
      • getName

        public String getName()
        Description copied from interface: CrawlableDataset
        Returns the dataset name, i.e., the last part of the dataset path.
        Specified by:
        getName in interface CrawlableDataset
        Returns:
        the dataset name, i.e., the last part of the dataset path.
      • exists

        public boolean exists()
        Description copied from interface: CrawlableDataset
        Return true if the dataset represented by this CrawlableDataset actually exists, null if it does not or an I/O error occurs.
        Specified by:
        exists in interface CrawlableDataset
        Returns:
        true if the dataset represented by this CrawlableDataset actually exists.
      • isCollection

        public boolean isCollection()
        Description copied from interface: CrawlableDataset
        Return true if the dataset is a collection dataset.
        Specified by:
        isCollection in interface CrawlableDataset
        Returns:
        true if the dataset is a collection dataset.
      • getDescendant

        public CrawlableDataset getDescendant​(String relativePath)
        Description copied from interface: CrawlableDataset
        Return the requested descendant of this dataset.
        Specified by:
        getDescendant in interface CrawlableDataset
        Parameters:
        relativePath - the path, relative to this dataset, of the requested dataset.
        Returns:
        the requested descendant of this dataset.
      • listDatasets

        public List<CrawlableDataset> listDatasets()
                                            throws IOException
        Description copied from interface: CrawlableDataset
        Returns the list of CrawlableDatasets contained in this collection dataset. The returned list will be empty if this collection dataset does not contain any children datasets. If this dataset is not a collection dataset, this method returns null.
        Specified by:
        listDatasets in interface CrawlableDataset
        Returns:
        Returns a list of the CrawlableDatasets contained in this collection dataset. The llist will be empty if no datasets are contained in this collection dataset.
        Throws:
        IOException - if an I/O error occurs while accessing the children datasets.
      • listDatasets

        public List<CrawlableDataset> listDatasets​(CrawlableDatasetFilter filter)
                                            throws IOException
        Description copied from interface: CrawlableDataset
        Returns the list of CrawlableDatasets contained in this collection dataset that satisfy the given filter. The returned list will be empty if this collection dataset does not contain any children datasets that satisfy the given filter.
        Specified by:
        listDatasets in interface CrawlableDataset
        Parameters:
        filter - a CrawlableDataset filter (if null, accept all datasets).
        Returns:
        Returns a list of the CrawlableDatasets contained in this collection dataset that satisfy the given filter. The list will be empty if no datasets are contained in this collection dataset.
        Throws:
        IOException - if an I/O error occurs while accessing the children datasets.
      • getParentDataset

        public CrawlableDataset getParentDataset()
        Description copied from interface: CrawlableDataset
        Returns the parent CrawlableDataset or null if this dataset has no parent.
        Specified by:
        getParentDataset in interface CrawlableDataset
        Returns:
        the parent CrawlableDataset or null if this dataset has no parent.
      • length

        public long length()
        Description copied from interface: CrawlableDataset
        Returns the size in bytes of the dataset, -1 if unknown.
        Specified by:
        length in interface CrawlableDataset
        Returns:
        the size in bytes of the dataset, -1 if unknown.
      • lastModified

        public Date lastModified()
        Description copied from interface: CrawlableDataset
        Returns the date the dataset was last modified, null if unknown.
        Specified by:
        lastModified in interface CrawlableDataset
        Returns:
        the date the dataset was last modified, null if unknown.