FileRetrievalSystem (Apache OODT 0.7 API)

java.lang.Object
- org.apache.oodt.cas.pushpull.retrievalsystem.FileRetrievalSystem

public class FileRetrievalSystem
extends Object

    Will crawl external directory structures and will download the files within these structures.

    This class's settings are set using a java .properties file which can be read in and parsed by Config.java.
    This .properties file should have the following properties set:

        #list of sites to crawl
   	protocol.external.sources=&lt;path-to-xml-file&gt;

   	#protocol types
   	protocolfactory.types=&lt;list-of-protocols-separated-by-commas&gt; (e.g. ftp,http,https,sftp)

   	#Protocol factories per types (must have one for each protocol mention in protocolfactory.types -- the property must be name
    	# as such: protocolfactory.&lt;name-of-protocol-type&gt;
   	protocolfactory.ftp=&lt;path-to-java-protocolfactory-class&gt; (e.g. org.apache.oodt.cas.protocol.ftp.FtpClientFactory)
   	protocolfactory.http=&lt;path-to-java-protocolfactory-class&gt;
   	protocolfactory.https=&lt;path-to-java-protocolfactory-class&gt;
   	protocolfactory.sftp=&lt;path-to-java-protocolfactory-class&gt;

   	#configuration to make java.net.URL accept unsupported protocols -- must exist just as shown
   	java.protocol.handler.pkgs=org.apache.oodt.cas.url.handlers
    

    In order to specify which external sites to crawl you must create a XML file which contains the
    the site and necessary information needed to crawl the site, such as username and password.
    protocol.external.sources must contain the path to this file so the crawl knows where to find it.
    You can also train this class on how to crawl each given site.  This is also specified in an XML
    file, whose path must be given in the first mentioned XML file which contians the username and password.

    Then schema for the external sites XML file is as such:

        &lt;sources&gt;
    	   &lt;source url=&quot;url-of-server&quot;&gt;
    	      &lt;username&gt;username&lt;/username&gt;
    	      &lt;password&gt;password&lt;/password&gt;
    	      &lt;dirstruct&gt;path-to-xml-file&lt;/dirstruct&gt;
    	      &lt;crawl&gt;yes-or-no&lt;/crawl&gt;
    	   &lt;/source&gt;
    	   ...
    	   ...
    	   ...
    	&lt;/sources\&gt;

    You may specify as many sources as you would like by specifying multiple &lt;source&gt; tags.
    In the &lt;source&gt; tag, the parameter 'url' must be specified.  This is the url of the server
    you want the crawler to connect to.  It should be of the following format:
    &lt;protocol&gt;://&lt;host&gt; (e.g. sftp://remote.computer.gov)
    If no username and password exist, then these elements can be omitted (they are optional).
    For &lt;crawl&gt; place yes or no here.  This is for convenience of being able to keep record of the
    sites and their information in this XML file even if you decide that you no longer need to crawl it
    anymore (just put &lt;crawl&gt;no&lt;/crawl&gt; and the crawl will not crawl that site).
    &lt;dirStruct&gt; contains a path to another XML file which is documented in DirStruct.java javadoc.  This
    element is optional.  If no &lt;dirStruct&gt; is given, then every directory will be crawled on the site
    and every encountered file will be downloaded.

Author:: bfoster (Brian Foster)

Constructor Summary

Constructors
Constructor and Description
FileRetrievalSystem(Config config, SiteInfo siteInfo)
Creates a Crawler based on the URL, DirStruct, and Config objects passed in.

Constructors
Constructor and Description
`FileRetrievalSystem(Config config, SiteInfo siteInfo)` Creates a Crawler based on the URL, DirStruct, and Config objects passed in.

Method Summary

Methods
Modifier and Type	Method and Description
`boolean`	`addToDownloadQueue(RemoteSiteFile file, String renamingString, File downloadToDir, String uniqueMetadataElement, boolean deleteAfterDownload)`
`boolean`	`addToDownloadQueue(RemoteSite remoteSite, String file, String renamingString, File downloadToDir, String uniqueMetadataElement, boolean deleteAfterDownload)`
`void`	`changeToDir(RemoteSiteFile pFile)`
`void`	`changeToDir(String dir, RemoteSite remoteSite)`
`void`	`changeToHOME(RemoteSite remoteSite)`
`void`	`changeToRoot(RemoteSite remoteSite)`
`void`	`clearErrorFlag()` reset error flag
`void`	`clearFailedDownloadsList()`
`boolean`	`closeSessions()` Disconnects all downloading Protocol sessions in the avaiableSessions list.
`ProtocolFile`	`getCurrentFile(RemoteSite remoteSite)`
`LinkedList<ProtocolFile>`	`getCurrentlyDownloadingFiles()`
`ProtocolFile`	`getHomeDir(RemoteSite remoteSite)`
`LinkedList<ProtocolFile>`	`getListOfFailedDownloads()`
`List<RemoteSiteFile>`	`getNextPage(RemoteSiteFile dir, ProtocolFileFilter filter)`
`ProtocolFile`	`getProtocolFile(RemoteSite remoteSite, String file, boolean isDir)`
`void`	`initialize()`
`boolean`	`isAlreadyInDatabase(RemoteFile rf)`
`boolean`	`isDownloading(ProtocolFile pFile)`
`void`	`registerDownloadListener(DownloadListener dListener)`
`void`	`shutdown()`
`boolean`	`validate(RemoteSite remoteSite)`
`void`	`waitUntilAllCurrentDownloadsAreComplete()`

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Constructor Detail
- FileRetrievalSystem
```
public FileRetrievalSystem(Config config,
                   SiteInfo siteInfo)
                    throws InstantiationException
```
  Creates a Crawler based on the URL, DirStruct, and Config objects passed in. If no DirStruct is needed then set it to null.
  
  Parameters:
  url - The URL for which you want this Crawler to crawl
  dirStruct - The specified directory structure located at the host -- use to train crawler (see DirStruct).
  config - The Configuration file that is passed to this objects ProtocolHandler.
  
  Throws:
  
  InstantiationException
  
  DatabaseException

Method Detail

registerDownloadListener

public void registerDownloadListener(DownloadListener dListener)

initialize

public void initialize()
                throws IOException

Throws:: IOException

clearErrorFlag
```
public void clearErrorFlag()
```
reset error flag

isAlreadyInDatabase

public boolean isAlreadyInDatabase(RemoteFile rf)
                            throws CatalogException

Throws:: CatalogException

getNextPage

public List<RemoteSiteFile> getNextPage(RemoteSiteFile dir,
                               ProtocolFileFilter filter)
                                 throws RemoteConnectionException

Throws:: RemoteConnectionException

changeToRoot

public void changeToRoot(RemoteSite remoteSite)
                  throws ProtocolException,
                         MalformedURLException,
                         ProtocolException

Throws:: ProtocolException; MalformedURLException

changeToHOME

public void changeToHOME(RemoteSite remoteSite)
                  throws ProtocolException,
                         MalformedURLException

Throws:: ProtocolException; MalformedURLException

changeToDir

public void changeToDir(String dir,
               RemoteSite remoteSite)
                 throws MalformedURLException,
                        ProtocolException

Throws:: MalformedURLException; ProtocolException

changeToDir

public void changeToDir(RemoteSiteFile pFile)
                 throws ProtocolException,
                        MalformedURLException

Throws:: ProtocolException; MalformedURLException

getHomeDir

public ProtocolFile getHomeDir(RemoteSite remoteSite)
                        throws ProtocolException

Throws:: ProtocolException

getProtocolFile

public ProtocolFile getProtocolFile(RemoteSite remoteSite,
                           String file,
                           boolean isDir)
                             throws ProtocolException

Throws:: ProtocolException

getCurrentFile

public ProtocolFile getCurrentFile(RemoteSite remoteSite)
                            throws ProtocolFileException,
                                   ProtocolException,
                                   MalformedURLException

Throws:: ProtocolFileException; ProtocolException; MalformedURLException

addToDownloadQueue

public boolean addToDownloadQueue(RemoteSite remoteSite,
                         String file,
                         String renamingString,
                         File downloadToDir,
                         String uniqueMetadataElement,
                         boolean deleteAfterDownload)
                           throws ToManyFailedDownloadsException,
                                  RemoteConnectionException,
                                  ProtocolFileException,
                                  ProtocolException,
                                  AlreadyInDatabaseException,
                                  UndefinedTypeException,
                                  CatalogException,
                                  IOException

Throws:: ToManyFailedDownloadsException; RemoteConnectionException; ProtocolFileException; ProtocolException; AlreadyInDatabaseException; UndefinedTypeException; CatalogException; IOException

validate

public boolean validate(RemoteSite remoteSite)

waitUntilAllCurrentDownloadsAreComplete

public void waitUntilAllCurrentDownloadsAreComplete()
                                             throws ProtocolException

Throws:: ProtocolException

addToDownloadQueue

public boolean addToDownloadQueue(RemoteSiteFile file,
                         String renamingString,
                         File downloadToDir,
                         String uniqueMetadataElement,
                         boolean deleteAfterDownload)
                           throws ToManyFailedDownloadsException,
                                  RemoteConnectionException,
                                  AlreadyInDatabaseException,
                                  UndefinedTypeException,
                                  CatalogException,
                                  IOException

Throws:: ToManyFailedDownloadsException; RemoteConnectionException; AlreadyInDatabaseException; UndefinedTypeException; CatalogException; IOException

isDownloading

public boolean isDownloading(ProtocolFile pFile)

getCurrentlyDownloadingFiles

public LinkedList<ProtocolFile> getCurrentlyDownloadingFiles()

getListOfFailedDownloads

public LinkedList<ProtocolFile> getListOfFailedDownloads()

clearFailedDownloadsList

public void clearFailedDownloadsList()

shutdown
```
public void shutdown()
```

closeSessions
```
public boolean closeSessions()
                      throws RemoteConnectionException
```
Disconnects all downloading Protocol sessions in the avaiableSessions list. The ThreadPoolExecutor needs to be completely shutdown before this method should be called. Otherwise some Protocols might not be disconnected or left downloading.

Returns:
True if successful, false otherwise

Throws:

RemoteConnectionException

Class FileRetrievalSystem

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Constructor Detail

FileRetrievalSystem

Method Detail

registerDownloadListener

initialize

clearErrorFlag

isAlreadyInDatabase

getNextPage

changeToRoot

changeToHOME

changeToDir

changeToDir

getHomeDir

getProtocolFile

getCurrentFile

addToDownloadQueue

validate

waitUntilAllCurrentDownloadsAreComplete

addToDownloadQueue

isDownloading

getCurrentlyDownloadingFiles

getListOfFailedDownloads

clearFailedDownloadsList

shutdown

closeSessions