org.apache.any23
Class Any23

java.lang.Object
  extended by org.apache.any23.Any23

public class Any23
extends Object

A facade with convenience methods for typical Any23 extraction operations.

Author:
Richard Cyganiak (richard@cyganiak.de), Michele Mostarda (michele.mostarda@gmail.com)

Field Summary
static String DEFAULT_HTTP_CLIENT_USER_AGENT
          Default HTTP User Agent defined in default configuration.
protected static org.slf4j.Logger logger
           
static String VERSION
          Any23 core library version.
 
Constructor Summary
Any23()
          Constructor with default configuration.
Any23(Configuration configuration)
          Constructor accepting Configuration.
Any23(Configuration configuration, ExtractorGroup extractorGroup)
          Constructor that allows the specification of a custom configuration and of a list of extractors.
Any23(Configuration configuration, String... extractorNames)
          Constructor that allows the specification of a custom configuration and of list of extractor names.
Any23(ExtractorGroup extractorGroup)
          Constructor that allows the specification of a list of extractors.
Any23(String... extractorNames)
          Constructor that allows the specification of a list of extractor names.
 
Method Summary
 DocumentSource createDocumentSource(String documentURI)
          Returns the most appropriate DocumentSource for the givendocumentURI.
 ExtractionReport extract(DocumentSource in, TripleHandler outputHandler)
          Performs metadata extraction from the content of the given in document source, sending the generated events to the specified outputHandler.
 ExtractionReport extract(DocumentSource in, TripleHandler outputHandler, String encoding)
          Performs metadata extraction from the content of the given in document source, sending the generated events to the specified outputHandler.
 ExtractionReport extract(ExtractionParameters eps, DocumentSource in, TripleHandler outputHandler)
          Performs metadata extraction from the content of the given in document source, sending the generated events to the specified outputHandler.
 ExtractionReport extract(ExtractionParameters eps, DocumentSource in, TripleHandler outputHandler, String encoding)
          Performs metadata extraction from the content of the given in document source, sending the generated events to the specified outputHandler.
 ExtractionReport extract(ExtractionParameters eps, String documentURI, TripleHandler outputHandler)
          Performs metadata extraction from the content of the given documentURI sending the generated events to the specified outputHandler.
 ExtractionReport extract(File file, TripleHandler outputHandler)
          Performs metadata extraction from the content of the given file sending the generated events to the specified outputHandler.
 ExtractionReport extract(String in, String documentURI, String contentType, String encoding, TripleHandler outputHandler)
          Performs metadata extraction on the in string associated to the documentURI URI, declaring contentType and encoding.
 ExtractionReport extract(String in, String documentURI, TripleHandler outputHandler)
          Performs metadata extraction on the in string associated to the documentURI URI, sending the generated events to the specified outputHandler.
 ExtractionReport extract(String documentURI, TripleHandler outputHandler)
          Performs metadata extraction from the content of the given documentURI sending the generated events to the specified outputHandler.
 HTTPClient getHTTPClient()
          Returns the current HTTPClient implementation.
 String getHTTPUserAgent()
          Returns the HTTP Header User Agent, see RFC 2616-14.43.
 void setCacheFactory(LocalCopyFactory cache)
          Allows to set a LocalCopyFactory instance.
 void setHTTPClient(HTTPClient httpClient)
          Allows to set the HTTPClient implementation used to retrieve contents.
 void setHTTPUserAgent(String userAgent)
          Sets the HTTP Header User Agent, see RFC 2616-14.43.
 void setMIMETypeDetector(MIMETypeDetector detector)
          Allows to set an instance of MIMETypeDetector.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

VERSION

public static final String VERSION
Any23 core library version. NOTE: there's also a version string in pom.xml, they should match.


DEFAULT_HTTP_CLIENT_USER_AGENT

public static final String DEFAULT_HTTP_CLIENT_USER_AGENT
Default HTTP User Agent defined in default configuration.


logger

protected static final org.slf4j.Logger logger
Constructor Detail

Any23

public Any23(Configuration configuration,
             ExtractorGroup extractorGroup)
Constructor that allows the specification of a custom configuration and of a list of extractors.

Parameters:
configuration - configuration used to build the Any23 instance.
extractorGroup - the group of extractors to be applied.

Any23

public Any23(ExtractorGroup extractorGroup)
Constructor that allows the specification of a list of extractors.

Parameters:
extractorGroup - the group of extractors to be applied.

Any23

public Any23(Configuration configuration,
             String... extractorNames)
Constructor that allows the specification of a custom configuration and of list of extractor names.

Parameters:
extractorNames - list of extractor's names.

Any23

public Any23(String... extractorNames)
Constructor that allows the specification of a list of extractor names.

Parameters:
extractorNames - list of extractor's names.

Any23

public Any23(Configuration configuration)
Constructor accepting Configuration.


Any23

public Any23()
Constructor with default configuration.

Method Detail

setHTTPUserAgent

public void setHTTPUserAgent(String userAgent)
Sets the HTTP Header User Agent, see RFC 2616-14.43.

Parameters:
userAgent - text describing the user agent.

getHTTPUserAgent

public String getHTTPUserAgent()
Returns the HTTP Header User Agent, see RFC 2616-14.43.

Returns:
text describing the user agent.

setHTTPClient

public void setHTTPClient(HTTPClient httpClient)
Allows to set the HTTPClient implementation used to retrieve contents. The default instance is DefaultHTTPClient.

Parameters:
httpClient - a valid client instance.
Throws:
IllegalStateException - if invoked after client has been initialized.

getHTTPClient

public HTTPClient getHTTPClient()
                         throws IOException
Returns the current HTTPClient implementation.

Returns:
instance of HTTPClient.
Throws:
IOException - if the HTTP client has not initialized.

setCacheFactory

public void setCacheFactory(LocalCopyFactory cache)
Allows to set a LocalCopyFactory instance.

Parameters:
cache - valid cache instance.

setMIMETypeDetector

public void setMIMETypeDetector(MIMETypeDetector detector)
Allows to set an instance of MIMETypeDetector.

Parameters:
detector - a valid detector instance, if null all the detectors will be used.

createDocumentSource

public DocumentSource createDocumentSource(String documentURI)
                                    throws URISyntaxException,
                                           IOException
Returns the most appropriate DocumentSource for the givendocumentURI.

Parameters:
documentURI - the document URI.
Returns:
a new instance of DocumentSource.
Throws:
URISyntaxException - if an error occurs while parsing the documentURI as a URI.
IOException - if an error occurs while initializing the internal HTTPClient.

extract

public ExtractionReport extract(ExtractionParameters eps,
                                DocumentSource in,
                                TripleHandler outputHandler,
                                String encoding)
                         throws IOException,
                                ExtractionException
Performs metadata extraction from the content of the given in document source, sending the generated events to the specified outputHandler.

Parameters:
eps - the extraction parameters to be applied.
in - the input document source.
outputHandler - handler responsible for collecting of the extracted metadata.
encoding - explicit encoding see available encodings.
Returns:
true if some extraction occurred, false otherwise.
Throws:
IOException
ExtractionException

extract

public ExtractionReport extract(String in,
                                String documentURI,
                                String contentType,
                                String encoding,
                                TripleHandler outputHandler)
                         throws IOException,
                                ExtractionException
Performs metadata extraction on the in string associated to the documentURI URI, declaring contentType and encoding. The generated events are sent to the specified outputHandler.

Parameters:
in - raw data to be analyzed.
documentURI - URI from which the raw data has been extracted.
contentType - declared data content type.
encoding - declared data encoding.
outputHandler - handler responsible for collecting of the extracted metadata.
Returns:
true if some extraction occurred, false otherwise.
Throws:
IOException
ExtractionException

extract

public ExtractionReport extract(String in,
                                String documentURI,
                                TripleHandler outputHandler)
                         throws IOException,
                                ExtractionException
Performs metadata extraction on the in string associated to the documentURI URI, sending the generated events to the specified outputHandler.

Parameters:
in - raw data to be analyzed.
documentURI - URI from which the raw data has been extracted.
outputHandler - handler responsible for collecting of the extracted metadata.
Returns:
true if some extraction occurred, false otherwise.
Throws:
IOException
ExtractionException

extract

public ExtractionReport extract(File file,
                                TripleHandler outputHandler)
                         throws IOException,
                                ExtractionException
Performs metadata extraction from the content of the given file sending the generated events to the specified outputHandler.

Parameters:
file - file containing raw data.
outputHandler - handler responsible for collecting of the extracted metadata.
Returns:
true if some extraction occurred, false otherwise.
Throws:
IOException
ExtractionException

extract

public ExtractionReport extract(ExtractionParameters eps,
                                String documentURI,
                                TripleHandler outputHandler)
                         throws IOException,
                                ExtractionException
Performs metadata extraction from the content of the given documentURI sending the generated events to the specified outputHandler. If the URI is replied with a redirect, the last will be followed.

Parameters:
eps - the parameters to be applied to the extraction.
documentURI - the URI from which retrieve document.
outputHandler - handler responsible for collecting of the extracted metadata.
Returns:
true if some extraction occurred, false otherwise.
Throws:
IOException
ExtractionException

extract

public ExtractionReport extract(String documentURI,
                                TripleHandler outputHandler)
                         throws IOException,
                                ExtractionException
Performs metadata extraction from the content of the given documentURI sending the generated events to the specified outputHandler. If the URI is replied with a redirect, the last will be followed.

Parameters:
documentURI - the URI from which retrieve document.
outputHandler - handler responsible for collecting of the extracted metadata.
Returns:
true if some extraction occurred, false otherwise.
Throws:
IOException
ExtractionException

extract

public ExtractionReport extract(DocumentSource in,
                                TripleHandler outputHandler,
                                String encoding)
                         throws IOException,
                                ExtractionException
Performs metadata extraction from the content of the given in document source, sending the generated events to the specified outputHandler.

Parameters:
in - the input document source.
outputHandler - handler responsible for collecting of the extracted metadata.
encoding - explicit encoding see available encodings.
Returns:
true if some extraction occurred, false otherwise.
Throws:
IOException
ExtractionException

extract

public ExtractionReport extract(DocumentSource in,
                                TripleHandler outputHandler)
                         throws IOException,
                                ExtractionException
Performs metadata extraction from the content of the given in document source, sending the generated events to the specified outputHandler.

Parameters:
in - the input document source.
outputHandler - handler responsible for collecting of the extracted metadata.
Returns:
true if some extraction occurred, false otherwise.
Throws:
IOException
ExtractionException

extract

public ExtractionReport extract(ExtractionParameters eps,
                                DocumentSource in,
                                TripleHandler outputHandler)
                         throws IOException,
                                ExtractionException
Performs metadata extraction from the content of the given in document source, sending the generated events to the specified outputHandler.

Parameters:
eps - the parameters to be applied for the extraction phase.
in - the input document source.
outputHandler - handler responsible for collecting of the extracted metadata.
Returns:
true if some extraction occurred, false otherwise.
Throws:
IOException
ExtractionException


Copyright © 2010-2012 The Apache Software Foundation. All Rights Reserved.