|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object org.apache.any23.Any23
public class Any23
A facade with convenience methods for typical Any23 extraction operations.
Field Summary | |
---|---|
static String |
DEFAULT_HTTP_CLIENT_USER_AGENT
Default HTTP User Agent defined in default configuration. |
protected static org.slf4j.Logger |
logger
|
static String |
VERSION
Any23 core library version. |
Constructor Summary | |
---|---|
Any23()
Constructor with default configuration. |
|
Any23(Configuration configuration)
Constructor accepting Configuration . |
|
Any23(Configuration configuration,
ExtractorGroup extractorGroup)
Constructor that allows the specification of a custom configuration and of a list of extractors. |
|
Any23(Configuration configuration,
String... extractorNames)
Constructor that allows the specification of a custom configuration and of list of extractor names. |
|
Any23(ExtractorGroup extractorGroup)
Constructor that allows the specification of a list of extractors. |
|
Any23(String... extractorNames)
Constructor that allows the specification of a list of extractor names. |
Method Summary | |
---|---|
DocumentSource |
createDocumentSource(String documentURI)
Returns the most appropriate DocumentSource for the givendocumentURI . |
ExtractionReport |
extract(DocumentSource in,
TripleHandler outputHandler)
Performs metadata extraction from the content of the given in document source, sending the generated events
to the specified outputHandler . |
ExtractionReport |
extract(DocumentSource in,
TripleHandler outputHandler,
String encoding)
Performs metadata extraction from the content of the given in document source, sending the generated events
to the specified outputHandler . |
ExtractionReport |
extract(ExtractionParameters eps,
DocumentSource in,
TripleHandler outputHandler)
Performs metadata extraction from the content of the given in document source, sending the generated events
to the specified outputHandler . |
ExtractionReport |
extract(ExtractionParameters eps,
DocumentSource in,
TripleHandler outputHandler,
String encoding)
Performs metadata extraction from the content of the given in document source, sending the generated events
to the specified outputHandler . |
ExtractionReport |
extract(ExtractionParameters eps,
String documentURI,
TripleHandler outputHandler)
Performs metadata extraction from the content of the given documentURI
sending the generated events to the specified outputHandler . |
ExtractionReport |
extract(File file,
TripleHandler outputHandler)
Performs metadata extraction from the content of the given file
sending the generated events to the specified outputHandler . |
ExtractionReport |
extract(String in,
String documentURI,
String contentType,
String encoding,
TripleHandler outputHandler)
Performs metadata extraction on the in string
associated to the documentURI URI, declaring
contentType and encoding . |
ExtractionReport |
extract(String in,
String documentURI,
TripleHandler outputHandler)
Performs metadata extraction on the in string
associated to the documentURI URI, sending the generated
events to the specified outputHandler . |
ExtractionReport |
extract(String documentURI,
TripleHandler outputHandler)
Performs metadata extraction from the content of the given documentURI
sending the generated events to the specified outputHandler . |
HTTPClient |
getHTTPClient()
Returns the current HTTPClient implementation. |
String |
getHTTPUserAgent()
Returns the HTTP Header User Agent, see RFC 2616-14.43. |
void |
setCacheFactory(LocalCopyFactory cache)
Allows to set a LocalCopyFactory instance. |
void |
setHTTPClient(HTTPClient httpClient)
Allows to set the HTTPClient implementation
used to retrieve contents. |
void |
setHTTPUserAgent(String userAgent)
Sets the HTTP Header User Agent, see RFC 2616-14.43. |
void |
setMIMETypeDetector(MIMETypeDetector detector)
Allows to set an instance of MIMETypeDetector . |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
public static final String VERSION
public static final String DEFAULT_HTTP_CLIENT_USER_AGENT
protected static final org.slf4j.Logger logger
Constructor Detail |
---|
public Any23(Configuration configuration, ExtractorGroup extractorGroup)
configuration
- configuration used to build the Any23 instance.extractorGroup
- the group of extractors to be applied.public Any23(ExtractorGroup extractorGroup)
extractorGroup
- the group of extractors to be applied.public Any23(Configuration configuration, String... extractorNames)
extractorNames
- list of extractor's names.public Any23(String... extractorNames)
extractorNames
- list of extractor's names.public Any23(Configuration configuration)
Configuration
.
public Any23()
Method Detail |
---|
public void setHTTPUserAgent(String userAgent)
userAgent
- text describing the user agent.public String getHTTPUserAgent()
public void setHTTPClient(HTTPClient httpClient)
HTTPClient
implementation
used to retrieve contents. The default instance is DefaultHTTPClient
.
httpClient
- a valid client instance.
IllegalStateException
- if invoked after client has been initialized.public HTTPClient getHTTPClient() throws IOException
HTTPClient
implementation.
IOException
- if the HTTP client has not initialized.public void setCacheFactory(LocalCopyFactory cache)
LocalCopyFactory
instance.
cache
- valid cache instance.public void setMIMETypeDetector(MIMETypeDetector detector)
MIMETypeDetector
.
detector
- a valid detector instance, if null
all the detectors
will be used.public DocumentSource createDocumentSource(String documentURI) throws URISyntaxException, IOException
DocumentSource
for the givendocumentURI
.
documentURI
- the document URI.
URISyntaxException
- if an error occurs while parsing the documentURI
as a URI.
IOException
- if an error occurs while initializing the internal HTTPClient
.public ExtractionReport extract(ExtractionParameters eps, DocumentSource in, TripleHandler outputHandler, String encoding) throws IOException, ExtractionException
in
document source, sending the generated events
to the specified outputHandler
.
eps
- the extraction parameters to be applied.in
- the input document source.outputHandler
- handler responsible for collecting of the extracted metadata.encoding
- explicit encoding see
available encodings.
true
if some extraction occurred, false
otherwise.
IOException
ExtractionException
public ExtractionReport extract(String in, String documentURI, String contentType, String encoding, TripleHandler outputHandler) throws IOException, ExtractionException
in
string
associated to the documentURI
URI, declaring
contentType
and encoding
.
The generated events are sent to the specified outputHandler
.
in
- raw data to be analyzed.documentURI
- URI from which the raw data has been extracted.contentType
- declared data content type.encoding
- declared data encoding.outputHandler
- handler responsible for collecting of the extracted metadata.
true
if some extraction occurred, false
otherwise.
IOException
ExtractionException
public ExtractionReport extract(String in, String documentURI, TripleHandler outputHandler) throws IOException, ExtractionException
in
string
associated to the documentURI
URI, sending the generated
events to the specified outputHandler
.
in
- raw data to be analyzed.documentURI
- URI from which the raw data has been extracted.outputHandler
- handler responsible for collecting of the extracted metadata.
true
if some extraction occurred, false
otherwise.
IOException
ExtractionException
public ExtractionReport extract(File file, TripleHandler outputHandler) throws IOException, ExtractionException
file
sending the generated events to the specified outputHandler
.
file
- file containing raw data.outputHandler
- handler responsible for collecting of the extracted metadata.
true
if some extraction occurred, false
otherwise.
IOException
ExtractionException
public ExtractionReport extract(ExtractionParameters eps, String documentURI, TripleHandler outputHandler) throws IOException, ExtractionException
documentURI
sending the generated events to the specified outputHandler
.
If the URI is replied with a redirect, the last will be followed.
eps
- the parameters to be applied to the extraction.documentURI
- the URI from which retrieve document.outputHandler
- handler responsible for collecting of the extracted metadata.
true
if some extraction occurred, false
otherwise.
IOException
ExtractionException
public ExtractionReport extract(String documentURI, TripleHandler outputHandler) throws IOException, ExtractionException
documentURI
sending the generated events to the specified outputHandler
.
If the URI is replied with a redirect, the last will be followed.
documentURI
- the URI from which retrieve document.outputHandler
- handler responsible for collecting of the extracted metadata.
true
if some extraction occurred, false
otherwise.
IOException
ExtractionException
public ExtractionReport extract(DocumentSource in, TripleHandler outputHandler, String encoding) throws IOException, ExtractionException
in
document source, sending the generated events
to the specified outputHandler
.
in
- the input document source.outputHandler
- handler responsible for collecting of the extracted metadata.encoding
- explicit encoding see
available encodings.
true
if some extraction occurred, false
otherwise.
IOException
ExtractionException
public ExtractionReport extract(DocumentSource in, TripleHandler outputHandler) throws IOException, ExtractionException
in
document source, sending the generated events
to the specified outputHandler
.
in
- the input document source.outputHandler
- handler responsible for collecting of the extracted metadata.
true
if some extraction occurred, false
otherwise.
IOException
ExtractionException
public ExtractionReport extract(ExtractionParameters eps, DocumentSource in, TripleHandler outputHandler) throws IOException, ExtractionException
in
document source, sending the generated events
to the specified outputHandler
.
eps
- the parameters to be applied for the extraction phase.in
- the input document source.outputHandler
- handler responsible for collecting of the extracted metadata.
true
if some extraction occurred, false
otherwise.
IOException
ExtractionException
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |