SingleDocumentExtraction (Apache Any23 :: Core 0.7.0-incubating-SNAPSHOT API)

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

org.apache.any23.extractor
Class SingleDocumentExtraction

java.lang.Object
  org.apache.any23.extractor.SingleDocumentExtraction

public class SingleDocumentExtraction
extends Object
extends Object

This class acts as facade where all the extractors were called on a single document.

Field Summary
`static String`	`EXTRACTION_CONTEXT_URI_PROPERTY`
`static String`	`METADATA_DOMAIN_PER_ENTITY_FLAG`
`static String`	`METADATA_NESTING_FLAG`
`static String`	`METADATA_TIMESIZE_FLAG`

Constructor Summary
`SingleDocumentExtraction(Configuration configuration, DocumentSource in, ExtractorFactory<?> factory, TripleHandler output)` Builds an extractor by the specification of document source, extractors factory and output triple handler.
`SingleDocumentExtraction(Configuration configuration, DocumentSource in, ExtractorGroup extractors, TripleHandler output)` Builds an extractor by the specification of document source, list of extractors and output triple handler.
`SingleDocumentExtraction(DocumentSource in, ExtractorFactory<?> factory, TripleHandler output)` Builds an extractor by the specification of document source, extractors factory and output triple handler, using the `DefaultConfiguration`.

Method Summary
`String`	`getDetectedMIMEType()` Returns the detected mimetype for the given `DocumentSource`.
`List<Extractor>`	`getMatchingExtractors()`
`String`	`getParserEncoding()`
`boolean`	`hasMatchingExtractors()` Check whether the given `DocumentSource` content activates of not at least an extractor.
`SingleDocumentExtractionReport`	`run()` Triggers the execution of all the `Extractor` registered to this class using the default extraction parameters.
`SingleDocumentExtractionReport`	`run(ExtractionParameters extractionParameters)` Triggers the execution of all the `Extractor` registered to this class using the specified extraction parameters.
`void`	`setLocalCopyFactory(LocalCopyFactory copyFactory)` Sets the internal factory for generating the document local copy, if `null` the `MemCopyFactory` will be used.
`void`	`setMIMETypeDetector(MIMETypeDetector detector)` Sets the internal mime type detector, if `null` mimetype detection will be skipped and all extractors will be activated.
`void`	`setParserEncoding(String encoding)` Sets the document parser encoding.

Methods inherited from class java.lang.Object
`clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait`

Field Detail

EXTRACTION_CONTEXT_URI_PROPERTY

public static final String EXTRACTION_CONTEXT_URI_PROPERTY

See Also:: Constant Field Values

METADATA_TIMESIZE_FLAG

public static final String METADATA_TIMESIZE_FLAG

See Also:: Constant Field Values

METADATA_NESTING_FLAG

public static final String METADATA_NESTING_FLAG

See Also:: Constant Field Values

METADATA_DOMAIN_PER_ENTITY_FLAG

public static final String METADATA_DOMAIN_PER_ENTITY_FLAG

See Also:: Constant Field Values

Constructor Detail

SingleDocumentExtraction

public SingleDocumentExtraction(Configuration configuration,
                                DocumentSource in,
                                ExtractorGroup extractors,
                                TripleHandler output)

Builds an extractor by the specification of document source, list of extractors and output triple handler.

Parameters:: configuration - configuration applied during extraction.; in - input document source.; extractors - list of extractors to be applied.; output - output triple handler.

SingleDocumentExtraction

public SingleDocumentExtraction(Configuration configuration,
                                DocumentSource in,
                                ExtractorFactory<?> factory,
                                TripleHandler output)

Builds an extractor by the specification of document source, extractors factory and output triple handler.

Parameters:: configuration - configuration applied during extraction.; in - input document source.; factory - the extractors factory.; output - output triple handler.

SingleDocumentExtraction

public SingleDocumentExtraction(DocumentSource in,
                                ExtractorFactory<?> factory,
                                TripleHandler output)

Builds an extractor by the specification of document source, extractors factory and output triple handler, using the DefaultConfiguration.

Parameters:: in - input document source.; factory - the extractors factory.; output - output triple handler.

Method Detail

setLocalCopyFactory

public void setLocalCopyFactory(LocalCopyFactory copyFactory)

Sets the internal factory for generating the document local copy, if null the MemCopyFactory will be used.

Parameters:: copyFactory - local copy factory.
See Also:: DocumentSource

setMIMETypeDetector

public void setMIMETypeDetector(MIMETypeDetector detector)

Sets the internal mime type detector, if null mimetype detection will be skipped and all extractors will be activated.

Parameters:: detector - detector instance.

run

public SingleDocumentExtractionReport run(ExtractionParameters extractionParameters)
                                   throws ExtractionException,
                                          IOException

Triggers the execution of all the Extractor registered to this class using the specified extraction parameters.

Parameters:: extractionParameters - the parameters applied to the run execution.
Returns:: the report generated by the extraction.
Throws:: ExtractionException - if an error occurred during the data extraction.; IOException - if an error occurred during the data access.

run

public SingleDocumentExtractionReport run()
                                   throws IOException,
                                          ExtractionException

Triggers the execution of all the Extractor registered to this class using the default extraction parameters.

Returns:: the extraction report.
Throws:: IOException; ExtractionException

getDetectedMIMEType

public String getDetectedMIMEType()
                           throws IOException

Returns the detected mimetype for the given DocumentSource.

Returns:: string containing the detected mimetype.
Throws:: IOException - if an error occurred while accessing the data.

hasMatchingExtractors

public boolean hasMatchingExtractors()
                              throws IOException

Check whether the given DocumentSource content activates of not at least an extractor.

Returns:: true if at least an extractor is activated, false otherwise.
Throws:: IOException

getMatchingExtractors

public List<Extractor> getMatchingExtractors()

Returns:: the list of all the activated extractors for the given DocumentSource.

getParserEncoding

public String getParserEncoding()

Returns:: the configured parsing encoding.

setParserEncoding

public void setParserEncoding(String encoding)

Sets the document parser encoding.

Parameters:: encoding - parser encoding.

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

org.apache.any23.extractor Class SingleDocumentExtraction

EXTRACTION_CONTEXT_URI_PROPERTY

METADATA_TIMESIZE_FLAG

METADATA_NESTING_FLAG

METADATA_DOMAIN_PER_ENTITY_FLAG

SingleDocumentExtraction

SingleDocumentExtraction

SingleDocumentExtraction

setLocalCopyFactory

setMIMETypeDetector

run

run

getDetectedMIMEType

hasMatchingExtractors

getMatchingExtractors

getParserEncoding

setParserEncoding

org.apache.any23.extractor
Class SingleDocumentExtraction