|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object org.apache.any23.extractor.SingleDocumentExtraction
public class SingleDocumentExtraction
This class acts as facade where all the extractors were called on a single document.
Field Summary | |
---|---|
static String |
EXTRACTION_CONTEXT_URI_PROPERTY
|
static String |
METADATA_DOMAIN_PER_ENTITY_FLAG
|
static String |
METADATA_NESTING_FLAG
|
static String |
METADATA_TIMESIZE_FLAG
|
Constructor Summary | |
---|---|
SingleDocumentExtraction(Configuration configuration,
DocumentSource in,
ExtractorFactory<?> factory,
TripleHandler output)
Builds an extractor by the specification of document source, extractors factory and output triple handler. |
|
SingleDocumentExtraction(Configuration configuration,
DocumentSource in,
ExtractorGroup extractors,
TripleHandler output)
Builds an extractor by the specification of document source, list of extractors and output triple handler. |
|
SingleDocumentExtraction(DocumentSource in,
ExtractorFactory<?> factory,
TripleHandler output)
Builds an extractor by the specification of document source, extractors factory and output triple handler, using the DefaultConfiguration . |
Method Summary | |
---|---|
String |
getDetectedMIMEType()
Returns the detected mimetype for the given DocumentSource . |
List<Extractor> |
getMatchingExtractors()
|
String |
getParserEncoding()
|
boolean |
hasMatchingExtractors()
Check whether the given DocumentSource content activates of not at least an extractor. |
SingleDocumentExtractionReport |
run()
Triggers the execution of all the Extractor
registered to this class using the default extraction parameters. |
SingleDocumentExtractionReport |
run(ExtractionParameters extractionParameters)
Triggers the execution of all the Extractor
registered to this class using the specified extraction parameters. |
void |
setLocalCopyFactory(LocalCopyFactory copyFactory)
Sets the internal factory for generating the document local copy, if null the MemCopyFactory will be used. |
void |
setMIMETypeDetector(MIMETypeDetector detector)
Sets the internal mime type detector, if null mimetype detection will
be skipped and all extractors will be activated. |
void |
setParserEncoding(String encoding)
Sets the document parser encoding. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
public static final String EXTRACTION_CONTEXT_URI_PROPERTY
public static final String METADATA_TIMESIZE_FLAG
public static final String METADATA_NESTING_FLAG
public static final String METADATA_DOMAIN_PER_ENTITY_FLAG
Constructor Detail |
---|
public SingleDocumentExtraction(Configuration configuration, DocumentSource in, ExtractorGroup extractors, TripleHandler output)
configuration
- configuration applied during extraction.in
- input document source.extractors
- list of extractors to be applied.output
- output triple handler.public SingleDocumentExtraction(Configuration configuration, DocumentSource in, ExtractorFactory<?> factory, TripleHandler output)
configuration
- configuration applied during extraction.in
- input document source.factory
- the extractors factory.output
- output triple handler.public SingleDocumentExtraction(DocumentSource in, ExtractorFactory<?> factory, TripleHandler output)
DefaultConfiguration
.
in
- input document source.factory
- the extractors factory.output
- output triple handler.Method Detail |
---|
public void setLocalCopyFactory(LocalCopyFactory copyFactory)
null
the MemCopyFactory
will be used.
copyFactory
- local copy factory.DocumentSource
public void setMIMETypeDetector(MIMETypeDetector detector)
null
mimetype detection will
be skipped and all extractors will be activated.
detector
- detector instance.public SingleDocumentExtractionReport run(ExtractionParameters extractionParameters) throws ExtractionException, IOException
Extractor
registered to this class using the specified extraction parameters.
extractionParameters
- the parameters applied to the run execution.
ExtractionException
- if an error occurred during the data extraction.
IOException
- if an error occurred during the data access.public SingleDocumentExtractionReport run() throws IOException, ExtractionException
Extractor
registered to this class using the default extraction parameters.
IOException
ExtractionException
public String getDetectedMIMEType() throws IOException
DocumentSource
.
IOException
- if an error occurred while accessing the data.public boolean hasMatchingExtractors() throws IOException
DocumentSource
content activates of not at least an extractor.
true
if at least an extractor is activated, false
otherwise.
IOException
public List<Extractor> getMatchingExtractors()
DocumentSource
.public String getParserEncoding()
public void setParserEncoding(String encoding)
encoding
- parser encoding.
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |