org.apache.any23.extractor.microdata
Class MicrodataExtractor

java.lang.Object
  extended by org.apache.any23.extractor.microdata.MicrodataExtractor
All Implemented Interfaces:
Extractor<Document>, Extractor.TagSoupDOMExtractor

public class MicrodataExtractor
extends Object
implements Extractor.TagSoupDOMExtractor

Default implementation of Microdata extractor, based on TagSoupDOMExtractor.

Author:
Michele Mostarda (mostarda@fbk.eu), Davide Palmisano ( dpalmisano@gmail.com )

Nested Class Summary
 
Nested classes/interfaces inherited from interface org.apache.any23.extractor.Extractor
Extractor.BlindExtractor, Extractor.ContentExtractor, Extractor.TagSoupDOMExtractor
 
Field Summary
static ExtractorFactory<MicrodataExtractor> factory
           
 
Constructor Summary
MicrodataExtractor()
           
 
Method Summary
 ExtractorDescription getDescription()
          Returns a ExtractorDescription of this extractor.
 void run(ExtractionParameters extractionParameters, ExtractionContext extractionContext, Document in, ExtractionResult out)
          This extraction performs the Microdata to RDF conversion algorithm.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

factory

public static final ExtractorFactory<MicrodataExtractor> factory
Constructor Detail

MicrodataExtractor

public MicrodataExtractor()
Method Detail

getDescription

public ExtractorDescription getDescription()
Description copied from interface: Extractor
Returns a ExtractorDescription of this extractor.

Specified by:
getDescription in interface Extractor<Document>
Returns:
the object representing the extractor description.

run

public void run(ExtractionParameters extractionParameters,
                ExtractionContext extractionContext,
                Document in,
                ExtractionResult out)
         throws IOException,
                ExtractionException
This extraction performs the Microdata to RDF conversion algorithm. A slight modification of the specification algorithm has been introduced to avoid performing actions 5.2.1, 5.2.2, 5.2.3, 5.2.4 if step 5.2.6 doesn't detect any Microdata.

Specified by:
run in interface Extractor<Document>
Parameters:
extractionParameters - the parameters to be applied during the extraction.
extractionContext - The document context.
in - The extractor input data.
out - the collector for the extracted data.
Throws:
IOException - On error while reading from the input stream.
ExtractionException - On other error, such as parse errors.


Copyright © 2010-2012 The Apache Software Foundation. All Rights Reserved.