org.apache.any23.extractor.html
Class SpeciesExtractor

java.lang.Object
  extended by org.apache.any23.extractor.html.MicroformatExtractor
      extended by org.apache.any23.extractor.html.EntityBasedMicroformatExtractor
          extended by org.apache.any23.extractor.html.SpeciesExtractor
All Implemented Interfaces:
Extractor<Document>, Extractor.TagSoupDOMExtractor

public class SpeciesExtractor
extends EntityBasedMicroformatExtractor

Extractor able to extract the Species Microformat. The data are represented using the BBC Wildlife Ontology.

Author:
Davide Palmisano (dpalmisano@gmail.com)
See Also:
WO

Nested Class Summary
 
Nested classes/interfaces inherited from interface org.apache.any23.extractor.Extractor
Extractor.BlindExtractor, Extractor.ContentExtractor, Extractor.TagSoupDOMExtractor
 
Field Summary
static ExtractorFactory<SpeciesExtractor> factory
           
 
Fields inherited from class org.apache.any23.extractor.html.MicroformatExtractor
BEGIN_SCRIPT, END_SCRIPT, valueFactory
 
Constructor Summary
SpeciesExtractor()
           
 
Method Summary
protected  boolean extractEntity(Node node, ExtractionResult out)
          Extracts an entity from a DOM node.
protected  String getBaseClassName()
          Returns the base class name for the extractor.
 ExtractorDescription getDescription()
          Returns the description of this extractor.
protected  void resetExtractor()
          Resets the internal status of the extractor to prepare it to a new extraction section.
 
Methods inherited from class org.apache.any23.extractor.html.EntityBasedMicroformatExtractor
extract, getBlankNodeFor
 
Methods inherited from class org.apache.any23.extractor.html.MicroformatExtractor
addBNodeProperty, addBNodeProperty, addURIProperty, conditionallyAddLiteralProperty, conditionallyAddResourceProperty, conditionallyAddStringProperty, fixLink, fixLink, getCurrentExtractionResult, getDocumentURI, getExtractionContext, getHTMLDocument, includes, openSubResult, run
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

factory

public static final ExtractorFactory<SpeciesExtractor> factory
Constructor Detail

SpeciesExtractor

public SpeciesExtractor()
Method Detail

getDescription

public ExtractorDescription getDescription()
Returns the description of this extractor.

Specified by:
getDescription in interface Extractor<Document>
Specified by:
getDescription in class MicroformatExtractor
Returns:
a human readable description.

getBaseClassName

protected String getBaseClassName()
Returns the base class name for the extractor.

Specified by:
getBaseClassName in class EntityBasedMicroformatExtractor
Returns:
a string containing the base of the extractor.

resetExtractor

protected void resetExtractor()
Resets the internal status of the extractor to prepare it to a new extraction section.

Specified by:
resetExtractor in class EntityBasedMicroformatExtractor

extractEntity

protected boolean extractEntity(Node node,
                                ExtractionResult out)
                         throws ExtractionException
Extracts an entity from a DOM node.

Specified by:
extractEntity in class EntityBasedMicroformatExtractor
Parameters:
node - the DOM node.
out - the extraction result collector.
Returns:
true if the extraction has produces something, false otherwise.
Throws:
ExtractionException


Copyright © 2010-2012 The Apache Software Foundation. All Rights Reserved.