org.apache.any23.extractor.html
Class EntityBasedMicroformatExtractor
java.lang.Object
org.apache.any23.extractor.html.MicroformatExtractor
org.apache.any23.extractor.html.EntityBasedMicroformatExtractor
- All Implemented Interfaces:
- Extractor<Document>, Extractor.TagSoupDOMExtractor
- Direct Known Subclasses:
- AdrExtractor, GeoExtractor, HCardExtractor, HListingExtractor, HRecipeExtractor, HResumeExtractor, HReviewExtractor, SpeciesExtractor
public abstract class EntityBasedMicroformatExtractor
- extends MicroformatExtractor
Base class for microformat extractors based on entities.
- Author:
- Gabriele Renzi
Method Summary |
boolean |
extract()
Performs the extraction of the data and writes them to the model. |
protected abstract boolean |
extractEntity(Node node,
ExtractionResult out)
Extracts an entity from a DOM node. |
protected abstract String |
getBaseClassName()
Returns the base class name for the extractor. |
protected org.openrdf.model.BNode |
getBlankNodeFor(Node node)
|
protected abstract void |
resetExtractor()
Resets the internal status of the extractor to prepare it to a new extraction section. |
Methods inherited from class org.apache.any23.extractor.html.MicroformatExtractor |
addBNodeProperty, addBNodeProperty, addURIProperty, conditionallyAddLiteralProperty, conditionallyAddResourceProperty, conditionallyAddStringProperty, fixLink, fixLink, getCurrentExtractionResult, getDescription, getDocumentURI, getExtractionContext, getHTMLDocument, includes, openSubResult, run |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
EntityBasedMicroformatExtractor
public EntityBasedMicroformatExtractor()
getBaseClassName
protected abstract String getBaseClassName()
- Returns the base class name for the extractor.
- Returns:
- a string containing the base of the extractor.
resetExtractor
protected abstract void resetExtractor()
- Resets the internal status of the extractor to prepare it to a new extraction section.
extractEntity
protected abstract boolean extractEntity(Node node,
ExtractionResult out)
throws ExtractionException
- Extracts an entity from a DOM node.
- Parameters:
node
- the DOM node.out
- the extraction result collector.
- Returns:
true
if the extraction has produces something, false
otherwise.
- Throws:
ExtractionException
extract
public boolean extract()
throws ExtractionException
- Description copied from class:
MicroformatExtractor
- Performs the extraction of the data and writes them to the model.
The nodes generated in the model can have any name or implicit label
but if possible they SHOULD have names (either URIs or AnonId) that
are uniquely derivable from their position in the DOM tree, so that
multiple extractors can merge information.
- Specified by:
extract
in class MicroformatExtractor
- Throws:
ExtractionException
getBlankNodeFor
protected org.openrdf.model.BNode getBlankNodeFor(Node node)
- Parameters:
node
- a DOM node representing a blank node
- Returns:
- an RDF blank node corresponding to that DOM node, by using a
blank node ID like "MD5 of http://doc-uri/#xpath/to/node"
Copyright © 2010-2012 The Apache Software Foundation. All Rights Reserved.