|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object org.apache.any23.extractor.rdfa.RDFaExtractor
public class RDFaExtractor
Extractor for RDFa in HTML, based on Fabien Gadon's XSLT transform, found here. It works by first parsing the HTML using a tagsoup parser, then applies the XSLT to the DOM tree, then parses the resulting RDF/XML.
Nested Class Summary |
---|
Nested classes/interfaces inherited from interface org.apache.any23.extractor.Extractor |
---|
Extractor.BlindExtractor, Extractor.ContentExtractor, Extractor.TagSoupDOMExtractor |
Field Summary | |
---|---|
static ExtractorFactory<RDFaExtractor> |
factory
|
static String |
NAME
|
static String |
xsltFilename
|
Constructor Summary | |
---|---|
RDFaExtractor()
Default constructor, with no verification of data types and not stop at first error. |
|
RDFaExtractor(boolean verifyDataType,
boolean stopAtFirstError)
Constructor, allows to specify the validation and error handling policies. |
Method Summary | |
---|---|
ExtractorDescription |
getDescription()
Returns a ExtractorDescription of this extractor. |
static XSLTStylesheet |
getXSLT()
Returns a XSLTStylesheet able to distill RDFa from
HTML pages. |
boolean |
isStopAtFirstError()
|
boolean |
isVerifyDataType()
|
void |
run(ExtractionParameters extractionParameters,
ExtractionContext extractionContext,
Document in,
ExtractionResult out)
Executes the extractor. |
void |
setStopAtFirstError(boolean stopAtFirstError)
|
void |
setVerifyDataType(boolean verifyDataType)
|
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
public static final String NAME
public static final String xsltFilename
public static final ExtractorFactory<RDFaExtractor> factory
Constructor Detail |
---|
public RDFaExtractor(boolean verifyDataType, boolean stopAtFirstError)
verifyDataType
- if true
the data types will be verified,
if false
will be ignored.stopAtFirstError
- if true
the parser will stop at first parsing error,
if false
will ignore non blocking errors.public RDFaExtractor()
Method Detail |
---|
public static XSLTStylesheet getXSLT()
XSLTStylesheet
able to distill RDFa from
HTML pages.
null
XSLT instance.public boolean isVerifyDataType()
public void setVerifyDataType(boolean verifyDataType)
public boolean isStopAtFirstError()
public void setStopAtFirstError(boolean stopAtFirstError)
public void run(ExtractionParameters extractionParameters, ExtractionContext extractionContext, Document in, ExtractionResult out) throws IOException, ExtractionException
Extractor
run
in interface Extractor<Document>
extractionParameters
- the parameters to be applied during the extraction.extractionContext
- The document context.in
- The extractor input data.out
- the collector for the extracted data.
IOException
- On error while reading from the input stream.
ExtractionException
- On other error, such as parse errors.public ExtractorDescription getDescription()
Extractor
ExtractorDescription
of this extractor.
getDescription
in interface Extractor<Document>
ExtractorDescription
of this extractor
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |