RDFaExtractor (Apache Any23 :: Core 0.7.0-incubating-SNAPSHOT API)

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

org.apache.any23.extractor.rdfa
Class RDFaExtractor

java.lang.Object
  org.apache.any23.extractor.rdfa.RDFaExtractor

All Implemented Interfaces:: Extractor<Document>, Extractor.TagSoupDOMExtractor

public class RDFaExtractor
extends Object
implements Extractor.TagSoupDOMExtractor
extends Object
implements Extractor.TagSoupDOMExtractor

Extractor for RDFa in HTML, based on Fabien Gadon's XSLT transform, found here. It works by first parsing the HTML using a tagsoup parser, then applies the XSLT to the DOM tree, then parses the resulting RDF/XML.

Author:: Gabriele Renzi, Richard Cyganiak (richard@cyganiak.de)

Nested Class Summary

Nested classes/interfaces inherited from interface org.apache.any23.extractor.Extractor
`Extractor.BlindExtractor, Extractor.ContentExtractor, Extractor.TagSoupDOMExtractor`

Field Summary
`static ExtractorFactory<RDFaExtractor>`	`factory`
`static String`	`NAME`
`static String`	`xsltFilename`

Constructor Summary
`RDFaExtractor()` Default constructor, with no verification of data types and not stop at first error.
`RDFaExtractor(boolean verifyDataType, boolean stopAtFirstError)` Constructor, allows to specify the validation and error handling policies.

Method Summary
`ExtractorDescription`	`getDescription()` Returns a `ExtractorDescription` of this extractor.
`static XSLTStylesheet`	`getXSLT()` Returns a `XSLTStylesheet` able to distill RDFa from HTML pages.
`boolean`	`isStopAtFirstError()`
`boolean`	`isVerifyDataType()`
`void`	`run(ExtractionParameters extractionParameters, ExtractionContext extractionContext, Document in, ExtractionResult out)` Executes the extractor.
`void`	`setStopAtFirstError(boolean stopAtFirstError)`
`void`	`setVerifyDataType(boolean verifyDataType)`

Methods inherited from class java.lang.Object
`clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait`

Field Detail

NAME

public static final String NAME

See Also:: Constant Field Values

xsltFilename

public static final String xsltFilename

factory

public static final ExtractorFactory<RDFaExtractor> factory

Constructor Detail

RDFaExtractor

public RDFaExtractor(boolean verifyDataType,
                     boolean stopAtFirstError)

Constructor, allows to specify the validation and error handling policies.

Parameters:: verifyDataType - if true the data types will be verified, if false will be ignored.; stopAtFirstError - if true the parser will stop at first parsing error, if false will ignore non blocking errors.

RDFaExtractor

public RDFaExtractor()

Default constructor, with no verification of data types and not stop at first error.

Method Detail

getXSLT

public static XSLTStylesheet getXSLT()

Returns a XSLTStylesheet able to distill RDFa from HTML pages.

Returns:: returns a not null XSLT instance.

isVerifyDataType

public boolean isVerifyDataType()

setVerifyDataType

public void setVerifyDataType(boolean verifyDataType)

isStopAtFirstError

public boolean isStopAtFirstError()

setStopAtFirstError

public void setStopAtFirstError(boolean stopAtFirstError)

run

public void run(ExtractionParameters extractionParameters,
                ExtractionContext extractionContext,
                Document in,
                ExtractionResult out)
         throws IOException,
                ExtractionException

Description copied from interface: Extractor

Executes the extractor. Will be invoked only once, extractors are not reusable.

Specified by:: run in interface Extractor<Document>

Parameters:: extractionParameters - the parameters to be applied during the extraction.; extractionContext - The document context.; in - The extractor input data.; out - the collector for the extracted data.
Throws:: IOException - On error while reading from the input stream.; ExtractionException - On other error, such as parse errors.

getDescription

public ExtractorDescription getDescription()

Description copied from interface: Extractor

Returns a ExtractorDescription of this extractor.

Specified by:: getDescription in interface Extractor<Document>

Returns:: the ExtractorDescription of this extractor

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

org.apache.any23.extractor.rdfa Class RDFaExtractor

NAME

xsltFilename

factory

RDFaExtractor

RDFaExtractor

getXSLT

isVerifyDataType

setVerifyDataType

isStopAtFirstError

setStopAtFirstError

run

getDescription

org.apache.any23.extractor.rdfa
Class RDFaExtractor