Package org.apache.any23.extractor.html

Class Summary
AdrExtractor Extractor for the adr microformat.
DocumentReport Represents the validationReportBuilder generated by a the TagSoupParser when a document is retrieved and validated.
DomUtils This class provides utility methods for DOM manipulation.
EntityBasedMicroformatExtractor Base class for microformat extractors based on entities.
GeoExtractor Extractor for the Geo microformat.
HCalendarExtractor Extractor for the hCalendar microformat.
HCardExtractor Extractor for the hCard microformat.
HCardName An HCard name, consisting of various parts.
HeadLinkExtractor This Extractor.TagSoupDOMExtractor implementation retrieves the LINKs declared within the HTML/HEAD page header.
HListingExtractor Extractor for the hListing microformat.
HRecipeExtractor Extractor for the hRecipe microformat.
HResumeExtractor Extractor for the hResume microformat.
HReviewExtractor Extractor for the hReview microformat.
HTMLDocument A wrapper around the DOM representation of an HTML document.
HTMLDocument.TextField This class represents a text extracted from the HTML DOM related to the node from which such test has been retrieved.
HTMLMetaExtractor This extractor represents the HTML META tag values according the HTML4 specification.
ICBMExtractor Extractor for "ICBM coordinates" provided as META headers in the head of an HTML page.
LicenseExtractor Extractor for the rel-license microformat.
MicroformatExtractor The abstract base class for any Microformat specification extractor.
SpanCloserInputStream Extension of InputStream meant to detect and replace any occurrence of inline span:
SpeciesExtractor Extractor able to extract the Species Microformat.
TagSoupParser Parses an InputStream into an HTML DOM tree using a TagSoup parser.
TagSoupParser.ElementLocation Describes a DOM Element location.
TitleExtractor Extracts the value of the <title> element of an HTML or XHTML page.
TurtleHTMLExtractor Extractor for Turtle/N3 format embedded within HTML script tags.
XFNExtractor Extractor for the XFN microformat.
 



Copyright © 2010-2012 The Apache Software Foundation. All Rights Reserved.