Package org.apache.jackrabbit.extractor

Interface Summary
DelegatingTextExtractor Interface for text extractors that need to delegate the extraction of parts of content documents to another text extractor.
TextExtractor Interface for extracting text content from binary streams.
 

Class Summary
AbstractTextExtractor Base class for text extractor implementations.
CompositeTextExtractor Composite text extractor.
DefaultTextExtractor Composite text extractor that by default contains the standard text extractors found in this package.
EmptyTextExtractor Dummy text extractor that always returns and empty reader for all documents.
HTMLParser Helper class for HTML parsing
HTMLTextExtractor Text extractor for HyperText Markup Language (HTML).
MsExcelTextExtractor Text extractor for Microsoft Excel sheets.
MsOutlookTextExtractor Text extractor for Microsoft Outlook messages.
MsPowerPointTextExtractor Text extractor for Microsoft PowerPoint presentations.
MsWordTextExtractor Text extractor for Microsoft Word documents.
OpenOfficeTextExtractor Text extractor for OpenOffice documents.
PdfTextExtractor Text extractor for Portable Document Format (PDF).
PlainTextExtractor Text extractor for plain text.
PngTextExtractor Text extractor for png/apng/mng images.
RTFTextExtractor Text extractor for Rich Text Format (RTF)
XMLTextExtractor Text extractor for XML documents.
 



Copyright © 2004-2009 The Apache Software Foundation. All Rights Reserved.