org.apache.jackrabbit.extractor
Class XMLTextExtractor
java.lang.Object
org.apache.jackrabbit.extractor.AbstractTextExtractor
org.apache.jackrabbit.extractor.XMLTextExtractor
- All Implemented Interfaces:
- TextExtractor
public class XMLTextExtractor
- extends AbstractTextExtractor
Text extractor for XML documents. This class extracts the text content
and attribute values from XML documents.
This class can handle any XML-based format
(application/xml+something
), not just the base XML content
types reported by AbstractTextExtractor.getContentTypes()
. However, it often makes
sense to use more specialized extractors that better understand the
specific content type.
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
XMLTextExtractor
public XMLTextExtractor()
- Creates a new
XMLTextExtractor
instance.
extractText
public Reader extractText(InputStream stream,
String type,
String encoding)
throws IOException
- Returns a reader for the text content of the given XML document.
Returns an empty reader if the given encoding is not supported or
if the XML document could not be parsed.
- Parameters:
stream
- XML documenttype
- XML content typeencoding
- character encoding, or null
- Returns:
- reader for the text content of the given XML document,
or an empty reader if the document could not be parsed
- Throws:
IOException
- if the XML document stream can not be closed
Copyright © 2004-2009 The Apache Software Foundation. All Rights Reserved.