org.apache.jackrabbit.extractor
Class XMLTextExtractor

java.lang.Object
  extended by org.apache.jackrabbit.extractor.AbstractTextExtractor
      extended by org.apache.jackrabbit.extractor.XMLTextExtractor
All Implemented Interfaces:
TextExtractor

public class XMLTextExtractor
extends AbstractTextExtractor

Text extractor for XML documents. This class extracts the text content and attribute values from XML documents.

This class can handle any XML-based format (application/xml+something), not just the base XML content types reported by AbstractTextExtractor.getContentTypes(). However, it often makes sense to use more specialized extractors that better understand the specific content type.


Constructor Summary
XMLTextExtractor()
          Creates a new XMLTextExtractor instance.
 
Method Summary
 Reader extractText(InputStream stream, String type, String encoding)
          Returns a reader for the text content of the given XML document.
 
Methods inherited from class org.apache.jackrabbit.extractor.AbstractTextExtractor
getContentTypes
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

XMLTextExtractor

public XMLTextExtractor()
Creates a new XMLTextExtractor instance.

Method Detail

extractText

public Reader extractText(InputStream stream,
                          String type,
                          String encoding)
                   throws IOException
Returns a reader for the text content of the given XML document. Returns an empty reader if the given encoding is not supported or if the XML document could not be parsed.

Parameters:
stream - XML document
type - XML content type
encoding - character encoding, or null
Returns:
reader for the text content of the given XML document, or an empty reader if the document could not be parsed
Throws:
IOException - if the XML document stream can not be closed


Copyright © 2004-2007 The Apache Software Foundation. All Rights Reserved.