org.apache.jackrabbit.extractor
Class RTFTextExtractor

java.lang.Object
  extended by org.apache.jackrabbit.extractor.AbstractTextExtractor
      extended by org.apache.jackrabbit.extractor.RTFTextExtractor
All Implemented Interfaces:
TextExtractor

public class RTFTextExtractor
extends AbstractTextExtractor

Text extractor for Rich Text Format (RTF)


Constructor Summary
RTFTextExtractor()
          Creates a new RTFTextExtractor instance.
 
Method Summary
 Reader extractText(InputStream stream, String type, String encoding)
          Returns a reader for the text content of the given binary document.
 
Methods inherited from class org.apache.jackrabbit.extractor.AbstractTextExtractor
getContentTypes
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

RTFTextExtractor

public RTFTextExtractor()
Creates a new RTFTextExtractor instance.

Method Detail

extractText

public Reader extractText(InputStream stream,
                          String type,
                          String encoding)
                   throws IOException
Returns a reader for the text content of the given binary document. The content type and character encoding (if available and applicable) are given as arguments. The given content type is guaranteed to be one of the types reported by TextExtractor.getContentTypes() unless the implementation explicitly permits other content types.

The implementation can choose either to read and parse the given document immediately or to return a reader that does it incrementally. The only constraint is that the implementation must close the given stream latest when the returned reader is closed. The caller on the other hand is responsible for closing the returned reader.

The implemenation should only throw an exception on transient errors, i.e. when it can expect to be able to successfully extract the text content of the same binary at another time. An effort should be made to recover from syntax errors and other similar problems.

This method should be thread-safe, i.e. it is possible that this method is invoked simultaneously by different threads to extract the text content of different documents. On the other hand the returned reader does not need to be thread-safe.

Parameters:
stream - binary document from which to extract text
type - MIME type of the given document, lower case
encoding - the character encoding of the binary data, or null if not available
Returns:
reader for the extracted text content
Throws:
IOException - on transient errors


Copyright © 2004-2008 The Apache Software Foundation. All Rights Reserved.