org.apache.jackrabbit.extractor
Class CompositeTextExtractor

java.lang.Object
  extended by org.apache.jackrabbit.extractor.CompositeTextExtractor
All Implemented Interfaces:
TextExtractor
Direct Known Subclasses:
DefaultTextExtractor

public class CompositeTextExtractor
extends Object
implements TextExtractor

Composite text extractor. This class presents a unified interface for a set of TextExtractor instances. The composite extractor supports all the content types supported by the component extractors, and delegates text extraction calls to the appropriate components.


Constructor Summary
CompositeTextExtractor()
           
 
Method Summary
 void addTextExtractor(TextExtractor extractor)
          Adds a component text extractor.
 Reader extractText(InputStream stream, String type, String encoding)
          Extracts text content using one of the component extractors.
 String[] getContentTypes()
          Returns all the content types supported by the component extractors.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

CompositeTextExtractor

public CompositeTextExtractor()
Method Detail

addTextExtractor

public void addTextExtractor(TextExtractor extractor)
Adds a component text extractor. The given extractor is registered to process all the content types it claims to support.

Parameters:
extractor - component extractor

getContentTypes

public String[] getContentTypes()
Returns all the content types supported by the component extractors.

Specified by:
getContentTypes in interface TextExtractor
Returns:
supported content types

extractText

public Reader extractText(InputStream stream,
                          String type,
                          String encoding)
                   throws IOException
Extracts text content using one of the component extractors. If an extractor for the given content type does not exist, then the binary stream is just closed and an empty reader is returned.

Specified by:
extractText in interface TextExtractor
Parameters:
stream - binary stream
type - content type
encoding - optional character encoding
Returns:
reader for the text content of the binary stream
Throws:
IOException - if the binary stream can not be read


Copyright © 2004-2007 The Apache Software Foundation. All Rights Reserved.