org.apache.jackrabbit.core.query.lucene
Class JackrabbitTextExtractor

java.lang.Object
  extended by org.apache.jackrabbit.core.query.lucene.JackrabbitTextExtractor
All Implemented Interfaces:
TextExtractor

public class JackrabbitTextExtractor
extends Object
implements TextExtractor

Backwards-compatible Jackrabbit text extractor component. This class implements the following functionality:


Constructor Summary
JackrabbitTextExtractor(String classes)
          Creates a Jackrabbit text extractor containing the configured component classes.
 
Method Summary
 Reader extractText(InputStream stream, String type, String encoding)
          Extracts the text content from the given binary stream.
 String[] getContentTypes()
          Returns the content types that the component extractors are known to support.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

JackrabbitTextExtractor

public JackrabbitTextExtractor(String classes)
Creates a Jackrabbit text extractor containing the configured component classes.

Parameters:
classes - configured TextExtractor (and TextFilter) class names (space- or comma-separated)
Method Detail

getContentTypes

public String[] getContentTypes()
Returns the content types that the component extractors are known to support.

Specified by:
getContentTypes in interface TextExtractor
Returns:
supported content types

extractText

public Reader extractText(InputStream stream,
                          String type,
                          String encoding)
                   throws IOException
Extracts the text content from the given binary stream. The given content type is used to look up a configured text extractor to which to delegate the request.

If a matching extractor is not found, then the configured text filters searched for an instance that claims to support the given content type. A text extractor adapter is created for that filter and saved in the extractor map for future use before delegating the request to the adapter.

If not even a text filter is found for the given content type, a warning is logged and an empty text extractor is created for that content type and saved in the extractor map for future use before delegating the request to the empty extractor.

Specified by:
extractText in interface TextExtractor
Parameters:
stream - binary stream
type - content type
encoding - character encoding, or null
Returns:
reader for the text content of the binary stream
Throws:
IOException - if the binary stream can not be read


Copyright © 2004-2009 The Apache Software Foundation. All Rights Reserved.