JackrabbitTextExtractor (Apache Jackrabbit 1.6.0 API)

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

org.apache.jackrabbit.core.query.lucene
Class JackrabbitTextExtractor

java.lang.Object
  org.apache.jackrabbit.core.query.lucene.JackrabbitTextExtractor

All Implemented Interfaces:: TextExtractor

public class JackrabbitTextExtractor
extends Object
implements TextExtractor
extends Object
implements TextExtractor

Backwards-compatible Jackrabbit text extractor component. This class implements the following functionality:

Parses the configured TextExtractor and TextFilter class names and instantiates the configured classes.
Acts as the delegate extractor for any configured DelegatingTextExtractor instances.
Maintains a CompositeTextExtractor instance that contains all the configured extractors and to which all text extraction calls are delegated.
Creates a TextFilterExtractor adapter for a configured TextFilter instance when it is first used and adds that adapter to the composite extractor for use in text extraction.
Logs a warning and creates a dummy EmptyTextExtractor instance for any unsupported content types when first detected. The dummy extractor is added to the composite extractor to prevent future warnings about the same content type.

Constructor Summary
`JackrabbitTextExtractor(String classes)` Creates a Jackrabbit text extractor containing the configured component classes.

Method Summary
`Reader`	`extractText(InputStream stream, String type, String encoding)` Extracts the text content from the given binary stream.
`String[]`	`getContentTypes()` Returns the content types that the component extractors are known to support.

Methods inherited from class java.lang.Object
`clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait`

Constructor Detail

JackrabbitTextExtractor

public JackrabbitTextExtractor(String classes)

Creates a Jackrabbit text extractor containing the configured component classes.

Parameters:: classes - configured TextExtractor (and TextFilter) class names (space- or comma-separated)

Method Detail

getContentTypes

public String[] getContentTypes()

Returns the content types that the component extractors are known to support.

Specified by:: getContentTypes in interface TextExtractor

Returns:: supported content types

extractText

public Reader extractText(InputStream stream,
                          String type,
                          String encoding)
                   throws IOException

Extracts the text content from the given binary stream. The given content type is used to look up a configured text extractor to which to delegate the request.

If a matching extractor is not found, then the configured text filters searched for an instance that claims to support the given content type. A text extractor adapter is created for that filter and saved in the extractor map for future use before delegating the request to the adapter.

If not even a text filter is found for the given content type, a warning is logged and an empty text extractor is created for that content type and saved in the extractor map for future use before delegating the request to the empty extractor.

Specified by:: extractText in interface TextExtractor

Parameters:: stream - binary stream; type - content type; encoding - character encoding, or null
Returns:: reader for the text content of the binary stream
Throws:: IOException - if the binary stream can not be read