PooledTextExtractor (Apache Jackrabbit 1.5.0 API)

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

org.apache.jackrabbit.core.query.lucene
Class PooledTextExtractor

java.lang.Object
  org.apache.jackrabbit.core.query.lucene.PooledTextExtractor

All Implemented Interfaces:: TextExtractor

public class PooledTextExtractor
extends Object
implements TextExtractor
extends Object
implements TextExtractor

PooledTextExtractor implements a text extractor that extracts the text using a pool of background threads.

Constructor Summary
`PooledTextExtractor(TextExtractor extractor, int poolSize, int backLog, long timeout)` Returns a pooled text extractor based on `extractor`.

Method Summary
`Reader`	`extractText(InputStream stream, String type, String encoding)` Returns a reader for the text content of the given binary document.
`String[]`	`getContentTypes()` Returns the MIME types supported by this extractor.
`void`	`shutdown()` Shuts down this pooled text extractor.

Methods inherited from class java.lang.Object
`clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait`

Constructor Detail

PooledTextExtractor

public PooledTextExtractor(TextExtractor extractor,
                           int poolSize,
                           int backLog,
                           long timeout)

Returns a pooled text extractor based on extractor.

Parameters:: extractor - the actual text extractor.; poolSize - the pool size.; backLog - size of the back log queue.; timeout - the timeout in milliseconds until text extraction is put into the indexing queue and the fulltext index for the node is later updated when the text extractor finished its work.

Method Detail

getContentTypes

public String[] getContentTypes()

Returns the MIME types supported by this extractor. The returned strings must be in lower case, and the returned array must not be empty.

The returned array must not be modified.

Specified by:: getContentTypes in interface TextExtractor

Returns:: supported MIME types, lower case

extractText

public Reader extractText(InputStream stream,
                          String type,
                          String encoding)
                   throws IOException

Returns a reader for the text content of the given binary document. The content type and character encoding (if available and applicable) are given as arguments. The given content type is guaranteed to be one of the types reported by TextExtractor.getContentTypes() unless the implementation explicitly permits other content types.

The implementation can choose either to read and parse the given document immediately or to return a reader that does it incrementally. The only constraint is that the implementation must close the given stream latest when the returned reader is closed. The caller on the other hand is responsible for closing the returned reader.

The implemenation should only throw an exception on transient errors, i.e. when it can expect to be able to successfully extract the text content of the same binary at another time. An effort should be made to recover from syntax errors and other similar problems.

This method should be thread-safe, i.e. it is possible that this method is invoked simultaneously by different threads to extract the text content of different documents. On the other hand the returned reader does not need to be thread-safe.

This implementation returns an instance of TextExtractorReader.

Specified by:: extractText in interface TextExtractor

Parameters:: stream - binary document from which to extract text; type - MIME type of the given document, lower case; encoding - the character encoding of the binary data, or null if not available
Returns:: reader for the extracted text content
Throws:: IOException - on transient errors

shutdown

public void shutdown()

Shuts down this pooled text extractor. This methods stops all currently running text extractor tasks and cleans up the pending queue (back log).