|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object org.apache.jackrabbit.core.query.lucene.PooledTextExtractor
public class PooledTextExtractor
PooledTextExtractor
implements a text extractor that extracts
the text using a pool of background threads.
Constructor Summary | |
---|---|
PooledTextExtractor(TextExtractor extractor,
int poolSize,
int backLog,
long timeout)
Returns a pooled text extractor based on extractor . |
Method Summary | |
---|---|
Reader |
extractText(InputStream stream,
String type,
String encoding)
Returns a reader for the text content of the given binary document. This implementation returns an instance of TextExtractorReader . |
String[] |
getContentTypes()
Returns the MIME types supported by this extractor. |
void |
shutdown()
Shuts down this pooled text extractor. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
---|
public PooledTextExtractor(TextExtractor extractor, int poolSize, int backLog, long timeout)
extractor
.
extractor
- the actual text extractor.poolSize
- the pool size.backLog
- size of the back log queue.timeout
- the timeout in milliseconds until text extraction is put
into the indexing queue and the fulltext index for the
node is later updated when the text extractor finished
its work.Method Detail |
---|
public String[] getContentTypes()
The returned array must not be modified.
getContentTypes
in interface TextExtractor
public Reader extractText(InputStream stream, String type, String encoding) throws IOException
TextExtractor.getContentTypes()
unless the
implementation explicitly permits other content types.
The implementation can choose either to read and parse the given document immediately or to return a reader that does it incrementally. The only constraint is that the implementation must close the given stream latest when the returned reader is closed. The caller on the other hand is responsible for closing the returned reader.
The implemenation should only throw an exception on transient errors, i.e. when it can expect to be able to successfully extract the text content of the same binary at another time. An effort should be made to recover from syntax errors and other similar problems.
This method should be thread-safe, i.e. it is possible that this method is invoked simultaneously by different threads to extract the text content of different documents. On the other hand the returned reader does not need to be thread-safe.
This implementation returns an instance ofTextExtractorReader
.
extractText
in interface TextExtractor
stream
- binary document from which to extract texttype
- MIME type of the given document, lower caseencoding
- the character encoding of the binary data,
or null
if not available
IOException
- on transient errorspublic void shutdown()
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |