org.apache.jackrabbit.core.query
Class MsWordTextFilter

java.lang.Object
  extended byorg.apache.jackrabbit.core.query.MsWordTextFilter
All Implemented Interfaces:
org.apache.jackrabbit.core.query.TextFilter

public class MsWordTextFilter
extends Object
implements org.apache.jackrabbit.core.query.TextFilter

Extracts texts from MS Word document binary data. Taken from Jakarta Slide class org.apache.slide.extractor.MSPowerPointExtractor


Constructor Summary
MsWordTextFilter()
           
 
Method Summary
 boolean canFilter(String mimeType)
           
 Map doFilter(org.apache.jackrabbit.core.state.PropertyState data, String encoding)
          Returns a map with a single entry for field FieldNames.FULLTEXT.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

MsWordTextFilter

public MsWordTextFilter()
Method Detail

canFilter

public boolean canFilter(String mimeType)
Specified by:
canFilter in interface org.apache.jackrabbit.core.query.TextFilter
Returns:
true for application/vnd.ms-word or application/msword, false otherwise.

doFilter

public Map doFilter(org.apache.jackrabbit.core.state.PropertyState data,
                    String encoding)
             throws RepositoryException
Returns a map with a single entry for field FieldNames.FULLTEXT.

Specified by:
doFilter in interface org.apache.jackrabbit.core.query.TextFilter
Parameters:
data - object containing MS Word document data.
encoding - text encoding is not used, since it is specified in the data.
Returns:
a map with a single Reader value for field FieldNames.FULLTEXT.
Throws:
RepositoryException - if data is a multi-value property or it does not contain valid MS Word document.


Copyright © -2006 The Apache Software Foundation. All Rights Reserved.