org.apache.jackrabbit.core.query.lucene
Class SearchIndex

java.lang.Object
  extended by org.apache.jackrabbit.core.query.AbstractQueryHandler
      extended by org.apache.jackrabbit.core.query.lucene.SearchIndex
All Implemented Interfaces:
QueryHandler

public class SearchIndex
extends AbstractQueryHandler

Implements a QueryHandler using Lucene.


Nested Class Summary
protected static class SearchIndex.CombinedIndexReader
          Combines multiple CachingMultiReader into a MultiReader with HierarchyResolver support.
 
Field Summary
static int DEFAULT_EXTRACTOR_BACK_LOG
          The default value for property extractorBackLog.
static int DEFAULT_EXTRACTOR_POOL_SIZE
          The default value for property extractorPoolSize.
static long DEFAULT_EXTRACTOR_TIMEOUT
          The default timeout in milliseconds which is granted to the text extraction process until fulltext indexing is deferred to a background thread.
static int DEFAULT_MAX_FIELD_LENGTH
          the default value for property maxFieldLength.
static int DEFAULT_MAX_MERGE_DOCS
          The default value for property maxMergeDocs.
static int DEFAULT_MERGE_FACTOR
          the default value for property mergeFactor.
static int DEFAULT_MIN_MERGE_DOCS
          The default value for property minMergeDocs.
 
Constructor Summary
SearchIndex()
          Default constructor.
 
Method Summary
 void addNode(NodeState node)
          Adds the node to the search index.
 void close()
          Closes this QueryHandler and frees resources attached to this handler.
protected  org.apache.lucene.document.Document createDocument(NodeState node, NamespaceMappings nsMappings)
          Creates a lucene Document for a node state using the namespace mappings nsMappings.
 ExcerptProvider createExcerptProvider(org.apache.lucene.search.Query query)
          Creates an excerpt provider for the given query.
 ExecutableQuery createExecutableQuery(SessionImpl session, ItemManager itemMgr, String statement, String language)
          Creates a new query by specifying the query statement itself and the language in which the query is stated.
protected  org.apache.lucene.search.SortField[] createSortFields(QName[] orderProps, boolean[] orderSpecs)
          Creates the SortFields for the order properties.
protected  TextExtractor createTextExtractor()
          Factory method to create the TextExtractor instance.
 void deleteNode(NodeId id)
          Removes the node with uuid from the search index.
protected  void doInit()
          Initializes this QueryHandler.
 QueryHits executeQuery(QueryImpl queryImpl, org.apache.lucene.search.Query query, QName[] orderProps, boolean[] orderSpecs)
          Executes the query on the search index.
 String getAnalyzer()
          Returns the class name of the analyzer that is currently in use.
 boolean getAutoRepair()
           
 int getBufferSize()
          Returns the current value for the buffer size.
 int getCacheSize()
           
 String getExcerptProviderClass()
           
 int getExtractorBackLogSize()
           
 int getExtractorPoolSize()
           
 long getExtractorTimeout()
           
 boolean getForceConsistencyCheck()
           
protected  MultiIndex getIndex()
          Returns the actual index.
 org.apache.lucene.index.IndexReader getIndexReader()
          Returns an index reader for this search index.
 int getMaxFieldLength()
           
 int getMaxMergeDocs()
          Returns the current value for maxMergeDocs.
 int getMergeFactor()
          Returns the current value for the merge factor.
 int getMinMergeDocs()
          Returns the current value for minMergeDocs.
 NamespaceMappings getNamespaceMappings()
          Returns the namespace mappings for the internal representation.
 String getPath()
          Returns the location of the search index.
 boolean getRespectDocumentOrder()
           
 int getResultFetchSize()
           
 boolean getSupportHighlighting()
           
 org.apache.lucene.analysis.Analyzer getTextAnalyzer()
          Returns the analyzer in use for indexing.
 TextExtractor getTextExtractor()
          Returns the text extractor in use for indexing.
 String getTextFilterClasses()
          Returns the fully qualified class names of the text filter instances currently in use.
 boolean getUseCompoundFile()
          Returns the current value for useCompoundFile.
 int getVolatileIdleTime()
          Returns the current value for volatileIdleTime.
 void setAnalyzer(String analyzerClassName)
          Sets the analyzer in use for indexing.
 void setAutoRepair(boolean b)
           
 void setBufferSize(int size)
           
 void setCacheSize(int size)
           
 void setExcerptProviderClass(String className)
          Sets the class name for the ExcerptProvider that should be used for the rep:excerpt pseudo property in a query.
 void setExtractorBackLogSize(int backLog)
          The number of extractor jobs that are queued until a new job is executed with the current thread instead of using the thread pool.
 void setExtractorPoolSize(int numThreads)
          The number of background threads for the extractor pool.
 void setExtractorTimeout(long timeout)
          The timeout in milliseconds which is granted to the text extraction process until fulltext indexing is deferred to a background thread.
 void setForceConsistencyCheck(boolean b)
           
 void setMaxFieldLength(int length)
           
 void setMaxMergeDocs(int maxMergeDocs)
          The lucene index writer property: maxMergeDocs
 void setMergeFactor(int mergeFactor)
          The lucene index writer property: mergeFactor
 void setMinMergeDocs(int minMergeDocs)
          The lucene index writer property: minMergeDocs
 void setPath(String path)
          Sets the location of the search index.
 void setRespectDocumentOrder(boolean docOrder)
           
 void setResultFetchSize(int size)
          Tells the query handler how many result should be fetched initially when a query is executed.
 void setSupportHighlighting(boolean b)
          If set to true additional information is stored in the index to support highlighting using the rep:excerpt pseudo property.
 void setTextFilterClasses(String filterClasses)
          Sets the list of text extractors (and text filters) to use for extracting text content from binary properties.
 void setUseCompoundFile(boolean b)
          The lucene index writer property: useCompoundFile
 void setVolatileIdleTime(int volatileIdleTime)
          Sets the property: volatileIdleTime
 void updateNodes(NodeIdIterator remove, NodeStateIterator add)
          This implementation forwards the call to MultiIndex.update(java.util.Iterator, java.util.Iterator) and transforms the two iterators to the required types.
 
Methods inherited from class org.apache.jackrabbit.core.query.AbstractQueryHandler
getContext, init
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

DEFAULT_MIN_MERGE_DOCS

public static final int DEFAULT_MIN_MERGE_DOCS
The default value for property minMergeDocs.

See Also:
Constant Field Values

DEFAULT_MAX_MERGE_DOCS

public static final int DEFAULT_MAX_MERGE_DOCS
The default value for property maxMergeDocs.

See Also:
Constant Field Values

DEFAULT_MERGE_FACTOR

public static final int DEFAULT_MERGE_FACTOR
the default value for property mergeFactor.

See Also:
Constant Field Values

DEFAULT_MAX_FIELD_LENGTH

public static final int DEFAULT_MAX_FIELD_LENGTH
the default value for property maxFieldLength.

See Also:
Constant Field Values

DEFAULT_EXTRACTOR_POOL_SIZE

public static final int DEFAULT_EXTRACTOR_POOL_SIZE
The default value for property extractorPoolSize.

See Also:
Constant Field Values

DEFAULT_EXTRACTOR_BACK_LOG

public static final int DEFAULT_EXTRACTOR_BACK_LOG
The default value for property extractorBackLog.

See Also:
Constant Field Values

DEFAULT_EXTRACTOR_TIMEOUT

public static final long DEFAULT_EXTRACTOR_TIMEOUT
The default timeout in milliseconds which is granted to the text extraction process until fulltext indexing is deferred to a background thread.

See Also:
Constant Field Values
Constructor Detail

SearchIndex

public SearchIndex()
Default constructor.

Method Detail

doInit

protected void doInit()
               throws IOException
Initializes this QueryHandler. This implementation requires that a path parameter is set in the configuration. If this condition is not met, a IOException is thrown.

Specified by:
doInit in class AbstractQueryHandler
Throws:
IOException - if an error occurs while initializing this handler.

addNode

public void addNode(NodeState node)
             throws RepositoryException,
                    IOException
Adds the node to the search index.

Parameters:
node - the node to add.
Throws:
RepositoryException - if an error occurs while indexing the node.
IOException - if an error occurs while adding the node to the index.

deleteNode

public void deleteNode(NodeId id)
                throws IOException
Removes the node with uuid from the search index.

Parameters:
id - the id of the node to remove from the index.
Throws:
IOException - if an error occurs while removing the node from the index.

updateNodes

public void updateNodes(NodeIdIterator remove,
                        NodeStateIterator add)
                 throws RepositoryException,
                        IOException
This implementation forwards the call to MultiIndex.update(java.util.Iterator, java.util.Iterator) and transforms the two iterators to the required types.

Specified by:
updateNodes in interface QueryHandler
Overrides:
updateNodes in class AbstractQueryHandler
Parameters:
remove - uuids of nodes to remove.
add - NodeStates to add. Calls to next() on this iterator may return null, to indicate that a node could not be indexed successfully.
Throws:
RepositoryException - if an error occurs while indexing a node.
IOException - if an error occurs while updating the index.

createExecutableQuery

public ExecutableQuery createExecutableQuery(SessionImpl session,
                                             ItemManager itemMgr,
                                             String statement,
                                             String language)
                                      throws InvalidQueryException
Creates a new query by specifying the query statement itself and the language in which the query is stated. If the query statement is syntactically invalid, given the language specified, an InvalidQueryException is thrown. language must specify a query language string from among those returned by QueryManager.getSupportedQueryLanguages(); if it is not then an InvalidQueryException is thrown.

Parameters:
session - the session of the current user creating the query object.
itemMgr - the item manager of the current user.
statement - the query statement.
language - the syntax of the query statement.
Returns:
A Query object.
Throws:
InvalidQueryException - if statement is invalid or language is unsupported.

close

public void close()
Closes this QueryHandler and frees resources attached to this handler.


executeQuery

public QueryHits executeQuery(QueryImpl queryImpl,
                              org.apache.lucene.search.Query query,
                              QName[] orderProps,
                              boolean[] orderSpecs)
                       throws IOException
Executes the query on the search index.

Parameters:
queryImpl - the query impl.
query - the lucene query.
orderProps - name of the properties for sort order.
orderSpecs - the order specs for the sort order properties. true indicates ascending order, false indicates descending.
Returns:
the lucene Hits object.
Throws:
IOException - if an error occurs while searching the index.

createExcerptProvider

public ExcerptProvider createExcerptProvider(org.apache.lucene.search.Query query)
                                      throws IOException
Creates an excerpt provider for the given query.

Parameters:
query - the query.
Returns:
an excerpt provider for the given query.
Throws:
IOException - if the provider cannot be created.

getTextAnalyzer

public org.apache.lucene.analysis.Analyzer getTextAnalyzer()
Returns the analyzer in use for indexing.

Returns:
the analyzer in use for indexing.

getTextExtractor

public TextExtractor getTextExtractor()
Returns the text extractor in use for indexing.

Returns:
the text extractor in use for indexing.

getNamespaceMappings

public NamespaceMappings getNamespaceMappings()
Returns the namespace mappings for the internal representation.

Returns:
the namespace mappings for the internal representation.

getIndexReader

public org.apache.lucene.index.IndexReader getIndexReader()
                                                   throws IOException
Returns an index reader for this search index. The caller of this method is responsible for closing the index reader when he is finished using it.

Returns:
an index reader for this search index.
Throws:
IOException - the index reader cannot be obtained.

createSortFields

protected org.apache.lucene.search.SortField[] createSortFields(QName[] orderProps,
                                                                boolean[] orderSpecs)
Creates the SortFields for the order properties.

Parameters:
orderProps - the order properties.
orderSpecs - the order specs for the properties.
Returns:
an array of sort fields

createDocument

protected org.apache.lucene.document.Document createDocument(NodeState node,
                                                             NamespaceMappings nsMappings)
                                                      throws RepositoryException
Creates a lucene Document for a node state using the namespace mappings nsMappings.

Parameters:
node - the node state to index.
nsMappings - the namespace mappings of the search index.
Returns:
a lucene Document that contains all properties of node.
Throws:
RepositoryException - if an error occurs while indexing the node.

getIndex

protected MultiIndex getIndex()
Returns the actual index.

Returns:
the actual index.

createTextExtractor

protected TextExtractor createTextExtractor()
Factory method to create the TextExtractor instance.

Returns:
the TextExtractor instance this index should use.

setAnalyzer

public void setAnalyzer(String analyzerClassName)
Sets the analyzer in use for indexing. The given analyzer class name must satisfy the following conditions:

If the above conditions are met, then a new instance of the class is set as the analyzer. Otherwise a warning is logged and the current analyzer is not changed.

This property setter method is normally invoked by the Jackrabbit configuration mechanism if the "analyzer" parameter is set in the search configuration.

Parameters:
analyzerClassName - the analyzer class name

getAnalyzer

public String getAnalyzer()
Returns the class name of the analyzer that is currently in use.

Returns:
class name of analyzer in use.

setPath

public void setPath(String path)
Sets the location of the search index.

Parameters:
path - the location of the search index.

getPath

public String getPath()
Returns the location of the search index. Returns null if not set.

Returns:
the location of the search index.

setUseCompoundFile

public void setUseCompoundFile(boolean b)
The lucene index writer property: useCompoundFile


getUseCompoundFile

public boolean getUseCompoundFile()
Returns the current value for useCompoundFile.

Returns:
the current value for useCompoundFile.

setMinMergeDocs

public void setMinMergeDocs(int minMergeDocs)
The lucene index writer property: minMergeDocs


getMinMergeDocs

public int getMinMergeDocs()
Returns the current value for minMergeDocs.

Returns:
the current value for minMergeDocs.

setVolatileIdleTime

public void setVolatileIdleTime(int volatileIdleTime)
Sets the property: volatileIdleTime

Parameters:
volatileIdleTime - idle time in seconds

getVolatileIdleTime

public int getVolatileIdleTime()
Returns the current value for volatileIdleTime.

Returns:
the current value for volatileIdleTime.

setMaxMergeDocs

public void setMaxMergeDocs(int maxMergeDocs)
The lucene index writer property: maxMergeDocs


getMaxMergeDocs

public int getMaxMergeDocs()
Returns the current value for maxMergeDocs.

Returns:
the current value for maxMergeDocs.

setMergeFactor

public void setMergeFactor(int mergeFactor)
The lucene index writer property: mergeFactor


getMergeFactor

public int getMergeFactor()
Returns the current value for the merge factor.

Returns:
the current value for the merge factor.

setBufferSize

public void setBufferSize(int size)
See Also:
VolatileIndex#setBufferSize(int)

getBufferSize

public int getBufferSize()
Returns the current value for the buffer size.

Returns:
the current value for the buffer size.

setRespectDocumentOrder

public void setRespectDocumentOrder(boolean docOrder)

getRespectDocumentOrder

public boolean getRespectDocumentOrder()

setForceConsistencyCheck

public void setForceConsistencyCheck(boolean b)

getForceConsistencyCheck

public boolean getForceConsistencyCheck()

setAutoRepair

public void setAutoRepair(boolean b)

getAutoRepair

public boolean getAutoRepair()

setCacheSize

public void setCacheSize(int size)

getCacheSize

public int getCacheSize()

setMaxFieldLength

public void setMaxFieldLength(int length)

getMaxFieldLength

public int getMaxFieldLength()

setTextFilterClasses

public void setTextFilterClasses(String filterClasses)
Sets the list of text extractors (and text filters) to use for extracting text content from binary properties. The list must be comma (or whitespace) separated, and contain fully qualified class names of the TextExtractor (and TextFilter) classes to be used. The configured classes must all have a public default constructor.

Parameters:
filterClasses - comma separated list of class names

getTextFilterClasses

public String getTextFilterClasses()
Returns the fully qualified class names of the text filter instances currently in use. The names are comma separated.

Returns:
class names of the text filters in use.

setResultFetchSize

public void setResultFetchSize(int size)
Tells the query handler how many result should be fetched initially when a query is executed.

Parameters:
size - the number of results to fetch initially.

getResultFetchSize

public int getResultFetchSize()
Returns:
the number of results the query handler will fetch initially when a query is executed.

setExtractorPoolSize

public void setExtractorPoolSize(int numThreads)
The number of background threads for the extractor pool.

Parameters:
numThreads - the number of threads.

getExtractorPoolSize

public int getExtractorPoolSize()
Returns:
the size of the thread pool which is used to run the text extractors when binary content is indexed.

setExtractorBackLogSize

public void setExtractorBackLogSize(int backLog)
The number of extractor jobs that are queued until a new job is executed with the current thread instead of using the thread pool.

Parameters:
backLog - size of the extractor job queue.

getExtractorBackLogSize

public int getExtractorBackLogSize()
Returns:
the size of the extractor queue back log.

setExtractorTimeout

public void setExtractorTimeout(long timeout)
The timeout in milliseconds which is granted to the text extraction process until fulltext indexing is deferred to a background thread.

Parameters:
timeout - the timeout in milliseconds.

getExtractorTimeout

public long getExtractorTimeout()
Returns:
the extractor timeout in milliseconds.

setSupportHighlighting

public void setSupportHighlighting(boolean b)
If set to true additional information is stored in the index to support highlighting using the rep:excerpt pseudo property.

Parameters:
b - true to enable highlighting support.

getSupportHighlighting

public boolean getSupportHighlighting()
Returns:
true if highlighting support is enabled.

setExcerptProviderClass

public void setExcerptProviderClass(String className)
Sets the class name for the ExcerptProvider that should be used for the rep:excerpt pseudo property in a query.

Parameters:
className - the name of a class that implements ExcerptProvider.

getExcerptProviderClass

public String getExcerptProviderClass()
Returns:
the class name of the excerpt provider implementation.


Copyright © 2004-2007 The Apache Software Foundation. All Rights Reserved.