org.apache.jackrabbit.core.query.lucene
Class NodeIndexer

java.lang.Object
  extended by org.apache.jackrabbit.core.query.lucene.NodeIndexer

public class NodeIndexer
extends Object

Creates a lucene Document object from a Node.


Field Summary
protected static float DEFAULT_BOOST
          The default boost for a lucene field: 1.0f.
protected  List<org.apache.lucene.document.Fieldable> doNotUseInExcerpt
          List of FieldNames.FULLTEXT fields which should not be used in an excerpt.
protected  IndexFormatVersion indexFormatVersion
          Indicates index format for this node indexer.
protected  IndexingConfiguration indexingConfig
          The indexing configuration or null if none is available.
protected  NamespaceMappings mappings
          Namespace mappings to use for indexing.
protected  NodeState node
          The NodeState of the node to index
protected  NamePathResolver resolver
          Name and Path resolver.
protected  ItemStateManager stateProvider
          The persistent item state provider
protected  boolean supportHighlighting
          If set to true the fulltext field is stored and and a term vector is created with offset information.
 
Constructor Summary
NodeIndexer(NodeState node, ItemStateManager stateProvider, NamespaceMappings mappings, Executor executor, org.apache.tika.parser.Parser parser)
          Creates a new node indexer.
 
Method Summary
protected  void addBinaryValue(org.apache.lucene.document.Document doc, String fieldName, InternalValue internalValue)
          Adds the binary value to the document as the named field.
protected  void addBooleanValue(org.apache.lucene.document.Document doc, String fieldName, Object internalValue)
          Adds the string representation of the boolean value to the document as the named field.
protected  void addCalendarValue(org.apache.lucene.document.Document doc, String fieldName, Object internalValue)
          Adds the calendar value to the document as the named field.
protected  void addDecimalValue(org.apache.lucene.document.Document doc, String fieldName, Object internalValue)
          Adds the long value to the document as the named field.
protected  void addDoubleValue(org.apache.lucene.document.Document doc, String fieldName, Object internalValue)
          Adds the double value to the document as the named field.
protected  void addLength(org.apache.lucene.document.Document doc, String propertyName, InternalValue value)
          Adds a FieldNames.PROPERTY_LENGTHS field to document with a named length value.
protected  void addLongValue(org.apache.lucene.document.Document doc, String fieldName, Object internalValue)
          Adds the long value to the document as the named field.
protected  void addMVPName(org.apache.lucene.document.Document doc, Name name)
          Adds a FieldNames.MVP field to doc with the resolved name using the internal search index namespace mapping.
protected  void addNameValue(org.apache.lucene.document.Document doc, String fieldName, Object internalValue)
          Adds the name value to the document as the named field.
protected  void addNodeName(org.apache.lucene.document.Document doc, String namespaceURI, String localName)
          Depending on the index format version adds one or two fields to the document for the node name.
protected  void addParentChildRelation(org.apache.lucene.document.Document doc, NodeId parentId)
          Adds a parent child relation to the given doc.
protected  void addPathValue(org.apache.lucene.document.Document doc, String fieldName, Object internalValue)
          Adds the path value to the document as the named field.
protected  void addPropertyName(org.apache.lucene.document.Document doc, Name name)
          Adds the property name to the lucene _:PROPERTIES_SET field.
protected  void addReferenceValue(org.apache.lucene.document.Document doc, String fieldName, Object internalValue, boolean weak)
          Adds the reference value to the document as the named field.
protected  void addStringValue(org.apache.lucene.document.Document doc, String fieldName, Object internalValue)
          Deprecated. Use addStringValue(Document, String, Object, boolean) instead.
protected  void addStringValue(org.apache.lucene.document.Document doc, String fieldName, Object internalValue, boolean tokenized)
          Adds the string value to the document both as the named field and optionally for full text indexing if tokenized is true.
protected  void addStringValue(org.apache.lucene.document.Document doc, String fieldName, Object internalValue, boolean tokenized, boolean includeInNodeIndex, float boost)
          Deprecated. use addStringValue(Document, String, Object, boolean, boolean, float, boolean) instead.
protected  void addStringValue(org.apache.lucene.document.Document doc, String fieldName, Object internalValue, boolean tokenized, boolean includeInNodeIndex, float boost, boolean useInExcerpt)
          Adds the string value to the document both as the named field and optionally for full text indexing if tokenized is true.
protected  void addURIValue(org.apache.lucene.document.Document doc, String fieldName, Object internalValue)
          Adds the uri value to the document as the named field.
protected  void addValue(org.apache.lucene.document.Document doc, InternalValue value, Name name)
          Adds a value to the lucene Document.
 org.apache.lucene.document.Document createDoc()
          Creates a lucene Document.
protected  org.apache.lucene.document.Field createFieldWithoutNorms(String fieldName, String internalValue, int propertyType)
          Creates a field of name fieldName with the value of internalValue.
protected  org.apache.lucene.document.Fieldable createFulltextField(InternalValue value, org.apache.tika.metadata.Metadata metadata)
          Creates a fulltext field for the reader value.
protected  org.apache.lucene.document.Field createFulltextField(String value)
          Deprecated. use createFulltextField(String, boolean, boolean) instead.
protected  org.apache.lucene.document.Field createFulltextField(String value, boolean store, boolean withOffsets)
          Creates a fulltext field for the string value.
 int getMaxExtractLength()
          Returns the maximum number of characters to extract from binaries.
protected  float getNodeBoost()
           
 NodeId getNodeId()
          Returns the NodeId of the indexed node.
protected  float getPropertyBoost(Name propertyName)
          Returns the boost value for the given property name.
protected  InternalValue getValue(Name name)
          Utility method that extracts the first value of the named property of the current node.
protected  boolean isIncludedInNodeIndex(Name propertyName)
          Returns true if the property with the given name should also be added to the node scope index.
protected  boolean isIndexed(Name propertyName)
          Returns true if the property with the given name should be indexed.
 void setIndexFormatVersion(IndexFormatVersion indexFormatVersion)
          Sets the index format version
 void setIndexingConfiguration(IndexingConfiguration config)
          Sets the indexing configuration for this node indexer.
 void setMaxExtractLength(int length)
          Sets the maximum number of characters to extract from binaries.
 void setSupportHighlighting(boolean b)
          If set to true additional information is stored in the index to support highlighting using the rep:excerpt pseudo property.
protected  void throwRepositoryException(Exception e)
          Wraps the exception e into a RepositoryException and throws the created exception.
protected  boolean useInExcerpt(Name propertyName)
          Returns true if the content of the property with the given name should the used to create an excerpt.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

DEFAULT_BOOST

protected static final float DEFAULT_BOOST
The default boost for a lucene field: 1.0f.

See Also:
Constant Field Values

node

protected final NodeState node
The NodeState of the node to index


stateProvider

protected final ItemStateManager stateProvider
The persistent item state provider


mappings

protected final NamespaceMappings mappings
Namespace mappings to use for indexing. This is the internal namespace mapping.


resolver

protected final NamePathResolver resolver
Name and Path resolver.


indexingConfig

protected IndexingConfiguration indexingConfig
The indexing configuration or null if none is available.


supportHighlighting

protected boolean supportHighlighting
If set to true the fulltext field is stored and and a term vector is created with offset information.


indexFormatVersion

protected IndexFormatVersion indexFormatVersion
Indicates index format for this node indexer.


doNotUseInExcerpt

protected List<org.apache.lucene.document.Fieldable> doNotUseInExcerpt
List of FieldNames.FULLTEXT fields which should not be used in an excerpt.

Constructor Detail

NodeIndexer

public NodeIndexer(NodeState node,
                   ItemStateManager stateProvider,
                   NamespaceMappings mappings,
                   Executor executor,
                   org.apache.tika.parser.Parser parser)
Creates a new node indexer.

Parameters:
node - the node state to index.
stateProvider - the persistent item state manager to retrieve properties.
mappings - internal namespace mappings.
executor - background task executor for text extraction
parser - parser for binary properties
Method Detail

getNodeId

public NodeId getNodeId()
Returns the NodeId of the indexed node.

Returns:
the NodeId of the indexed node.

setSupportHighlighting

public void setSupportHighlighting(boolean b)
If set to true additional information is stored in the index to support highlighting using the rep:excerpt pseudo property.

Parameters:
b - true to enable highlighting support.

setIndexFormatVersion

public void setIndexFormatVersion(IndexFormatVersion indexFormatVersion)
Sets the index format version

Parameters:
indexFormatVersion - the index format version

setIndexingConfiguration

public void setIndexingConfiguration(IndexingConfiguration config)
Sets the indexing configuration for this node indexer.

Parameters:
config - the indexing configuration.

getMaxExtractLength

public int getMaxExtractLength()
Returns the maximum number of characters to extract from binaries.

Returns:
maximum extraction length

setMaxExtractLength

public void setMaxExtractLength(int length)
Sets the maximum number of characters to extract from binaries.

Parameters:
length - maximum extraction length

createDoc

public org.apache.lucene.document.Document createDoc()
                                              throws RepositoryException
Creates a lucene Document.

Returns:
the lucene Document with the index layout.
Throws:
RepositoryException - if an error occurs while reading property values from the ItemStateProvider.

throwRepositoryException

protected void throwRepositoryException(Exception e)
                                 throws RepositoryException
Wraps the exception e into a RepositoryException and throws the created exception.

Parameters:
e - the base exception.
Throws:
RepositoryException

addMVPName

protected void addMVPName(org.apache.lucene.document.Document doc,
                          Name name)
Adds a FieldNames.MVP field to doc with the resolved name using the internal search index namespace mapping.

Parameters:
doc - the lucene document.
name - the name of the multi-value property.

addValue

protected void addValue(org.apache.lucene.document.Document doc,
                        InternalValue value,
                        Name name)
                 throws RepositoryException
Adds a value to the lucene Document.

Parameters:
doc - the document.
value - the internal jackrabbit value.
name - the name of the property.
Throws:
RepositoryException

addPropertyName

protected void addPropertyName(org.apache.lucene.document.Document doc,
                               Name name)
Adds the property name to the lucene _:PROPERTIES_SET field.

Parameters:
doc - the document.
name - the name of the property.

addBinaryValue

protected void addBinaryValue(org.apache.lucene.document.Document doc,
                              String fieldName,
                              InternalValue internalValue)
Adds the binary value to the document as the named field.

This implementation checks if this node is of type nt:resource and if that is the case, tries to extract text from the binary property using the #extractor.

Parameters:
doc - The document to which to add the field
fieldName - The name of the field to add
internalValue - The value for the field to add to the document.

getValue

protected InternalValue getValue(Name name)
                          throws ItemStateException
Utility method that extracts the first value of the named property of the current node. Returns null if the property does not exist or contains no values.

Parameters:
name - property name
Returns:
value of the named property, or null
Throws:
ItemStateException - if the property can not be accessed

addBooleanValue

protected void addBooleanValue(org.apache.lucene.document.Document doc,
                               String fieldName,
                               Object internalValue)
Adds the string representation of the boolean value to the document as the named field.

Parameters:
doc - The document to which to add the field
fieldName - The name of the field to add
internalValue - The value for the field to add to the document.

createFieldWithoutNorms

protected org.apache.lucene.document.Field createFieldWithoutNorms(String fieldName,
                                                                   String internalValue,
                                                                   int propertyType)
Creates a field of name fieldName with the value of internalValue. The created field is indexed without norms.

Parameters:
fieldName - The name of the field to add
internalValue - The value for the field to add to the document.
propertyType - the property type.

addCalendarValue

protected void addCalendarValue(org.apache.lucene.document.Document doc,
                                String fieldName,
                                Object internalValue)
Adds the calendar value to the document as the named field. The calendar value is converted to an indexable string value using the DateField class.

Parameters:
doc - The document to which to add the field
fieldName - The name of the field to add
internalValue - The value for the field to add to the document.

addDoubleValue

protected void addDoubleValue(org.apache.lucene.document.Document doc,
                              String fieldName,
                              Object internalValue)
Adds the double value to the document as the named field. The double value is converted to an indexable string value using the DoubleField class.

Parameters:
doc - The document to which to add the field
fieldName - The name of the field to add
internalValue - The value for the field to add to the document.

addLongValue

protected void addLongValue(org.apache.lucene.document.Document doc,
                            String fieldName,
                            Object internalValue)
Adds the long value to the document as the named field. The long value is converted to an indexable string value using the LongField class.

Parameters:
doc - The document to which to add the field
fieldName - The name of the field to add
internalValue - The value for the field to add to the document.

addDecimalValue

protected void addDecimalValue(org.apache.lucene.document.Document doc,
                               String fieldName,
                               Object internalValue)
Adds the long value to the document as the named field. The long value is converted to an indexable string value using the LongField class.

Parameters:
doc - The document to which to add the field
fieldName - The name of the field to add
internalValue - The value for the field to add to the document.

addReferenceValue

protected void addReferenceValue(org.apache.lucene.document.Document doc,
                                 String fieldName,
                                 Object internalValue,
                                 boolean weak)
Adds the reference value to the document as the named field. The value's string representation is added as the reference data. Additionally the reference data is stored in the index. As of Jackrabbit 2.0 this method also adds the reference UUID as a FieldNames.WEAK_REFS field to the index if it is a weak reference.

Parameters:
doc - The document to which to add the field
fieldName - The name of the field to add
internalValue - The value for the field to add to the document.
weak - Flag indicating whether it's a WEAKREFERENCE (true) or a REFERENCE (flase)

addPathValue

protected void addPathValue(org.apache.lucene.document.Document doc,
                            String fieldName,
                            Object internalValue)
Adds the path value to the document as the named field. The path value is converted to an indexable string value using the name space mappings with which this class has been created.

Parameters:
doc - The document to which to add the field
fieldName - The name of the field to add
internalValue - The value for the field to add to the document.

addURIValue

protected void addURIValue(org.apache.lucene.document.Document doc,
                           String fieldName,
                           Object internalValue)
Adds the uri value to the document as the named field.

Parameters:
doc - The document to which to add the field
fieldName - The name of the field to add
internalValue - The value for the field to add to the document.

addStringValue

protected void addStringValue(org.apache.lucene.document.Document doc,
                              String fieldName,
                              Object internalValue)
Deprecated. Use addStringValue(Document, String, Object, boolean) instead.

Adds the string value to the document both as the named field and for full text indexing.

Parameters:
doc - The document to which to add the field
fieldName - The name of the field to add
internalValue - The value for the field to add to the document.

addStringValue

protected void addStringValue(org.apache.lucene.document.Document doc,
                              String fieldName,
                              Object internalValue,
                              boolean tokenized)
Adds the string value to the document both as the named field and optionally for full text indexing if tokenized is true.

Parameters:
doc - The document to which to add the field
fieldName - The name of the field to add
internalValue - The value for the field to add to the document.
tokenized - If true the string is also tokenized and fulltext indexed.

addStringValue

protected void addStringValue(org.apache.lucene.document.Document doc,
                              String fieldName,
                              Object internalValue,
                              boolean tokenized,
                              boolean includeInNodeIndex,
                              float boost)
Deprecated. use addStringValue(Document, String, Object, boolean, boolean, float, boolean) instead.

Adds the string value to the document both as the named field and optionally for full text indexing if tokenized is true.

Parameters:
doc - The document to which to add the field
fieldName - The name of the field to add
internalValue - The value for the field to add to the document.
tokenized - If true the string is also tokenized and fulltext indexed.
includeInNodeIndex - If true the string is also tokenized and added to the node scope fulltext index.
boost - the boost value for this string field.

addStringValue

protected void addStringValue(org.apache.lucene.document.Document doc,
                              String fieldName,
                              Object internalValue,
                              boolean tokenized,
                              boolean includeInNodeIndex,
                              float boost,
                              boolean useInExcerpt)
Adds the string value to the document both as the named field and optionally for full text indexing if tokenized is true.

Parameters:
doc - The document to which to add the field
fieldName - The name of the field to add
internalValue - The value for the field to add to the document.
tokenized - If true the string is also tokenized and fulltext indexed.
includeInNodeIndex - If true the string is also tokenized and added to the node scope fulltext index.
boost - the boost value for this string field.
useInExcerpt - If true the string may show up in an excerpt.

addNameValue

protected void addNameValue(org.apache.lucene.document.Document doc,
                            String fieldName,
                            Object internalValue)
Adds the name value to the document as the named field. The name value is converted to an indexable string treating the internal value as a Name and mapping the name space using the name space mappings with which this class has been created.

Parameters:
doc - The document to which to add the field
fieldName - The name of the field to add
internalValue - The value for the field to add to the document.

createFulltextField

protected org.apache.lucene.document.Field createFulltextField(String value)
Deprecated. use createFulltextField(String, boolean, boolean) instead.

Creates a fulltext field for the string value.

Parameters:
value - the string value.
Returns:
a lucene field.

createFulltextField

protected org.apache.lucene.document.Field createFulltextField(String value,
                                                               boolean store,
                                                               boolean withOffsets)
Creates a fulltext field for the string value.

Parameters:
value - the string value.
store - if the value of the field should be stored.
withOffsets - if a term vector with offsets should be stored.
Returns:
a lucene field.

createFulltextField

protected org.apache.lucene.document.Fieldable createFulltextField(InternalValue value,
                                                                   org.apache.tika.metadata.Metadata metadata)
Creates a fulltext field for the reader value.

Parameters:
value - the binary value
metadata - document metatadata
Returns:
a lucene field.

isIndexed

protected boolean isIndexed(Name propertyName)
Returns true if the property with the given name should be indexed.

Parameters:
propertyName - name of a property.
Returns:
true if the property should be fulltext indexed; false otherwise.

isIncludedInNodeIndex

protected boolean isIncludedInNodeIndex(Name propertyName)
Returns true if the property with the given name should also be added to the node scope index.

Parameters:
propertyName - the name of a property.
Returns:
true if it should be added to the node scope index; false otherwise.

useInExcerpt

protected boolean useInExcerpt(Name propertyName)
Returns true if the content of the property with the given name should the used to create an excerpt.

Parameters:
propertyName - the name of a property.
Returns:
true if it should be used to create an excerpt; false otherwise.

getPropertyBoost

protected float getPropertyBoost(Name propertyName)
Returns the boost value for the given property name.

Parameters:
propertyName - the name of a property.
Returns:
the boost value for the given property name.

getNodeBoost

protected float getNodeBoost()
Returns:
the boost value for this node state.

addLength

protected void addLength(org.apache.lucene.document.Document doc,
                         String propertyName,
                         InternalValue value)
Adds a FieldNames.PROPERTY_LENGTHS field to document with a named length value.

Parameters:
doc - the lucene document.
propertyName - the property name.
value - the internal value.

addNodeName

protected void addNodeName(org.apache.lucene.document.Document doc,
                           String namespaceURI,
                           String localName)
                    throws NamespaceException
Depending on the index format version adds one or two fields to the document for the node name.

Parameters:
doc - the lucene document.
namespaceURI - the namespace URI of the node name.
localName - the local name of the node.
Throws:
NamespaceException

addParentChildRelation

protected void addParentChildRelation(org.apache.lucene.document.Document doc,
                                      NodeId parentId)
                               throws ItemStateException,
                                      RepositoryException
Adds a parent child relation to the given doc.

Parameters:
doc - the document.
parentId - the id of the parent node.
Throws:
ItemStateException - if the parent node cannot be read.
RepositoryException - if the parent node does not have a child node entry for the current node.


Copyright © 2004-2010 The Apache Software Foundation. All Rights Reserved.