To explain the algorithm, let's use the following sample text (to be highlighted) and user query:
Sample Text | Lucene is a search engine library. |
User Query | Lucene^2 OR "search library"~1 |
The user query is a BooleanQuery that consists of TermQuery("Lucene") with boost of 2 and PhraseQuery("search library") with slop of 1.
For your convenience, here is the offsets and positions info of the sample text.
+--------+-----------------------------------+ | | 1111111111222222222233333| | offset|01234567890123456789012345678901234| +--------+-----------------------------------+ |document|Lucene is a search engine library. | +--------*-----------------------------------+ |position|0 1 2 3 4 5 | +--------*-----------------------------------+
In Step 1, Fast Vector Highlighter generates {@link org.apache.lucene.search.vectorhighlight.FieldQuery.QueryPhraseMap} from the user query.
public class QueryPhraseMap { boolean terminal; int slop; // valid if terminal == true and phraseHighlight == true float boost; // valid if terminal == true Map<String, QueryPhraseMap> subMap; }
From the sample user query, the following
QueryPhraseMap +--------+-+ +-------+-+ |"Lucene"|o+->|boost=2|*| * : terminal +--------+-+ +-------+-+ +--------+-+ +---------+-+ +-------+------+-+ |"search"|o+->|"library"|o+->|boost=1|slop=1|*| +--------+-+ +---------+-+ +-------+------+-+
In Step 2, Fast Vector Highlighter generates {@link org.apache.lucene.search.vectorhighlight.FieldTermStack}. Fast Vector Highlighter uses {@link org.apache.lucene.index.TermFreqVector} data
(must be stored {@link org.apache.lucene.document.Field.TermVector#WITH_POSITIONS_OFFSETS})
to generate it.
FieldTermStack +------------------+ |"Lucene"(0,6,0) | +------------------+ |"search"(12,18,3) | +------------------+ |"library"(26,33,5)| +------------------+ where : "termText"(startOffset,endOffset,position)
In Step 3, Fast Vector Highlighter generates {@link org.apache.lucene.search.vectorhighlight.FieldPhraseList}
by reference to
FieldPhraseList +----------------+-----------------+---+ |"Lucene" |[(0,6)] |w=2| +----------------+-----------------+---+ |"search library"|[(12,18),(26,33)]|w=1| +----------------+-----------------+---+
The type of each entry is
In Step 4, Fast Vector Highlighter creates
FieldFragList +---------------------------------+ |"Lucene"[(0,6)] | |"search library"[(12,18),(26,33)]| |totalBoost=3 | +---------------------------------+
In Step 5, by using