Lucene contrib change Log

For more information on past and future Lucene versions, please see:
http://s.apache.org/luceneversions

======================= Lucene 3.6.0 ================

Changes in backwards compatibility policy

 * LUCENE-3626: The internal implementation classes in PKIndexSplitter
   and MultiPassIndexSplitter were made private as they now work
   per segment.  (Uwe Schindler)
   
 * LUCENE-3807: Cleaned up Suggest / Lookup API. Term weights (freqs) are now
   64bit signed integers instead of 32bit floats. Sorting of terms is now a 
   disk based merge sort instead of an in-memory sort. The Lookup API now 
   accepts and returns CharSequence instead of String which should be converted
   into a String before used in a datastructure that relies on hashCode / equals.
   (Simon Willnauer)
  
Changes in Runtime Behavior

 * LUCENE-3698: FastVectorHighlighter no longer adds a multi value separator
   to the end of the highlighted text. (Shay Banon via Koji Sekiguchi)
   
 * LUCENE-3867, LUCENE-3886: Use RAMUsageEstimator for memory estimations
   in MemoryIndex. Because of more precise calculations, results may differ.
   (Uwe Schindler)

New Features

 * LUCENE-3596: DirectoryTaxonomyWriter extensions can override createIndexWriterConfig() 
   and modify how its internal index writer is opened. (Doron Cohen)

 * SOLR-2982: Added phonetic encoders to contrib/analyzers/phonetic:
   Metaphone, Soundex, Caverphone, Beider-Morse, etc.  (Robert Muir)

 * LUCENE-2906: Added CJKBigramFilter that forms bigrams from StandardTokenizer or
   ICUTokenizer CJK tokens, and CJKWidthFilter that normalizes halfwidth/fullwidth. 
   This filter supports unicode supplementary characters and you can toggle whether 
   bigrams are formed for each of Han/Hiragana/Katakana/Hangul independently. Deprecates
   CJKTokenizer.  (Tom Burton-West, Robert Muir)

 * LUCENE-3634: IndexReader's static main method was moved to a new
   tool, CompoundFileExtractor, in contrib/misc.  (Mike McCandless)

 * SOLR-3020: Add KeywordAttribute support to HunspellStemFilter. Terms marked as
   keywords are not modified by the stemmer. (Simon Willnauer, Helge Jenssen) 

 * LUCENE-3305: Added Kuromoji morphological analyzer for Japanese.
   (Christian Moen, Masaru Hasegawa, Simon Willnauer, Uwe Schindler, Mike McCandless, Robert Muir)

 * LUCENE-3730: Refine Kuromoji search mode (Mode.SEARCH) decompounding
   heuristics.  (Christian Moen via Robert Muir)

 * LUCENE-3767: Kuromoji tokenizer/analyzer produces both compound words 
   and the segmentation of that compound in Mode.SEARCH. (Robert Muir, Mike McCandless via Christian Moen)

 * LUCENE-3901: Added katakana stem filter to normalize common spelling variants
   with/without trailing long vowel marks. The filter is used in both KuromojiAnalyzer
   and the "text_ja" field type in schema.xml. (Christian Moen)

 * LUCENE-3915: Add Japanese filter to replace a term attribute with its reading.
   (Koji Sekiguchi, Robert Muir, Christian Moen)

 * LUCENE-3685: Add ToChildBlockJoinQuery and renamed previous
   BlockJoinQuery to ToParentBlockJoinQuery, so that you can now do
   joins in both parent to child and child to parent directions.
   (Mike McCandless)
  
 * LUCENE-1812: Added static index pruning contrib module.
   (Andrzej Bialecki, Doron Cohen)

 * LUCENE-3602: Added query time joining under the join contrib. (Martijn van Groningen, Michael McCandless)
  
 * LUCENE-3714: Add WFSTCompletionLookup suggester that supports more fine-grained
   ranking for suggestions.  (Mike McCandless, Dawid Weiss, Robert Muir)

 * LUCENE-3883: Add Analyzer for Irish. (Jim Regan via Robert Muir)

API Changes

 * LUCENE-3596: DirectoryTaxonomyWriter.openIndexWriter() now takes an 
   openIndexWriter parameter rather than just an open-mode. (Doron Cohen) 
  
 * LUCENE-3606: FieldNormModifier was deprecated, because IndexReader's
   setNorm() was deprecated. Furthermore, this class is broken, as it does
   not take position overlaps into account while recalculating norms.
   (Uwe Schindler, Robert Muir)

Changes in runtime behavior

 * LUCENE-3626: PKIndexSplitter and MultiPassIndexSplitter now work
   per segment.  (Uwe Schindler)
   
 * SOLR-3105: When passed LUCENE_36 or greater as version, GermanAnalyzer,
   SpanishAnalyzer, FrenchAnalyzer, ItalianAnalyzer, and PortugueseAnalyzer
   use a lighter stemming approach, CatalanAnalyzer uses ElisionFilter 
   with a set of contractions, HindiAnalyzer uses StandardTokenizer, and
   ThaiAnalyzer uses thai stopwords. Add GermanNormalizationFilter which applies
   the Snowball German2 algorithm to ae/oe/ue and case-folds ß. Add 
   GalicianMinimalStemFilter for plural removal only. (Robert Muir)

 * LUCENE-3748: EnglishPossessiveFilter did not work with Unicode right 
   single quotation mark (U+2019).  (David Croley via Robert Muir)

Optimizations

* SOLR-2888: FSTSuggester refactoring: internal storage is now UTF-8, 
  external sorting (on disk) prevents OOMs even with large data sets
  (the bottleneck is now FST construction), code cleanups and API cleanups.
  You should use FSTCompletionLookup (the old FSTLookup impl is deprecated).
  (Dawid Weiss, Robert Muir)

Bug Fixes

 * LUCENE-3600: BlockJoinQuery now supports parent docs that have no
   children (such docs will never match, but BJQ was tripping an
   assert if such a parent doc was the first doc in the segment).
   (Shay Banon, Mike McCandless)
   
 * LUCENE-3609: Fix regression in BooleanFilter, introduced in Lucene 3.5,
   to correctly handle minShouldMatch behaviour of previous versions.
   (Shay Banon, Uwe Schindler)

 * LUCENE-3668: For a multi-token synonym mapping to a single token,
   SynonymFilter will now set the start offset of the synonym token to
   the start offset of the first matched token, and the end offset of
   the synonym token to the end offset of the last matched token.
   This way if the synonym token is used for highlighting, it will
   cover all tokens it had matched.  (Koji Sekiguchi, Robert Muir,
   Mike McCandless)

 * LUCENE-3742: When SynonymFilter has an output extending beyond the
   input tokens, it now sets the start and end offset to the same
   values for the last token (not 0, 0).  (Robert Muir, Mike
   McCandless)

 * LUCENE-3686: CategoryEnhancement must override Object.equals(Object).
   (Sivan Yogev via Shai Erera)
 
 * LUCENE-3697: SimpleBoundaryScanner does not work well when highlighting
   at the beginning of the text. (Shay Banon via Koji Sekiguchi)

 * LUCENE-3703: Calling DirectoryTaxonomyReader.refresh() could mess up 
   reference counting (e.g. if application called incRef/decRef). Also, 
   getRefCount() no longer checks if the taxonomy reader is already closed.
   (Doron Cohen, Shai Erera)
 
 * LUCENE-3719: FVH: slow performance on very large queries.
   (Igor Motov via Koji Sekiguchi)
   
 * LUCENE-3746: Spell checker's sort could fail on low JVM free-heap-memory
   even though max-memory settings allowed to allocate more.
   (Doron Cohen)

 * LUCENE-3765: As of Version.LUCENE_36, DutchAnalyzer's two ctors
   that take stopwords and stem exclusion tables also initialize
   the default stem overrides (e.g. kind/kinder, fiets).  (Robert Muir)

 * SOLR-3076: ToParent/ChildBlockJoinQuery was not handling
   deleted docs correctly (Mikhail Khludnev via Mike
   McCandless).

 * LUCENE-3794: DirectoryTaxonomyWriter could lose the INDEX_CREATE_TIME 
   property if multiple commits with userData were done. It now always records 
   the creation time in the taxonomy index commitData, and reads it from the 
   index in the constructor. (Shai Erera)
   
 * LUCENE-3831: avoid NPE if the SpanQuery has a null field (eg a
   SpanOrQuery with no clauses added).  (Alan Woodward via Mike
   McCandless).

 * LUCENE-3894: ICUTokenizer, NGramTokenizer and EdgeNGramTokenizer
   could stop early if the Reader only partially fills the provided
   buffer. (Mike McCandless) 
   
 * LUCENE-3937: Workaround a XERCES-J bug in benchmark module.
   (Uwe Schindler, Robert Muir, Mike McCandless)

 * LUCENE-3934: Residual IDF calculation in the pruning package is wrong
   (Andrzej Bialecki)
  
Documentation

 * LUCENE-3599: Javadocs for DistanceUtils.haversine() were incorrectly 
   stating the expected order of the arguments (David Smiley via hossman)

======================= Lucene 3.5.0 ================

Changes in backwards compatibility policy

 * LUCENE-3446: Removed BooleanFilter.finalResult() due to change to
   FixedBitSet.  (Uwe Schindler)

 * LUCENE-3508: Changed some method signatures in decompounding TokenFilters
   to make them no longer use the Token class.  (Uwe Schindler)
   
 * LUCENE-3557: The various SpellChecker.indexDictionary methods were removed,
   and consolidated to one:

   indexDictionary(Dictionary dict, IndexWriterConfig config, boolean optimize)
   
   Previously, there was no way to specify an IndexWriterConfig, and some
   of these methods would sneakily pass 'true' to optimize.  (Robert Muir)
   
 * LUCENE-3558: Moved NRTManager & NRTManagerReopenThread into lucene core 
   o.a.l.search. (Simon Willnauer)
   
 * LUCENE-2564: WordListLoader is now flaged as @lucene.internal. All methods in
   WordListLoader now return CharArraySet/Map and expect Reader instances for 
   efficiency. Utilities to open Readers from Files, InputStreams or Java 
   resources were added to IOUtils. (Simon Willnauer, Robert Muir)

 * LUCENE-3552: Renamed LuceneTaxonomyReader/Writer to DirectoryTR/TW. (Shai Erera)

 * LUCENE-3556: DirectoryTaxonomyWriter's indexWriter is now private and 
   openIndexWriter() now returns an IndexWriter. (Shai Erera)

New Features

 * LUCENE-1824: Add BoundaryScanner interface and its implementation classes,
   SimpleBoundaryScanner and BreakIteratorBoundaryScanner, so that FVH's FragmentsBuilder
   can find "natural" boundary to make snippets. (Robert Muir, Koji Sekiguchi)

 * LUCENE-1889: Add MultiTermQuery support for FVH. (Mike Sokolov via Koji Sekiguchi)

 * LUCENE-3458: Change BooleanFilter to have only a single clauses ArrayList
   (so toString() works in order). It now behaves more like BooleanQuery,
   implements Iterable<FilterClause>, and allows adding Filters without
   creating FilterClause.  (Uwe Schindler)

 * LUCENE-3414: Added HunspellStemFilter which uses a provided pure Java implementation of the 
   Hunspell algorithm. (Chris Male)

 * LUCENE-3445: Added SearcherManager, to manage sharing and reopening
   IndexSearchers across multiple search threads.  IndexReader's
   refCount is used to safely close the reader only once all threads are done
   using it.  (Michael McCandless)

 * LUCENE-3486: Add SearcherLifetimeManager, to manage retrieving the
   same searcher used in a previous search to ensure follow-on actions
   (next page, drill down, etc.) use the same searcher as before (Mike
   McCandless)

API Changes

 * LUCENE-3431: Deprecated QueryAutoStopWordAnalyzer.addStopWords* since they
   prevent reuse.  Stopwords are now to be computed when the Analyzer is instantiated.
   If new stopwords are needed, a new Analyzer instance should be created. (Chris Male)

 * LUCENE-3434: Deprecated ShingleAnalyzerWrapper.set* since they prevent reuse.  The
   Analyzer should be configured at instantiation.  Deprecated PerFieldAnalyzerWrapper.addAnalyzer
   since it also prevents reuse.  Analyzers per field should be configured at instantiation.
   (Chris Male)
   
 * LUCENE-3436: Add SuggestMode to the spellchecker, so you can specify the strategy
   for suggesting related terms.  (James Dyer via Robert Muir)

 * LUCENE-3513: Add SimpleFragListBuilder constructor with margin parameter.
   (Kelsey Francis via Koji Sekiguchi)
   
 * LUCENE-3579: DirectoryTaxonomyWriter throws AlreadyClosedException if it was
   closed, but any of its API methods are called. (Shai Erera)
   
 * LUCENE-3573: TaxonomyReader.refresh() signature was modified from void to 
   boolean, now returning an indication if any change was detected. It 
   throws a new InconsistentTaxonomyException if the taxonomy was recreated
   since TaxonomyReader was last opened or refreshed. (Doron Cohen)     

Bug Fixes

 * LUCENE-3417: DictionaryCompoundWordFilter did not properly add tokens from the
   end compound word. (Njal Karevoll via Robert Muir)

 * LUCENE-3019: Fix unexpected color tags for FastVectorHighlighter. (Koji Sekiguchi)

 * LUCENE-3446: Fix NPE in BooleanFilter when DocIdSet/DocIdSetIterator is null.
   Converted code to FixedBitSet and simplified.  (Uwe Schindler, Shuji Umino)
   
 * LUCENE-3484: Fix NPE in TaxonomyWriter: parents array creation was not thread safe.
   (Doron Cohen)
   
 * LUCENE-3485: Fix a bug in LuceneTaxonomyReader, where calling decRef() might
   close the inner IndexReader, leaving the taxonomy reader in limbo.
   (Gilad Barkai via Shai Erera)
   
 * LUCENE-3495: Fix BlockJoinQuery to properly implement getBoost()/setBoost().
   (Robert Muir)

 * LUCENE-3519: BlockJoinCollector always returned null when you tried
   to retrieve top groups for any BlockJoinQuery after the first (Mark
   Harwood, Mike McCandless)

 * LUCENE-3301: Added a workaround for buggy BreakIterator implementations in
   Java that crash on certain inputs containing supplementary characters.
   (Robert Muir)
   
 * LUCENE-3501: RandomSample was not random.
   Replaced with RandomSampler. For previous behavior use RepeatableSampler.
   (Gilad Barkai, Shai Erera, Doron Cohen)

 * LUCENE-3508: Decompounders based on CompoundWordTokenFilterBase can now be
   used with custom attributes. All those attributes are preserved and set on all
   added decompounded tokens.  (Spyros Kapnissis, Uwe Schindler)
   
 * LUCENE-3542: Group expanded query terms to preserve parent boolean operator
   in StandartQueryParser. (Simon Willnauer) 

 * LUCENE-3573: TaxonomyReader.refresh() was broken in case that the taxonomy was 
   recreated since the taxonomy reader was last refreshed or opened. TR.refresh()
   now detects this situation and throws an InconsistentTaxonomyException. 
   When obtaining such an exception the application should open a new taxonomy 
   reader. Old taxonomy reader should be closed, once not more used.  (Doron Cohen)

======================= Lucene 3.4.0 ================

New Features

 * LUCENE-3234: provide a limit on phrase analysis in FastVectorHighlighter for
   highlighting speed up. Use FastVectorHighlighter.setPhraseLimit() to set limit
   (e.g. 5000). (Mike Sokolov via Koji Sekiguchi)

 * LUCENE-3079: a new facet module which provides faceted indexing & search
   capabilities. It allows managing a taxonomy of categories, and index them
   with documents. It also provides search API for aggregating (e.g. count)
   the weights of the categories that are relevant to the search results. 
   (Shai Erera)
   
 * LUCENE-3171: Added BlockJoinQuery and BlockJoinCollector, under the
   new contrib/join module, to enable searches that require joining
   between parent and child documents.  Joined (children + parent)
   documents must be indexed as a document block, using
   IndexWriter.add/UpdateDocuments (Mark Harwood, Mike McCandless)

 * LUCENE-3233, LUCENE-3375: Added SynonymFilter for applying multi-word synonyms
   during indexing or querying (with parsers for wordnet and solr formats).
   Removed contrib/wordnet.  (Simon Rosenthal, Robert Muir, Mike McCandless)

 * LUCENE-1768: added support for numeric ranges in contrib query parser;
   added support for simple numeric queries, such as <age:4>, in contrib
   query parser (Vinicius Barros via Uwe Schindler)

Changes in runtime behavior

 * LUCENE-1768: StandardQueryConfigHandler now uses NumericFieldConfigListener
   to set a NumericConfig to its corresponding FieldConfig;
   StandardQueryTreeBuilder now uses DummyQueryNodeBuilder for
   NumericQueryNodes and uses NumericRangeQueryNodeBuilder for
   NumericRangeQueryNodes; StandardQueryNodeProcessorPipeline now executes
   NumericQueryNodeProcessor followed by NumericRangeQueryNodeProcessor
   right after LowercaseExpandedTermsQueryNodeProcessor
   (Vinicius Barros via Uwe Schindler)

API Changes

 * LUCENE-3296: PKIndexSplitter & MultiPassIndexSplitter now have version
   constructors. PKIndexSplitter accepts a IndexWriterConfig for each of 
   the target indexes. (Simon Willnauer, Jason Rutherglen)
   
 * LUCENE-2979: queryparser configuration API located under 
   org.apache.lucene.queryParser.core.config has been simplified and
   Attribute objects no longer should be used to configure query parsers. Now
   any configuration should be done through AbstractQueryConfig's set and get 
   methods. The old API, which uses Attributes objects, is still in place, however
   it has been deprecated and will be removed soon. 
   (Phillipe Ramalho via Adriano Crestani)

 * LUCENE-3400: Deprecated DutchAnalyzer.setStemDictionary since it prevents
   TokenStream reuse (Chris Male)

 * LUCENE-1768: setNumericConfigMap and getNumericConfigMap were added
   to StandardQueryParser; ParametricRangeQueryNode and
   oal.queryParser.standard.nodes.RangeQueryNode now implement
   oal.queryParser.core.nodes.RangeQueryNode;
   oal.queryParser.core.nodes.RangeQueryNode was deprecated and now extends
   TermRangeQueryNode, which extends AbstractRangeQueryNode;
   ParametricQueryNode was deprecated; FieldQueryNode now implements the
   new FieldValueQueryNode<CharSequence>, which this last one implements
   FieldableQueryNode and thew new ValueQueryNode
   (Vinicius Barros via Uwe Schindler)
   
 * LUCENE-3488: Factored out SearcherManager from NRTManager. NRTManager
   now manages SearcherManager instances instead of IndexSearcher directly.
   Acquiring a SearcherManager is non-blocking unless the caller explicitly
   requires to acquire a certain SearcherManager generation. (Simon Willnauer)

Optimizations

 * LUCENE-3306: Disabled indexing of positions for spellchecker n-gram
   fields: they are not needed because the spellchecker does not
   use positional queries.  (Robert Muir)
      
Bug Fixes

 * LUCENE-3326: Fixed bug if you used MoreLikeThis.like(Reader), it would
   try to re-analyze the same Reader multiple times, passing different
   field names to the analyzer. Additionally MoreLikeThisQuery would take
   your string and encode/decode it using the default charset, it now uses
   a StringReader.  Finally, MoreLikeThis's methods that take File, URL, InputStream,
   are deprecated, please create the Reader yourself. (Trejkaz, Robert Muir)
   
 * LUCENE-3347: XML query parser did not always incorporate boosts from
   UserQuery elements.  (Moogie, Uwe Schindler)
   
 * LUCENE-3382: Fixed a bug where NRTCachingDirectory's listAll() would wrongly
   throw NoSuchDirectoryException when all files written so far have been
   cached to RAM and the directory still has not yet been created on the
   filesystem.  (Robert Muir)

======================= Lucene 3.3.0 =======================

New Features

 * LUCENE-152: Add KStem (light stemmer for English).
   (Yonik Seeley via Robert Muir)

 * LUCENE-3135: Add suggesters (autocomplete) to contrib/spellchecker,
   with three implementations: Jaspell, Ternary Trie, and Finite State.
   (Andrzej Bialecki, Dawid Weiss, Mike Mccandless, Robert Muir)
 
 * LUCENE-3129: Added BlockGroupingCollector, a single pass
   grouping collector which is faster than the two-pass approach, and
   also computes the total group count, but requires that every
   document sharing the same group was indexed as a doc block
   (IndexWriter.add/updateDocuments).  (Mike McCandless)

 * LUCENE-2955: Added NRTManager and NRTManagerReopenThread, to
   simplify handling NRT reopen with multiple search threads, and to
   allow an app to control which indexing changes must be visible to
   which search requests.  (Mike McCandless)

 * LUCENE-3191: Added SearchGroup.merge and TopGroups.merge, to
   facilitate doing grouping in a distributed environment (Uwe
   Schindler, Mike McCandless)

 * LUCENE-2919: Added PKIndexSplitter, that splits an index according
   to a middle term in a specified field.  (Jason Rutherglen via Mike
   McCandless, Uwe Schindler)

API Changes

 * LUCENE-3141: add getter method to access fragInfos in FieldFragList.
   (Sujit Pal via Koji Sekiguchi)

 * LUCENE-3099: Allow subclasses to determine the group value for
   First/SecondPassGroupingCollector.  (Martijn van Groningen, Mike
   McCandless)

Bug Fixes

 * LUCENE-3185: Fix bug in NRTCachingDirectory.deleteFile that would
   always throw exception and sometimes fail to actually delete the
   file.  (Mike McCandless)

 * LUCENE-3188: contrib/misc IndexSplitter creates indexes with incorrect
   SegmentInfos.counter; added CheckIndex check & fix for this problem.
   (Ivan Dimitrov Vasilev via Steve Rowe)

Build

 * LUCENE-3149: Upgrade contrib/icu's ICU jar file to ICU 4.8. 
   (Robert Muir)

======================= Lucene 3.2.0 =======================

Changes in backwards compatibility policy

 * LUCENE-2981: Removed the following contribs: ant, db, lucli, swing. (Robert Muir)

Changes in runtime behavior

 * LUCENE-3086: ItalianAnalyzer now uses ElisionFilter with a set of Italian
   contractions by default.  (Robert Muir)

Bug Fixes

 * LUCENE-3045: fixed QueryNodeImpl.containsTag(String key) that was
   not lowercasing the key before checking for the tag (Adriano Crestani)

 * LUCENE-3026: SmartChineseAnalyzer's WordTokenFilter threw NullPointerException
   on sentences longer than 32,767 characters.  (wangzhenghang via Robert Muir)
   
 * LUCENE-2939: Highlighter should try and use maxDocCharsToAnalyze in 
   WeightedSpanTermExtractor when adding a new field to MemoryIndex as well as 
   when using CachingTokenStream. This can be a significant performance bug for
   large documents. (Mark Miller)

 * LUCENE-3043: GermanStemmer threw IndexOutOfBoundsException if it encountered
   a zero-length token.  (Robert Muir)
   
 * LUCENE-3044: ThaiWordFilter didn't reset its cached state correctly, this only
   caused a problem if you consumed a tokenstream, then reused it, added different
   attributes to it, and consumed it again.  (Robert Muir, Uwe Schindler)

 * LUCENE-3113: Fixed some minor analysis bugs: double-reset() in ReusableAnalyzerBase
   and ShingleAnalyzerWrapper, missing end() implementations in PrefixAwareTokenFilter
   and PrefixAndSuffixAwareTokenFilter, invocations of incrementToken() after it
   already returned false in CommonGramsQueryFilter, HyphenatedWordsFilter,
   ShingleFilter, and SynonymsFilter.  (Robert Muir, Steven Rowe, Uwe Schindler)

New Features

 * LUCENE-3016: Add analyzer for Latvian.  (Robert Muir)

 * LUCENE-1421: create new grouping contrib module, enabling search
   results to be grouped by a single-valued indexed field.  This
   module was factored out of Solr's grouping implementation, but
   it cannot group by function queries nor arbitrary queries.  (Mike
   McCandless)

 * LUCENE-3098: add AllGroupsCollector, to collect all unique groups
   (but in unspecified order).  (Martijn van Groningen via Mike
   McCandless)

 * LUCENE-3092: Added NRTCachingDirectory in contrib/misc, which
   caches small segments in RAM.  This is useful, in the near-real-time
   case where the indexing rate is lowish but the reopen rate is
   highish, to take load off the IO system.  (Mike McCandless)

Optimizations

 * LUCENE-3040: Switch all analysis consumers (highlighter, morelikethis, memory, ...)
   over to reusableTokenStream().  (Robert Muir)

======================= Lucene 3.1.0 =======================

Changes in backwards compatibility policy

 * LUCENE-2100: All Analyzers in Lucene-contrib have been marked as final.
   Analyzers should be only act as a composition of TokenStreams, users should
   compose their own analyzers instead of subclassing existing ones.
   (Simon Willnauer)

 * LUCENE-2194, LUCENE-2201: Snowball APIs were upgraded to snowball revision
   502 (with some local modifications for improved performance).
   Index backwards compatibility and binary backwards compatibility is
   preserved, but some protected/public member variables changed type. This
   does NOT affect java code/class files produced by the snowball compiler,
   but technically is a backwards compatibility break.  (Robert Muir)

 * LUCENE-2226: Moved contrib/snowball functionality into contrib/analyzers.
   Be sure to remove any old obselete lucene-snowball jar files from your
   classpath!  (Robert Muir)

 * LUCENE-2323: Moved contrib/wikipedia functionality into contrib/analyzers.
   Additionally the package was changed from org.apache.lucene.wikipedia.analysis
   to org.apache.lucene.analysis.wikipedia.  (Robert Muir)

 * LUCENE-2581: Added new methods to FragmentsBuilder interface. These methods
   are used to set pre/post tags and Encoder. (Koji Sekiguchi)

 * LUCENE-2391: Improved spellchecker (re)build time/ram usage by omitting
   frequencies/positions/norms for single-valued fields, modifying the default
   ramBufferMBSize to match IndexWriterConfig (16MB), making index optimization
   an optional boolean parameter, and modifying the incremental update logic
   to work well with unoptimized spellcheck indexes. The indexDictionary() methods
   were made final to ensure a hard backwards break in case you were subclassing
   Spellchecker. In general, subclassing Spellchecker is not recommended.  (Robert Muir)

Changes in runtime behavior

 * LUCENE-2117: SnowballAnalyzer uses TurkishLowerCaseFilter instead of
   LowercaseFilter to correctly handle the unique Turkish casing behavior if
   used with Version > 3.0 and the TurkishStemmer.
   (Robert Muir via Simon Willnauer)

 * LUCENE-2055: GermanAnalyzer now uses the Snowball German2 algorithm and
   stopwords list by default for Version > 3.0.
   (Robert Muir, Uwe Schindler, Simon Willnauer)

Bug fixes

 * LUCENE-2855: contrib queryparser was using CharSequence as key in some internal
   Map instances, which was leading to incorrect behavior, since some CharSequence
   implementors do not override hashcode and equals methods. Now the internal Maps
   are using String instead. (Adriano Crestani)

 * LUCENE-2068: Fixed ReverseStringFilter which was not aware of supplementary
   characters. During reverse the filter created unpaired surrogates, which
   will be replaced by U+FFFD by the indexer, but not at query time. The filter
   now reverses supplementary characters correctly if used with Version > 3.0.
   (Simon Willnauer, Robert Muir)

 * LUCENE-2035: TokenSources.getTokenStream() does not assign  positionIncrement.
   (Christopher Morris via Mark Miller)

 * LUCENE-2055: Deprecated RussianTokenizer, RussianStemmer, RussianStemFilter,
   FrenchStemmer, FrenchStemFilter, DutchStemmer, and DutchStemFilter. For
   these Analyzers, SnowballFilter is used instead (for Version > 3.0), as
   the previous code did not always implement the Snowball algorithm correctly.
   Additionally, for Version > 3.0, the Snowball stopword lists are used by
   default.  (Robert Muir, Uwe Schindler, Simon Willnauer)

 * LUCENE-2184: Fixed bug with handling best fit value when the proper best fit value is
   not an indexed field.  Note, this change affects the APIs. (Grant Ingersoll)

 * LUCENE-2359: Fix bug in CartesianPolyFilterBuilder related to handling of behavior around
   the 180th meridian (Grant Ingersoll)

 * LUCENE-2404: Fix bugs with position increment and empty tokens in ThaiWordFilter.
   For matchVersion >= 3.1 the filter also no longer lowercases. ThaiAnalyzer
   will use a separate LowerCaseFilter instead. (Uwe Schindler, Robert Muir)

 * LUCENE-2615: Fix DirectIOLinuxDirectory to not assign bogus
   permissions to newly created files, and to not silently hardwire
   buffer size to 1 MB.  (Mark Miller, Robert Muir, Mike McCandless)

 * LUCENE-2629: Fix gennorm2 task for generating ICUFoldingFilter's .nrm file. This allows
   you to customize its normalization/folding, by editing the source data files in src/data
   and regenerating a new .nrm with 'ant gennorm2'.  (David Bowen via Robert Muir)

 * LUCENE-2653: ThaiWordFilter depends on the JRE having a Thai dictionary, which is not
   always the case. If the dictionary is unavailable, the filter will now throw
   UnsupportedOperationException in the constructor.  (Robert Muir)

 * LUCENE-589: Fix contrib/demo for international documents.
   (Curtis d'Entremont via Robert Muir)

 * LUCENE-2246: Fix contrib/demo for Turkish html documents.
   (Selim Nadi via Robert Muir)

 * LUCENE-590: Demo HTML parser gives incorrect summaries when title is repeated as a heading
   (Curtis d'Entremont via Robert Muir)

 * LUCENE-591: The demo indexer now indexes meta keywords.
   (Curtis d'Entremont via Robert Muir)

 * LUCENE-2874: Highlighting overlapping tokens outputted doubled words.
   (Pierre Gossé via Robert Muir)

 * LUCENE-2943: Fix thread-safety issues with ICUCollationKeyFilter.
   (Robert Muir)

 * LUCENE-3087: Highlighter: fix case that was preventing highlighting
   of exact phrase when tokens overlap. (Pierre Gossé via Mike
   McCandless)

API Changes

 * LUCENE-2867: Some contrib queryparser methods that receives CharSequence as
   identifier, such as QueryNode#unsetTag(CharSequence), were deprecated and
   will be removed on version 4. (Adriano Crestani)

 * LUCENE-2147: Spatial GeoHashUtils now always decode GeoHash strings
   with full precision. GeoHash#decode_exactly(String) was merged into
   GeoHash#decode(String). (Chris Male, Simon Willnauer)

 * LUCENE-2204: Change some package private classes/members to publicly accessible to implement
   custom FragmentsBuilders. (Koji Sekiguchi)

 * LUCENE-2055: Integrate snowball into contrib/analyzers. SnowballAnalyzer is
   now deprecated in favor of language-specific analyzers which contain things
   such as stopword lists and any language-specific processing in addition to
   stemming. Add Turkish and Romanian stopwords lists to support this.
   (Robert Muir, Uwe Schindler, Simon Willnauer)

 * LUCENE-2603: Add setMultiValuedSeparator(char) method to set an arbitrary
   char that is used when concatenating multiValued data. Default is a space
   (' '). It is applied on ANALYZED field only. (Koji Sekiguchi)

 * LUCENE-2626: FastVectorHighlighter: enable FragListBuilder and FragmentsBuilder
   to be set per-field override. (Koji Sekiguchi)

 * LUCENE-2712: FieldBoostMapAttribute in contrib/queryparser was changed from
   a Map<CharSequence,Float> to a Map<String,Float>. Per the CharSequence javadoc,
   CharSequence is inappropriate as a map key. (Robert Muir)

 * LUCENE-1937: Add more methods to manipulate QueryNodeProcessorPipeline elements.
   QueryNodeProcessorPipeline now implements the List interface, this is useful
   if you want to extend or modify an existing pipeline. (Adriano Crestani via Robert Muir)

 * LUCENE-2754, LUCENE-2757: Deprecated SpanRegexQuery. Use
   new SpanMultiTermQueryWrapper<RegexQuery>(new RegexQuery()) instead.
   (Robert Muir, Uwe Schindler)

 * LUCENE-2747: Deprecated ArabicLetterTokenizer. StandardTokenizer now tokenizes
   most languages correctly including Arabic.  (Steven Rowe, Robert Muir)

 * LUCENE-2830: Use StringBuilder instead of StringBuffer across Benchmark, and
   remove the StringBuffer HtmlParser.parse() variant. (Shai Erera)

 * LUCENE-2920: Deprecated ShingleMatrixFilter as it is unmaintained and does
   not work with custom Attributes or custom payload encoders.  (Uwe Schindler)

New features

 * LUCENE-2500: Added DirectIOLinuxDirectory, a Linux-specific
   Directory impl that uses the O_DIRECT flag to bypass the buffer
   cache.  This is useful to prevent segment merging from evicting
   pages from the buffer cache, since fadvise/madvise do not seem.
   (Michael McCandless)

 * LUCENE-2306: Add NumericRangeFilter and NumericRangeQuery support to XMLQueryParser.
   (Jingkei Ly, via Mark Harwood)

 * LUCENE-2102: Add a Turkish LowerCase Filter. TurkishLowerCaseFilter handles
   Turkish and Azeri unique casing behavior correctly.
   (Ahmet Arslan, Robert Muir via Simon Willnauer)

 * LUCENE-2039: Add a extensible query parser to contrib/misc.
   ExtendableQueryParser enables arbitrary parser extensions based on a
   customizable field naming scheme.
   (Simon Willnauer)

 * LUCENE-2067: Add a Czech light stemmer. CzechAnalyzer will now stem words
   when Version is set to 3.1 or higher.  (Robert Muir)

 * LUCENE-2062: Add a Bulgarian analyzer.  (Robert Muir, Simon Willnauer)

 * LUCENE-2206: Add Snowball's stopword lists for Danish, Dutch, English,
   Finnish, French, German, Hungarian, Italian, Norwegian, Russian, Spanish,
   and Swedish. These can be loaded with WordListLoader.getSnowballWordSet.
   (Robert Muir, Simon Willnauer)

 * LUCENE-2243: Add DisjunctionMaxQuery support for FastVectorHighlighter.
   (Koji Sekiguchi)

 * LUCENE-2218: ShingleFilter supports minimum shingle size, and the separator
   character is now configurable. Its also up to 20% faster.
   (Steven Rowe via Robert Muir)

 * LUCENE-2234: Add a Hindi analyzer.  (Robert Muir)

 * LUCENE-2055: Add analyzers/misc/StemmerOverrideFilter. This filter provides
   the ability to override any stemmer with a custom dictionary map.
   (Robert Muir, Uwe Schindler, Simon Willnauer)

 * LUCENE-2399: Add ICUNormalizer2Filter, which normalizes tokens with ICU's
   Normalizer2. This allows for efficient combinations of normalization and custom
   mappings in addition to standard normalization, and normalization combined
   with unicode case folding.  (Robert Muir)

 * LUCENE-1343: Add ICUFoldingFilter, a replacement for ASCIIFoldingFilter that
   does a more thorough job of normalizing unicode text for search.
   (Robert Haschart, Robert Muir)

 * LUCENE-2409: Add ICUTransformFilter, which transforms text in a context
   sensitive way, either from ICU built-in rules (such as Traditional-Simplified),
   or from rules you write yourself.  (Robert Muir)

 * LUCENE-2414: Add ICUTokenizer, a tailorable tokenizer that implements Unicode
   Text Segmentation. This tokenizer is useful for documents or collections with
   multiple languages.  The default configuration includes special support for
   Thai, Lao, Myanmar, and Khmer.  (Robert Muir, Uwe Schindler)

 * LUCENE-2298: Add analyzers/stempel, an algorithmic stemmer with support for
   the Polish language.  (Andrzej Bialecki via Robert Muir)

 * LUCENE-2400: ShingleFilter was changed to don't output all-filler shingles and
   unigrams, and uses a more performant algorithm to build grams using a linked list
   of AttributeSource.cloneAttributes() instances and the new copyTo() method.
   (Steven Rowe via Uwe Schindler)

 * LUCENE-2437: Add an Analyzer for Indonesian.  (Robert Muir)

 * LUCENE-2393: The HighFreqTerms tool (in misc) can now optionally
   also include the total termFreq.  (Tom Burton-West via Mike McCandless)

 * LUCENE-2463: Add a Greek inflectional stemmer. GreekAnalyzer will now stem words
   when Version is set to 3.1 or higher.  (Robert Muir)

 * LUCENE-1287: Allow usage of HyphenationCompoundWordTokenFilter without dictionary.
   (Thomas Peuss via Robert Muir)

 * LUCENE-2464: FastVectorHighlighter: add SingleFragListBuilder to return
   entire field contents. (Koji Sekiguchi)

 * LUCENE-2503: Added lighter stemming alternatives for European languages.
   (Robert Muir)

 * LUCENE-2581: FastVectorHighlighter: add Encoder to FragmentsBuilder.
   (Koji Sekiguchi)

 * LUCENE-2624: Add Analyzers for Armenian, Basque, and Catalan, from snowball.
   (Robert Muir)

 * LUCENE-1938: PrecedenceQueryParser is now implemented with the flexible QP framework.
   This means that you can also add this functionality to your own QP pipeline by using
   BooleanModifiersQueryNodeProcessor, for example instead of GroupQueryNodeProcessor.
   (Adriano Crestani via Robert Muir)

 * LUCENE-2791: Added WindowsDirectory, a Windows-specific Directory impl
   that doesn't synchronize on the file handle. This can be useful to
   avoid the performance problems of SimpleFSDirectory and NIOFSDirectory.
   (Robert Muir, Simon Willnauer, Uwe Schindler, Michael McCandless)

 * LUCENE-2842: Add analyzer for Galician. Also adds the RSLP (Orengo) stemmer
   for Portuguese.  (Robert Muir)

 * SOLR-1057: Add PathHierarchyTokenizer that represents file path hierarchies as synonyms of
   /something, /something/something, /something/something/else. (Ryan McKinley, Koji Sekiguchi)

Build

 * LUCENE-2124: Moved the JDK-based collation support from contrib/collation
   into core, and moved the ICU-based collation support into contrib/icu.
   (Steven Rowe, Robert Muir)

 * LUCENE-2323: Moved contrib/regex into contrib/queries. Moved the
   queryparsers under contrib/misc and contrib/surround into contrib/queryparser.
   Moved contrib/fast-vector-highlighter into contrib/highlighter.
   Moved ChainedFilter from contrib/misc to contrib/queries. contrib/spatial now
   depends on contrib/queries instead of contrib/misc.  (Robert Muir)

 * LUCENE-2333: Fix failures during contrib builds, when classes in
   core were changed without ant clean. This fix also optimizes the
   dependency management between contribs by a new ANT macro.
   (Uwe Schindler, Shai Erera)

 * LUCENE-2797: Upgrade contrib/icu's ICU jar file to ICU 4.6
   (Robert Muir)

 * LUCENE-2833: Upgrade contrib/ant's jtidy jar file to r938 (Robert Muir)

 * LUCENE-2413: Moved the demo out of lucene core and into contrib/demo.
   (Robert Muir)

Optimizations

 * LUCENE-2157: DelimitedPayloadTokenFilter no longer copies the buffer
   over itsself. Instead it sets only the length. This patch also optimizes
   the logic of the filter and uses NIO for IdentityEncoder. (Uwe Schindler)

 * LUCENE-2084: Change IndexableBinaryStringTools to work on byte[] and char[]
   directly, instead of Byte/CharBuffers, and modify ICUCollationKeyFilter to
   take advantage of this for faster performance.
   (Steven Rowe, Uwe Schindler, Robert Muir)

 * LUCENE-2194, LUCENE-2201, LUCENE-2288: Snowball stemmers in contrib/analyzers
   have been optimized to work on char[] and remove unnecessary object creation.
   (Shai Erera, Robert Muir)

 * LUCENE-2404: Improve performance of ThaiWordFilter by using a char[]-backed
   CharacterIterator (currently from javax.swing).  (Uwe Schindler, Robert Muir)

Test Cases

 * LUCENE-2115: Cutover contrib tests to use Java5 generics.  (Kay Kay
   via Mike McCandless)

Other

 * LUCENE-1845: Updated bdb-je jar from version 3.3.69 to 3.3.93.
   (Simon Willnauer via Mike McCandless)

 * LUCENE-2415: Use reflection instead of a shim class to access Jakarta
   Regex prefix.  (Uwe Schindler)

================== Release 2.9.4 / 3.0.3 ====================

Bug Fixes

 * LUCENE-2277: QueryNodeImpl threw ConcurrentModificationException on 
   add(List<QueryNode>). (Frank Wesemann via Robert Muir)

 * LUCENE-2284: MatchAllDocsQueryNode toString() created an invalid XML tag.
   (Frank Wesemann via Robert Muir)

 * LUCENE-2278: FastVectorHighlighter: Highlighted term is out of alignment
   in multi-valued NOT_ANALYZED field. (Koji Sekiguchi)

 * LUCENE-2524: FastVectorHighlighter: use mod for getting colored tag.
   (Koji Sekiguchi)

 * LUCENE-2616: FastVectorHighlighter: out of alignment when the first value is
   empty in multiValued field (Koji Sekiguchi)
   
 * LUCENE-2731, LUCENE-2732: Fix (charset) problems in XML loading in
   HyphenationCompoundWordTokenFilter (partial bugfix-only in 2.9 and 3.0,
   full fix will be in later 3.1).
   (Uwe Schinder)

Documentation

 * LUCENE-2055: Add documentation noting that the Dutch and French stemmers
   in contrib/analyzers do not implement the Snowball algorithm correctly,
   and recommend to use the equivalents in contrib/snowball if possible. 
   (Robert Muir, Uwe Schindler, Simon Willnauer)

 * LUCENE-2653: Add documentation noting that ThaiWordFilter will not work
   as expected on all JRE's. For example, on an IBM JRE, it does nothing.
   (Robert Muir)

================== Release 2.9.3 / 3.0.2 ====================

No changes.

================== Release 2.9.2 / 3.0.1 ====================

New features

 * LUCENE-2108: Spellchecker now safely supports concurrent modifications to
   the spell-index. Threads can safely obtain term suggestions while the spell-
   index is rebuild, cleared or reset. Internal IndexSearcher instances remain
   open until the last thread accessing them releases the reference.
   (Simon Willnauer)

Bug Fixes

 * LUCENE-2144: Fix InstantiatedIndex to handle termDocs(null)
   correctly (enumerate all non-deleted docs).  (Karl Wettin via Mike
   McCandless)

 * LUCENE-2199: ShingleFilter skipped over tri-gram shingles if outputUnigram
   was set to false. (Simon Willnauer)
  
 * LUCENE-2211: Fix missing clearAttributes() calls:
   ShingleMatrix, PrefixAware, compounds, NGramTokenFilter,
   EdgeNGramTokenFilter, Highlighter, and MemoryIndex.
   (Uwe Schindler, Robert Muir)

 * LUCENE-2207, LUCENE-2219: Fix incorrect offset calculations in end() for 
   CJKTokenizer, ChineseTokenizer, SmartChinese SentenceTokenizer, 
   and WikipediaTokenizer.  (Koji Sekiguchi, Robert Muir)
   
 * LUCENE-2266: Fixed offset calculations in NGramTokenFilter and 
   EdgeNGramTokenFilter.  (Joe Calderon, Robert Muir via Uwe Schindler)
   
API Changes

 * LUCENE-2108: Add SpellChecker.close, to close the underlying
   reader.  (Eirik Bjørsnøs via Mike McCandless)

 * LUCENE-2165: Add a constructor to SnowballAnalyzer that takes a Set of 
   stopwords, and deprecate the String[] one.  (Nick Burch via Robert Muir)
   
======================= Release 3.0.0 =======================

Changes in backwards compatibility policy

 * LUCENE-1257: Change some occurences of StringBuffer in public/protected
   APIs of contrib/surround to StringBuilder.
   (Paul Elschot via Uwe Schindler)

Changes in runtime behavior

 * LUCENE-1966: Modified and cleaned the default Arabic stopwords list used
   by ArabicAnalyzer. You'll need to fully re-index any previously created 
   indexes.  (Basem Narmok via Robert Muir)

API Changes

 * LUCENE-1936: Deprecated RussianLowerCaseFilter, because it transforms
   text exactly the same as LowerCaseFilter. Please use LowerCaseFilter
   instead, which has the same functionality.  (Robert Muir)
   
 * LUCENE-2051: Contrib Analyzer setters were deprecated and replaced
   with ctor arguments / Version number.  Also stop word lists
   were unified.  (Simon Willnauer)

Bug fixes

 * LUCENE-1781: Fixed various issues with the lat/lng bounding box
   distance filter created for radius search in contrib/spatial.
   (Bill Bell via Mike McCandless)

 * LUCENE-1939: IndexOutOfBoundsException at ShingleMatrixFilter's
   Iterator#hasNext method on exhausted streams.
   (Patrick Jungermann via Karl Wettin)

 * LUCENE-1359: French analyzer did not support null field names.
   (Andrew Lynch via Robert Muir)
   
New features

 * LUCENE-1924: Added BalancedSegmentMergePolicy to contrib/misc,
   which is a merge policy that tries to avoid doing very large
   segment merges to give better search performance in a mixed
   indexing/searching environment.  (John Wang via Mike McCandless)

 * LUCENE-1959: Add index splitting tools. The IndexSplitter tool works
   on multi-segment (non optimized) indexes and it can copy specific
   segments out of the index into a new index.  It can also list the
   segments in the index, and delete specified segments.  (Jason Rutherglen via
   Mike McCandless). MultiPassIndexSplitter can split any index into
   any number of output parts, at the cost of doing multiple passes over
   the input index. (Andrzej Bialecki)

 * LUCENE-1993: Add maxDocFreq setting to MoreLikeThis, to exclude
   from consideration terms that match more than the specified number
   of documents.  (Christian Steinert via Mike McCandless)

Optimizations

 * LUCENE-1965, LUCENE-1962: Arabic-, Persian- and SmartChineseAnalyzer
   loads default stopwords only once if accessed for the first time.
   Previous versions were loading the stopword files each time a new
   instance was created. This might improve performance for applications
   creating lots of instances of these Analyzers. (Simon Willnauer) 

Documentation

 * LUCENE-1916: Translated documentation in the smartcn hhmm package.
   (Patricia Peng via Robert Muir)

Build

 * LUCENE-1904: Moved wordnet-based synonym support from contrib/memory
   into contrib/wordnet.  (Robert Muir)
   
 * LUCENE-2031: Moved PatternAnalyzer from contrib/memory into
   contrib/analyzers/common, under miscellaneous.  (Robert Muir)
   
======================= Release 2.9.1 =======================

Changes in backwards compatibility policy

 * LUCENE-2002: Add required Version matchVersion argument when
   constructing ComplexPhraseQueryParser and default (as of 2.9)
   enablePositionIncrements to true to match StandardAnalyzer's
   default.  Also added required matchVersion to most of the analyzers
   (Uwe Schindler, Mike McCandless)

Changes in runtime behavior

 * LUCENE-1963: ArabicAnalyzer now lowercases before checking the stopword
   list. This has no effect on Arabic text, but if you are using a custom
   stopword list that contains some non-Arabic words, you'll need to fully
   reindex.  (DM Smith via Robert Muir)

Bug fixes

 * LUCENE-1953: FastVectorHighlighter: small fragCharSize can cause
   StringIndexOutOfBoundsException. (Koji Sekiguchi)
   
 * LUCENE-1929: Highlighter throws exception on NumericRangeQuery and does not
   support deprecated RangeQuery.  (Mark Miller)
   
 * LUCENE-2001: Wordnet Syns2Index incorrectly parses synonyms that
   contain a single quote. (Parag H. Dave via Robert Muir)
   
 * LUCENE-2003: Highlighter doesn't respect position increments other than 1 with 
   PhraseQuerys. (Uwe Schindler, Mark Miller)

 * LUCENE-1954: InstantiatedIndexWriter: Fixed ClassCastException with
   NumericField because of incorrect unchecked cast: Document.getFields()
   returns List<Fieldable>.  (Bernd Fondermann via Uwe Schindler)
   
 * LUCENE-2014: SmartChineseAnalyzer did not properly clear attributes
   in WordTokenFilter. If enablePositionIncrements is set for StopFilter,
   then this could create invalid position increments, causing IndexWriter
   to crash.  (Robert Muir, Uwe Schindler)
   
 * LUCENE-2013: SpanRegexQuery does not work with QueryScorer.
   (Benjamin Keil via Mark Miller)

======================= Release 2.9.0 =======================

Changes in runtime behavior

 * LUCENE-1505: Local lucene now uses org.apache.lucene.util.NumericUtils for all
    number conversion.  You'll need to fully re-index any previously created indexes.
    This isn't a break in back-compatibility because local Lucene has not yet
    been released.  (Mike McCandless)
 
 * LUCENE-1758: ArabicAnalyzer now uses the light10 algorithm, has a refined
    default stopword list, and lowercases non-Arabic text.  
    You'll need to fully re-index any previously created indexes. This isn't a 
    break in back-compatibility because ArabicAnalyzer has not yet been 
    released.  (Robert Muir)


API Changes

 * LUCENE-1695: Update the Highlighter to use the new TokenStream API. This issue breaks backwards
    compatibility with some public classes. If you have implemented custom Fragmenters or Scorers, 
    you will need to adjust them to work with the new TokenStream API. Rather than getting passed a 
    Token at a time, you will be given a TokenStream to init your impl with - store the Attributes 
    you are interested in locally and access them on each call to the method that used to pass a new 
    Token. Look at the included updated impls for examples.  (Mark Miller)

 * LUCENE-1460: Change contrib TokenStreams/Filters to use the new
    TokenStream API. (Robert Muir, Michael Busch)

 * LUCENE-1775, LUCENE-1903: Change remaining TokenFilters (shingle, prefix-suffix)
    to use the new TokenStream API. ShingleFilter is much more efficient now,
    it clones much less often and computes the tokens mostly on the fly now.
    Also added more tests. (Robert Muir, Michael Busch, Uwe Schindler, Chris Harris)
    
 * LUCENE-1685: The position aware SpanScorer has become the default scorer
    for Highlighting. The SpanScorer implementation has replaced QueryScorer
    and the old term highlighting QueryScorer has been renamed to 
    QueryTermScorer. Multi-term queries are also now expanded by default. If
    you were previously rewriting the query for multi-term query highlighting,
    you should no longer do that (unless you switch to using QueryTermScorer).
    The SpanScorer API (now QueryScorer) has also been improved to more closely
    match the API of the previous QueryScorer implementation.  (Mark Miller)  

 * LUCENE-1793: Deprecate the custom encoding support in the Greek and Russian
    Analyzers. If you need to index text in these encodings, please use Java's
    character set conversion facilities (InputStreamReader, etc) during I/O, 
    so that Lucene can analyze this text as Unicode instead.  (Robert Muir)  

Bug fixes

 * LUCENE-1423: InstantiatedTermEnum#skipTo(Term) throws ArrayIndexOutOfBounds on empty index.
    (Karl Wettin) 

 * LUCENE-1462: InstantiatedIndexWriter did not reset pre analyzed TokenStreams the
    same way IndexWriter does. Parts of InstantiatedIndex was not Serializable.
    (Karl Wettin)

 * LUCENE-1510: InstantiatedIndexReader#norms methods throws NullPointerException on empty index.
    (Karl Wettin, Robert Newson)

 * LUCENE-1514: ShingleMatrixFilter#next(Token) easily throws a StackOverflowException
    due to recursive invocation. (Karl Wettin)

 * LUCENE-1548: Fix distance normalization in LevenshteinDistance to
    not produce negative distances (Thomas Morton via Mike McCandless)

 * LUCENE-1490: Fix latin1 conversion of HALFWIDTH_AND_FULLWIDTH_FORMS
    characters to only apply to the correct subset (Daniel Cheng via
    Mike McCandless)

 * LUCENE-1576: Fix BrazilianAnalyzer to downcase tokens after
    StandardTokenizer so that stop words with mixed case are filtered
    out.  (Rafael Cunha de Almeida, Douglas Campos via Mike McCandless)

 * LUCENE-1491: EdgeNGramTokenFilter no longer stops on tokens shorter than minimum n-gram size.
    (Todd Teak via Otis Gospodnetic)

 * LUCENE-1683: Fixed JavaUtilRegexCapabilities (an impl used by
    RegexQuery) to use Matcher.matches() not Matcher.lookingAt() so
    that the regexp must match the entire string, not just a prefix.
    (Trejkaz via Mike McCandless)

 * LUCENE-1792: Fix new query parser to set rewrite method for
    multi-term queries. (Luis Alves, Mike McCandless via Michael Busch)

 * LUCENE-1828: Fix memory index to call TokenStream.reset() and
    TokenStream.end(). (Tim Smith via Michael Busch)

 * LUCENE-1912: Fix fast-vector-highlighter issue when two or more
   terms are concatenated (Koji Sekiguchi via Mike McCandless)

New features

 * LUCENE-1531: Added support for BoostingTermQuery to XML query parser. (Karl Wettin)

 * LUCENE-1435: Added contrib/collation, a CollationKeyFilter
    allowing you to convert tokens into CollationKeys encoded using
    IndexableBinaryStringTools.  This allows for faster RangeQuery when
    a field needs to use a custom Collator.  (Steven Rowe via Mike
    McCandless)

 * LUCENE-1591: EnWikiDocMaker, LineDocMaker, WriteLineDoc can now
    read/write bz2 using Apache commons compress library.  This means
    you can download the .bz2 export from http://wikipedia.org and
    immediately index it.  (Shai Erera via Mike McCandless)

 * LUCENE-1629: Add SmartChineseAnalyzer to contrib/analyzers.  It
    improves on CJKAnalyzer and ChineseAnalyzer by handling Chinese
    sentences properly.  SmartChineseAnalyzer uses a Hidden Markov
    Model to tokenize Chinese words in a more intelligent way.
    (Xiaoping Gao via Mike McCandless)

 * LUCENE-1676: Added DelimitedPayloadTokenFilter class for automatically adding payloads "in-stream" (Grant Ingersoll)    
 
 * LUCENE-1578: Support for loading unoptimized readers to the
    constructor of InstantiatedIndex. (Karl Wettin)

 * LUCENE-1704: Allow specifying the Tidy configuration file when
    parsing HTML docs with contrib/ant.  (Keith Sprochi via Mike
    McCandless)

 * LUCENE-1522: Added contrib/fast-vector-highlighter, a new alternative
    highlighter.  (Koji Sekiguchi via Mike McCandless)

 * LUCENE-1740: Added "analyzer" command to Lucli, enabling changing
    the analyzer from the default StandardAnalyzer.  (Bernd Fondermann
    via Mike McCandless)

 * LUCENE-1272: Add get/setBoost to MoreLikeThis. (Jonathan
    Leibiusky via Mike McCandless)
 
 * LUCENE-1745: Added constructors to JakartaRegexpCapabilities and
    JavaUtilRegexCapabilities as well as static flags to support
    configuring a RegexCapabilities implementation with the
    implementation-specific modifier flags. Allows for callers to
    customize the RegexQuery using the implementation-specific options
    and fine tune how regular expressions are compiled and
    matched. (Marc Zampetti zampettim@aim.com via Mike McCandless)
 
 * LUCENE-1567: Added a new QueryParser framework, that allows 
    implementing a new query syntax in a flexible and efficient way.
    This new QueryParser will be moved to Lucene's core in release
    3.0 and will then replace the current core QueryParser, which
    has been deprecated with this patch.
    (Luis Alves and Adriano Campos via Michael Busch)
    
 * LUCENE-1486: Added ComplexPhraseQueryParser, an extension of QueryParser 
    that allows a subset of the Lucene query language to be embedded in
    PhraseQuerys. Wildcard, Range, and Fuzzy queries, as well as limited 
    boolean logic, can be used within quote operators with this parser, ie: 
    "(jo* -john) smyth~". (Mark Harwood via Mark Miller)
    
 * Added web-based demo of functionality in contrib's XML Query Parser
    packaged as War file (Mark Harwood)

 * LUCENE-1406: Added Arabic analyzer.  (Robert Muir via Grant Ingersoll)

 * LUCENE-1628: Added Persian analyzer.  (Robert Muir)

 * LUCENE-1813: Add option to ReverseStringFilter to mark reversed tokens.
    (Andrzej Bialecki via Robert Muir)

Optimizations

 * LUCENE-1643: Re-use the collation key (RawCollationKey) for
     better performance, in ICUCollationKeyFilter.  (Robert Muir via
     Mike McCandless)

 * LUCENE-1794: Implement TokenStream reuse for contrib Analyzers, 
     and implement reset() for TokenStreams to support reuse.  (Robert Muir)

Documentation

 * LUCENE-1876: added missing package level documentation for numerous
     contrib packages.
     (Steven Rowe & Robert Muir)

Build

 * LUCENE-1728: Split contrib/analyzers into common and smartcn modules. 
   Contrib/analyzers now builds an additional lucene-smartcn Jar file. All
   smartcn classes are not included in the lucene-analyzers JAR file.
   (Robert Muir via Simon Willnauer)
 
 * LUCENE-1829: Fix contrib query parser to properly create javacc files.
   (Jan-Pascal and Luis Alves via Michael Busch)      

Test Cases


======================= Release 2.4.0 =======================

Changes in runtime behavior

 (None)

API Changes

 1. 

 (None)

Bug fixes

 1. LUCENE-1312: Added full support for InstantiatedIndexReader#getFieldNames()
    and tests that assert that deleted documents behaves as they should (they did).
    (Jason Rutherglen, Karl Wettin)

 2. LUCENE-1318: InstantiatedIndexReader.norms(String, b[], int) didn't treat
    the array offset right. (Jason Rutherglen via Karl Wettin)

New features

 1. LUCENE-1320: ShingleMatrixFilter, multidimensional shingle token filter. (Karl Wettin)

 2. LUCENE-1142: Updated Snowball package, org.tartarus distribution revision 500.
    Introducing Hungarian, Turkish and Romanian support, updated older stemmers
    and optimized (reflectionless) SnowballFilter.
    IMPORTANT NOTICE ON BACKWARDS COMPATIBILITY: an index created using the 2.3.2 (or older)
    might not be compatible with these updated classes as some algorithms have changed.
    (Karl Wettin)

 3. LUCENE-1016: TermVectorAccessor, transparent vector space access via stored vectors
    or by resolving the inverted index. (Karl Wettin) 

Documentation

 (None)

Build

 (None)

Test Cases

 (None)