Apache Solr Release Notes Introduction ------------ Solr is the popular, blazing fast open source enterprise search platform from the Apache Lucene project. Its major features include powerful full-text search, hit highlighting, faceted search, dynamic clustering, database integration, and rich document (e.g., Word, PDF) handling. Solr is highly scalable, providing distributed search and index replication, and it powers the search and navigation features of many of the world's largest internet sites. Solr is written in Java and runs as a standalone full-text search server within a servlet container such as Tomcat. Solr uses the Lucene Java search library at its core for full-text indexing and search, and has REST-like HTTP/XML and JSON APIs that make it easy to use from virtually any programming language. Solr's powerful external configuration allows it to be tailored to almost any type of application without Java coding, and it has an extensive plugin architecture when more advanced customization is required. See README.txt and http://lucene.apache.org/solr for more information on how to get started. ================== 3.6.1 ================== More information about this release, including any errata related to the release notes, upgrade instructions, or other changes may be found online at: https://wiki.apache.org/solr/Solr3.6.1 Bug Fixes: * LUCENE-3969: Throw IAE on bad arguments that could cause confusing errors in PatternTokenizer. CommonGrams populates PositionLengthAttribute correctly. (Uwe Schindler, Mike McCandless, Robert Muir) * SOLR-3361: ReplicationHandler "maxNumberOfBackups" doesn't work if backups are triggered on commit (James Dyer, Tomas Fernandez Lobbe) * SOLR-3375: Fix charset problems with HttpSolrServer (Roger Håkansson, yonik, siren) * SOLR-3436: Group count incorrect when not all shards are queried in the second pass. (Francois Perron, Martijn van Groningen) * SOLR-3454: Exception when using result grouping with main=true and using wt=javabin. (Ludovic Boutros, Martijn van Groningen) * SOLR-3489: Config file replication less error prone (Jochen Just via janhoy) * SOLR-3477: SOLR does not start up when no cores are defined (Tomás Fernández Löbbe via tommaso) ================== 3.6.0 ================== More information about this release, including any errata related to the release notes, upgrade instructions, or other changes may be found online at: https://wiki.apache.org/solr/Solr3.6 Upgrading from Solr 3.5 ---------------------- * SOLR-2983: As a consequence of moving the code which sets a MergePolicy from SolrIndexWriter to SolrIndexConfig, (custom) MergePolicies should now have an empty constructor; thus an IndexWriter should not be passed as constructor parameter but instead set using the setIndexWriter() method. * As doGet() methods in SimplePostTool was changed to static, the client applications of this class need to be recompiled. * In Solr version 3.5 and earlier, HTMLStripCharFilter had known bugs in the character offsets it provided, triggering e.g. exceptions in highlighting. HTMLStripCharFilter has been re-implemented, addressing this and other issues. See the entry for LUCENE-3690 in the Bug Fixes section below for a detailed list of changes. For people who depend on the behavior of HTMLStripCharFilter in Solr version 3.5 and earlier: the old implementation (bugs and all) is preserved as LegacyHTMLStripCharFilter. * As of Solr 3.6, the and sections of solrconfig.xml are deprecated and replaced with a new section. Read more in SOLR-1052 below. * SOLR-3040: The DIH's admin UI (dataimport.jsp) now requires DIH request handlers to start with a '/'. (dsmiley) * SOLR-3161: is now the default. An existing config will probably work as-is because handleSelect was explicitly enabled in default configs. HandleSelect makes /select work as well as enables the 'qt' parameter. Instead, consider explicitly configuring /select as is done in the example solrconfig.xml, and register your other search handlers with a leading '/' which is a recommended practice. (David Smiley, Erik Hatcher) * SOLR-3161: Don't use the 'qt' parameter with a leading '/'. It probably won't work in 4.0 and it's now limited in 3.6 to SearchHandler subclasses that aren't lazy-loaded. * SOLR-2724: Specifying and in schema.xml is now considered deprecated. Instead you are encouraged to specify these via the "df" and "q.op" parameters in your request handler definition. (David Smiley) * Bugs found and fixed in the SignatureUpdateProcessor that previously caused some documents to produce the same signature even when the configured fields contained distinct (non-String) values. Users of SignatureUpdateProcessor are strongly advised that they should re-index as document signatures may have now changed. (see SOLR-3200 & SOLR-3226 for details) New Features ---------------------- * SOLR-2020: Add Java client that uses Apache Http Components http client (4.x). (Chantal Ackermann, Ryan McKinley, Yonik Seeley, siren) * SOLR-2854: Now load URL content stream data (via stream.url) when called for during request handling, rather than loading URL content streams automatically regardless of use. (David Smiley and Ryan McKinley via ehatcher) * SOLR-2904: BinaryUpdateRequestHandler should be able to accept multiple update requests from a stream (shalin) * SOLR-1565: StreamingUpdateSolrServer supports RequestWriter API and therefore, javabin update format (shalin) * SOLR-2438 added MultiTermAwareComponent to the various classes to allow automatic lowercasing for multiterm queries (wildcards, regex, prefix, range, etc). You can now optionally specify a "multiterm" analyzer in our schema.xml, but Solr should "do the right thing" if you don't specify (Pete Sturge Erick Erickson, Mentoring from Seeley and Muir) * SOLR-2919: Added support for localized range queries when the analysis chain uses CollationKeyFilter or ICUCollationKeyFilter. (Michael Sokolov, rmuir) * SOLR-2982: Added BeiderMorseFilterFactory for Beider-Morse (BMPM) phonetic encoder. Upgrades commons-codec to version 1.6 (Brooke Schreier Ganz, rmuir) * SOLR-1843: A new "rootName" attribute is now available when configuring in solrconfig.xml. If this attribute is set, Solr will use it as the root name for all MBeans Solr exposes via JMX. The default root name is "solr" followed by the core name. (Constantijn Visinescu, hossman) * SOLR-2906: Added LFU cache options to Solr. (Shawn Heisey via Erick Erickson) * SOLR-3036: Ability to specify overwrite=false on the URL for XML updates. (Sami Siren via yonik) * SOLR-2603: Add the encoding function for alternate fields in highlighting. (Massimo Schiavon, koji) * SOLR-1729: Evaluation of NOW for date math is done only once per request for consistency, and is also propagated to shards in distributed search. Adding a parameter NOW= to the request will override the current time. (Peter Sturge, yonik, Simon Willnauer) * SOLR-1709: Distributed support for Date and Numeric Range Faceting (Peter Sturge, David Smiley, hossman, Simon Willnauer) * SOLR-3054, LUCENE-3671: Add TypeTokenFilterFactory that creates TypeTokenFilter that filters tokens based on their TypeAttribute. (Tommaso Teofili via Uwe Schindler) * LUCENE-3305, SOLR-3056: Added Kuromoji morphological analyzer for Japanese. See the 'text_ja' fieldtype in the example to get started. (Christian Moen, Masaru Hasegawa via Robert Muir) * SOLR-1860: StopFilterFactory, CommonGramsFilterFactory, and CommonGramsQueryFilterFactory can optionally read stopwords in Snowball format (specify format="snowball"). (Robert Muir) * SOLR-3105: ElisionFilterFactory optionally allows the parameter ignoreCase (default=false). (Robert Muir) * LUCENE-3714: Add WFSTLookupFactory, a suggester that uses a weighted FST for more fine-grained suggestions. (Mike McCandless, Dawid Weiss, Robert Muir) * SOLR-3143: Add SuggestQueryConverter, a QueryConverter intended for auto-suggesters. (Robert Muir) * SOLR-3033: ReplicationHandler's backup command now supports a 'maxNumberOfBackups' init param that can be used to delete all but the most recent N backups. (Torsten Krah, James Dyer) * SOLR-2202: Currency FieldType, whith support for currencies and exchange rates (Greg Fodor & Andrew Morrison via janhoy, rmuir, Uwe Schindler) * SOLR-3026: eDismax: Locking down which fields can be explicitly queried (user fields aka uf) (janhoy, hossmann, Tomás Fernández Löbbe) * SOLR-2826: URLClassify Update Processor (janhoy) * SOLR-2764: Create a NorwegianLightStemmer and NorwegianMinimalStemmer (janhoy) * SOLR-3221: Added the ability to directly configure aspects of the concurrency and thread-pooling used within distributed search in solr. This allows for finer grained controlled and can be tuned by end users to target their own specific requirements. This builds on the work of the HttpCommComponent and uses the same configuration block to configure the thread pool. The default configuration has the same behaviour as solr 3.5, favouring throughput over latency. More information can be found on the wiki (http://wiki.apache.org/solr/SolrConfigXml) (Greg Bowyer) * SOLR-2001: The query component will substitute an empty query that matches no documents if the query parser returns null. This also prevents an exception from being thrown by the default parser if "q" is missing. (yonik) SOLR-435: if q is "" then it's also acceptable. (dsmiley, hoss) Optimizations ---------------------- * SOLR-1931: Speedup for LukeRequestHandler and admin/schema browser. New parameter reportDocCount defaults to 'false'. Old behavior still possible by specifying this as 'true' (Erick Erickson) * SOLR-3012: Move System.getProperty("type") in postData() to main() and add type argument so that the client applications of SimplePostTool can set content type via method argument. (koji) * SOLR-2888: FSTSuggester refactoring: internal storage is now UTF-8, external sorting (on disk) prevents OOMs even with large data sets (the bottleneck is now FST construction), code cleanups and API cleanups. (Dawid Weiss, Robert Muir) Bug Fixes ---------------------- * SOLR-3187 SystemInfoHandler leaks filehandles (siren) * LUCENE-3820: Fixed invalid position indexes by reimplementing PatternReplaceCharFilter. This change also drops real support for boundary characters -- all input is prebuffered for pattern matching. (Dawid Weiss) * SOLR-3068: Fixed NPE in ThreadDumpHandler (siren) * SOLR-2912: Fixed File descriptor leak in ShowFileRequestHandler (Michael Ryan, shalin) * SOLR-2819: Improved speed of parsing hex entities in HTMLStripCharFilter (Bernhard Berger, hossman) * SOLR-2509: StringIndexOutOfBoundsException in the spellchecker collate when the term contains a hyphen. (Thomas Gambier caught the bug, Steffen Godskesen did the patch, via Erick Erickson) * SOLR-2955: Fixed IllegalStateException when querying with group.sort=score desc in sharded environment. (Steffen Elberg Godskesen, Martijn van Groningen) * SOLR-2956: Fixed inconsistencies in the flags (and flag key) reported by the LukeRequestHandler (hossman) * SOLR-1730: Made it clearer when a core failed to load as well as better logging when the QueryElevationComponent fails to properly initialize (gsingers) * SOLR-1520: QueryElevationComponent now supports non-string ids (gsingers) * SOLR-3024: Fixed JSONTestUtil.matchObj, in previous releases it was not respecting the 'delta' arg (David Smiley via hossman) * SOLR-2542: Fixed DIH Context variables which were broken for all scopes other then SCOPE_ENTITY (Linbin Chen & Frank Wesemann via hossman) * SOLR-3042: Fixed Maven Jetty plugin configuration. (David Smiley via Steve Rowe) * SOLR-2970: CSV ResponseWriter returns fields defined as stored=false in schema (janhoy) * LUCENE-3690, LUCENE-2208, SOLR-882, SOLR-42: Re-implemented HTMLStripCharFilter as a JFlex-generated scanner and moved it to lucene/contrib/analyzers/common/. See below for a list of bug fixes and other changes. To get the same behavior as HTMLStripCharFilter in Solr version 3.5 and earlier (including the bugs), use LegacyHTMLStripCharFilter, which is the previous implementation. Behavior changes from the previous version: - Known offset bugs are fixed. - The "Mark invalid" exceptions reported in SOLR-1283 are no longer triggered (the bug is still present in LegacyHTMLStripCharFilter). - The character entity "'" is now always properly decoded. - More cases of