Apache Solr Release Notes Introduction ------------ Solr is the popular, blazing fast open source enterprise search platform from the Apache Lucene project. Its major features include powerful full-text search, hit highlighting, faceted search, dynamic clustering, database integration, and rich document (e.g., Word, PDF) handling. Solr is highly scalable, providing distributed search and index replication, and it powers the search and navigation features of many of the world's largest internet sites. Solr is written in Java and runs as a standalone full-text search server within a servlet container such as Tomcat. Solr uses the Lucene Java search library at its core for full-text indexing and search, and has REST-like HTTP/XML and JSON APIs that make it easy to use from virtually any programming language. Solr's powerful external configuration allows it to be tailored to almost any type of application without Java coding, and it has an extensive plugin architecture when more advanced customization is required. See README.txt and http://lucene.apache.org/solr for more information on how to get started. ================== 3.3.0 ================== Upgrading from Solr 3.2.0 ---------------------- * SolrCore's CloseHook API has been changed in a backward-incompatible way. It has been changed from an interface to an abstract class. Any custom components which use the SolrCore.addCloseHook method will need to be modified accordingly. To migrate, put your old CloseHook#close impl into CloseHook#preClose. New Features ---------------------- * SOLR-2378: A new, automaton-based, implementation of suggest (autocomplete) component, offering an order of magnitude smaller memory consumption compared to ternary trees and jaspell and very fast lookups at runtime. (Dawid Weiss) * SOLR-2400: Field- and DocumentAnalysisRequestHandler now provide a position history for each token, so you can follow the token through all analysis stages. The output contains a separate int[] attribute containing all positions from previous Tokenizers/TokenFilters (called "positionHistory"). (Uwe Schindler) * SOLR-2524: (SOLR-236, SOLR-237, SOLR-1773, SOLR-1311) Grouping / Field collapsing using the Lucene grouping contrib. The search result can be grouped by field and query. (Martijn van Groningen, Emmanuel Keller, Shalin Shekhar Mangar, Koji Sekiguchi, Iván de Prado, Ryan McKinley, Marc Sturlese, Peter Karich, Bojan Smid, Charles Hornberger, Dieter Grad, Dmitry Lihachev, Doug Steigerwald, Karsten Sperling, Michael Gundlach, Oleg Gnatovskiy, Thomas Traeger, Harish Agarwal, yonik, Michael McCandless, Bill Bell) * SOLR-1331: Added a srcCore parameter to CoreAdminHandler's mergeindexes action to merge one or more cores' indexes to a target core (shalin) * SOLR-2610 -- Add an option to delete index through CoreAdmin UNLOAD action (shalin) Optimizations ---------------------- * SOLR-2567: Solr now defaults to TieredMergePolicy. See http://s.apache.org/merging for more information. (rmuir) Bug Fixes ---------------------- * SOLR-2519: Improve text_* fieldTypes in example schema.xml: improve cross-language defaults for text_general; break out separate English-specific fieldTypes (Jan Høydahl, hossman, Robert Muir, yonik, Mike McCandless) * SOLR-2462: Fix extremely high memory usage problems with spellcheck.collate. Separately, an additional spellcheck.maxCollationEvaluations (default=10000) parameter is added to avoid excessive CPU time in extreme cases (e.g. long queries with many misspelled words). (James Dyer via rmuir) Other Changes ---------------------- * SOLR-2620: Removed unnecessary log4j jar from clustering contrib (Dawid Weiss). * SOLR-2571: Add a commented out example of the spellchecker's thresholdTokenFrequency parameter to the example solrconfig.xml, and also add a unit test for this feature. (James Dyer via rmuir) * SOLR-2576: Deprecate SpellingResult.add(Token token, int docFreq), please use SpellingResult.addFrequency(Token token, int docFreq) instead. (James Dyer via rmuir) * SOLR-2574: Upgrade slf4j to v1.6.1 (shalin) * LUCENE-3204: The maven-ant-tasks jar is now included in the source tree; users of the generate-maven-artifacts target no longer have to manually place this jar in the Ant classpath. NOTE: when Ant looks for the maven-ant-tasks jar, it looks first in its pre-existing classpath, so any copies it finds will be used instead of the copy included in the Lucene/Solr source tree. For this reason, it is recommeded to remove any copies of the maven-ant-tasks jar in the Ant classpath, e.g. under ~/.ant/lib/ or under the Ant installation's lib/ directory. (Steve Rowe) * SOLR-2611: Fix typos in the example configuration (Eric Pugh via rmuir) ================== 3.2.0 ================== Versions of Major Components --------------------- Apache Tika 0.8 Carrot2 3.5.0 Upgrading from Solr 3.1 ---------------------- * The updateRequestProcessorChain for a RequestHandler is now defined with update.chain rather than update.processor. The latter still works, but has been deprecated. Detailed Change List ---------------------- New Features ---------------------- * SOLR-2496: Add ability to specify overwrite and commitWithin as request parameters (e.g. specified in the URL) when using the JSON update format, and added a simplified format for specifying multiple documents. Example: [{"id":"doc1"},{"id":"doc2"}] (yonik) * SOLR-2113: Add TermQParserPlugin, registered as "term". This is useful when generating filter queries from terms returned from field faceting or the terms component. Example: fq={!term f=weight}1.5 (hossman, yonik) * SOLR-1915: DebugComponent now supports using a NamedList to model Explanation objects in it's responses instead of Explanation.toString (hossman) Optimizations ---------------------- Bug Fixes ---------------------- * SOLR-2445: Change the default qt to blank in form.jsp, because there is no "standard" request handler unless you have it in your solrconfig.xml explicitly. (koji) * SOLR-2455: Prevent double submit of forms in admin interface. (Jeffrey Chang via uschindler) * SOLR-2464: Fix potential slowness in QueryValueSource (the query() function) when the query is very sparse and may not match any documents in a segment. (yonik) * SOLR-2469: When using java replication with replicateAfter=startup, the first commit point on server startup is never removed. (yonik) * SOLR-2466: SolrJ's CommonsHttpSolrServer would retry requests on failure, regardless of the configured maxRetries, due to HttpClient having it's own retry mechanism by default. The retryCount of HttpClient is now set to 0, and SolrJ does the retry. (yonik) * SOLR-2409: edismax parser - treat the text of a fielded query as a literal if the fieldname does not exist. For example Mission: Impossible should not search on the "Mission" field unless it's a valid field in the schema. (Ryan McKinley, yonik) * SOLR-2403: facet.sort=index reported incorrect results for distributed search in a number of scenarios when facet.mincount>0. This patch also adds some performance/algorithmic improvements when (facet.sort=count && facet.mincount=1 && facet.limit=-1) and when (facet.sort=index && facet.mincount>0) (yonik) * SOLR-2333: The "rename" core admin action does not persist the new name to solr.xml (Rasmus Hahn, Paul R. Brown via Mark Miller) * SOLR-2390: Performance of usePhraseHighlighter is terrible on very large Documents, regardless of hl.maxDocCharsToAnalyze. (Mark Miller) * SOLR-2474: The helper TokenStreams in analysis.jsp and AnalysisRequestHandlerBase did not clear all attributes so they displayed incorrect attribute values for tokens in later filter stages. (uschindler, rmuir, yonik) * SOLR-2467: Fix initialization so any errors are logged properly. (hossman) * SOLR-2493: SolrQueryParser was fixed to not parse the SolrConfig DOM tree on each instantiation which is a huge slowdown. (Stephane Bailliez via uschindler) * SOLR-2495: The JSON parser could hang on corrupted input and could fail to detect numbers that were too large to fit in a long. (yonik) * SOLR-2520: Make JSON response format escape \u2029 as well as \u2028 in strings since those characters are not valid in javascript strings (although they are valid in JSON strings). (yonik) * SOLR-2536: Add ReloadCacheRequestHandler to fix ExternalFileField bug (if reopenReaders set to true and no index segments have been changed, commit cannot trigger reload external file). (koji) * SOLR-2539: VectorValueSource.floatVal incorrectly used byteVal on sub-sources. (Tom Liu via yonik) * SOLR-2554: RandomSortField didn't work when used in a function query. (yonik) Other Changes ---------------------- * SOLR-2061: Pull base tests out into a new Solr Test Framework module, and publish binary, javadoc, and source test-framework jars. (Drew Farris, Robert Muir, Steve Rowe) * SOLR-2105: Rename RequestHandler param 'update.processor' to 'update.chain'. (Jan Høydahl via Mark Miller) * SOLR-2485: Deprecate BaseResponseWriter, GenericBinaryResponseWriter, and GenericTextResponseWriter. These classes will be removed in 4.0. (ryan) * SOLR-2451: Enhance assertJQ to allow individual tests to specify the tolerance delta used in numeric equalities. This allows for slight variance in asserting score comparisons in unit tests. (David Smiley, Chris Hostetter) * SOLR-2528: Remove default="true" from HtmlEncoder in example solrconfig.xml, because html encoding confuses non-ascii users. (koji) Build ---------------------- * LUCENE-3006: Building javadocs will fail on warnings by default. Override with -Dfailonjavadocwarning=false (sarowe, gsingers) Documentation ---------------------- ================== 3.1.0 ================== Versions of Major Components --------------------- Apache Lucene 3.1.0 Apache Tika 0.8 Carrot2 3.4.2 Velocity 1.6.1 and Velocity Tools 2.0-beta3 Apache UIMA 2.3.1-SNAPSHOT Upgrading from Solr 1.4 ---------------------- * The Lucene index format has changed and as a result, once you upgrade, previous versions of Solr will no longer be able to read your indices. In a master/slave configuration, all searchers/slaves should be upgraded before the master. If the master were to be updated first, the older searchers would not be able to read the new index format. * The Solr JavaBin format has changed as of Solr 3.1. If you are using the JavaBin format, you will need to upgrade your SolrJ client. (SOLR-2034) * The experimental ALIAS command has been removed (SOLR-1637) * Using solr.xml is recommended for single cores also (SOLR-1621) * Old syntax of configuration in solrconfig.xml is deprecated (SOLR-1696) * The deprecated HTMLStripReader, HTMLStripWhitespaceTokenizerFactory and HTMLStripStandardTokenizerFactory were removed. To strip HTML tags, HTMLStripCharFilter should be used instead, and it works with any Tokenizer of your choice. (SOLR-1657) * Field compression is no longer supported. Fields that were formerly compressed will be uncompressed as index segments are merged. For shorter fields, this may actually be an improvement, as the compression used was not very good for short text. Some indexes may get larger though. * SOLR-1845: The TermsComponent response format was changed so that the "terms" container is a map instead of a named list. This affects response formats like JSON, but not XML. (yonik) * SOLR-1876: All Analyzers and TokenStreams are now final to enforce the decorator pattern. (rmuir, uschindler) * LUCENE-2608: Added the ability to specify the accuracy on a per request basis. It is recommended that implementations of SolrSpellChecker should change over to the new SolrSpellChecker methods using the new SpellingOptions class, but are not required to. While this change is backward compatible, the trunk version of Solr has already dropped support for all but the SpellingOptions method. (gsingers) * readercycle script was removed. (SOLR-2046) * In previous releases, sorting or evaluating function queries on fields that were "multiValued" (either by explicit declaration in schema.xml or by implict behavior because the "version" attribute on the schema was less then 1.2) did not generally work, but it would sometimes silently act as if it succeeded and order the docs arbitrarily. Solr will now fail on any attempt to sort, or apply a function to, multi-valued fields * The DataImportHandler jars are no longer included in the solr WAR and should be added in Solr's lib directory, or referenced via the directive in solrconfig.xml. Detailed Change List ---------------------- New Features ---------------------- * SOLR-1302: Added several new distance based functions, including Great Circle (haversine), Manhattan, Euclidean and String (using the StringDistance methods in the Lucene spellchecker). Also added geohash(), deg() and rad() convenience functions. See http://wiki.apache.org/solr/FunctionQuery. (gsingers) * SOLR-1553: New dismax parser implementation (accessible as "edismax") that supports full lucene syntax, improved reserved char escaping, fielded queries, improved proximity boosting, and improved stopword handling. Note: status is experimental for now. (yonik) * SOLR-1574: Add many new functions from java Math (e.g. sin, cos) (yonik) * SOLR-1569: Allow functions to take in literal strings by modifying the FunctionQParser and adding LiteralValueSource (gsingers) * SOLR-1571: Added unicode collation support though Lucene's CollationKeyFilter (Robert Muir via shalin) * SOLR-785: Distributed Search support for SpellCheckComponent (Matthew Woytowitz, shalin) * SOLR-1625: Add regexp support for TermsComponent (Uri Boness via noble) * SOLR-1297: Add sort by Function capability (gsingers, yonik) * SOLR-1139: Add TermsComponent Query and Response Support in SolrJ (Matt Weber via shalin) * SOLR-1177: Distributed Search support for TermsComponent (Matt Weber via shalin) * SOLR-1621, SOLR-1722: Allow current single core deployments to be specified by solr.xml (Mark Miller , noble) * SOLR-1532: Allow StreamingUpdateSolrServer to use a provided HttpClient (Gabriele Renzi via shalin) * SOLR-1653: Add PatternReplaceCharFilter (koji) * SOLR-1131: FieldTypes can now output multiple Fields per Type and still be searched. This can be handy for hiding the details of a particular implementation such as in the spatial case. (Chris Mattmann, shalin, noble, gsingers, yonik) * SOLR-1586: Add support for Geohash and Spatial Tile FieldType (Chris Mattmann, gsingers) * SOLR-1697: PluginInfo should load plugins w/o class attribute also (noble) * SOLR-1268: Incorporate FastVectorHighlighter (koji) * SOLR-1750: SolrInfoMBeanHandler added for simpler programmatic access to info currently available from registry.jsp and stats.jsp (ehatcher, hossman) * SOLR-1815: SolrJ now preserves the order of facet queries. (yonik) * SOLR-1677: Add support for choosing the Lucene Version for Lucene components within Solr. (Uwe Schindler, Mark Miller) * SOLR-1379: Add RAMDirectoryFactory for non-persistent in memory index storage. (Alex Baranov via yonik) * SOLR-1857: Synced Solr analysis with Lucene 3.1. Added KeywordMarkerFilterFactory and StemmerOverrideFilterFactory, which can be used to tune stemming algorithms. Added factories for Bulgarian, Czech, Hindi, Turkish, and Wikipedia analysis. Improved the performance of SnowballPorterFilterFactory. (rmuir) * SOLR-1657: Converted remaining TokenStreams to the Attributes-based API. All Solr TokenFilters now support custom Attributes, and some have improved performance: especially WordDelimiterFilter and CommonGramsFilter. (rmuir, cmale, uschindler) * SOLR-1740: ShingleFilterFactory supports the "minShingleSize" and "tokenSeparator" parameters for controlling the minimum shingle size produced by the filter, and the separator string that it uses, respectively. (Steven Rowe via rmuir) * SOLR-744: ShingleFilterFactory supports the "outputUnigramsIfNoShingles" parameter, to output unigrams if the number of input tokens is fewer than minShingleSize, and no shingles can be generated. (Chris Harris via Steven Rowe) * SOLR-1923: PhoneticFilterFactory now has support for the Caverphone algorithm. (rmuir) * SOLR-1957: The VelocityResponseWriter contrib moved to core. Example search UI now available at http://localhost:8983/solr/browse (ehatcher) * SOLR-1974: Add LimitTokenCountFilterFactory. (koji) * SOLR-1966: QueryElevationComponent can now return just the included results in the elevation file (gsingers, yonik) * SOLR-1556: TermVectorComponent now supports per field overrides. Also, it now throws an error if passed in fields do not exist and warnings if fields that do not have term vector options (termVectors, offsets, positions) that align with the schema declaration. It also will now return warnings about (gsingers) * SOLR-1985: FastVectorHighlighter: add wrapper class for Lucene's SingleFragListBuilder (koji) * SOLR-1984: Add HyphenationCompoundWordTokenFilterFactory. (PB via rmuir) * SOLR-397: Date Faceting now supports a "facet.date.include" param for specifying when the upper & lower end points of computed date ranges should be included in the range. Legal values are: "all", "lower", "upper", "edge", and "outer". For backwards compatibility the default value is the set: [lower,upper,edge], so that al ranges between start and ed are inclusive of their endpoints, but the "before" and "after" ranges are not. * SOLR-945: JSON update handler that accepts add, delete, commit commands in JSON format. (Ryan McKinley, yonik) * SOLR-2015: Add a boolean attribute autoGeneratePhraseQueries to TextField. autoGeneratePhraseQueries="true" (the default) causes the query parser to generate phrase queries if multiple tokens are generated from a single non-quoted analysis string. For example WordDelimiterFilter splitting text:pdp-11 will cause the parser to generate text:"pdp 11" rather than (text:PDP OR text:11). Note that autoGeneratePhraseQueries="true" tends to not work well for non whitespace delimited languages. (yonik) * SOLR-1925: Add CSVResponseWriter (use wt=csv) that returns the list of documents in CSV format. (Chris Mattmann, yonik) * SOLR-1240: "Range Faceting" has been added. This is a generalization of the existing "Date Faceting" logic so that it now supports any all stock numeric field types that support range queries in addition to dates. facet.date is now deprecated in favor of this generalized mechanism. (Gijs Kunze, hossman) * SOLR-2021: Add SolrEncoder plugin to Highlighter. (koji) * SOLR-2030: Make FastVectorHighlighter use of SolrEncoder. (koji) * SOLR-2053: Add support for custom comparators in Solr spellchecker, per LUCENE-2479 (gsingers) * SOLR-2049: Add hl.multiValuedSeparatorChar for FastVectorHighlighter, per LUCENE-2603. (koji) * SOLR-2059: Add "types" attribute to WordDelimiterFilterFactory, which allows you to customize how WordDelimiterFilter tokenizes text with a configuration file. (Peter Karich, rmuir) * SOLR-2099: Add ability to throttle rsync based replication using rsync option --bwlimit. (Brandon Evans via koji) * SOLR-1316: Create autosuggest component. (Ankul Garg, Jason Rutherglen, Shalin Shekhar Mangar, Grant Ingersoll, Robert Muir, ab) * SOLR-1568: Added "native" filtering support for PointType, GeohashField. Added LatLonType with filtering support too. See http://wiki.apache.org/solr/SpatialSearch and the example. Refactored some items in Lucene spatial. Removed SpatialTileField as the underlying CartesianTier is broken beyond repair and is going to be moved. (gsingers) * SOLR-2128: Full parameter substitution for function queries. Example: q=add($v1,$v2)&v1=mul(popularity,5)&v2=20.0 (yonik) * SOLR-2133: Function query parser can now parse multiple comma separated value sources. It also now fails if there is extra unexpected text after parsing the functions, instead of silently ignoring it. This allows expressions like q=dist(2,vector(1,2),$pt)&pt=3,4 (yonik) * SOLR-2157: Suggester should return alpha-sorted results when onlyMorePopular=false (ab) * SOLR-2010: Added ability to verify that spell checking collations have actual results in the index. (James Dyer via gsingers) * SOLR-2188: Added "maxTokenLength" argument to the factories for ClassicTokenizer, StandardTokenizer, and UAX29URLEmailTokenizer. (Steven Rowe) * SOLR-2129: Added a Solr module for dynamic metadata extraction/indexing with Apache UIMA. See contrib/uima/README.txt for more information. (Tommaso Teofili via rmuir) * SOLR-2325: Allow tagging and exlcusion of main query for faceting. (yonik) * SOLR-2263: Add ability for RawResponseWriter to stream binary files as well as text files. (Eric Pugh via yonik) * SOLR-860: Add debug output for MoreLikeThis. (koji) * SOLR-1057: Add PathHierarchyTokenizerFactory. (ryan, koji) Optimizations ---------------------- * SOLR-1679: Don't build up string messages in SolrCore.execute unless they are necessary for the current log level. (Fuad Efendi and hossman) * SOLR-1874: Optimize PatternReplaceFilter for better performance. (rmuir, uschindler) * SOLR-1968: speed up initial filter cache population for facet.method=enum and also big terms for multi-valued facet.method=fc. The resulting speedup for the first facet request is anywhere from 30% to 32x, depending on how many terms are in the field and how many documents match per term. (yonik) * SOLR-2089: Speed up UnInvertedField faceting (facet.method=fc for multi-valued fields) when facet.limit is both high, and a high enough percentage of the number of unique terms in the field. Extreme cases yield speedups over 3x. (yonik) * SOLR-2046: add common functions to scripts-util. (koji) Bug Fixes ---------------------- * SOLR-1769: Solr 1.4 Replication - Repeater throwing NullPointerException (Jörgen Rydenius via noble) * SOLR-1432: Make the new ValueSource.getValues(context,reader) delegate to the original ValueSource.getValues(reader) so custom sources will work. (yonik) * SOLR-1572: FastLRUCache correctly implemented the LRU policy only for the first 2B accesses. (yonik) * SOLR-1582: copyField was ignored for BinaryField types (gsingers) * SOLR-1563: Binary fields, including trie-based numeric fields, caused null pointer exceptions in the luke request handler. (yonik) * SOLR-1577: The example solrconfig.xml defaulted to a solr data dir relative to the current working directory, even if a different solr home was being used. The new behavior changes the default to a zero length string, which is treated the same as if no dataDir had been specified, hence the "data" directory under the solr home will be used. (yonik) * SOLR-1584: SolrJ - SolrQuery.setIncludeScore() incorrectly added fl=score to the parameter list instead of appending score to the existing field list. (yonik) * SOLR-1580: Solr Configuration ignores 'mergeFactor' parameter, always uses Lucene default. (Lance Norskog via Mark Miller) * SOLR-1593: ReverseWildcardFilter didn't work for surrogate pairs (i.e. code points outside of the BMP), resulting in incorrect matching. This change requires reindexing for any content with such characters. (Robert Muir, yonik) * SOLR-1596: A rollback operation followed by the shutdown of Solr or the close of a core resulted in a warning: "SEVERE: SolrIndexWriter was not closed prior to finalize()" although there were no other consequences. (yonik) * SOLR-1595: StreamingUpdateSolrServer used the platform default character set when streaming updates, rather than using UTF-8 as the HTTP headers indicated, leading to an encoding mismatch. (hossman, yonik) * SOLR-1587: A distributed search request with fl=score, didn't match the behavior of a non-distributed request since it only returned the id,score fields instead of all fields in addition to score. (yonik) * SOLR-1601: Schema browser does not indicate presence of charFilter. (koji) * SOLR-1615: Backslash escaping did not work in quoted strings for local param arguments. (Wojtek Piaseczny, yonik) * SOLR-1628: log contains incorrect number of adds and deletes. (Thijs Vonk via yonik) * SOLR-343: Date faceting now respects facet.mincount limiting (Uri Boness, Raiko Eckstein via hossman) * SOLR-1624: Highlighter only highlights values from the first field value in a multivalued field when term positions (term vectors) are stored. (Chris Harris via yonik) * SOLR-1635: Fixed error message when numeric values can't be parsed by DOMUtils - notably for plugin init params in solrconfig.xml. (hossman) * SOLR-1651: Fixed Incorrect dataimport handler package name in SolrResourceLoader (Akshay Ukey via shalin) * SOLR-1660: CapitalizationFilter crashes if you use the maxWordCountOption (Robert Muir via shalin) * SOLR-1667: PatternTokenizer does not reset attributes such as positionIncrementGap (Robert Muir via shalin) * SOLR-1711: SolrJ - StreamingUpdateSolrServer had a race condition that could halt the streaming of documents. The original patch to fix this (never officially released) introduced another hanging bug due to connections not being released. (Attila Babo, Erik Hetzner, Johannes Tuchscherer via yonik) * SOLR-1748, SOLR-1747, SOLR-1746, SOLR-1745, SOLR-1744: Streams and Readers retrieved from ContentStreams are not closed in various places, resulting in file descriptor leaks. (Christoff Brill, Mark Miller) * SOLR-1753: StatsComponent throws NPE when getting statistics for facets in distributed search (Janne Majaranta via koji) * SOLR-1736:In the slave , If 'mov'ing file does not succeed , copy the file (noble) * SOLR-1579: Fixes to XML escaping in stats.jsp (David Bowen and hossman) * SOLR-1777: fieldTypes with sortMissingLast=true or sortMissingFirst=true can result in incorrectly sorted results. (yonik) * SOLR-1798: Small memory leak (~100 bytes) in fastLRUCache for every commit. (yonik) * SOLR-1823: Fixed XMLResponseWriter (via XMLWriter) so it no longer throws a ClassCastException when a Map containing a non-String key is used. (Frank Wesemann, hossman) * SOLR-1797: fix ConcurrentModificationException and potential memory leaks in ResourceLoader. (yonik) * SOLR-1850: change KeepWordFilter so a new word set is not created for each instance (John Wang via yonik) * SOLR-1706: fixed WordDelimiterFilter for certain combinations of options where it would output incorrect tokens. (Robert Muir, Chris Male) * SOLR-1936: The JSON response format needed to escape unicode code point U+2028 - 'LINE SEPARATOR' (Robert Hofstra, yonik) * SOLR-1914: Change the JSON response format to output float/double values of NaN,Infinity,-Infinity as strings. (yonik) * SOLR-1948: PatternTokenizerFactory should use parent's args (koji) * SOLR-1870: Indexing documents using the 'javabin' format no longer fails with a ClassCastException whenSolrInputDocuments contain field values which are Collections or other classes that implement Iterable. (noble, hossman) * SOLR-1981: Solr will now fail correctly if solr.xml attempts to specify multiple cores that have the same name (hossman) * SOLR-1791: Fix messed up core names on admin gui (yonik via koji) * SOLR-1995: Change date format from "hour in am/pm" to "hour in day" in CoreContainer and SnapShooter. (Hayato Ito, koji) * SOLR-2008: avoid possible RejectedExecutionException w/autoCommit by making SolreCore close the UpdateHandler before closing the SearchExecutor. (NarasimhaRaju, hossman) * SOLR-2036: Avoid expensive fieldCache ram estimation for the admin stats page. (yonik) * SOLR-2047: ReplicationHandler should accept bool type for enable flag. (koji) * SOLR-1630: Fix spell checking collation issue related to token positions (rmuir, gsingers) * SOLR-2100: The replication handler backup command didn't save the commit point and hence could fail when a newer commit caused the older commit point to be removed before it was finished being copied. This did not affect normal master/slave replication. (Peter Sturge via yonik) * SOLR-2114: Fixed parsing error in hsin function. The function signature has changed slightly. (gsingers) * SOLR-2083: SpellCheckComponent misreports suggestions when distributed (James Dyer via gsingers) * SOLR-2111: Change exception handling in distributed faceting to work more like non-distributed faceting, change facet_counts/exception from a String to a List to enable listing all exceptions that happened, and prevent an exception in one facet command from affecting another facet command. (yonik) * SOLR-2110: Remove the restriction on names for local params substitution/dereferencing. Properly encode local params in distributed faceting. (yonik) * SOLR-2135: Fix behavior of ConcurrentLRUCache when asking for getLatestAccessedItems(0) or getOldestAccessedItems(0). (David Smiley via hossman) * SOLR-2148: Highlighter doesn't support q.alt. (koji) * SOLR-2180: It was possible for EmbeddedSolrServer to leave searchers open if a request threw an exception. (yonik) * SOLR-2173: Suggester should always rebuild Lookup data if Lookup.load fails. (ab) * SOLR-2081: BaseResponseWriter.isStreamingDocs causes SingleResponseWriter.end to be called 2x (Chris A. Mattmann via hossman) * SOLR-2219: The init() method of every SolrRequestHandler was being called twice. (ambikeshwar singh and hossman) * SOLR-2285: duplicate SolrEventListeners no longer created (hossman) * SOLR-1993: fix String cast assumption in JavaBinCodec - specific addresses "commitWithin" option on Update requests. (noble, hossman, and Maxim Valyanskiy) * SOLR-2261: fix velocity template layout.vm that referred to an older version of jquery. (Eric Pugh via rmuir) * SOLR-2307: fix bug in PHPSerializedResponseWriter (wt=phps) when dealing with SolrDocumentList objects -- ie: sharded queries. (Antonio Verni via hossman) * SOLR-2127: Fixed serialization of default core and indentation of solr.xml when serializing. (Ephraim Ofir, Mark Miller) * SOLR-2320: Fixed ReplicationHandler detail reporting for masters (hossman) * SOLR-482: Provide more exception handling in CSVLoader (gsingers) * SOLR-1283: HTMLStripCharFilter sometimes threw a "Mark Invalid" exception. (Julien Coloos, hossman, yonik) * SOLR-2085: Improve SolrJ behavior when FacetComponent comes before QueryComponent (Tomas Salfischberger via hossman) * SOLR-1940: Fix SolrDispatchFilter behavior when Content-Type is unknown (Lance Norskog and hossman) * SOLR-1983: snappuller fails when modifiedConfFiles is not empty and full copy of index is needed. (Alexander Kanarsky via yonik) * SOLR-2156: SnapPuller fails to clean Old Index Directories on Full Copy (Jayendra Patil via yonik) * SOLR-96: Fix XML parsing in XMLUpdateRequestHandler and DocumentAnalysisRequestHandler to respect charset from XML file and only use HTTP header's "Content-Type" as a "hint". (uschindler) * SOLR-2339: Fix sorting to explicitly generate an error if you attempt to sort on a multiValued field. (hossman) * SOLR-2348: Fix field types to explicitly generate an error if you attempt to get a ValueSource for a multiValued field. (hossman) * SOLR-2380: Distributed faceting could miss values when facet.sort=index and when facet.offset was greater than 0. (yonik) * SOLR-1656: XIncludes and other HREFs in XML files loaded by ResourceLoader are fixed to be resolved using the URI standard (RFC 2396). The system identifier is no longer a plain filename with path, it gets initialized using a custom URI scheme "solrres:". This scheme is resolved using a EntityResolver that utilizes ResourceLoader (org.apache.solr.common.util.SystemIdResolver). This makes all relative pathes in Solr's config files behave like expected. This change introduces some backwards breaks in the API: Some config classes (Config, SolrConfig, IndexSchema) were changed to take org.xml.sax.InputSource instead of InputStream. There may also be some backwards breaks in existing config files, it is recommended to check your config files / XSLTs and replace all XIncludes/HREFs that were hacked to use absolute paths to use relative ones. (uschindler) * SOLR-309: Fix FieldType so setting an analyzer on a FieldType that doesn't expect it will generate an error. Practically speaking this means that Solr will now correctly generate an error on initialization if the schema.xml contains an analyzer configuration for a fieldType that does not use TextField. (hossman) * SOLR-2192: StreamingUpdateSolrServer.blockUntilFinished was not thread safe and could throw an exception. (yonik) Other Changes ---------------------- * SOLR-1602: Refactor SOLR package structure to include o.a.solr.response and move QueryResponseWriters in there (Chris A. Mattmann, ryan, hoss) * SOLR-1516: Addition of an abstract BaseResponseWriter class to simplify the development of QueryResponseWriter implementations. (Chris A. Mattmann via noble) * SOLR-1592: Refactor XMLWriter startTag to allow arbitrary attributes to be written (Chris A. Mattmann via noble) * SOLR-1561: Added Lucene 2.9.1 spatial contrib jar to lib. (gsingers) * SOLR-1570: Log warnings if uniqueKey is multi-valued or not stored (hossman, shalin) * SOLR-1558: QueryElevationComponent only works if the uniqueKey field is implemented using StrField. In previous versions of Solr no warning or error would be generated if you attempted to use QueryElevationComponent, it would just fail in unexpected ways. This has been changed so that it will fail with a clear error message on initialization. (hossman) * SOLR-1611: Added Lucene 2.9.1 collation contrib jar to lib (shalin) * SOLR-1608: Extract base class from TestDistributedSearch to make it easy to write test cases for other distributed components. (shalin) * Upgraded to Lucene 2.9-dev r888785 (shalin) * SOLR-1610: Generify SolrCache (Jason Rutherglen via shalin) * SOLR-1637: Remove ALIAS command * SOLR-1662: Added Javadocs in BufferedTokenStream and fixed incorrect cloning in TestBufferedTokenStream (Robert Muir, Uwe Schindler via shalin) * SOLR-1674: Improve analysis tests and cut over to new TokenStream API. (Robert Muir via Mark Miller) * SOLR-1661: Remove adminCore from CoreContainer . removed deprecated methods setAdminCore(), getAdminCore() (noble) * SOLR-1704: Google collections moved from clustering to core (noble) * SOLR-1268: Add Lucene 2.9-dev r888785 FastVectorHighlighter contrib jar to lib. (koji) * SOLR-1538: Reordering of object allocations in ConcurrentLRUCache to eliminate (an extremely small) potential for deadlock. (gabriele renzi via hossman) * SOLR-1588: Removed some very old dead code. (Chris A. Mattmann via hossman) * SOLR-1696 : Deprecate old syntax and move configuration to HighlightComponent (noble) * SOLR-1727: SolrEventListener should extend NamedListInitializedPlugin (noble) * SOLR-1771: Improved error message when StringIndex cannot be initialized for a function query (hossman) * SOLR-1695: Improved error messages when adding a document that does not contain exactly one value for the uniqueKey field (hossman) * SOLR-1776: DismaxQParser and ExtendedDismaxQParser now use the schema.xml "defaultSearchField" as the default value for the "qf" param instead of failing with an error when "qf" is not specified. (hossman) * SOLR-1851: luceneAutoCommit no longer has any effect - it has been remove (Mark Miller) * SOLR-1865: SolrResourceLoader.getLines ignores Byte Order Markers (BOMs) at the beginning of input files, these are often created by editors such as Windows Notepad. (rmuir, hossman) * SOLR-1938: ElisionFilterFactory will use a default set of French contractions if you do not supply a custom articles file. (rmuir) * SOLR-2003: SolrResourceLoader will report any encoding errors, rather than silently using replacement characters for invalid inputs (blargy via rmuir) * SOLR-1804: Google collections updated to Google Guava (which is a superset of collections and contains bug fixes) (gsingers) * SOLR-2034: Switch to JavaBin codec version 2. Strings are now serialized as the number of UTF-8 bytes, followed by the bytes in UTF-8. Previously Strings were serialized as the number of UTF-16 chars, followed by the bytes in Modified UTF-8. (hossman, yonik, rmuir) * SOLR-2013: Add mapping-FoldToASCII.txt to example conf directory. (Steven Rowe via koji) * SOLR-2213: Upgrade to jQuery 1.4.3 (Erick Erickson via ryan) * SOLR-1826: Add unit tests for highlighting with termOffsets=true and overlapping tokens. (Stefan Oestreicher via rmuir) * SOLR-2340: Add version infos to message in JavaBinCodec when throwing exception. (koji) * SOLR-2350: Since Solr no longer requires XML files to be in UTF-8 (see SOLR-96) SimplePostTool (aka: post.jar) has been improved to work with files of any mime-type or charset. (hossman) * SOLR-2365: Move DIH jars out of solr.war (David Smiley via yonik) * SOLR-2381: Include a patched version of Jetty (6.1.26 + JETTY-1340) to fix problematic UTF-8 handling for supplementary characters. (Bernd Fehling, uschindler, yonik, rmuir) * SOLR-2391: The preferred Content-Type for XML was changed to application/xml. XMLResponseWriter now only delivers using this type; updating documents and analyzing documents is still supported using text/xml as Content-Type, too. If you have clients that are hardcoded on text/xml as Content-Type, you have to change them. (uschindler, rmuir) * SOLR-2414: All ResponseWriters now use only ServletOutputStreams and wrap their own Writer around it when serializing. This fixes the bug in PHPSerializedResponseWriter that produced wrong string length if the servlet container had a broken UTF-8 encoding that was in fact CESU-8 (see SOLR-1091). The system property to enable the CESU-8 byte counting in PHPSerializesResponseWriters for broken servlet containers was therefore removed and is now ignored if set. Output is always UTF-8. (uschindler, yonik, rmuir) Build ---------------------- * SOLR-1522: Automated release signing process. (gsingers) * SOLR-1891: Make lucene-jars-to-solr fail if copying any of the jars fails, and update clean to remove the jars in that directory (Mark Miller) * LUCENE-2466: Commons-Codec was upgraded from 1.3 to 1.4. (rmuir) * SOLR-2042: Fixed some Maven deps (Drew Farris via gsingers) * LUCENE-2657: Switch from using Maven POM templates to full POMs when generating Maven artifacts (Steven Rowe) Documentation ---------------------- * SOLR-1590: Javadoc for XMLWriter#startTag (Chris A. Mattmann via hossman) * SOLR-1792: Documented peculiar behavior of TestHarness.LocalRequestFactory (hossman) ================== Release 1.4.1 ================== Release Date: See http://lucene.apache.org/solr for the official release date. Upgrading from Solr 1.4 ----------------------- This is a bug fix release - no changes are required when upgrading from Solr 1.4. However, a reindex is needed for some of the analysis fixes to take effect. Versions of Major Components ---------------------------- Apache Lucene 2.9.3 Apache Tika 0.4 Carrot2 3.1.0 Lucene Information ---------------- Since Solr is built on top of Lucene, many people add customizations to Solr that are dependent on Lucene. Please see http://lucene.apache.org/java/2_9_3/, especially http://lucene.apache.org/java/2_9_3/changes/Changes.html for more information on the version of Lucene used in Solr. Bug Fixes ---------------------- * SOLR-1934: Upgrade to Apache Lucene 2.9.3 to obtain several bug fixes from the previous 2.9.1. See the Lucene 2.9.3 release notes for details. (hossman, Mark Miller) * SOLR-1432: Make the new ValueSource.getValues(context,reader) delegate to the original ValueSource.getValues(reader) so custom sources will work. (yonik) * SOLR-1572: FastLRUCache correctly implemented the LRU policy only for the first 2B accesses. (yonik) * SOLR-1595: StreamingUpdateSolrServer used the platform default character set when streaming updates, rather than using UTF-8 as the HTTP headers indicated, leading to an encoding mismatch. (hossman, yonik) * SOLR-1660: CapitalizationFilter crashes if you use the maxWordCountOption (Robert Muir via shalin) * SOLR-1662: Added Javadocs in BufferedTokenStream and fixed incorrect cloning in TestBufferedTokenStream (Robert Muir, Uwe Schindler via shalin) * SOLR-1711: SolrJ - StreamingUpdateSolrServer had a race condition that could halt the streaming of documents. The original patch to fix this (never officially released) introduced another hanging bug due to connections not being released. (Attila Babo, Erik Hetzner via yonik) * SOLR-1748, SOLR-1747, SOLR-1746, SOLR-1745, SOLR-1744: Streams and Readers retrieved from ContentStreams are not closed in various places, resulting in file descriptor leaks. (Christoff Brill, Mark Miller) * SOLR-1580: Solr Configuration ignores 'mergeFactor' parameter, always uses Lucene default. (Lance Norskog via Mark Miller) * SOLR-1777: fieldTypes with sortMissingLast=true or sortMissingFirst=true can result in incorrectly sorted results. (yonik) * SOLR-1797: fix ConcurrentModificationException and potential memory leaks in ResourceLoader. (yonik) * SOLR-1798: Small memory leak (~100 bytes) in fastLRUCache for every commit. (yonik) * SOLR-1522: Show proper message if