---------------- Apache Tika 2.1.0 ---------------- ~~ Licensed to the Apache Software Foundation (ASF) under one or more ~~ contributor license agreements. See the NOTICE file distributed with ~~ this work for additional information regarding copyright ownership. ~~ The ASF licenses this file to You under the Apache License, Version 2.0 ~~ (the "License"); you may not use this file except in compliance with ~~ the License. You may obtain a copy of the License at ~~ ~~ http://www.apache.org/licenses/LICENSE-2.0 ~~ ~~ Unless required by applicable law or agreed to in writing, software ~~ distributed under the License is distributed on an "AS IS" BASIS, ~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. ~~ See the License for the specific language governing permissions and ~~ limitations under the License. Apache Tika 2.1.0 The most notable changes in Tika 2.1.0 over the previous release are: * Improved packaging for tika-parsers-extended. Use the tika-parser-scientific-package and tika-parser-sqlite3-package artifacts if you want fat jars with dependencies. ({{{http://issues.apache.org/jira/browse/TIKA-3510}TIKA-3510}}) * Tika app writes UTF-8 when an encoding is not specified; the legacy behavior was UTF-8 on Mac OS, but System default on other OSs ({{{http://issues.apache.org/jira/browse/TIKA-3515}TIKA-3515}}). * Change the default rendering strategy for PDFs from NO_TEXT to ALL ({{{http://issues.apache.org/jira/browse/TIKA-3520}TIKA-3520}}).Other changes: * Fixed bug that pointed to the wrong tessdata directory if the user specified a tesseract path but not also a tessdata path ({{{http://issues.apache.org/jira/browse/TIKA-3518}TIKA-3518}}). * Fixed bug in Icu4j's encoding detector where it would return non-standard names for charsets, e.g. IBM424_rtl is now returned as IBM424 ({{{http://issues.apache.org/jira/browse/TIKA-3516}TIKA-3516}}). * Add a simple UrlFetcher in tika-core as a basic alternative to tika-fetcher-http ({{{http://issues.apache.org/jira/browse/TIKA-3527}TIKA-3527}}). * Add tika-pipes support for Google Cloud Storage ({{{http://issues.apache.org/jira/browse/TIKA-3524}TIKA-3524}}). * Fix markup ordering errors in xhtml output for ODT files ({{{http://issues.apache.org/jira/browse/TIKA-2242}TIKA-2242}}). * Fix serialization of embedded docs in OpenSearch emitter and fix embedded documents not being indexed in some use-cases in the Solr emitter ({{{http://issues.apache.org/jira/browse/TIKA-3490}TIKA-3490}}). * Add pipesClientId system property to PipesServer so that each forked process can log to its own logger ({{{http://issues.apache.org/jira/browse/TIKA-3480}TIKA-3480}}). * Add DateNormalizingMetadataFilter let users ensure that all dates emitted to Solr/OpenSearch are in UTC. Users can configure which timezone they'd like to use in cases where the file format does not store a timezone ({{{http://issues.apache.org/jira/browse/TIKA-3496}TIKA-3496}}). The following people have contributed to Tika 2.1.0 by submitting or commenting on the issues resolved in this release: * Aashish Chaudhary * Abha * Albert L. * Alessandro De Angelis * Ann Burgess * Bin Hawking * Chaitra Rajappa * Chris A. Mattmann * Chris Bryant * Daniel Bonniot de Ruisselet * Dave Meikle * David Eric Pugh * frank * Graham Charters * jefferyyuan * Jukka Zitting * Julian Reschke * Kenneth William Krugler * Konstantin Gribov * Lewis John McGibbney * Luís Filipe Nassif * Łukasz Ozimek * Madhav Sharan * Markus Jelsma * Michael McCandless * Nick Burch * Paul Ramirez * Peter Kronenberg * RameshKalidindi * Ravi * Ray Gauss II * Reinhard Pötz * Roberto Benedetti * Rupert Westenthaler * Sam H * Sebastian Nagel * Sergey Beryozkin * Shubhangi Raut * Thomas Mortagne * Tilman Hausherr * Tim Allison * Tyler Bui-Palsulich * Uwe Schindler * Yaniv Kunda See {{https://s.apache.org/h8ik6}} for more details on these contributions.