---------------- Apache Tika 2.5.0 ---------------- ~~ Licensed to the Apache Software Foundation (ASF) under one or more ~~ contributor license agreements. See the NOTICE file distributed with ~~ this work for additional information regarding copyright ownership. ~~ The ASF licenses this file to You under the Apache License, Version 2.0 ~~ (the "License"); you may not use this file except in compliance with ~~ the License. You may obtain a copy of the License at ~~ ~~ http://www.apache.org/licenses/LICENSE-2.0 ~~ ~~ Unless required by applicable law or agreed to in writing, software ~~ distributed under the License is distributed on an "AS IS" BASIS, ~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. ~~ See the License for the specific language governing permissions and ~~ limitations under the License. Apache Tika 2.5.0 The most notable changes in Tika 2.5.0 over the previous release are: * Improved extraction of PDF subset info for PDF/UA, PDF/VT, and PDF/X. NOTE: we no longer append PDF/A information, e.g. 'version="A-1b"'to the 'dc:format'. Users must now get that information from the'pdfa:PDFVersion' key or from 'pdfaid:conformance' and 'pdfaid:part' ({{{http://issues.apache.org/jira/browse/TIKA-3844}TIKA-3844}}). * Avoid infinite loop in bookmark extraction from PDFs ({{{http://issues.apache.org/jira/browse/TIKA-3832}TIKA-3832}}). * Update to slf4j 2.0.1 ({{{http://issues.apache.org/jira/browse/TIKA-3842}TIKA-3842}}). * Added upsert option for the OpenSearch emitter ({{{http://issues.apache.org/jira/browse/TIKA-3855}TIKA-3855}}). * Extract PDF signature information at the document level into the metadata ({{{http://issues.apache.org/jira/browse/TIKA-3852}TIKA-3852}}). * Enable configuration of digests via AutoDetectParserConfig ({{{http://issues.apache.org/jira/browse/TIKA-3853}TIKA-3853}}). * Use commons-io byte array streams via PJ Fanning ({{{http://issues.apache.org/jira/browse/TIKA-3843}TIKA-3843}}). * Upgrade to PDFBox 2.0.27 ({{{http://issues.apache.org/jira/browse/TIKA-3866}TIKA-3866}}). * Upgrade to JempBox 1.8.17 ({{{http://issues.apache.org/jira/browse/TIKA-3856}TIKA-3856}}). * Add extraction of ODF version from ODF files ({{{http://issues.apache.org/jira/browse/TIKA-3840}TIKA-3840}}). * tika-parser-html-commons (BoilerPipeHandler) is no longer aa dependency of tika-parser-html-module. tika-app and tika-server-standard have added a dependency on tika-parser-html-commons. However, users who are managing custom dependencies and who want the BoilerPipeHandler will have to now include the tika-parser-html-commons dependency({{{http://issues.apache.org/jira/browse/TIKA-1484}TIKA-1484}}). * Add unrar as an optional parser ({{{http://issues.apache.org/jira/browse/TIKA-3800}TIKA-3800}}). * Refactor FuzzingCLI to use PipesParser ({{{http://issues.apache.org/jira/browse/TIKA-3799}TIKA-3799}}). * ServiceLoader's loadServiceProviders() now guaranteesunique classes ({{{http://issues.apache.org/jira/browse/TIKA-3797}TIKA-3797}}). * Fix bug that prevented setting of includeHeadersAndFooters for xls, xlsx, doc and docx via tika-config ({{{http://issues.apache.org/jira/browse/TIKA-3796}TIKA-3796}}). * Fix bug that prevented specification of rendered image type via http header in the PDFParser ({{{http://issues.apache.org/jira/browse/TIKA-3794}TIKA-3794}}). * Fix bug causing some Exif dates to be decoded wrongly on timezones different than UTC ({{{http://issues.apache.org/jira/browse/TIKA-3815}TIKA-3815}}). * Numerous dependency upgrades ({{{http://issues.apache.org/jira/browse/TIKA-3795}TIKA-3795}}). The following people have contributed to Tika 2.4.1 by submitting or commenting on the issues resolved in this release: * Aurélien Marocco * Ben Gilbert * Eduardas Kazakas * Eugen Caruntu * Giorgiana Ciobanu * Lakatos Gyula * Luís Filipe Nassif * Nicholas DiPiazza * PJ Fanning * Robin Schimpf * Tilman Hausherr * Tim Allison * Yurii See {{https://s.apache.org/j2sms}} for more details on these contributions.