---------------- Apache Tika 1.18 ---------------- ~~ Licensed to the Apache Software Foundation (ASF) under one or more ~~ contributor license agreements. See the NOTICE file distributed with ~~ this work for additional information regarding copyright ownership. ~~ The ASF licenses this file to You under the Apache License, Version 2.0 ~~ (the "License"); you may not use this file except in compliance with ~~ the License. You may obtain a copy of the License at ~~ ~~ http://www.apache.org/licenses/LICENSE-2.0 ~~ ~~ Unless required by applicable law or agreed to in writing, software ~~ distributed under the License is distributed on an "AS IS" BASIS, ~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. ~~ See the License for the specific language governing permissions and ~~ limitations under the License. Apache Tika 1.19 The most notable changes in Tika 1.19 over the previous release are: * Require Java 8 ({{{http://issues.apache.org/jira/browse/TIKA-2679}TIKA-2679}}). * Enable building with Java 11 ({{{http://issues.apache.org/jira/browse/TIKA-2668}TIKA-2668}}) * Add an option to make tika-server robust against infinite loops, OOMs, and memory leaks ({{{http://issues.apache.org/jira/browse/TIKA-2725}TIKA-2725}}). * Allow configuration of the Tesseract parser via the standard tika-config.xml options ({{{http://issues.apache.org/jira/browse/TIKA-2705}TIKA-2705}}). * Improve handling of empty cells across table-based formats ({{{http://issues.apache.org/jira/browse/TIKA-2479}TIKA-2479}}). * Add a Standards compliant HTML encoding detector via Gerard Bouchar ({{{http://issues.apache.org/jira/browse/TIKA-2673}TIKA-2673}}). * Improved XML parsing -- limited default entity expansions to 20. To raise this limit, add -Djdk.xml.entityExpansionLimit=XXX to your commandline. * Mime magic improvements for Olympus RAW ({{{http://issues.apache.org/jira/browse/TIKA-2658}TIKA-2658}}), interpreted server-side languages via HTTP ({{{http://issues.apache.org/jira/browse/TIKA-2648}TIKA-2648}}), MHTML ({{{http://issues.apache.org/jira/browse/TIKA-2723}TIKA-2723}}) * Add absolute timeout to ForkParser rather than testing for active ({{{http://issues.apache.org/jira/browse/TIKA-2656}TIKA-2656}}). * Make the RecursiveParserWrapper work with the ForkParser ({{{http://issues.apache.org/jira/browse/TIKA-2655}TIKA-2655}}). * Allow the ForkParser to specify a directory containing tika-app.jar for use by the ForkServer. This allows users to keep most of the parser dependencies out of their code; and it allows for an easy addition of optional jars for Parser dependencies, such as the xerial sqlite jar ({{{http://issues.apache.org/jira/browse/TIKA-2653}TIKA-2653}}). * Use a pool for SAXParsers and DOMBuilders rather than creatinga new parser/builder for every parse. For better performance, set XMLReaderUtils.setPoolSize() to the number of threads you're using with Tika ({{{http://issues.apache.org/jira/browse/TIKA-2645}TIKA-2645}}). * Add the RecursiveParserWrapperHandler to improve the RecursiveParserWrapperAPI slightly ({{{http://issues.apache.org/jira/browse/TIKA-2644}TIKA-2644}}). * Upgraded to Commons-Compress 1.18 ({{{http://issues.apache.org/jira/browse/TIKA-2707}TIKA-2707}}). * Upgraded to Apache POI 4.0.0 ({{{http://issues.apache.org/jira/browse/TIKA-2552}TIKA-2552}}). * Upgraded to Apache PDFBox 2.0.11 ({{{http://issues.apache.org/jira/browse/TIKA-2681}TIKA-2681}}). * Upgraded to deeplearning4j 1.0.0-beta2 ({{{http://issues.apache.org/jira/browse/TIKA-2672}TIKA-2672}}). * Upgraded jmatio to 1.4 ({{{http://issues.apache.org/jira/browse/TIKA-2667}TIKA-2667}}) * Upgraded Apache Lucene to 7.4.0 in tika-eval and tika-examples ({{{http://issues.apache.org/jira/browse/TIKA-2695}TIKA-2695}}). * Upgraded junrar to 1.0.1 ({{{http://issues.apache.org/jira/browse/TIKA-2664}TIKA-2664}}). * Numerous other upgrades ({{{http://issues.apache.org/jira/browse/TIKA-2692}TIKA-2692}}). * Excluded Spring as a transitive dependency (TIKA-2721). The following people have contributed to Tika 1.19 by submitting or commenting on the issues resolved in this release: * Abhijit Rajwade * Adam Rauch * Andreas Meier * Annie Didier * Celpan Valeria * Chris A. Mattmann * Gerard Bouchar * Hans Brende * Karanjeet Singh * Karl Wright * Ken Krugler * Konstantin Gribov * Lewis John McGibbney * Sebastian Nagel * Slava G * Thorsten Schäfer * Tim Allison * Vincent van Donselaar * Yuriy Koval * Yury Kats See {{https://s.apache.org/dG8B}} for more details on these contributions.