---------------- Apache Tika 1.23 ---------------- ~~ Licensed to the Apache Software Foundation (ASF) under one or more ~~ contributor license agreements. See the NOTICE file distributed with ~~ this work for additional information regarding copyright ownership. ~~ The ASF licenses this file to You under the Apache License, Version 2.0 ~~ (the "License"); you may not use this file except in compliance with ~~ the License. You may obtain a copy of the License at ~~ ~~ http://www.apache.org/licenses/LICENSE-2.0 ~~ ~~ Unless required by applicable law or agreed to in writing, software ~~ distributed under the License is distributed on an "AS IS" BASIS, ~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. ~~ See the License for the specific language governing permissions and ~~ limitations under the License. Apache Tika 1.23 The most notable changes in Tika 1.23 over the previous release are: * NOTE: The PDFParser now relies on OCRDPI to render page images when users configure OCR on rendered page images. This will have the effect of increasing rendered image size ({{{http://issues.apache.org/jira/browse/TIKA-2624}TIKA-2624}}). * NOTE: tika-server no longer returns 415 for file types for which there is no parser. * NOTE: tika-server's /rmeta endpoint now returns 200 if there is a parse exception to align its behavior with tika-app in batch mode. The stacktrace is stored as a metadata value. * Fix bug in AUTO OCR strategy in the PDFParser ({{{http://issues.apache.org/jira/browse/TIKA-3002}TIKA-3002}}). * Fix incorrect height and width metadata extraction from JPEG images ({{{http://issues.apache.org/jira/browse/TIKA-2630}TIKA-2630}}). * Upgrade to POI 4.1.1 ({{{http://issues.apache.org/jira/browse/TIKA-2851}TIKA-2851}}). * Upgrade to PDFBox 2.0.17 ({{{http://issues.apache.org/jira/browse/TIKA-2951}TIKA-2951}}). * Ensure that the PDFParser respects custom configuration of Tesseractfrom tika-config.xml via Eric Pugh ({{{http://issues.apache.org/jira/browse/TIKA-2970}TIKA-2970}}). * Add parser for XLIFF v1.2 files ({{{http://issues.apache.org/jira/browse/TIKA-2975}TIKA-2975}}). * Add mime type detection support for WebAssembly ({{{http://issues.apache.org/jira/browse/TIKA-2894}TIKA-2894}}),HEIF / HEIC images ({{{http://issues.apache.org/jira/browse/TIKA-2942}TIKA-2942}}), Digilite FDF ({{{http://issues.apache.org/jira/browse/TIKA-2988}TIKA-2988}});and xml-root detection for XFDF ({{{http://issues.apache.org/jira/browse/TIKA-2990}TIKA-2990}}) and XDP ({{{http://issues.apache.org/jira/browse/TIKA-2989}TIKA-2989}}). * Add an XLZ Parser ({{{http://issues.apache.org/jira/browse/TIKA-2976}TIKA-2976}}). The following people have contributed to Tika 1.23 by submitting or commenting on the issues resolved in this release: * Christian Ribeaud * Chris Z * Dan Becker * Dave Meikle * David Eric Pugh * Ewan Mellor * Felix Sonntag * Feng Jiao Jiang * Fredrik Söderström * Kim Ju Young * Kyle DuPont * Luís Filipe Nassif * Luke Butters * Pascal Essiembre * Peng Cheng * Roman Ivanov * Sergey Beryozkin * Tilman Hausherr * Tim Allison * Yahav Amsalem See {{https://s.apache.org/asrx3}} for more details on these contributions.