---------------- Apache Tika 2.3.0 ---------------- ~~ Licensed to the Apache Software Foundation (ASF) under one or more ~~ contributor license agreements. See the NOTICE file distributed with ~~ this work for additional information regarding copyright ownership. ~~ The ASF licenses this file to You under the Apache License, Version 2.0 ~~ (the "License"); you may not use this file except in compliance with ~~ the License. You may obtain a copy of the License at ~~ ~~ http://www.apache.org/licenses/LICENSE-2.0 ~~ ~~ Unless required by applicable law or agreed to in writing, software ~~ distributed under the License is distributed on an "AS IS" BASIS, ~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. ~~ See the License for the specific language governing permissions and ~~ limitations under the License. Apache Tika 2.3.0 The most notable changes in Tika 2.3.0 over the previous release are: * Upgrade to Apache POI 5.2.0. This is the first upgrade to POI 5.x and represents a major refactoring. Users will experience significantly more logging from the POI parsers ({{{http://issues.apache.org/jira/browse/TIKA-3164}TIKA-3164}}). * Upgrade to log4j2 2.17.1 ({{{http://issues.apache.org/jira/browse/TIKA-3638}TIKA-3638}}). * Improve consistency in reporting package-entry divs acrossall parsers for embedded files ({{{http://issues.apache.org/jira/browse/TIKA-3644}TIKA-3644}}). This leads to some more text (embedded file names) in files with many embedded attachments. * Improve configuration of maps as params for parsers in TikaConfig ({{{http://issues.apache.org/jira/browse/TIKA-3645}TIKA-3645}}). * Improve identification of iWorks 13 files and add parsing for thumbnails, some metadata and attachments ({{{http://issues.apache.org/jira/browse/TIKA-3634}TIKA-3634}}). Skip handling of .iwa files, which are not yet supported. * Limit the default in-memory processing (maxMainMemoryBytes) in the PDFParser to 512MB as in the 1.x branch ({{{http://issues.apache.org/jira/browse/TIKA-3642}TIKA-3642}}). * Added IDML Parser from 1.x series to 2.x series ({{{http://issues.apache.org/jira/browse/TIKA-3188}TIKA-3188}}). * Extract annotation types and subtypes for PDFs into metadata ({{{http://issues.apache.org/jira/browse/TIKA-3653}TIKA-3653}}). * Add metadata value for PDFs that contain 3D annotations ({{{http://issues.apache.org/jira/browse/TIKA-3653}TIKA-3653}}). * Add parser for Translation Memory eXchange (TMX) files ({{{http://issues.apache.org/jira/browse/TIKA-3660}TIKA-3660}}). * Add Bill of Materials (Maven BOM) for centralized module version management ({{{http://issues.apache.org/jira/browse/TIKA-3667}TIKA-3667}}). The following people have contributed to Tika 2.3.0 by submitting or commenting on the issues resolved in this release: * Bernhard Geisberger * Carina Antunes * Aman Mishra * Aravinth * Dave Meikle * Dmitrii Kriukov * Josh Burchard * Kaka Lee * Lewis John McGibbney * Sergen Bağ * Subhajit Das * Tim Allison See {{https://s.apache.org/syxl5}} for more details on these contributions.