---------------- Apache Tika 1.11 ---------------- ~~ Licensed to the Apache Software Foundation (ASF) under one or more ~~ contributor license agreements. See the NOTICE file distributed with ~~ this work for additional information regarding copyright ownership. ~~ The ASF licenses this file to You under the Apache License, Version 2.0 ~~ (the "License"); you may not use this file except in compliance with ~~ the License. You may obtain a copy of the License at ~~ ~~ http://www.apache.org/licenses/LICENSE-2.0 ~~ ~~ Unless required by applicable law or agreed to in writing, software ~~ distributed under the License is distributed on an "AS IS" BASIS, ~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. ~~ See the License for the specific language governing permissions and ~~ limitations under the License. Apache Tika 1.11 The most notable changes in Tika 1.11 over the previous release are: * Java7 API support for allowing java.nio.file.Path as method arguments was added to Tika and to ParsingReader, TikaFileTypeDetector, and to Tika Config ({{{http://issues.apache.org/jira/browse/TIKA-1745}TIKA-1745}}, {{{http://issues.apache.org/jira/browse/TIKA-1746}TIKA-1746}}, {{{http://issues.apache.org/jira/browse/TIKA-1751}TIKA-1751}}). * MIME support was added for WebVTT: The Web Video Text Tracks Format files ({{{http://issues.apache.org/jira/browse/TIKA-1772}TIKA-1772}}). * MIME magic improved to ensure emails detected as message/rfc822 ({{{http://issues.apache.org/jira/browse/TIKA-1771}TIKA-1771}}). * Upgrade to Jackcess Encrypt 2.1.1 to avoid binary incompatibility with Bouncy Castle ({{{http://issues.apache.org/jira/browse/TIKA-1736}TIKA-1736}}). * Make div and other markup more consistent between PPT and PPTX ({{{http://issues.apache.org/jira/browse/TIKA-1755}TIKA-1755}}). * Parse multiple authors from MSOffice's semi-colon delimited author field ({{{http://issues.apache.org/jira/browse/TIKA-1765}TIKA-1765}}). * Include CTAKESConfig.properties within tika-parsers resources by default ({{{http://issues.apache.org/jira/browse/TIKA-1741}TIKA-1741}}). * Prevent infinite recursion when processing inline images in PDF files by limiting extraction of duplicate images within the same page ({{{http://issues.apache.org/jira/browse/TIKA-1742}TIKA-1742}}). * Upgrade to POI 3.13-final (via Andreas Beeker) ({{{http://issues.apache.org/jira/browse/TIKA-1707}TIKA-1707}}). * Upgraded tika-batch to use Path throughout (TIKA-1747 and (TIKA-1754). * Upgraded to Path in TikaInputStream (via Yaniv Kunda) ({{{http://issues.apache.org/jira/browse/TIKA-1744}TIKA-1744}}). * Changed default content handler type for "/rmeta" in tika-server to "xml" to align with "-J" option in tika-app. Clients can now specify handler types via PathParam. ({{{http://issues.apache.org/jira/browse/TIKA-1716}TIKA-1716}}). * The fantastic GROBID (or Grobid) GeneRation Of BIbliographic Data for machine learning from PDF files is now integrated as a Tika parser ({{{http://issues.apache.org/jira/browse/TIKA-1699}TIKA-1699}}, {{{http://issues.apache.org/jira/browse/TIKA-1712}TIKA-1712}}). * The ability to specify the Tesseract Config Path was added to the OCR Parser ({{{http://issues.apache.org/jira/browse/TIKA-1703}TIKA-1703}}). * Upgraded to ASM 5.0.4 ({{{http://issues.apache.org/jira/browse/TIKA-1705}TIKA-1705}}). * Corrected Tika Config XML detector definition explicit loading of MimeTypes ({{{http://issues.apache.org/jira/browse/TIKA-1708}TIKA-1708}}) * In Tika Parsers, Batch, Server, App and Examples, use Apache Commons IO instead of inlined ex-Commons classes, and the Java 7 Standard Charset definitions ({{{http://issues.apache.org/jira/browse/TIKA-1710}TIKA-1710}}) * Upgraded to Commons Compress 1.10, which enables zlib compressed archives support ({{{http://issues.apache.org/jira/browse/TIKA-1718}TIKA-1718}}) The following people have contributed to Tika 1.11 by submitting or commenting on the issues resolved in this release: * Alexander Widera * Bob Paulin * Chris A. Mattmann * Christian Wolfe * Jeremy B. Merrill * Jukka Zitting * Justin Palmer * Konstantin Gribov * Lewis John McGibbney * Nick Burch * Sujen Shah * Tim Allison * Yaniv Kunda See {{http://s.apache.org/fSj}} for more details on these contributions.