----------------
                       Apache Tika 1.14
                       ----------------

~~ Licensed to the Apache Software Foundation (ASF) under one or more
~~ contributor license agreements.  See the NOTICE file distributed with
~~ this work for additional information regarding copyright ownership.
~~ The ASF licenses this file to You under the Apache License, Version 2.0
~~ (the "License"); you may not use this file except in compliance with
~~ the License.  You may obtain a copy of the License at
~~
~~     http://www.apache.org/licenses/LICENSE-2.0
~~
~~ Unless required by applicable law or agreed to in writing, software
~~ distributed under the License is distributed on an "AS IS" BASIS,
~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
~~ See the License for the specific language governing permissions and
~~ limitations under the License.

Apache Tika 1.14

        The most notable changes in Tika 1.14 over the previous release are:

	 * Extract all headers from MSG/RFC822 ({{{http://issues.apache.org/jira/browse/TIKA-2122}TIKA-2122}}).

	 * 9.1 ({{{http://issues.apache.org/jira/browse/TIKA-2113}TIKA-2113}}).

	 * Extract PDF DocInfo metadata into separate keys to preventoverwriting by XMP metadata ({{{http://issues.apache.org/jira/browse/TIKA-2057}TIKA-2057}}).

	 * Re-enable fileUrl for tika-server ({{{http://issues.apache.org/jira/browse/TIKA-2081}TIKA-2081}}).  If you choose,to use this feature, beware of the security vulnerabilities!See: https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2015-3271

	 * Add Tesseract's hOCR output format as an option, via Eric Pugh({{{http://issues.apache.org/jira/browse/TIKA-2093}TIKA-2093}})

	 * Extract macros from MSOffice files ({{{http://issues.apache.org/jira/browse/TIKA-2069}TIKA-2069}}).

	 * Maintain passed-in mime in TXTParser ({{{http://issues.apache.org/jira/browse/TIKA-2047}TIKA-2047}}).

	 * Upgrade to POI.3-15 ({{{http://issues.apache.org/jira/browse/TIKA-2013}TIKA-2013}}).

         * 0.3 ({{{http://issues.apache.org/jira/browse/TIKA-2051}TIKA-2051}}).

         * Fix hyperlinks with formatting in DOC and DOCX ({{{http://issues.apache.org/jira/browse/TIKA-1255}TIKA-1255}}and {{{http://issues.apache.org/jira/browse/TIKA-2078}TIKA-2078}})

         * Tika now is integrated with the Tensorflow library from Googleand it can use its Inception v3 image classification model toidentify objects in images ({{{http://issues.apache.org/jira/browse/TIKA-1993}TIKA-1993}}).

         * Parser configuration is now type-safe and parameters for parserscan have assigned types ({{{http://issues.apache.org/jira/browse/TIKA-1508}TIKA-1508}}, {{{http://issues.apache.org/jira/browse/TIKA-1986}TIKA-1986}}).

         * Prevent OOM/permanent hang on some corrupt CHM files ({{{http://issues.apache.org/jira/browse/TIKA-2040}TIKA-2040}}).

         * Upgrade ICU4J charset detection components to fix multithreadingbug ({{{http://issues.apache.org/jira/browse/TIKA-2041}TIKA-2041}}).

         * 1.4 ({{{http://issues.apache.org/jira/browse/TIKA-2039}TIKA-2039}}).

         * Maintain more significant digits in cells of "General" formatin XLS and XLSX ({{{http://issues.apache.org/jira/browse/TIKA-2025}TIKA-2025}}).

	 * Avoid mark/reset issues when extracting or detecting embedded resourcesin RFC822 emails ({{{http://issues.apache.org/jira/browse/TIKA-2037}TIKA-2037}}).

	 * Improving accuracy of Tesseract for better extraction of numericand alphanumeric text from images ({{{http://issues.apache.org/jira/browse/TIKA-2021}TIKA-2021}}, {{{http://issues.apache.org/jira/browse/TIKA-2031}TIKA-2031}}).

         * Improve extraction of embedded documents from PPT, PPTX and XLSX({{{http://issues.apache.org/jira/browse/TIKA-2026}TIKA-2026}}).

	 * Add parser for applefile (AppleSingle) ({{{http://issues.apache.org/jira/browse/TIKA-2022}TIKA-2022}}).

	 * Add mime types, mime magic and/or globs for:

	 ** Endnote Import File ({{{http://issues.apache.org/jira/browse/TIKA-2011}TIKA-2011}})

	 ** DJVU files ({{{http://issues.apache.org/jira/browse/TIKA-2009}TIKA-2009}})

	 ** MS Owner File ({{{http://issues.apache.org/jira/browse/TIKA-2008}TIKA-2008}})

	 ** Windows Media Metafile ({{{http://issues.apache.org/jira/browse/TIKA-2004}TIKA-2004}})

	 ** iCal and vCalendar ({{{http://issues.apache.org/jira/browse/TIKA-2006}TIKA-2006}})

	 ** MBOX ({{{http://issues.apache.org/jira/browse/TIKA-2042}TIKA-2042}})

	 ** Stata DTA ({{{http://issues.apache.org/jira/browse/TIKA-2064}TIKA-2064}})

	 * Add configurable maximum threshold for number of events extractedfrom the XMP Media Management Schema in JempboxExtractor ({{{http://issues.apache.org/jira/browse/TIKA-1999}TIKA-1999}}).

	 * Integrate TesseractOCR with full page image rendering for PDFs ({{{http://issues.apache.org/jira/browse/TIKA-1994}TIKA-1994}}).

	 * Add mime detection via Nick C and parser for DBF files ({{{http://issues.apache.org/jira/browse/TIKA-1513}TIKA-1513}}).

	 * Add mime detection and parsers for MSOffice 2003 XML Wordand Excel formats ({{{http://issues.apache.org/jira/browse/TIKA-1958}TIKA-1958}}).

	 * Extract hyperlinks from PPT, PPTX, XSLX ({{{http://issues.apache.org/jira/browse/TIKA-1454}TIKA-1454}}).


   The following people have contributed to Tika 1.14 by submitting or
   commenting on the issues resolved in this release:

        * Aeham Abushwashi
 
        * Alan Hunter

        * Alexander Kazakov

        * Chris A. Mattmann

        * Chris Knott

        * Egbert

        * Eli Trucco

        * Eric Pugh

        * Jean Coudon

        * Jeff Swindle

        * John Dougrez-Lewis

        * John Haynes

        * Joseph Naegele

        * Josh Cummings

        * Ken Krugler

        * Kukushkin Alexander

        * Lewis John McGibbney

        * Luis Filipe Nassif

        * Matthias Pigulla

        * Nam-Quang Tran

        * Nilay Chheda

        * Philipp Steinkrueger

        * Sara Miller

        * Sebastian Iturra

        * Thamme Gowda

        * Tilman Hausherr

        * Tim Allison

        * Tim Barrett

        * Vjeran Marcinko

        * Yahav Amsalem

        * Zarana Parekh

   See {{https://s.apache.org/TRWa}} for more details on these contributions.