----------------
                       Apache Tika 1.17
                       ----------------

~~ Licensed to the Apache Software Foundation (ASF) under one or more
~~ contributor license agreements.  See the NOTICE file distributed with
~~ this work for additional information regarding copyright ownership.
~~ The ASF licenses this file to You under the Apache License, Version 2.0
~~ (the "License"); you may not use this file except in compliance with
~~ the License.  You may obtain a copy of the License at
~~
~~     http://www.apache.org/licenses/LICENSE-2.0
~~
~~ Unless required by applicable law or agreed to in writing, software
~~ distributed under the License is distributed on an "AS IS" BASIS,
~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
~~ See the License for the specific language governing permissions and
~~ limitations under the License.

Apache Tika 1.17
          
	 The most notable changes in Tika 1.17 over the previous release are:

         * This will be the last version that supports Java 7. The next version will require Java 8.

         * Fix thread-safety in ChmExtractor ({{{http://issues.apache.org/jira/browse/TIKA-2519}TIKA-2519}}).

         * Upgrade cxf to 3.0.16 ({{{http://issues.apache.org/jira/browse/TIKA-2516}TIKA-2516}}).

         * Allow users to configure maxMainMemoryBytes for PDFs via shrike (PR-213).

         * Extract underline and strikethrough in docx ({{{http://issues.apache.org/jira/browse/TIKA-2347}TIKA-2347}} and {{{http://issues.apache.org/jira/browse/TIKA-2512}TIKA-2512}}).

         * Cache TikaConfig in EmbeddedDocumentUtil for better performance in documents with large number of attachments ({{{http://issues.apache.org/jira/browse/TIKA-2511}TIKA-2511}}).

         * Extract media files from ooxml ({{{http://issues.apache.org/jira/browse/TIKA-2510}TIKA-2510}}).

         * Standardize the way the Image and Video captioning dockers and extraction work ({{{http://issues.apache.org/jira/browse/TIKA-2400}TIKA-2400}}, {{{http://github.com/apache/tika/pull/208}Github-208}})

         * Upgrade to xmpcore 5.1.3 ({{{http://issues.apache.org/jira/browse/TIKA-2034}TIKA-2034}}).

         * Upgrade to metadata-extractor 2.10.1 ({{{http://issues.apache.org/jira/browse/TIKA-2486}TIKA-2486}}).

         * Upgrade to OpenNLP 1.8.3 ({{{http://issues.apache.org/jira/browse/TIKA-2502}TIKA-2502}}).

         * Upgrade to Jackson 2.9.2 ({{{http://issues.apache.org/jira/browse/TIKA-2501}TIKA-2501}}).

         * Catch potential NPE in getting InputStream for attachments in PST file ({{{http://issues.apache.org/jira/browse/TIKA-2488}TIKA-2488}}).

         * Upgrade to PDFBox 2.0.8 ({{{http://issues.apache.org/jira/browse/TIKA-2489}TIKA-2489}}).

         * Allow configuration of markLimit in EncodingDetectors via tika-config.xml ({{{http://issues.apache.org/jira/browse/TIKA-2485}TIKA-2485}}).

         * RFC822Parser now selects the best alternative for multipart/alternative body components.  This aligns with the behavior of the OutlookParser ({{{http://issues.apache.org/jira/browse/TIKA-2478}TIKA-2478}}).  Users can select legacy behavior via the "extractAllAlternatives" parameter in the RFC822 parser definition in tika-config.xml.

         * Narrow mime detection for ms-owner files and add detectionfor .nls files ({{{http://issues.apache.org/jira/browse/TIKA-2469}TIKA-2469}}).

         * Fix bug in CharsetDetector that led to different detected charsets depending on whether user setText with a byte[] or an InputStream via Sean Story ({{{http://issues.apache.org/jira/browse/TIKA-2475}TIKA-2475}}).

         * Remove JAXB for easier use with Java 9 via Robert Munteanu ({{{http://issues.apache.org/jira/browse/TIKA-2466}TIKA-2466}}).

         * Upgrade to POI 3.17 ({{{http://issues.apache.org/jira/browse/TIKA-2429}TIKA-2429}}).

         * Enabling extraction of standard references from text ({{{http://issues.apache.org/jira/browse/TIKA-2449}TIKA-2449}}).

         * Load external custom mimetypes XML from system property tika.custom-mimetypes ({{{http://issues.apache.org/jira/browse/TIKA-2460}TIKA-2460}}).

         * Extract number of tiffs in a multi-page tiff ({{{http://issues.apache.org/jira/browse/TIKA-2451}TIKA-2451}}).

         * Fix detection of emails extracted from mbox ({{{http://issues.apache.org/jira/browse/TIKA-2456}TIKA-2456}}).

         * Add OverrideDetector and allow PSTParser to specify body content typeas text or html -- to avoid incorrect auto-detection of rfc/mbox, etc. ({{{http://issues.apache.org/jira/browse/TIKA-2454}TIKA-2454}})

         * AutoDetectParser throws ZeroByteFileException for zero-byte files after detection on the file extension ({{{http://issues.apache.org/jira/browse/TIKA-2450}TIKA-2450}}).

         * Extract phonetic runs in docx with experimental SAX parser ({{{http://issues.apache.org/jira/browse/TIKA-2448}TIKA-2448}}).

         * Extract phonetic runs from xls and allow users to turn off extraction of phonetic runs in both xls and xlsx ({{{http://issues.apache.org/jira/browse/TIKA-2440}TIKA-2440}}).

         * OOXML locale should be set by POI's LocaleUtil not Locale.getDefault(). Fix unit tests to be robust against different locales in OOXML and ExcelParser ({{{http://issues.apache.org/jira/browse/TIKA-2438}TIKA-2438}}).

         * Tika now has support for automatic image captioning, that combines Computer Vision and Natural Language Processing to automatically generate a readable caption for an image({{{http://issues.apache.org/jira/browse/TIKA-2262}TIKA-2262}}, {{{http://issues.apache.org/jira/browse/TIKA-2355}TIKA-2355}}, {{{http://issues.apache.org/jira/browse/TIKA-2402}TIKA-2402}}, Gh-198, Gh-196, Gh-189).

         * Add TestCorruptedFiles to allow devs to test parsers against corrupted input files ({{{http://issues.apache.org/jira/browse/TIKA-2430}TIKA-2430}}).

         * Correct Mimetype definition for Windows batch files (CMD and BAT) which are the same ({{{http://issues.apache.org/jira/browse/TIKA-2445}TIKA-2445}})

         * PSDParser memory use improvements ({{{http://issues.apache.org/jira/browse/TIKA-2447}TIKA-2447}})

         * Add underline extraction from Word documents (doc/docx) via Stuart Hendren as well as strike through extraction in docx ({{{http://issues.apache.org/jira/browse/TIKA-2347}TIKA-2347}}, {{{http://github.com/apache/tika/pull/173}Github-173}})


         The following people have contributed to Tika 1.17 by submitting or
            commenting on the issues resolved in this release:


         * Aashish Chaudhary

         * Abhijit Rajwade

         * Advokat

         * Albert L.
 
         * Alessandro De Angelis

         * Aman R Mathur

         * Ann Burgess

         * Bin Hawking

         * Bob Paulin
 
         * Chris A. Mattmann

         * Chris Bryant

         * Chris Wilson

         * Daniel Bonniot de Ruisselet

         * Dave Meikle

         * Dillon Welch

         * Dustin Spicuzza

         * Eamonn Saunders

         * frank

         * Giuseppe Totaro

         * Jan Burkhardt

         * jefferyyuan

         * Julian Reschke
  
         * Karl Buchta

         * Karl Richter

         * Ken Krugler

         * Konstantin Gribov

         * Lewis John McGibbney
 
         * Luis Filipe Nassif

         * Łukasz Ozimek

         * Madhav Sharan

         * Markus Jelsma

         * Matthew Caruana Galizia

         * Michael McCandless

         * Mike Cantrell

         * Nick Burch

         * Paul Ramirez
         
         * Peter Weiss

         * RameshKalidindi

         * Ravi

         * Ray Gauss II

         * Reinhard Schwab

         * Robert Letzler

         * Robert Munteanu

         * Roberto Benedetti

         * Rupert Westenthaler

         * Sam H

         * Sergey Beryozkin

         * Sergey Tsalkov

         * Stefano Fornari

         * Stuart Hendren

         * Takahiro Ochi

         * Thamme Gowda
 
         * Thejan Wijesinghe

         * Thomas Mortagne

         * Tilman Hausherr

         * Tim Allison

         * Tyler Palsulich

         * TzeKai Lee

         * Uwe Schindler

         * Yaniv Kunda

   See {{https://s.apache.org/bX5z}} for more details on these contributions.