----------------
                       Apache Tika 1.15
                       ----------------

~~ Licensed to the Apache Software Foundation (ASF) under one or more
~~ contributor license agreements.  See the NOTICE file distributed with
~~ this work for additional information regarding copyright ownership.
~~ The ASF licenses this file to You under the Apache License, Version 2.0
~~ (the "License"); you may not use this file except in compliance with
~~ the License.  You may obtain a copy of the License at
~~
~~     http://www.apache.org/licenses/LICENSE-2.0
~~
~~ Unless required by applicable law or agreed to in writing, software
~~ distributed under the License is distributed on an "AS IS" BASIS,
~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
~~ See the License for the specific language governing permissions and
~~ limitations under the License.

Apache Tika 1.15

	 The most notable changes in Tika 1.15 over the previous release are:

	 * Tika now has a module for Deep Learning powered by theDL4J toolkit. The initial included model is for InceptionV3and so using this module, natively in Java, Tika can useDeep learning for metadata/text extraction from Images usingthe power of the Inception model ({{{http://github.com/apache/tika/pull/165}Github-165}}).

	 * A new parser for sentiment analysis using a categorical(multi-class, anry, sad, neutral, like, love) and binary(positive/negative) was added leveraging the USC datascience work ({{{http://issues.apache.org/jira/browse/TIKA-2016}TIKA-2016}}).

	 * Tika now has the ability to automatically detect objects in videos,using OpenCV and Tensorflow ({{{http://issues.apache.org/jira/browse/TIKA-2322}TIKA-2322}}).

	 * Change default behavior to parse embedded documents even if the userforgets to specify a Parser.class in the ParseContext ({{{http://issues.apache.org/jira/browse/TIKA-2096}TIKA-2096}}).Users who wish to parse only the container document should setan EmptyParser as the Parser.class in the ParseContext.

	 * Change default behavior of Office Parsers to _not_ extractMacros.  User needs to setExtractMacros to "true" ({{{http://issues.apache.org/jira/browse/TIKA-2302}TIKA-2302}}).

	 * Added tika-eval module ({{{http://issues.apache.org/jira/browse/TIKA-1332}TIKA-1332}}).

	 * Unified logging across Tika: SLF4J as logging API, Apache Log4j asimplementation with JCL and JUL bridges in standalone tools liketika-app, tika-batch and tika-server ({{{http://issues.apache.org/jira/browse/TIKA-2245}TIKA-2245}}).

	 * Add parser for XLSB files ({{{http://issues.apache.org/jira/browse/TIKA-1195}TIKA-1195}}).

	 * Add parsers for EMF/WMF files ({{{http://issues.apache.org/jira/browse/TIKA-2246}TIKA-2246}}/{{{http://issues.apache.org/jira/browse/TIKA-2247}TIKA-2247}}).

	 * Add parsers for WordPerfect and QuattroPro (.qpw) files.Contributed by Pascal Essiembre ({{{http://issues.apache.org/jira/browse/TIKA-1946}TIKA-1946}} and {{{http://issues.apache.org/jira/browse/TIKA-2228}TIKA-2228}}).

	 * Add experimental SAX parser for .pptx files. To select this parser,set useSAXPptxExtractor(true) on OfficeParserConfig ({{{http://issues.apache.org/jira/browse/TIKA-2210}TIKA-2210}}).

	 * Add experimental SAX parser for .docx files. To select this parser,set useSAXDocxExtractor(true) on OfficeParserConfig ({{{http://issues.apache.org/jira/browse/TIKA-1321}TIKA-1321}}, {{{http://issues.apache.org/jira/browse/TIKA-2191}TIKA-2191}}).

	 * Add mime detection and parser for Word 2006ML format ({{{http://issues.apache.org/jira/browse/TIKA-2179}TIKA-2179}}).

	 * Bug fix for WordPerfect via Pascal Essiembre ({{{http://issues.apache.org/jira/browse/TIKA-2352}TIKA-2352}}).

	 * Added "text-main" equivalent option to tika-server via/tika/main ({{{http://issues.apache.org/jira/browse/TIKA-2343}TIKA-2343}}).

	 * Enabled configuration of the EncodingDetector used byparsers that extend AbstractEncodingDetectorParser ({{{http://issues.apache.org/jira/browse/TIKA-2273}TIKA-2273}}).

	 * Prevent easily preventable OOMs for both detection and parsingof some compression formats ({{{http://issues.apache.org/jira/browse/TIKA-2330}TIKA-2330}}).

	 * Extract images and thumbnails from ODT via Sam Bayer ({{{http://issues.apache.org/jira/browse/TIKA-2295}TIKA-2295}}).

	 * Fix potential NPE in FeedParser via Julien Nioche ({{{http://issues.apache.org/jira/browse/TIKA-2269}TIKA-2269}}).

	 * Official mime types for BMP, EMF and WMF have been registered withIANA, so switch to these (image/bmp image/emf image/wmf) ({{{http://issues.apache.org/jira/browse/TIKA-2250}TIKA-2250}})

	 * Be more parsimonious with BufferedInputStreams via Josh Hight({{{http://issues.apache.org/jira/browse/TIKA-2244}TIKA-2244}}).

	 * Enable handling of hyphenated language codes in TesseractOCRParservia Graham Russell ({{{http://issues.apache.org/jira/browse/TIKA-2231}TIKA-2231}}).

	 * Improve style tags in ODT ({{{http://issues.apache.org/jira/browse/TIKA-2242}TIKA-2242}}).

	 * Add container detection for embedded MSEquation files ({{{http://issues.apache.org/jira/browse/TIKA-2238}TIKA-2238}}).

	 * Add parsing of JBIG2 and extraction of JBIG2 from PDFs whenrequired dependencies are added to class path by user.Contributed by Pascal Essiembre ({{{http://issues.apache.org/jira/browse/TIKA-2232}TIKA-2232}}).

	 * Mime magic for the OneNote family (.one / .onetoc / .onepkg), no parser({{{http://issues.apache.org/jira/browse/TIKA-2224}TIKA-2224}}).

	 * Add configurability of "preserve-interword-spacing" toTesseractOCRParser ({{{http://issues.apache.org/jira/browse/TIKA-2190}TIKA-2190}}).

	 * Upgrade PDFBox to 2.0.6 and JempBox 1.8.13 ({{{http://issues.apache.org/jira/browse/TIKA-2361}TIKA-2361}}.

	 * Refactor MockParser to consolidate service loadingand mime types into tika-core/src/test ({{{http://issues.apache.org/jira/browse/TIKA-2195}TIKA-2195}}).

	 * Enabled extraction of embedded objects from headers, footers,footnotes, endnotes and comments in legacy .docx parser ({{{http://issues.apache.org/jira/browse/TIKA-2192}TIKA-2192}}).

	 * Allow extraction of PDActions (including Javascript) fromPDFs ({{{http://issues.apache.org/jira/browse/TIKA-2090}TIKA-2090}}).  This is turned off by default.  Usersmust setExtractActions(true) on the PDFParserConfig.

	 * Change default behavior in experimental .docx parser to ignoredeleted text to align with .doc ({{{http://issues.apache.org/jira/browse/TIKA-2187}TIKA-2187}}).

	 * Upgrade to Apache POI 3.16 ({{{http://issues.apache.org/jira/browse/TIKA-2116}TIKA-2116}}, {{{http://issues.apache.org/jira/browse/TIKA-2181}TIKA-2181}}, {{{http://issues.apache.org/jira/browse/TIKA-2329}TIKA-2329}}).

	 * Allow configuration of timeout for ForkParser ({{{http://issues.apache.org/jira/browse/TIKA-2170}TIKA-2170}}).

	 * Add extraction of .jpx inline images from PDFs when required dependencies are added by user to class path ({{{http://issues.apache.org/jira/browse/TIKA-2175}TIKA-2175}}).

	 * Add .jpx, .jp2, .ppm to formats handled by Tesseract ({{{http://issues.apache.org/jira/browse/TIKA-2174}TIKA-2174}}).

	 * Upgrade "provided" Sqlite to 3.16.1 ({{{http://issues.apache.org/jira/browse/TIKA-2334}TIKA-2334}}).

	 * Upgrade CXF version to 3.0.12 ({{{http://issues.apache.org/jira/browse/TIKA-2292}TIKA-2292}}).

	 * Add Lingo24 Language Detector ({{{http://issues.apache.org/jira/browse/TIKA-2297}TIKA-2297}}).

	 * Further mime magic for WebVTT ({{{http://issues.apache.org/jira/browse/TIKA-1772}TIKA-1772}})

	 * Extend support for increased PSM options up to 13 for modernversions of Tesseract ({{{http://issues.apache.org/jira/browse/TIKA-2357}TIKA-2357}}).


   The following people have contributed to Tika 1.15 by submitting or
   commenting on the issues resolved in this release:

        * Adam Carroll

        * Aeham Abushwashi

        * Anastasija Mensikova

        * Bipul Kumar

        * Chris A. Mattmann

        * Dave Meikle

        * David Pilato

        * Fabio

        * Frederic Ronny

        * Jan Van Raemdonck

        * Jasper Hafkenscheid

        * Jorge Spinsanti

        * Joshua Hight

        * Julian

        * Julien Nioche

        * Ken Krugler

        * Kevin Oberlag

        * Konstantin Gribov

        * Laszlo Marai

        * Lewis John McGibbney

        * Luis Filipe Nassif

        * Madhav Sharan

        * Matthew Caruana Galizia

        * Michal Hlavac

        * Mike Liu

        * Nick Burch

        * Nick C

        * Nino Skopac

        * Panagiotis Mpailis

        * Pascal Essiembre

        * Peter Weiss

        * Robin Schimpf

        * Sean Story

        * senthil

        * Sergey Beryozkin

        * Seva Alekseyev

        * Thamme Gowda

        * Thomas Galla

        * Tim Allison

        * Tim Kingsbury

   See {{https://s.apache.org/XowY}} for more details on these contributions.