----------------
                     Apache Tika 1.21
                     ----------------

~~ Licensed to the Apache Software Foundation (ASF) under one or more
~~ contributor license agreements.  See the NOTICE file distributed with
~~ this work for additional information regarding copyright ownership.
~~ The ASF licenses this file to You under the Apache License, Version 2.0
~~ (the "License"); you may not use this file except in compliance with
~~ the License.  You may obtain a copy of the License at
~~
~~     http://www.apache.org/licenses/LICENSE-2.0
~~
~~ Unless required by applicable law or agreed to in writing, software
~~ distributed under the License is distributed on an "AS IS" BASIS,
~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
~~ See the License for the specific language governing permissions and
~~ limitations under the License.


Apache Tika 1.21

	 The most notable changes in Tika 1.21 over the previous release are:

	 * Add optional AUTO mode to OCR'ing of PDFs.  If tesseract is installed
	   and on the path, and this option is selected programmatically
	   or via TikaConfig(), the PDFParser will use heuristics to decide
	   whether or not to run OCR per page on PDFs. ({{{http://issues.apache.org/jira/browse/TIKA-2749}TIKA-2749}})

	 * The ZipContainerDetector's default behavior was changed to run
	   streaming detection up to its markLimit.  Users can get the
	   legacy behavior (spool-to-file/rely-on-underlying-file-in-TikaInputStream)

	 * The POIFSContainerDetector requires an underlying file; it will try to
	   spool the file to disk; if the file's length is > markLimit,
	   it will not attempt detection; set markLimit to -1 for legacy behavior
	   ({{{http://issues.apache.org/jira/browse/TIKA-2849}TIKA-2849}}).

	 * Upgrade PDFBox to 2.0.14 ({{{http://issues.apache.org/jira/browse/TIKA-2834}TIKA-2834}}).

	 * Add CSV detection and replace TXTParser with TextAndCSVParser;
	   users can turn off CSV detection by excluding the TextAndCSVParser
	   and adding back the TXTParser via tika-config ({{{http://issues.apache.org/jira/browse/TIKA-2833}TIKA-2833}}).

	 * Add a CSVParser.  CSV detection is currently based solely on filename
	   and/or information conveyed via Metadata ({{{http://issues.apache.org/jira/browse/TIKA-2826}TIKA-2826}}).

	 * General upgrades: asm, bouncycastle, commons-codec, commons-lang3, cxf,
	   guava, h2, httpcomponents, jackcess, junrar, Lucene, mime4j, opennlp,
	   parso,sqlite-jdbc (provided), zstd-jni (provided) ({{{http://issues.apache.org/jira/browse/TIKA-2824}TIKA-2824}})

	 * Bundle xerces2 with tika-parsers ({{{http://issues.apache.org/jira/browse/TIKA-2802}TIKA-2802}}).

	 * Upgrade jaxb to 2.3.2 ({{{http://issues.apache.org/jira/browse/TIKA-2819}TIKA-2819}}).

	 * Upgrade jackson to 2.9.8 ({{{http://issues.apache.org/jira/browse/TIKA-2717}TIKA-2717}}).

	 * Update tika-eval's common tokens lists ({{{http://issues.apache.org/jira/browse/TIKA-2822}TIKA-2822}}).

	 * Handle bad tags in tika-eval more robustly ({{{http://issues.apache.org/jira/browse/TIKA-2810}TIKA-2810}}).

	 * Add reports for tags in tika-eval ({{{http://issues.apache.org/jira/browse/TIKA-2809}TIKA-2809}}).

	 * Extract text from SDT element within textboxes in .docx files ({{{http://issues.apache.org/jira/browse/TIKA-2807}TIKA-2807}}).


     The following people have contributed to Tika 1.21 by submitting or
           commenting on the issues resolved in this release:

        * Anssi Törmä

        * Boris Petrov

        * Claudia Mickiewicz

        * Edans Sandes

        * Filip

        * Hans Brende

        * Karl Wright

        * Konstantin Gribov

        * Luis Filipe Nassif

        * Maxim Solodovnik

        * Pat cashman

        * Robert Munteanu

        * Serban Alexe

        * Tim Allison

        * chandra

   See {{https://s.apache.org/IThN}} for more details on these contributions.