--------------- Apache Tika 1.7 --------------- ~~ Licensed to the Apache Software Foundation (ASF) under one or more ~~ contributor license agreements. See the NOTICE file distributed with ~~ this work for additional information regarding copyright ownership. ~~ The ASF licenses this file to You under the Apache License, Version 2.0 ~~ (the "License"); you may not use this file except in compliance with ~~ the License. You may obtain a copy of the License at ~~ ~~ http://www.apache.org/licenses/LICENSE-2.0 ~~ ~~ Unless required by applicable law or agreed to in writing, software ~~ distributed under the License is distributed on an "AS IS" BASIS, ~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. ~~ See the License for the specific language governing permissions and ~~ limitations under the License. Apache Tika 1.9 The most notable changes in Tika 1.9 over the previous release are: * The ability to use the cTAKES clinical text knowledge extraction system for biomedical data is now included as a Tika parser ({{{http://issues.apache.org/jira/browse/TIKA-1645}TIKA-1645}}, {{{http://issues.apache.org/jira/browse/TIKA-1642}TIKA-1642}}). * Tika-server allows a user to specify the Tika config from the command line ({{{http://issues.apache.org/jira/browse/TIKA-1652}TIKA-1652}}, {{{http://issues.apache.org/jira/browse/TIKA-1426}TIKA-1426}}). * Matlab file detection has been improved ({{{http://issues.apache.org/jira/browse/TIKA-1634}TIKA-1634}}). * The EXIFTool was added as an External parser ({{{http://issues.apache.org/jira/browse/TIKA-1639}TIKA-1639}}). * If FFMPEG is installed and on the PATH, it is a usable Parser in Tika now ({{{http://issues.apache.org/jira/browse/TIKA-1510}TIKA-1510}}). * Fixes have been applied to the ExternalParser to make it functional ({{{http://issues.apache.org/jira/browse/TIKA-1638}TIKA-1638}}). * Tika service loading can now be more verbose with the org.apache.tika.service.error.warn system property ({{{http://issues.apache.org/jira/browse/TIKA-1636}TIKA-1636}}). * Tika Server now allows for metadata extraction from remote URLs and in addition it outputs the detected language as a metadata field ({{{http://issues.apache.org/jira/browse/TIKA-1625}TIKA-1625}}). * OUTPUT_FILE_TOKEN not being replaced in ExternalParser contributed by Pascal Essiembre ({{{http://issues.apache.org/jira/browse/TIKA-1620}TIKA-1620}}). * Tika REST server now supports language identification ({{{http://issues.apache.org/jira/browse/TIKA-1622}TIKA-1622}}). * All of the example code from the Tika in Action book has been donated to Tika and added to tika-examples ({{{http://issues.apache.org/jira/browse/TIKA-1562}TIKA-1562}}). * Tika server now logs errors determining ContentDisposition ({{{http://issues.apache.org/jira/browse/TIKA-1621}TIKA-1621}}). * An algorithm for using Byte Histogram frequencies to construct a Neural Network and to perform MIME detection was added ({{{http://issues.apache.org/jira/browse/TIKA-1582}TIKA-1582}}). * A Bayesian algorithm for MIME detection by probabilistic means was added ({{{http://issues.apache.org/jira/browse/TIKA-1517}TIKA-1517}}). * Tika now incorporates the Apache Spatial Information System capability of parsing Geographic ISO 19139 files ({{{http://issues.apache.org/jira/browse/TIKA-443}TIKA-443}}). It can also detect those files as well. * Update the MimeTypes code to support inheritance ({{{http://issues.apache.org/jira/browse/TIKA-1535}TIKA-1535}}). * Provide ability to parse and identify Global Change Master Directory Interchange Format (GCMD DIF) scientific data files ({{{http://issues.apache.org/jira/browse/TIKA-1532}TIKA-1532}}). * Improvements to detect CBOR files by extension ({{{http://issues.apache.org/jira/browse/TIKA-1610}TIKA-1610}}). * Change xerial.org's sqlite-jdbc jar to "provided" ({{{http://issues.apache.org/jira/browse/TIKA-1511}TIKA-1511}}). Users will now need to add sqlite-jdbc to their classpath for the Sqlite3Parser to work. * ExternalParser.check now catches (suppresses) SecurityException and returns false, so it's OK to run Tika with a security policy that does not allow execution of external processes ({{{http://issues.apache.org/jira/browse/TIKA-1628}TIKA-1628}}). The following people have contributed to Tika 1.9 by submitting or commenting on the issues resolved in this release: * Aakarsh Medleri Hire Math * Anya Yun Li * Arturo Beltran * Chris A. Mattmann * Gautham Gowrishankar * Giuseppe Totaro * Jan Kronquist * Ji-Hyun Oh * Konstantin Gribov * Lewis John McGibbney * Lorenz Leutgeb * Luke sh * Michael McCandless * Nick Burch * Pascal Essiembre * Pavel Micka * Selina Chu * Tim Allison * Tyler Palsulich See {{http://s.apache.org/4n1}} for more details on these contributions.