--------------- Apache Tika 1.7 --------------- ~~ Licensed to the Apache Software Foundation (ASF) under one or more ~~ contributor license agreements. See the NOTICE file distributed with ~~ this work for additional information regarding copyright ownership. ~~ The ASF licenses this file to You under the Apache License, Version 2.0 ~~ (the "License"); you may not use this file except in compliance with ~~ the License. You may obtain a copy of the License at ~~ ~~ http://www.apache.org/licenses/LICENSE-2.0 ~~ ~~ Unless required by applicable law or agreed to in writing, software ~~ distributed under the License is distributed on an "AS IS" BASIS, ~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. ~~ See the License for the specific language governing permissions and ~~ limitations under the License. Apache Tika 1.7 The most notable changes in Tika 1.7 over the previous release are: * Fixed resource leak in OutlookPSTParser that caused TikaException when invoked via AutoDetectParser on Windows ({{{http://issues.apache.org/jira/browse/TIKA-1506}TIKA-1506}}). * HTML tags are properly stripped from content by FeedParser ({{{http://issues.apache.org/jira/browse/TIKA-1500}TIKA-1500}}). * Tika Server support for selecting a single metadata key; wrapped MetadataEP into MetadataResource ({{{http://issues.apache.org/jira/browse/TIKA-1499}TIKA-1499}}). * Tika Server support for JSON and XMP views of metadata ({{{http://issues.apache.org/jira/browse/TIKA-1497}TIKA-1497}}). * Tika Parent uses dependency management to keep duplicate dependencies in different modules the same version ({{{http://issues.apache.org/jira/browse/TIKA-1384}TIKA-1384}}). * Upgraded slf4j to version 1.7.7 ({{{http://issues.apache.org/jira/browse/TIKA-1496}TIKA-1496}}). * Tika Server support for RecursiveParserWrapper's JSON output (endpoint=rmeta) equivalent to ({{{http://issues.apache.org/jira/browse/TIKA-1451}TIKA-1451}}'s) -J option in tika-app ({{{http://issues.apache.org/jira/browse/TIKA-1498}TIKA-1498}}). * Tika Server support for providing the password for files on a per-request basis through the Password http header ({{{http://issues.apache.org/jira/browse/TIKA-1494}TIKA-1494}}). * Simple support for the BPG (Better Portable Graphics) image format ({{{http://issues.apache.org/jira/browse/TIKA-1491}TIKA-1491}}, {{{http://issues.apache.org/jira/browse/TIKA-1495}TIKA-1495}}). * Prevent exceptions from being thrown for some malformed mp3 files ({{{http://issues.apache.org/jira/browse/TIKA-1218}TIKA-1218}}). * Reformat pom.xml files to use two spaces per indent ({{{http://issues.apache.org/jira/browse/TIKA-1475}TIKA-1475}}). * Fix warning of slf4j logger on Tika Server startup ({{{http://issues.apache.org/jira/browse/TIKA-1472}TIKA-1472}}). * Tika CLI and GUI now have option to view JSON rendering of output of RecursiveParserWrapper ({{{http://issues.apache.org/jira/browse/TIKA-1451}TIKA-1451}}). * Tika now integrates the Geospatial Data Abstraction Library (GDAL) for parsing hundreds of geospatial formats ({{{http://issues.apache.org/jira/browse/TIKA-605}TIKA-605}}, {{{http://issues.apache.org/jira/browse/TIKA-1503}TIKA-1503}}). * ExternalParsers can now use Regexs to specify dynamic keys ({{{http://issues.apache.org/jira/browse/TIKA-1441}TIKA-1441}}). * Thread safety issues in ImageMetadataExtractor were resolved ({{{http://issues.apache.org/jira/browse/TIKA-1369}TIKA-1369}}). * The ForkParser service is now registered in Activator ({{{http://issues.apache.org/jira/browse/TIKA-1354}TIKA-1354}}). * The Rome Library was upgraded to version 1.5 ({{{http://issues.apache.org/jira/browse/TIKA-1435}TIKA-1435}}). * Add markup for files embedded in PDFs ({{{http://issues.apache.org/jira/browse/TIKA-1427}TIKA-1427}}). * Extract files embedded in annotations in PDFS ({{{http://issues.apache.org/jira/browse/TIKA-1433}TIKA-1433}}). * Upgrade to PDFBox 1.8.8 ({{{http://issues.apache.org/jira/browse/TIKA-1419}TIKA-1419}}, {{{http://issues.apache.org/jira/browse/TIKA-1442}TIKA-1442}}). * Add RecursiveParserWrapper (aka Jukka's and Nick's) RecursiveMetadataParser ({{{http://issues.apache.org/jira/browse/TIKA-1329}TIKA-1329}}). * Add example for how to dump TikaConfig to XML ({{{http://issues.apache.org/jira/browse/TIKA-1418}TIKA-1418}}). * Allow users to specify a tika config file for tika-app ({{{http://issues.apache.org/jira/browse/TIKA-1426}TIKA-1426}}). * PackageParser includes the last-modified date from the archive in the metadata, when handling embedded entries ({{{http://issues.apache.org/jira/browse/TIKA-1246}TIKA-1246}}). * Created a new Tesseract OCR Parser to extract text from images. Requires installation of Tesseract before use ({{{http://issues.apache.org/jira/browse/TIKA-93}TIKA-93}}). * Basic parser for older Excel formats, such as Excel 4, 5 and 95, which can get simple text, and metadata for Excel 5+95 ({{{http://issues.apache.org/jira/browse/TIKA-1490}TIKA-1490}}). The following people have contributed to Tika 1.7 by submitting or commenting on the issues resolved in this release: * Aimee Dev * Alexander Chow * Amit Gupta * Andreas * Andreas Hubold * Andrzej Bialecki * Ann Burgess * Avi * Boris Naguet * Chetan Laddha * Chris A. Mattmann * Chris Bamford * Christian Reuschling * Cservenak, Tamas * Damiano * Dave Meikle * Erik Hetzner * Fabian Lange * Hassan Akram * Hong-Thai Nguyen * James Baker * Jonathan Evans * Jukka Zitting * Kaijian Xu * Ken Krugler * Konstantin Gribov * Lewis John McGibbney * Luis Filipe Nassif * Marco Quaranta * Martin Kalcher * Matthias Krueger * Matthieu Neamar * Nick Burch * Nicolas Gavalda * Omid Pourhadi * Pradeep Singh * Ray Gauss II * Sasa Milenkovic * Sebastian Nagel * Sergey Beryozkin * Steffen * Steve R * Tadeu Alves * Tim Allison * Tran Nam Quang * Tyler Palsulich * Vladimir Glina * William Palmer See {{http://s.apache.org/a8m}} for more details on these contributions.