--------------- Apache Tika 1.4 --------------- ~~ Licensed to the Apache Software Foundation (ASF) under one or more ~~ contributor license agreements. See the NOTICE file distributed with ~~ this work for additional information regarding copyright ownership. ~~ The ASF licenses this file to You under the Apache License, Version 2.0 ~~ (the "License"); you may not use this file except in compliance with ~~ the License. You may obtain a copy of the License at ~~ ~~ http://www.apache.org/licenses/LICENSE-2.0 ~~ ~~ Unless required by applicable law or agreed to in writing, software ~~ distributed under the License is distributed on an "AS IS" BASIS, ~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. ~~ See the License for the specific language governing permissions and ~~ limitations under the License. Apache Tika 1.4 The most notable changes in Tika 1.4 over the previous release are: * Removed a test HTML file with a poorly chosen GPL text in it ({{{http://issues.apache.org/jira/browse/TIKA-1129}TIKA-1129}}). * Improvements to tika-server to allow it to produce text/html and text/xml content ({{{http://issues.apache.org/jira/browse/TIKA-1126}TIKA-1126}}, {{{http://issues.apache.org/jira/browse/TIKA-1127}TIKA-1127}}). * Improvements were made to the Compressor Parser to handle g'zipped files that require the decompressConcatenated option set to true ({{{http://issues.apache.org/jira/browse/TIKA-1096}TIKA-1096}}). * Addressed a typographic error that was preventing from detection of awk files ({{{http://issues.apache.org/jira/browse/TIKA-1081}TIKA-1081}}). * Added a new end-point to Tika's JAX-RS REST server that only detects the media-type based on a small portion of the document submitted ({{{http://issues.apache.org/jira/browse/TIKA-1047}TIKA-1047}}). * RTF: Ordered and unordered lists are now extracted ({{{http://issues.apache.org/jira/browse/TIKA-1062}TIKA-1062}}). * MP3: Audio duration is now extracted ({{{http://issues.apache.org/jira/browse/TIKA-991}TIKA-991}}) * Java .class files: upgraded from ASM 3.1 to ASM 4.1 for parsing the Java bytecodes ({{{http://issues.apache.org/jira/browse/TIKA-1053}TIKA-1053}}). * Mime Types: Definitions extended to optionally include Link (URL) and UTI, along with details for several common formats ({{{http://issues.apache.org/jira/browse/TIKA-1012}TIKA-1012}} / {{{http://issues.apache.org/jira/browse/TIKA-1083}TIKA-1083}}) * Exceptions when parsing OLE10 embedded documents, when parsing summary information from Office documents, and when saving embedded documennts in TikaCLI are now logged instead of aborting extraction ({{{http://issues.apache.org/jira/browse/TIKA-1074}TIKA-1074}}) * MS Word: line tabular character is now replaced with newline ({{{http://issues.apache.org/jira/browse/TIKA-1128}TIKA-1128}}) * XML: ElementMetadataHandlers can now optionally accept duplicate and empty values ({{{http://issues.apache.org/jira/browse/TIKA-1133}TIKA-1133}}) The following people have contributed to Tika 1.4 by submitting or commenting on the issues resolved in this release: * Axel Dörfler * Bernhard Berger * Chris A. Mattmann * Dave Meikle * David Morana * Giuseppe Totaro * Gregory Chanan * Jérémie Lesage * Jukka Zitting * Konstantin Privezentsev * Lee Graber * Lewis John McGibbney * Marco Quaranta * Markus Jelsma * Michael McCandless * Nick Burch * Raimund Merkert * Ray Gauss II * Ryan McKinley * T. Schmidt * Vincent Massol See {{http://s.apache.org/JPY}} for more details on these contributions.