---------------
                       Apache Tika 0.8
                       ---------------

~~ Licensed to the Apache Software Foundation (ASF) under one or more
~~ contributor license agreements.  See the NOTICE file distributed with
~~ this work for additional information regarding copyright ownership.
~~ The ASF licenses this file to You under the Apache License, Version 2.0
~~ (the "License"); you may not use this file except in compliance with
~~ the License.  You may obtain a copy of the License at
~~
~~     http://www.apache.org/licenses/LICENSE-2.0
~~
~~ Unless required by applicable law or agreed to in writing, software
~~ distributed under the License is distributed on an "AS IS" BASIS,
~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
~~ See the License for the specific language governing permissions and
~~ limitations under the License.

Apache Tika 1.1


   The most notable changes in Tika 1.1 over the previous release are:

      * Link Extraction: The rel attribute is now extracted from links 
        per the LinkConteHandler. 
        ({{{http://issues.apache.org/jira/browse/TIKA-824}TIKA-824}})
        
      * MP3: Fixed handling of UTF-16 (two byte) ID3v2 tags (previously 
        the last character in a UTF-16 tag could be corrupted) 
        ({{{http://issues.apache.org/jira/browse/TIKA-793}TIKA-793}})
        
      * Performance: Loading of the default media type registry is now 
        significantly faster. 
        ({{{http://issues.apache.org/jira/browse/TIKA-780}TIKA-780}})
        
      * PDF: Allow controlling whether overlapping duplicated text should 
        be removed.  Disabling this (the default) can give big speedups to 
        text extraction and may workaround cases where non-duplicated 
        characters were incorrectly removed 
        ({{{http://issues.apache.org/jira/browse/TIKA-767}TIKA-767}}).
        Allow controlling whether text tokens should be sorted by their x/y 
        position before extracting text 
        ({{{http://issues.apache.org/jira/browse/TIKA-612}TIKA-612}}); 
        this is necessary for certain PDFs.  Fixed cases where too many 
        </p> tags appear in the XHTML output, causing NPE when opening 
        some PDFs with the GUI 
        ({{{http://issues.apache.org/jira/browse/TIKA-778}TIKA-778}}).
        
      * RTF: Fixed case where a font change would result in processing 
        bytes in the wrong font's charset, producing bogus text output 
        ({{{http://issues.apache.org/jira/browse/TIKA-777}TIKA-777}}).  
        Don't output whitespace in ignored group states, avoiding 
        excessive whitespace output 
        ({{{http://issues.apache.org/jira/browse/TIKA-781}TIKA-781}}).  
        Binary embedded content (using \bin control word) is now skipped 
        correctly; previously it could cause the parser to incorrectly 
        extract binary content as text
        ({{{http://issues.apache.org/jira/browse/TIKA-782}TIKA-782}}).
      
      * CLI: New TikaCLI option "--list-detectors", which displays the 
        mimetype detectors that are available, similar to the existing 
        "--list-parsers" option for parsers. 
        ({{{http://issues.apache.org/jira/browse/TIKA-785}TIKA-785}}).
        
      * Detectors: The order of detectors, as supplied via the service
        registry loader, is now controlled. User supplied detectors are 
        prefered, then Tika detectors (such as the container aware ones), 
        and finally the core Tika MimeTypes is used as a backup. This 
        allows for specific, detailed detectors to take preference over 
        the default mime magic + filename detector. 
        ({{{http://issues.apache.org/jira/browse/TIKA-786}TIKA-786}})
        
      * Microsoft Project (MPP): Filetype detection has been fixed, and 
        basic metadata (but no text) is now extracted. 
        ({{{http://issues.apache.org/jira/browse/TIKA-789}TIKA-789}})
        
      * Outlook: fixed NullPointerException in TikaGUI when messages with
        embedded RTF or HTML content were filtered 
        ({{{http://issues.apache.org/jira/browse/TIKA-801}TIKA-801}}).
        
      * Ogg Vorbis and FLAC: Parser added for Ogg Vorbis and FLAC audio
        files, which extract audio metadata and tags 
        ({{{http://issues.apache.org/jira/browse/TIKA-747}TIKA-747}}).
        
      * MP4: Improved mime magic detection for MP4 based formats (including
        QuickTime, MP4 Video and Audio, and 3GPP) 
        ({{{http://issues.apache.org/jira/browse/TIKA-851}TIKA-851}}).
        
      * MP4: Basic metadata extracting parser for MP4 files added, which includes
        limited audio and video metadata, along with the iTunes media metadata
        (such as Artist and Title) 
        ({{{http://issues.apache.org/jira/browse/TIKA-852}TIKA-852}}).
        
      * Document Passwords: A new ParseContext object, PasswordProvider, 
        has been added. This provides a way to supply the password for 
        a document during processing. Currently, only password protected 
        PDFs and Microsoft OOXML Files are supported. 
        ({{{http://issues.apache.org/jira/browse/TIKA-850}TIKA-850}}).   

   The following people have contributed to Tika 1.1 by submitting or
   commenting on the issues resolved in this release:

      * Alex Ott
      
      * Alexander Chow 
      
      * Ali Oral 
      
      * Andrzej Bialecki
      
      * Antoni Mylka
      
      * Arjohn Kampman
      
      * Bastian Mathes
      
      * Chris A. Mattmann
      
      * Craig Stires
      
      * David Tran
      
      * Etienne Jouvin
      
      * Fabian Lange
      
      * Geoff Jarrad
      
      * Jan Høydahl
      
      * Jerome Lacoste
      
      * John Mastarone
      
      * Jukka Zitting
      
      * Julien Nioche 
      
      * Ken Krugler
      
      * Lau Brino
      
      * Markus Jelsma 
      
      * Maxim Valyanskiy
      
      * Michael McCandless
      
      * Nick Burch
      
      * Pablo Queixalos 
      
      * Paul Hill
      
      * Paul Pearcy 
      
      * peter royal
      
      * PNS
      
      * Radek
      
      * Ray Gauss II 
      
      * Stephan Mühlstrasser
      
      * Swapna Vuppala
      
      * Torsten Krah 
      
      * William Seemann
      
      * Yegor Kozlov 

   See {{http://s.apache.org/Jn4}} for more details on these contributions.