Apache Tika 1.10
The most notable changes in Tika 1.10 over the previous release are:
- Tika Config XML can now be used to create composite detectors, and exclude detectors that DefaultDetector would otherwise have used. This brings support in-line with Parsers. (TIKA-1702).
- Reverted to legacy sort order of parsers that was mistakenly reversed in Tika 1.9 (TIKA-1689).
- Upgrade to POI 3.13-beta1 (TIKA-1667).
- Upgrade to PDFBox 1.8.10 (TIKA-1588).
- MimeTypes now tries to find a registered type with and without parameters (TIKA-1692).
- Added more robust error handling for encoding detection of .MSG files (TIKA-1238).
- Fixed bug in Tika's use of the Jackcess parser that prevented reading of v97 Access files (TIKA-1681).
- Upgrade xerial.org's sqlite-jdbc to 3.8.10.1. NOTE: as of Tika 1.9, this jar is "provided." Make sure to upgrade your provided jar! (TIKA-1687).
- Add header/footer extraction to xls (via Aeham Abushwashi) (TIKA-1400).
- Drop the source file name from the embedded file path in RecursiveParserWrapper's "X-TIKA:embedded_resource_path" (TIKA-1673).
- Upgraded to Java 7 (TIKA-1536).
- Non-standards compliant emails are now correctly detected as message/rfc822 (TIKA-1602).
- Added parser for MS Access files via Jackcess. Many thanks to Health Market Science, Brian O'Neill and James Ahlborn for relicensing Jackcess to Apache v2! (TIKA-1601).
- GDALParser now correctly sets "nitf" as a supported MediaType (TIKA-1664).
- Added DigestingParser to calculate digest hashes and record them in metadata. Integrated with tika-app and tika-server (TIKA-1663).
- Fixed ZipContainerDetector to detect all IPA files (TIKA-1659).
The following people have contributed to Tika 1.10 by submitting or commenting on the issues resolved in this release:
- Aashish Chaudhary
- Adam Estrada
- Albert L.
- Alessandro De Angelis
- Andrew Jackson
- Ann Burgess
- Bin Hawking
- Bob Paulin
- Chris A. Mattmann
- Chris Wilson
- Daniel Bonniot de Ruisselet
- David Warren
- Filip Bednárik
- Giuseppe Totaro
- Jeremy B. Merrill
- Johannes Mockenhaupt
- Joseph North
- Ken Krugler
- Lewis John McGibbney
- Markus Jelsma
- Michael McCandless
- Namrata Malarout
- Nick Burch
- Niels
- Paul Ramirez
- Paul Tunison
- Rami Shomali
- Ray Gauss II
- Sergey Beryozkin
- Tim Allison
- Tyler Palsulich
- jefferyyuan
See http://s.apache.org/EQ2 for more details on these contributions.