Apache Tika 0.8
The most notable changes in Tika 0.8 over the previous release are:
- Language identification is now dynamically configurable, managed via a config file loaded from the classpath. (TIKA-490)
- Tika now supports parsing Feeds by wrapping the underlying Rome library. (TIKA-466)
- A quick-start guide for Tika parsing was contributed. (TIKA-464)
- An approach for plumbing through XHTML attributes was added. (TIKA-379)
- Media type hierarchy information is now taken into account when selecting the best parser for a given input document. (TIKA-298)
- Support for parsing common scientific data formats including netCDF and HDF4/5 was added (TIKA-400 and TIKA-399).
- Unit tests for Windows have been fixed, allowing TestParsers to complete. (TIKA-398)
The following people have contributed to Tika 0.8 by submitting or commenting on the issues resolved in this release:
- Łukasz Wiktor
- Adam Wilmer
- Alex Baranau
- Alex Ott
- André Ricardo
- Andrey Barhatov
- Andrey Sidorenko
- Antoni Mylka
- Arturo Beltran
- Attila Király
- Brad Greenlee
- Bruno Dumon
- Chris A. Mattmann
- Chris Bamford
- Christophe Gourmelon
- Dave Meikle
- David Weekly
- Dmitry Kuzmenko
- Erik Hetzner
- Geoff Jarrad
- Gerd Bremer
- Grant Ingersoll
- Jan Høydahl
- Jean-Philippe Ricard
- Jeremias Maerki
- Joao Garcia
- Jukka Zitting
- Julien Nioche
- Ken Krugler
- Liam O'Boyle
- Mads Hansen
- Marcel May
- Markus Goldbach
- Martijn van Groningen
- Maxim Valyanskiy
- Mike Hays
- Miroslav Pokorny
- Nick Burch
- Otis Gospodnetic
- Peter van Raamsdonk
- Peter Wolanin
- Peter_Lenahan@ibi.com
- Piotr Bartosiewicz
- Radek
- Rajiv Kumar
- Reinhard Schwab
- rick cameron
- Robert Muir
- Sanjeev Rao
- Simon Tyler
- Sjoerd Smeets
- Slavomir Varchula
- Staffan Olsson
- Tom De Leu
- Uwe Schindler
- Victor Kazakov
See http://s.apache.org/ab0 for more details on these contributions.