Apache Tika 0.8

The most notable changes in Tika 0.8 over the previous release are:

  • Language identification is now dynamically configurable, managed via a config file loaded from the classpath. (TIKA-490)
  • Tika now supports parsing Feeds by wrapping the underlying Rome library. (TIKA-466)
  • A quick-start guide for Tika parsing was contributed. (TIKA-464)
  • An approach for plumbing through XHTML attributes was added. (TIKA-379)
  • Media type hierarchy information is now taken into account when selecting the best parser for a given input document. (TIKA-298)
  • Support for parsing common scientific data formats including netCDF and HDF4/5 was added (TIKA-400 and TIKA-399).
  • Unit tests for Windows have been fixed, allowing TestParsers to complete. (TIKA-398)

The following people have contributed to Tika 0.8 by submitting or commenting on the issues resolved in this release:

  • Łukasz Wiktor
  • Adam Wilmer
  • Alex Baranau
  • Alex Ott
  • André Ricardo
  • Andrey Barhatov
  • Andrey Sidorenko
  • Antoni Mylka
  • Arturo Beltran
  • Attila Király
  • Brad Greenlee
  • Bruno Dumon
  • Chris A. Mattmann
  • Chris Bamford
  • Christophe Gourmelon
  • Dave Meikle
  • David Weekly
  • Dmitry Kuzmenko
  • Erik Hetzner
  • Geoff Jarrad
  • Gerd Bremer
  • Grant Ingersoll
  • Jan Høydahl
  • Jean-Philippe Ricard
  • Jeremias Maerki
  • Joao Garcia
  • Jukka Zitting
  • Julien Nioche
  • Ken Krugler
  • Liam O'Boyle
  • Mads Hansen
  • Marcel May
  • Markus Goldbach
  • Martijn van Groningen
  • Maxim Valyanskiy
  • Mike Hays
  • Miroslav Pokorny
  • Nick Burch
  • Otis Gospodnetic
  • Peter van Raamsdonk
  • Peter Wolanin
  • Peter_Lenahan@ibi.com
  • Piotr Bartosiewicz
  • Radek
  • Rajiv Kumar
  • Reinhard Schwab
  • rick cameron
  • Robert Muir
  • Sanjeev Rao
  • Simon Tyler
  • Sjoerd Smeets
  • Slavomir Varchula
  • Staffan Olsson
  • Tom De Leu
  • Uwe Schindler
  • Victor Kazakov

See http://s.apache.org/ab0 for more details on these contributions.