Apache Tika 1.20

The most notable changes in Tika 1.20 over the previous release are:

  • Upgrade to Apache POI 4.0.1 (TIKA-2751).
  • Integrate/parameterize new angles handling in PDFBox (TIKA-2779).
  • Upgrade to PDFBox 2.0.13 (TIKA-2788).
  • Prevent content within style/ and script/ elements to be written in the ToTextContentHandler (TIKA-2550).
  • Switch child to parent communication to a shared memory-mappedfile in tika-server's -spawnChild mode.
  • Fix bug in tika-server when run in legacy mode (not -spawnChild) that caused it to return 503 on documents submitted after it hit an OutOfMemoryError (TIKA-2776).
  • Upgrade jaxb-runtime and javax.activation (TIKA-2778).
  • tika-app in batch mode now requires an interrupt or kill signal to the parent process to stop the parent and the child processes (TIKA-2780).
  • Bulk upgrade of dependencies (TIKA-2775).
  • Improve language id efficiency in tika-eval (TIKA-2777).
  • 25.2 (TIKA-2773).
  • Remove duplication of notes in PPT slides (TIKA-2735)
  • Use -javaHome or $JAVA_HOME (if they exist) when spawning child in tika-server's -spawnChild mode.

The following people have contributed to Tika 1.20 by submitting or commenting on the issues resolved in this release:

  • Boris Petrov
  • Dave Meikle
  • feng ye
  • Hans Brende
  • Jeroen
  • Julien Massiera
  • Kristen Cheung
  • Lewis John McGibbney
  • Mario Bisonti
  • Markus Jelsma
  • Nick Sincaglia
  • Ronan O'Sullivan
  • Tim Allison

See https://s.apache.org/fScy for more details on these contributions.