Apache Tika 1.9
The most notable changes in Tika 1.9 over the previous release are:
- The ability to use the cTAKES clinical text knowledge extraction system for biomedical data is now included as a Tika parser (TIKA-1645, TIKA-1642).
- Tika-server allows a user to specify the Tika config from the command line (TIKA-1652, TIKA-1426).
- Matlab file detection has been improved (TIKA-1634).
- The EXIFTool was added as an External parser (TIKA-1639).
- If FFMPEG is installed and on the PATH, it is a usable Parser in Tika now (TIKA-1510).
- Fixes have been applied to the ExternalParser to make it functional (TIKA-1638).
- Tika service loading can now be more verbose with the org.apache.tika.service.error.warn system property (TIKA-1636).
- Tika Server now allows for metadata extraction from remote URLs and in addition it outputs the detected language as a metadata field (TIKA-1625).
- OUTPUT_FILE_TOKEN not being replaced in ExternalParser contributed by Pascal Essiembre (TIKA-1620).
- Tika REST server now supports language identification (TIKA-1622).
- All of the example code from the Tika in Action book has been donated to Tika and added to tika-examples (TIKA-1562).
- Tika server now logs errors determining ContentDisposition (TIKA-1621).
- An algorithm for using Byte Histogram frequencies to construct a Neural Network and to perform MIME detection was added (TIKA-1582).
- A Bayesian algorithm for MIME detection by probabilistic means was added (TIKA-1517).
- Tika now incorporates the Apache Spatial Information System capability of parsing Geographic ISO 19139 files (TIKA-443). It can also detect those files as well.
- Update the MimeTypes code to support inheritance (TIKA-1535).
- Provide ability to parse and identify Global Change Master Directory Interchange Format (GCMD DIF) scientific data files (TIKA-1532).
- Improvements to detect CBOR files by extension (TIKA-1610).
- Change xerial.org's sqlite-jdbc jar to "provided" (TIKA-1511). Users will now need to add sqlite-jdbc to their classpath for the Sqlite3Parser to work.
- ExternalParser.check now catches (suppresses) SecurityException and returns false, so it's OK to run Tika with a security policy that does not allow execution of external processes (TIKA-1628).
The following people have contributed to Tika 1.9 by submitting or commenting on the issues resolved in this release:
- Aakarsh Medleri Hire Math
- Anya Yun Li
- Arturo Beltran
- Chris A. Mattmann
- Gautham Gowrishankar
- Giuseppe Totaro
- Jan Kronquist
- Ji-Hyun Oh
- Konstantin Gribov
- Lewis John McGibbney
- Lorenz Leutgeb
- Luke sh
- Michael McCandless
- Nick Burch
- Pascal Essiembre
- Pavel Micka
- Selina Chu
- Tim Allison
- Tyler Palsulich
See http://s.apache.org/4n1 for more details on these contributions.