Apache Tika

Apache Tika Apache Tika - a content analysis toolkit. Apache Software Foundation Apache Software Foundation Metadata Tika Content Apache Tika

The Apache Tika toolkit detects and extracts metadata and structured text content from various documents using existing parser libraries. You can find the latest release on the download page. See the Getting Started guide for instructions on how to start using Tika.

Tika is a project of the Apache Software Foundation, and was formerly a subproject of Apache Lucene.