---------------- Apache Tika 2.4.0 ---------------- ~~ Licensed to the Apache Software Foundation (ASF) under one or more ~~ contributor license agreements. See the NOTICE file distributed with ~~ this work for additional information regarding copyright ownership. ~~ The ASF licenses this file to You under the Apache License, Version 2.0 ~~ (the "License"); you may not use this file except in compliance with ~~ the License. You may obtain a copy of the License at ~~ ~~ http://www.apache.org/licenses/LICENSE-2.0 ~~ ~~ Unless required by applicable law or agreed to in writing, software ~~ distributed under the License is distributed on an "AS IS" BASIS, ~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. ~~ See the License for the specific language governing permissions and ~~ limitations under the License. Apache Tika 2.4.0 The most notable changes in Tika 2.4.0 over the previous release are: * NOTE: To save on resources, we no longer include the deeplearning4j dependencies in the tika-dl jar. The dependencies for the tika-dl package must be provided by users. See:https://github.com/apache/tika/blob/main/tika-parsers/tika-parsers-ml/tika-dl/pom.xml for the dependencies that must be provided at run-time ({{{http://issues.apache.org/jira/browse/TIKA-3676}TIKA-3676}}). * NOTE: Added prefix "dwg-custom:" to DWG custom metadata properties ({{{http://issues.apache.org/jira/browse/TIKA-3731}TIKA-3731}}). * Add initial, BETA-grade TLS encryption option for tika-server; configuration may change in future releases ({{{http://issues.apache.org/jira/browse/TIKA-3719}TIKA-3719}}). * Allow specification of fetcherName and fetchKey via query parameters in request URI in tika-server ({{{http://issues.apache.org/jira/browse/TIKA-3714}TIKA-3714}}). * Add basic parsers for WARC and WACZ in tika-parsers-standard ({{{http://issues.apache.org/jira/browse/TIKA-3697}TIKA-3697}}). * Add MetadataWriteFilter capability to improve memory profile in Metadata objects ({{{http://issues.apache.org/jira/browse/TIKA-3695}TIKA-3695}}). * Allow configurability of the ContentHandlerDecorator used by the AutoDetectParser ({{{http://issues.apache.org/jira/browse/TIKA-3723}TIKA-3723}}). * Allow configurability of the EmbeddedDocumentExtractor used by the AutoDetectParser ({{{http://issues.apache.org/jira/browse/TIKA-3711}TIKA-3711}}). * Add detection for Frictionless Data packages and WACZ ({{{http://issues.apache.org/jira/browse/TIKA-3696}TIKA-3696}}). * Add detection for DGN files with gratitude and credit to Steven Frew's tika-dgn-detector ({{{http://issues.apache.org/jira/browse/TIKA-3721}TIKA-3721}}). * Add parser for metadata from DGN 8 files via Dan Coldrick ({{{http://issues.apache.org/jira/browse/TIKA-3721}TIKA-3721}}). * Add a fetcher and emitter for Azure blob storage ({{{http://issues.apache.org/jira/browse/TIKA-3707}TIKA-3707}}). * Add detection for files encrypted by Microsoft's Rights Management Service({{{http://issues.apache.org/jira/browse/TIKA-3666}TIKA-3666}}). * Fixed regression in 2.3.0 that led to more embedded filenames than appropriate being written to the content ({{{http://issues.apache.org/jira/browse/TIKA-3711}TIKA-3711}}). * tika-server now clones forking process' environment variables into forked process ({{{http://issues.apache.org/jira/browse/TIKA-3715}TIKA-3715}}). * Add an optional /eval endpoint for tika-eval profile or compare capabilities in tika-server ({{{http://issues.apache.org/jira/browse/TIKA-3689}TIKA-3689}}). * Add a Parsed-By-Full-Set metadata item to record all parsers that processed a file ({{{http://issues.apache.org/jira/browse/TIKA-3716}TIKA-3716}}). * Add metadata filters for Optimaize and OpenNLP language detectors ({{{http://issues.apache.org/jira/browse/TIKA-3717}TIKA-3717}}). * Upgrade to PDFBox 2.0.26 ({{{http://issues.apache.org/jira/browse/TIKA-3726}TIKA-3726}}). * Upgrade deeplearning4j to 1.0.0-M2 ({{{http://issues.apache.org/jira/browse/TIKA-3458}TIKA-3458}} and PR#527). * Various dependency upgrades, including POI, dl4j, gson, jackson, twelvemonkeys, log4j2 and others ({{{http://issues.apache.org/jira/browse/TIKA-3675}TIKA-3675}} and many PRs from dependabot). The following people have contributed to Tika 2.4.0 by submitting or commenting on the issues resolved in this release: * August Valera * beamliu * Dan Coldrick * Julien Massiera * Lewis John McGibbney * Nick Burch * PJ Fanning * Sam Stephens * Thierry Guérin * Tim Allison * Zac Jacobson See {{https://s.apache.org/59u4j}} for more details on these contributions.