---------------- Security ---------------- ~~ Licensed to the Apache Software Foundation (ASF) under one or more ~~ contributor license agreements. See the NOTICE file distributed with ~~ this work for additional information regarding copyright ownership. ~~ The ASF licenses this file to You under the Apache License, Version 2.0 ~~ (the "License"); you may not use this file except in compliance with ~~ the License. You may obtain a copy of the License at ~~ ~~ https://www.apache.org/licenses/LICENSE-2.0 ~~ ~~ Unless required by applicable law or agreed to in writing, software ~~ distributed under the License is distributed on an "AS IS" BASIS, ~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. ~~ See the License for the specific language governing permissions and ~~ limitations under the License. Security The following is an incomplete list of known and fixed Critical Vulnerabilities and Exposures (CVEs) and other vulnerabilities in Apache Tika or its dependencies. Please help us fill this in with more details. *-------------*-------------*----------------*------------------* |CVE or Vulnerability| Description | Reporter | Affected Versions| *-------------*-------------*----------------*------------------* | {{{https://nvd.nist.gov/vuln/detail/CVE-2023-42503} CVE-2023-42503}} | commons-compress uncontrolled resource consumption vulnerability while parsing tar files| ??? | ???->2.9.0 | *-------------*-------------*----------------*------------------* | {{{https://lists.apache.org/thread/wfno8mf5nlcvbs78z93q9thgrm30wwfh} CVE-2022-33879}} | Regex DoS in StandardsExtractingContentHandler; incomplete fix for CVE-2022-30973/CVE-2022-30216 and a new one | Tony Torralba, Jaroslav Lobačevski and Tim Allison |???-2.4.0 and ???-1.28.3| *-------------*-------------*----------------*------------------* | {{{https://lists.apache.org/thread/gqvb5t4p7tmdpl0y5bdbf72pgxj04h7p} CVE-2022-30973}} | Regex DoS in StandardsExtractingContentHandler; missed fix in 1.28.2 | Cathy Hu, SUSE Software Solutions Germany GmbH |???-1.28.2| *-------------*-------------*----------------*------------------* | {{{https://lists.apache.org/thread/t3tb51sf0k2pmbnzsrrrm23z9r1c10rk} CVE-2022-25169}} | BPGParser Memory Usage DoS | ??? |???-2.3.0 and ???-1.28.1| *-------------*-------------*----------------*------------------* | {{{https://lists.apache.org/thread/dh3syg68nxogbmlg13srd6gjn3h2z6r4} CVE-2022-30216}} | Regex DoS in StandardsExtractingContentHandler | CodeQL team members Tony Torralba and Joseph Farebrother |???-2.3.0 and ???-1.28.1| *-------------*-------------*----------------*------------------* | {{{https://nvd.nist.gov/vuln/detail/CVE-2021-44832} CVE-2021-44832}} | Remote Code Execution via JDBC Appender in log4j2 | ??? |2.0.0-BETA-2.2.1| *-------------*-------------*----------------*------------------* | {{{https://nvd.nist.gov/vuln/detail/CVE-2021-44228} CVE-2021-44228}} | Critical Remote Code Execution in log4j2 | ??? |2.0.0-BETA-2.1.0| *-------------*-------------*----------------*------------------* | {{{https://lists.apache.org/thread.html/ra2ab0ce69ce8aaff0773b8c1036438387ce004c2afc6f066626e205e%40%3Cusers.pdfbox.apache.org%3E} CVE-2021-31812}} | Infinite loop when loading a crafted PDF in PDFBox before 2.0.24 | Chaoyuan Peng |?-1.26| *-------------*-------------*----------------*------------------* | {{{https://lists.apache.org/thread.html/re3bd16f0cc8f1fbda46b06a4b8241cd417f71402809baa81548fc20e%40%3Cusers.pdfbox.apache.org%3E} CVE-2021-31811}} | OutOfMemoryException when loading a crafted PDF in PDFBox before 2.0.24 | Chaoyuan Peng |?-1.26| *-------------*-------------*----------------*------------------* | {{{https://lists.apache.org/thread.html/r915add4aa52c60d1b5cf085039cfa73a98d7fae9673374dfd7744b5a%40%3Cdev.tika.apache.org%3E} CVE-2021-28657}} | Infinite loop in the MP3Parser.| Khaled Nassar |?-1.25| *-------------*-------------*----------------*------------------* | {{{https://lists.apache.org/thread.html/rf35026148ccc0e1af133501c0d003d052883fcc65107b3ff5d3b61cd%40%3Cusers.pdfbox.apache.org%3E}CVE-2021-27906}} | Out of memory error while loading a file in PDFBox before 2.0.23.| Fabian Meumertzheim |?-1.25| *-------------*-------------*----------------*------------------* | {{{https://lists.apache.org/thread.html/r4717f902f8bc36d47b3fa978552a25e4ed3ddc2fffb52b94fbc4ab36%40%3Cusers.pdfbox.apache.org%3E} CVE-2021-27807}} | Infinite loop while loading a file in PDFBox before 2.0.23.| Fabian Meumertzheim |?-1.25| *-------------*-------------*----------------*------------------* | {{{https://lists.apache.org/thread.html/r4d943777e36ca3aa6305a45da5acccc54ad894f2d5a07186cfa2442c%40%3Cdev.tika.apache.org%3E} CVE-2020-9489}} | System.exit vulnerability in Tika's OneNote Parser; out of memory errors and/or infinite loops in Tika's ICNSParser, MP3Parser, MP4Parser, SAS7BDATParser, OneNoteParser and ImageParser.| Tim Allison |1.0-1.24| *-------------*-------------*----------------*------------------* | {{{https://lists.apache.org/thread.html/r463b1a67817ae55fe022536edd6db34e8f9636971188430cbcf8a8dd%40%3Cdev.tika.apache.org%3E} CVE-2020-1950}} | Excessive memory usage (DoS) vulnerability in Apache Tika's PSDParser |Pierre Ernst |1.0-1.23| *-------------*-------------*----------------*------------------* | {{{https://lists.apache.org/thread.html/rd8c1b42bd0e31870d804890b3f00b13d837c528f7ebaf77031323172%40%3Cdev.tika.apache.org%3E} CVE-2020-1951}} | Infinite Loop (DoS) vulnerability in Apache Tika's PSDParser |Tim Allison |1.0-1.23| *-------------*-------------*----------------*------------------* | {{{https://lists.apache.org/thread.html/fe876a649d9d36525dd097fe87ff4dcb3b82bb0fbb3a3d71fb72ef61@%3Cdev.tika.apache.org%3E} CVE-2019-10094}} | StackOverflow from Crafted Package/Compressed Files in Apache Tika's RecursiveParserWrapper|Tim Allison; files contributed by Matthew Barber and Erling Ellingsen |1.7-1.21| *-------------*-------------*----------------*------------------* | {{{https://lists.apache.org/thread.html/a5a44eff1b9eda3bc69d22943a1030c43d376380c75d3ab04d0c1a21@%3Cdev.tika.apache.org%3E} CVE-2019-10093}} | Denial of Service in Apache Tika's 2003ml and 2006ml Parsers|Tim Allison|1.19-1.21| *-------------*-------------*----------------*------------------* | {{{https://lists.apache.org/thread.html/1c63555609b737c20d1bbfa4a3e73ec488e3408a84e2f5e47e1b7e08@%3Cdev.tika.apache.org%3E} CVE-2019-10088}} | OOM from a crafted Zip File in Apache Tika's RecursiveParserWrapper|RunningSnail|1.7-1.21| *-------------*-------------*----------------*------------------* | {{{https://issues.apache.org/jira/browse/PDFBOX-4550} PDFBOX-4550}} | OOM from corrupt ToUnicode stream in PDFs|Tilman Hausherr|?-1.21| *-------------*-------------*----------------*------------------* | {{{https://nvd.nist.gov/vuln/detail/CVE-2019-0228} CVE-2019-0228}} | XML External Entity (XXE) in xfdf loading in PDFBox (regular Tika parsing would likely not be vulnerable) |Kurt Boberg|?-1.20| *-------------*-------------*----------------*------------------* | {{{https://nvd.nist.gov/vuln/detail/CVE-2018-20346} CVE-2018-20346}} | (Provided) SQLite before 3.52.3 allows remote attackers to execute arbitrary code|Pat Cashman (notified Tika team)|?-1.20| *-------------*-------------*----------------*------------------* | {{{https://lists.apache.org/thread.html/7c021a4ea2037e52e74628e17e8e0e2acab1f447160edc8be0eae6d3@%3Cdev.tika.apache.org%3E}CVE-2018-17197}} | Infinite Loop in Tika's SQLite3Parser |Tim Allison |1.8-1.19.1| *-------------*-------------*----------------*------------------* | {{{https://lists.apache.org/thread.html/88de8350cda9b184888ec294c813c5bd8a2081de8fd3666f8904bc05@%3Cdev.tika.apache.org%3E}CVE-2018-11796}} | XML Entity Expansion in Tika's SAXParsers after reset() |Slava Gorelik |?-1.19| *-------------*-------------*----------------*------------------* | {{{https://lists.apache.org/thread.html/b7eb142436d2620646d1da087ca004159241d3930a9463b476700a4d@%3Cdev.pdfbox.apache.org%3E}CVE-2018-11797}} | Very long loop parsing page tree in PDFBox |Shawn Rasheed and Jens Dietrich |?-1.19| *-------------*-------------*----------------*------------------* | {{{http://mail-archives.us.apache.org/mod_mbox/www-announce/201808.mbox/%3C87in4apjvv.fsf@v45346.1blu.de%3E}CVE-2018-11771}} | Infinite Loop in Commons-Compress ZipArchiveInputStream |Tobias Ospelt |?-1.18| *-------------*-------------*----------------*------------------* | {{{https://lists.apache.org/thread.html/72df7a3f0dda49a912143a1404b489837a11f374dfd1961061873a91@%3Cdev.tika.apache.org%3E}CVE-2018-8017}} | Infinite Loop in IptcAnpaParser|Rohan Padhye and Tobias Ospelt |1.2-1.18| *-------------*-------------*----------------*------------------* | {{{https://lists.apache.org/thread.html/9f62f742fd4fcd81654a9533b8a71349b064250840592bcd502dcfb6@%3Cusers.pdfbox.apache.org%3E}CVE-2018-8036}} | Infinite Loop leading to OOM in PDFBox's AFMParser|Tobias Ospelt |?-1.18| *-------------*-------------*----------------*------------------* | {{{https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2018-12418}CVE-2018-12418}} | Infinite Loop in junrar|Tobias Ospelt |?-1.18| *-------------*-------------*----------------*------------------* | {{{https://lists.apache.org/thread.html/5553e10bba5604117967466618f219c0cae710075819c70cfb3fb421@%3Cdev.tika.apache.org%3E}CVE-2018-11761}} | XML Entity Expansion Vulnerability|Renfei (Brian) Wang |0.1-1.18| *-------------*-------------*----------------*------------------* | {{{https://lists.apache.org/thread.html/ab2e1af38975f5fc462ba89b517971ef892ec3d06bee12ea2258895b@%3Cdev.tika.apache.org%3E}CVE-2018-11762}} | Rare Zip Slip Vulnerability in tika-app|Tim Allison |0.9-1.18| *-------------*-------------*----------------*------------------* | {{{http://mail.openjdk.java.net/pipermail/sound-dev/2015-September/000349.html}RIFFReader}} | Infinite Loop in AudioParser in Java 8 and 9|Sergey Bylokhov and Tobias Ospelt |?-1.18| *-------------*-------------*----------------*------------------* | {{{https://issues.apache.org/jira/browse/TIKA-2446}TIKA-2446}} | OOM detecting OPCPackage files with corrupt ZIP|Thorsten Schäfer |?-1.18| *-------------*-------------*----------------*------------------* | {{{https://issues.apache.org/jira/browse/PDFBOX-4014}PDFBOX-4014}} | Infinite loop in JBig2 (versions less than 3.0.0) | Hanno Böck | (if user supplied) ?-1.17| *-------------*-------------*----------------*------------------* | {{{https://www.cvedetails.com/cve/CVE-2018-1339}CVE-2018-1339}} | Infinite loop in ChmParser|Tobias Ospelt |?-1.17| *-------------*-------------*----------------*------------------* |{{{https://www.cvedetails.com/cve/CVE-2018-1338}CVE-2018-1338}} | Infinite loop in BPGParser| Tobias Ospelt | ?-1.17| *-------------*-------------*----------------*------------------* |{{{http://mail-archives.apache.org/mod_mbox/www-announce/201804.mbox/%3CCAC1dCwVhrPRyFJMS5BbY02%2B495CUODrAzndqZkvKacJnXUSm%2Bw%40mail.gmail.com%3E}CVE-2018-1335}} | Command Execution in tika-server | Tim Allison | ?-1.17| *-------------*-------------*----------------*------------------* |{{{https://www.cvedetails.com/cve/CVE-2017-12626}CVE-2017-12626}} | Apache POI - Infinite loops in WMF, EMF, MSG and macros; OOMs in DOC, PPT and XLS | Tim Allison, Luís Filipe Nassif and Jerome Lacoste| ?-1.17| *-------------*-------------*----------------*------------------* |{{{https://nvd.nist.gov/vuln/detail/CVE-2018-1324}CVE-2018-1324}} and {{{https://issues.apache.org/jira/browse/COMPRESS-432}COMPRESS-432}} | Commons Compress - Infinite loop in ZipFile | Luís Filipe Nassif and Anton Abashkin | ?-1.17| *-------------*-------------*----------------*------------------* |{{{https://www.cvedetails.com/cve/CVE-2018-7489/}CVE-2018-7489}} and {{{https://issues.apache.org/jira/browse/TIKA-2634}TIKA-2634}} | Jackson - Deserialization vulnerability | Richard Cyganiak (notified Tika team) | ?-1.17| *-------------*-------------*----------------*------------------* |{{{https://issues.apache.org/jira/browse/PDFBOX-3919}PDFBOX-3919}} | Apache PDFBox - Infinite loop | Hanno Böck and Andreas Bogk | ?-1.16| *-------------*-------------*----------------*------------------* |{{{https://issues.apache.org/jira/browse/TIKA-2115}TIKA-2115}} | Apache POI - OOM parsing OLE object| Thomas Galla | ?-1.15| *-------------*-------------*----------------*------------------* |{{{https://issues.apache.org/jira/browse/COMPRESS-382}COMPRESS-382}} | Commons Compress - OOM detecting corrupt LZMA | Luís Filipe Nassif | ?-1.15| *-------------*-------------*----------------*------------------* |{{{https://issues.apache.org/jira/browse/COMPRESS-386}COMPRESS-386}} and {{{https://issues.apache.org/jira/browse/TIKA-1631}TIKA-1631}} | Commons Compress - OOM detecting corrupt x-compress | Pavel Micka | ?-1.15| *-------------*-------------*----------------*------------------* |{{{https://issues.apache.org/jira/browse/TIKA-2045}TIKA-2045}} and {{{https://issues.apache.org/jira/browse/PDFBOX-3442}TIKA-3442}} | Apache PDFBox - OOM in font caching | Egbert | ?-1.13| *-------------*-------------*----------------*------------------* |{{{https://issues.apache.org/jira/browse/TIKA-1866}TIKA-1866}} and {{{https://issues.apache.org/jira/browse/TIKA-954}TIKA-954}} | Apache POI - OOM in DOCX and PPTX because of bug in Piccolo parser| Rob Tulloh and Shawn Johnson | ?-1.13| *-------------*-------------*----------------*------------------* |{{{https://issues.apache.org/jira/browse/TIKA-2040}TIKA-2040}} | GC-Overload and OOM in CHMParser| Luís Filipe Nassif | ?-1.13| *-------------*-------------*----------------*------------------* |{{{https://www.cvedetails.com/cve/CVE-2016-6809}CVE-2016-6809}} | jmatio - Deserialization Vulnerability in MATLAB parser| Pierre Ernst | 1.6-1.13| *-------------*-------------*----------------*------------------* |{{{https://www.cvedetails.com/cve/CVE-2016-4434}CVE-2016-4434}} | XXE Vulnerability in several parsers | Arthur Khashaev, Seulgi Kim, Mesut Timur (and Tim Allison while remediating initial issue reported by Arthur et al.)| 0.10-1.12| *-------------*-------------*----------------*------------------* |{{{https://nvd.nist.gov/vuln/detail/CVE-2016-2175}CVE-2016-2175}} | XML External Entity (XXE) in PDFBox | ???| ?-1.12| *-------------*-------------*----------------*------------------* |{{{https://www.cvedetails.com/cve/CVE-2015-3271}CVE-2015-3271}} | Remote Access to host files via tika-server| Tim Allison | 1.9?-1.10| *-------------*-------------*----------------*------------------* |{{{https://issues.apache.org/jira/browse/PDFBOX-2811}PDFBOX-2811}} | Apache PDFBox - Infinite Loop| Andreas Lehmkühler | ?-1.10| *-------------*-------------*----------------*------------------* |{{{https://issues.apache.org/jira/browse/PDFBOX-2200}PDFBOX-2200}} | Apache PDFBox - Slowly building memory leak because of static caching of fonts| Matthew Buckett | ?-1.6| *-------------*-------------*----------------*------------------* |{{{https://issues.apache.org/jira/browse/TIKA-1471}TIKA-1471}} | Apache PDFBox - OOM with corrupt PDF| Alan Burlison | ?-1.6| *-------------*-------------*----------------*------------------* |{{{https://issues.apache.org/jira/browse/TIKA-788}TIKA-788}} | Infinite Loop in DWG | Stas Shaposhnikov | ?-1.4?| *-------------*-------------*----------------*------------------* |{{{https://issues.apache.org/jira/browse/TIKA-1132}TIKA-1132}} | Apache POI - Nearly Infinite Loop in XLS| Ryan Krueger | ?-1.4| *-------------*-------------*----------------*------------------* |{{{https://issues.apache.org/jira/browse/TIKA-1179}TIKA-1179}} | Infinite Loop in corrupt MP3| Marius Dumitru Florea| ?-1.4| *-------------*-------------*----------------*------------------* |{{{https://issues.apache.org/jira/browse/TIKA-866}TIKA-866}} | OOM reading Tika config file| Stephan Mühlstrasser | ?-1.1| *-------------*-------------*----------------*------------------* Third party vulnerabilities that may or may not be triggerable via regular use of Apache Tika. *-------------*-------------*----------------*------------------* |CVE or Vulnerability| Description | Reporter | Affected Versions| *-------------*-------------*----------------*------------------* | {{{https://nvd.nist.gov/vuln/detail//CVE-2018-10237} CVE-2018-10237}} | Unbounded memory allocation in Google Guava|Pat Cashman (notified Tika team)|?-1.20| *-------------*-------------*----------------*------------------* |{{{https://nvd.nist.gov/vuln/detail/CVE-2018-19362}CVE-2018-19362}} |FaxterXML jackson-databind may allow attackers to have unspecified impact from polymorphic deserialization |Pat Cashman (notified Tika team)| ?-1.20| *-------------*-------------*----------------*------------------* Acronyms and Terms * Command Execution -- A malicious client could execute anything on tika-server's commandline * Deserialization Vulnerability -- {{{https://www.owasp.org/index.php/Deserialization_Cheat_Sheet}OWASP's Cheat Sheet}}. A malicious actor could run arbitrary code on your computer. * OOM -- Out of Memory Error -- Parsers may allocate more memory than is available. This can sometimes be caused by parsers not performing sanity checks before allocation. See, for example: {{{https://issues.apache.org/jira/browse/TIKA-1631}TIKA-1631}} * XXE -- {{{https://www.owasp.org/index.php/XML_External_Entity_(XXE)_Processing} XML External Entity Processing}} A malicious client could access data on your system.