Class SecureContentHandler

All Implemented Interfaces:
ContentHandler, DTDHandler, EntityResolver, ErrorHandler

public class SecureContentHandler extends ContentHandlerDecorator
Content handler decorator that attempts to prevent denial of service attacks against Tika parsers.

Currently this class simply compares the number of output characters to to the number of input bytes and keeps track of the XML nesting levels. An exception gets thrown if the output seems excessive compared to the input document. This is a strong indication of a zip bomb.

Since:
Apache Tika 0.4
See Also:
  • Constructor Details

    • SecureContentHandler

      public SecureContentHandler(ContentHandler handler, TikaInputStream stream)
      Decorates the given content handler with zip bomb prevention based on the count of bytes read from the given counting input stream. The resulting decorator can be passed to a Tika parser along with the given counting input stream.
      Parameters:
      handler - the content handler to be decorated
      stream - the input stream to be parsed
  • Method Details

    • getOutputThreshold

      public long getOutputThreshold()
      Returns the configured output threshold.
      Returns:
      output threshold
    • setOutputThreshold

      public void setOutputThreshold(long threshold)
      Sets the threshold for output characters before the zip bomb prevention is activated. This avoids false positives in cases where an otherwise normal document for some reason starts with a highly compressible sequence of bytes.
      Parameters:
      threshold - new output threshold
    • getMaximumCompressionRatio

      public long getMaximumCompressionRatio()
      Returns the maximum compression ratio.
      Returns:
      maximum compression ratio
    • setMaximumCompressionRatio

      public void setMaximumCompressionRatio(long ratio)
      Sets the ratio between output characters and input bytes. If this ratio is exceeded (after the output threshold has been reached) then an exception gets thrown.
      Parameters:
      ratio - new maximum compression ratio
    • getMaximumDepth

      public int getMaximumDepth()
      Returns the maximum XML element nesting level.
      Returns:
      maximum XML element nesting level
    • setMaximumDepth

      public void setMaximumDepth(int depth)
      Sets the maximum XML element nesting level. If this depth level is exceeded then an exception gets thrown.
      Parameters:
      depth - maximum XML element nesting level
    • getMaximumPackageEntryDepth

      public int getMaximumPackageEntryDepth()
      Returns the maximum package entry nesting level.
      Returns:
      maximum package entry nesting level
    • setMaximumPackageEntryDepth

      public void setMaximumPackageEntryDepth(int depth)
      Sets the maximum package entry nesting level. If this depth level is exceeded then an exception gets thrown.
      Parameters:
      depth - maximum package entry nesting level
    • throwIfCauseOf

      public void throwIfCauseOf(SAXException e) throws TikaException
      Converts the given SAXException to a corresponding TikaException if it's caused by this instance detecting a zip bomb.
      Parameters:
      e - SAX exception
      Throws:
      TikaException - zip bomb exception
    • advance

      protected void advance(int length) throws SAXException
      Records the given number of output characters (or more accurately UTF-16 code units). Throws an exception if the recorded number of characters highly exceeds the number of input bytes read.
      Parameters:
      length - number of new output characters produced
      Throws:
      SAXException - if a zip bomb is detected
    • startElement

      public void startElement(String uri, String localName, String name, Attributes atts) throws SAXException
      Specified by:
      startElement in interface ContentHandler
      Overrides:
      startElement in class ContentHandlerDecorator
      Throws:
      SAXException
    • endElement

      public void endElement(String uri, String localName, String name) throws SAXException
      Specified by:
      endElement in interface ContentHandler
      Overrides:
      endElement in class ContentHandlerDecorator
      Throws:
      SAXException
    • characters

      public void characters(char[] ch, int start, int length) throws SAXException
      Specified by:
      characters in interface ContentHandler
      Overrides:
      characters in class ContentHandlerDecorator
      Throws:
      SAXException
    • ignorableWhitespace

      public void ignorableWhitespace(char[] ch, int start, int length) throws SAXException
      Specified by:
      ignorableWhitespace in interface ContentHandler
      Overrides:
      ignorableWhitespace in class ContentHandlerDecorator
      Throws:
      SAXException