Class InputStreamDigester

java.lang.Object
org.apache.tika.parser.digest.InputStreamDigester
All Implemented Interfaces:
DigestingParser.Digester

public class InputStreamDigester extends Object implements DigestingParser.Digester
  • Constructor Details

    • InputStreamDigester

      public InputStreamDigester(int markLimit, String algorithm, DigestingParser.Encoder encoder)
    • InputStreamDigester

      public InputStreamDigester(int markLimit, String algorithm, String algorithmKeyName, DigestingParser.Encoder encoder)
      Parameters:
      markLimit - limit in bytes to allow for mark/reset. If the inputstream is longer than this limit, the stream will be reset and then spooled to a temporary file. Throws IllegalArgumentException if < 0.
      algorithm - name of the digest algorithm to retrieve from the Provider
      algorithmKeyName - name of the algorithm to store as part of the key in the metadata when digest(InputStream, Metadata, ParseContext) is called
      encoder - encoder to convert the byte array returned from the digester to a string
  • Method Details

    • getProvider

      protected Provider getProvider()
      When subclassing this, becare to ensure that your provider is thread-safe (not likely) or return a new provider with each call.
      Returns:
      provider to use to get the MessageDigest from the algorithm name. Default is to return null.
    • digest

      public void digest(InputStream is, Metadata metadata, ParseContext parseContext) throws IOException
      Description copied from interface: DigestingParser.Digester
      Digests an InputStream and sets the appropriate value(s) in the metadata. The Digester is also responsible for marking and resetting the stream.

      The given stream is guaranteed to support the mark feature and the detector is expected to mark the stream before reading any bytes from it, and to reset the stream before returning. The stream must not be closed by the detector.

      Specified by:
      digest in interface DigestingParser.Digester
      Parameters:
      is - InputStream to digest. Best to use a TikaInputStream because of potential need to spool to disk. InputStream must support mark/reset.
      metadata - metadata in which to store the digest information
      parseContext - ParseContext -- not actually used yet, but there for future expansion
      Throws:
      IOException - on IO problem or IllegalArgumentException if algorithm couldn't be found