Class PDFParserConfig.OCRStrategyAuto

java.lang.Object
org.apache.tika.parser.pdf.PDFParserConfig.OCRStrategyAuto
All Implemented Interfaces:
Serializable
Enclosing class:
PDFParserConfig

public static class PDFParserConfig.OCRStrategyAuto extends Object implements Serializable
Encapsulate the numbers used to control OCR Strategy when set to auto

If the total characters on the page < this.totalCharsPerPage or total unmapped unicode characters on the page > this.unmappedUnicodeCharsPerPage then we will perform OCR on the page

If unamppedUnicodeCharsPerPage is an integer > 0, then we compare absolute number of characters. If it is a float < 1, then we assume it is a percentage and we compare it to the percentage of unmappedCharactersPerPage/totalCharsPerPage

See Also:
  • Constructor Details

    • OCRStrategyAuto

      public OCRStrategyAuto(float unmappedUnicodeCharsPerPage, int totalCharsPerPage)
  • Method Details

    • getUnmappedUnicodeCharsPerPage

      public float getUnmappedUnicodeCharsPerPage()
    • getTotalCharsPerPage

      public int getTotalCharsPerPage()
    • toString

      public String toString()
      Overrides:
      toString in class Object