Class TextStatsFromTikaEval

java.lang.Object
org.apache.tika.example.TextStatsFromTikaEval

public class TextStatsFromTikaEval extends Object
These examples create a new CompositeTextStatsCalculator for each call. This is extremely inefficient because the lang id model has to be loaded and the common words for each call.
  • Constructor Details

    • TextStatsFromTikaEval

      public TextStatsFromTikaEval()
  • Method Details

    • getOOV

      public double getOOV(String txt)
      Use the default language id models and the default common tokens lists in tika-eval to calculate the out-of-vocabulary percentage for a given string.
      Parameters:
      txt -
      Returns: