Class BatchProcess

java.lang.Object
org.apache.tika.batch.BatchProcess
All Implemented Interfaces:
Callable<ParallelFileProcessingResult>

public class BatchProcess extends Object implements Callable<ParallelFileProcessingResult>
This is the main processor class for a single process. This class can only be run once.

It requires a FileResourceCrawler and FileResourceConsumers, and it can also support a StatusReporter and an Interrupter.

This is designed to shutdown if a parser has timed out or if there is an OutOfMemoryError. Consider using BatchProcessDriverCLI as a daemon/watchdog that monitors and can restart this batch process;

Note that this classs redirects stderr to stdout so that it can communicate without interference with the parent process on stderr.

  • Constructor Details

  • Method Details

    • call

      Runs main execution loop.

      Redirects stdout to stderr to keep clean communications over stdout with parent process

      Specified by:
      call in interface Callable<ParallelFileProcessingResult>
      Returns:
      result of the processing
      Throws:
      InterruptedException
    • setPauseOnEarlyTerminationMillis

      public void setPauseOnEarlyTerminationMillis(long pauseOnEarlyTerminationMillis)
      If there is an early termination via an interrupt or too many timed out consumers or because a consumer or other Runnable threw a Throwable, pause this long before interrupting the consumers and other threads.

      Typically makes sense for this to be the same or slightly larger than timeoutThresholdMillis

      Parameters:
      pauseOnEarlyTerminationMillis - how long to pause if there is an early termination
    • setTimeoutThresholdMillis

      public void setTimeoutThresholdMillis(long timeoutThresholdMillis)
      The amount of time allowed before a consumer should be timed out.
      Parameters:
      timeoutThresholdMillis - threshold in milliseconds before declaring a consumer timed out
    • setTimeoutCheckPulseMillis

      public void setTimeoutCheckPulseMillis(long timeoutCheckPulseMillis)
    • setMaxAliveTimeSeconds

      public void setMaxAliveTimeSeconds(int maxAliveTimeSeconds)
      The maximum amount of time that this process can be alive. To avoid memory leaks, it is sometimes beneficial to shutdown (and restart) the process periodically.

      If the value is < 0, the process will run until completion, interruption or exception.

      Parameters:
      maxAliveTimeSeconds - maximum amount of time in seconds to remain alive