public interface CollectionProcessingManager
CollectionProcessingManager
(CPM) manages the application of an
AnalysisEngine
to a collection of artifacts. For text analysis applications, this will be
a collection of documents. The analysis results will then be delivered to one ore more
CasConsumer
s.
The CPM is configured with an Analysis Engine and CAS Consumers by calling its
setAnalysisEngine(AnalysisEngine)
and addCasConsumer(CasConsumer)
methods.
Collection processing is then initiated by calling the process(CollectionReader)
or
process(CollectionReader,int)
methods.
The process
methods take a CollectionReader
object as an argument. The
Collection Reader retrieves each artifact from the collection as a
CAS
object.
Listeners can register with the CPM by calling the
addStatusCallbackListener(StatusCallbackListener)
method. These listeners receive status
callbacks during the processing. At any time, performance and progress reports are available from
the getPerformanceReport()
and getProgress()
methods.
A CPM implementation may choose to implement parallelization of the processing, but this is not a requirement of the architecture.
Note that a CPM only supports processing one collection at a time. Attempting to reconfigure a
CPM or start a new processing job while a previous processing job is occurring will result in a
UIMA_IllegalStateException
. Processing multiple collections
simultaneously is done by instantiating and configuring multiple instances of the CPM.
A CollectionProcessingManager
instance can be obtained by calling
UIMAFramework.newCollectionProcessingManager()
.
Modifier and Type | Method and Description |
---|---|
void |
addCasConsumer(CasConsumer aCasConsumer)
Adds a
CasConsumer to this CPM. |
void |
addStatusCallbackListener(StatusCallbackListener aListener)
Registers a listsner to receive status callbacks.
|
AnalysisEngine |
getAnalysisEngine()
Gets the
AnalysisEngine that is assigned to this CPM. |
CasConsumer[] |
getCasConsumers()
Gets the
CasConsumers s assigned to this CPM. |
ProcessTrace |
getPerformanceReport()
Gets a performance report for the processing that is currently occurring or has just completed.
|
Progress[] |
getProgress()
Gets a progress report for the processing that is currently occurring or has just completed.
|
boolean |
isPaused()
Determines whether this CPM's processing is currently paused.
|
boolean |
isPauseOnException()
Gets whether this CPM will automatically pause processing if an exception occurs.
|
boolean |
isProcessing()
Determines whether this CPM is currently processing.
|
boolean |
isSerialProcessingRequired()
Gets whether this CPM is required to process the collection's elements serially (as opposed to
performing parallelization).
|
void |
pause()
Pauses processing.
|
void |
process(CollectionReader aCollectionReader)
Initiates processing of a collection.
|
void |
process(CollectionReader aCollectionReader,
int aBatchSize)
Initiates processing of a collection.
|
void |
removeCasConsumer(CasConsumer aCasConsumer)
Removes a
CasConsumer from this CPM. |
void |
removeStatusCallbackListener(StatusCallbackListener aListener)
Unregisters a status callback listener.
|
void |
resume()
Resumes processing that has been paused.
|
void |
resume(boolean aRetryFailed)
Resumes processing that has been paused.
|
void |
setAnalysisEngine(AnalysisEngine aAnalysisEngine)
Sets the
AnalysisEngine that is assigned to this CPM. |
void |
setPauseOnException(boolean aPause)
Sets whether this CPM will automatically pause processing if an exception occurs.
|
void |
setSerialProcessingRequired(boolean aRequired)
Sets whether this CPM is required to process the collection's elements serially* (as opposed to
performing parallelization).
|
void |
stop()
Stops processing.
|
AnalysisEngine getAnalysisEngine()
AnalysisEngine
that is assigned to this CPM.AnalysisEngine
that this CPM will use to analyze each CAS in the
collection.void setAnalysisEngine(AnalysisEngine aAnalysisEngine) throws ResourceConfigurationException
AnalysisEngine
that is assigned to this CPM.aAnalysisEngine
- the AnalysisEngine
that this CPM will use to analyze each CAS in the
collection.ResourceConfigurationException
- if this CPM is currently processingCasConsumer[] getCasConsumers()
CasConsumers
s assigned to this CPM.CasConsumer
svoid addCasConsumer(CasConsumer aCasConsumer) throws ResourceConfigurationException
CasConsumer
to this CPM.aCasConsumer
- a CasConsumer
to addResourceConfigurationException
- if this CPM is currently processingvoid removeCasConsumer(CasConsumer aCasConsumer)
CasConsumer
from this CPM.aCasConsumer
- the CasConsumer
to removeUIMA_IllegalStateException
- if this CPM is currently processingboolean isSerialProcessingRequired()
false
does not guarantee that
parallelization is performed; this is left up to the CPM implementation.void setSerialProcessingRequired(boolean aRequired)
false
.
Note that a value of false
does not guarantee that parallelization is performed;
this is left up to the CPM implementation.aRequired
- true if and only if serial processing is requiredUIMA_IllegalStateException
- if this CPM is currently processingboolean isPauseOnException()
resume(boolean)
method.void setPauseOnException(boolean aPause)
resume(boolean)
method.aPause
- true if and only if this CPM should pause on exceptionUIMA_IllegalStateException
- if this CPM is currently processingvoid addStatusCallbackListener(StatusCallbackListener aListener)
aListener
- the listener to addvoid removeStatusCallbackListener(StatusCallbackListener aListener)
aListener
- the listener to removevoid process(CollectionReader aCollectionReader) throws ResourceInitializationException
addStatusCallbackListener(StatusCallbackListener)
method.
A CPM can only process one collection at a time. If this method is called while a previous
processing request has not yet completed, a UIMA_IllegalStateException
will
result. To find out whether a CPM is free to begin another processing request, call the
isProcessing()
method.
aCollectionReader
- the CollectionReader
from which to obtain the Entities to be processedResourceInitializationException
- if an error occurs during initializationUIMA_IllegalStateException
- if this CPM is currently processingvoid process(CollectionReader aCollectionReader, int aBatchSize) throws ResourceInitializationException
process(CollectionReader)
, but it breaks the processing up into batches of a size
determined by the aBatchSize
parameter. Each CasConsumer
will be
notified at the end of each batch.aCollectionReader
- the CollectionReader
from which to obtain the Entities to be processedaBatchSize
- the size of the batch.ResourceInitializationException
- if an error occurs during initializationUIMA_IllegalStateException
- if this CPM is currently processingboolean isProcessing()
stop()
ped. If processing is paused,
this method will still return true
.void pause()
resume(boolean)
method.UIMA_IllegalStateException
- if no processing is currently occurringboolean isPaused()
void resume(boolean aRetryFailed)
aRetryFailed
- if processing was paused because an exception occurred (see
setPauseOnException(boolean)
), setting a value of true
for
this parameter will cause the failed entity to be retried. A value of
false
(the default) will cause processing to continue with the next
entity after the failure.UIMA_IllegalStateException
- if processing is not currently pausedvoid resume()
UIMA_IllegalStateException
- if processing is not currently pausedvoid stop()
UIMA_IllegalStateException
- if no processing is currently occuringProcessTrace getPerformanceReport()
Progress[] getProgress()
Progress
objects, each of which represents the progress in a
different set of units (for example number of entities or bytes)Copyright © 2006–2017 The Apache Software Foundation. All rights reserved.