A Flow Controller is a component that plugs into an Aggregate Analysis Engine. When a CAS is input to the Aggregate, the Flow Controller determines the order in which the components of that aggregate are invoked on that CAS. The ability to provide your own Flow Controller implementation is new as of release 2.0 of UIMA.
Flow Controllers may decide the flow dynamically, based on the contents of the CAS. So, as just one example, you could develop a Flow Controller that first sends each CAS to a Language Identification Annotator and then, based on the output of the Language Identification Annotator, routes that CAS to an Annotator that is specialized for that particular language.
Flow Controller implementations should extend from the JCasFlowController_ImplBase
or CasFlowController_ImplBase
classes, depending on which CAS interface they prefer to use. As with other types of components, the Flow
Controller ImplBase classes define optional initialize
,
destroy
, and reconfigure
methods. They also define the required
method computeFlow
.
The computeFlow
method is
called by the framework whenever a new CAS enters the Aggregate Analysis
Engine. It is given the CAS as an
argument and must return an object which implements the Flow
interface (the Flow object). The Flow
Controller developer must define this object. It is the object that is responsible for routing this particular CAS
through the components of the Aggregate Analysis Engine. For convenience, the framework provides basic
implementation of flow objects in the classes CasFlow_ImplBase and JCasFlow_ImplBase; use the JCas one if you are
using the JCas interface to the CAS.
The framework then uses the Flow object and calls its next()
method, which returns a Step
object (implemented by the UIMA Framework) that indicates what to do next with
this CAS next. There are two common
types of steps:
SimpleStep
,
which indicates the Analysis Engine that should receive the CAS next.FinalStep
,
which indicates that the flow is completed.
There may be additional types of steps added in future versions; for example to invoke multiple Analysis Engines in parallel on the same CAS.
After executing the step, the framework will call the Flow
object's next()
method again to determine the next
destination, and this will be repeated until the Flow Object indicates that
processing is complete by returning a FinalStep
.
The Flow Controller has access to a FlowControllerContext
,
which is a subtype of UimaContext
. In addition to the configuration parameter
and resource access provided by a UimaContext
, the FlowControllerContext
also gives access to the metadata
for all of the Analysis Engines that the Flow Controller can route CASes
to. Most Flow Controllers will need to
use this information to make routing decisions. You can get a handle to the FlowControllerContext
by calling the getContext()
method defined in JCasFlowController_ImplBase
and CasFlowController_ImplBase
. Then, the FlowControllerContext.
getAnalysisEngineMetaDataMap
can be called to get a map containing an
entry for each of the Analysis Engines in the Aggregate. The keys in this map are the same as the
delegate analysis engine keys specified in the aggregate descriptor, and the
values are the corresponding Analysis Engine MetaData objects.
This section walks through the source code of an example Flow Controller that implements a simple version of the "Whiteboard" flow model. At each step of the flow, the Flow Controller looks it all of the available Analysis Engines that have not yet run on this CAS, and picks one whose input requirements are satisfied.
The Java class for the example is com.ibm.uima.examples.flow.WhiteboardFlowController
and the source code is included in the UIMA SDK under the docs/examples/src
directory.
public class WhiteboardFlowController extends CasFlowController_ImplBase
{
public Flow computeFlow(CAS aCAS) throws AnalysisEngineProcessException
{
WhiteboardFlow flow = new WhiteboardFlow();
flow.setCas(aCAS);
return flow;
}
class WhiteboardFlow extends CasFlow_ImplBase { // Discussed Later } }
The WhiteboardFlowController
extends from CasFlowController_ImplBase
and
implements the computeFlow
method. The implementation of the computeFlow
method is very simple; it just constructs a
new WhiteboardFlow
object that will be responsible
for routing this CAS, and calls the WhiteboardFlow.setCas
method to give it a handle to that CAS, which it will later use to make its
routing decisions. The setCas
method is a method provided by the ..._ImplBase
classes for Flows.
Note that we will have one instance of WhiteboardFlow
per CAS, so if there are multiple CASes
being simultaneously processed there will not be any confusion.
class WhiteboardFlow extends CasFlow_ImplBase { private Set mAlreadyCalled = new HashSet(); public Step next() throws AnalysisEngineProcessException { //Get the CAS that this Flow object is responsible for routing. //Each Flow instance is responsible for a single CAS CAS cas = getCas(); //iterate over available AEs Iterator aeIter = getContext().getAnalysisEngineMetaDataMap() .entrySet().iterator(); while (aeIter.hasNext()) { Map.Entry entry = (Map.Entry) aeIter.next(); //skip AEs that were already called on this CAS String aeKey = (String) entry.getKey(); if (!mAlreadyCalled.contains(aeKey)) { //check for satisfied input capabilities // (i.e. the CAS contains at least one instance //of each required input) AnalysisEngineMetaData md = (AnalysisEngineMetaData)entry.getValue(); Capability[] caps = md.getCapabilities(); boolean satisfied = true; for (int i = 0; i < caps.length; i++) { satisfied = inputsSatisfied(caps[i].getInputs(), cas); if (satisfied) break; } if (satisfied) { mAlreadyCalled.add(aeKey); return new SimpleStep(aeKey); } } }
//no appropriate AEs to call - end of flow return new FinalStep(); }
private boolean inputsSatisfied(TypeOrFeature[] aInputs, CAS aCAS)
{ //implementation detail; see the actual source code } }
Each instance of the WhiteboardFlowController
is responsible for routing a single CAS. A handle to the CAS instance is available by calling the getCas()
method, which is a standard method defined on
the CasFlow_ImplBase
superclass.
Each time the next
method is
called, the Flow object iterates over the metadata of all of the available
Analysis Engines (obtained via the call to getContext().
getAnalysisEngineMetaDataMap)
and sees if the input types declared in an
AnalysisEngineMetaData object are satisfied by the CAS (that is, the CAS
contains at least one instance of each declared input type). The exact details of checking for instances
of types in the CAS are not discussed here – see the
WhiteboardFlowController.java file for the complete source.
When the Flow object decides which AnalysisEngine should be called next, it indicates this by creating a SimpleStep object with the key for that AnalysisEngine and returning it:
return new SimpleStep(aeKey);
The Flow object keeps a list of which Analysis Engines it
has invoked in the mAlreadyCalled
field, and never
invokes the same Analysis Engine twice. Note this is not a hard requirement. It is acceptable to design a FlowController
that invokes the same Analysis Engine more than once; however, if you do this
you must make sure that the flow will eventually terminate.
If there are no Analysis Engines left whose input requirements are satisfied, the Flow object signals the end of the flow by returning a FinalStep object:
return new FinalStep();
To create a Flow Controller Descriptor in the CDE, use File -> New -> Other -> UIMA -> Flow Controller Descriptor File:
This will bring up the Overview page for the Flow Controller Descriptor:
Type in the Java class name that implements the Flow
Controller, or use the "Browse" button to select it. You must select a Java class that implements
the FlowController
interface.
Flow Controller Descriptors are very similar to Primitive Analysis Engine Descriptors – for example you can specify configuration parameters and external resources if you wish.
If you wish to edit a Flow Controller Descriptor by hand, see section Flow Controller Descriptors for the syntax.
To use a Flow Controller you must add it to an Aggregate Analysis Engine. You can only have one Flow Controller per Aggregate Analysis Engine. In the Component Descriptor Editor, the Flow Controller is specified on the Aggregate page, as a choice in the flow control kind - pick User-defined Flow. When you do, the Browse and Search buttons underneath become active, and allow you to specify an existing Flow Controller Descriptor, which when you select it, will be imported into the aggregate descriptor.
The key name is created automatically from the name element in the Flow Controller Descriptor being imported. If you need to change this name, you can do so by switching to the "Source" view using the bottom tabs, and editing the name in the XML source.
If you edit your Aggregate Analysis Engine Descriptor by hand, the syntax for adding a Flow Controller is:
<delegateAnalysisEngineSpecifiers> ... </delegateAnalysisEngineSpecifiers>
<flowController key="[String]"> <import .../> </flowController>
As usual, you can use either in import by location or import by name – see Imports .
The key that you assign to the FlowController can be used elsewhere in the Aggregate Analysis Engine Descriptor – in parameter overrides, resource bindings, and Sofa mappings.
Flow Controllers cannot be added directly to Collection Processing Engines. To use a Flow Controller in a CPE you first need to wrap the part of your CPE that requires complex flow control into an Aggregate Analysis Engine, and then add the Aggregate Analysis Engine to your CPE. The CPE's deployment and error handling options can then only be configured for the entire Aggregate Analysis Engine as a unit.
If you want your Flow Controller to work inside an Aggregate Analysis Engine that contains a CAS Multiplier (see CAS Multiplier Developer's Guide ), there are additional things you must consider.
When your Flow Controller routes a CAS to a CAS Multiplier, the CAS Multiplier may produce new CASes that then will also need to be routed by the Flow Controller. When a new output CAS is produced, the framework will call the method:
protected Flow newCasProduced(AbstractCas newOutputCas, String producedBy)
on the Flow object that was managing the flow of the
parent CAS (the one that was input to the CAS Multiplier). The newCasProduced
method must create a new Flow object that will be responsible for routing the
new output CAS.
In the CasFlow_ImplBase
and JCasFlow_ImplBase
classes, the newCasProduced
method is defined to throw an exception indicating that the Flow Controller
does not handle CAS Multipliers. If you
want your Flow Controller to properly deal with CAS Multipliers you must
override this method.
Also, there is a variant of FinalStep
which can only be specified for output CASes produced by CAS Multipliers within
the Aggregate Analysis Engine containing the Flow Controller. This version of FinalStep
is produced by the calling the constructor with a true
argument, and it causes the CAS to be immediately released back to the
pool. No further processing will be done
on it and it will not be output from the aggregate. This is the way that you can build an
Aggregate Analysis Engine that outputs some new CASes but not others. Note that if you never want any new CASes to
be output from the Aggregate Analysis Engine, you don't need to use this;
instead just declare <outputsNewCASes>false</outputsNewCASes>
in your Aggregate Analysis Engine Descriptor as described in section Aggregate .