Flow Controller Developer's Guide

A Flow Controller is a component that plugs into an Aggregate Analysis Engine. When a CAS is input to the Aggregate, the Flow Controller determines the order in which the components of that aggregate are invoked on that CAS. The ability to provide your own Flow Controller implementation is new as of release 2.0 of UIMA.

Flow Controllers may decide the flow dynamically, based on the contents of the CAS. So, as just one example, you could develop a Flow Controller that first sends each CAS to a Language Identification Annotator and then, based on the output of the Language Identification Annotator, routes that CAS to an Annotator that is specialized for that particular language.

Flow Controller Interface Overview

Flow Controller implementations should extend from the JCasFlowController_ImplBase or CasFlowController_ImplBase classes, depending on which CAS interface they prefer to use. As with other types of components, the Flow Controller ImplBase classes define optional initialize, destroy, and reconfigure methods. They also define the required method computeFlow.

The computeFlow method is called by the framework whenever a new CAS enters the Aggregate Analysis Engine. It is given the CAS as an argument and must return an object which implements the Flow interface (the Flow object). The Flow Controller developer must define this object. It is the object that is responsible for routing this particular CAS through the components of the Aggregate Analysis Engine. For convenience, the framework provides basic implementation of flow objects in the classes CasFlow_ImplBase and JCasFlow_ImplBase; use the JCas one if you are using the JCas interface to the CAS.

The framework then uses the Flow object and calls its next() method, which returns a Step object (implemented by the UIMA Framework) that indicates what to do next with this CAS next. There are two common types of steps:

SimpleStep, which indicates the Analysis Engine that should receive the CAS next.
FinalStep, which indicates that the flow is completed.

There may be additional types of steps added in future versions; for example to invoke multiple Analysis Engines in parallel on the same CAS.

After executing the step, the framework will call the Flow object's next() method again to determine the next destination, and this will be repeated until the Flow Object indicates that processing is complete by returning a FinalStep.

The Flow Controller has access to a FlowControllerContext, which is a subtype of UimaContext. In addition to the configuration parameter and resource access provided by a UimaContext, the FlowControllerContext also gives access to the metadata for all of the Analysis Engines that the Flow Controller can route CASes to. Most Flow Controllers will need to use this information to make routing decisions. You can get a handle to the FlowControllerContext by calling the getContext() method defined in JCasFlowController_ImplBase and CasFlowController_ImplBase. Then, the FlowControllerContext. getAnalysisEngineMetaDataMap can be called to get a map containing an entry for each of the Analysis Engines in the Aggregate. The keys in this map are the same as the delegate analysis engine keys specified in the aggregate descriptor, and the values are the corresponding Analysis Engine MetaData objects.

Example Code

This section walks through the source code of an example Flow Controller that implements a simple version of the "Whiteboard" flow model. At each step of the flow, the Flow Controller looks it all of the available Analysis Engines that have not yet run on this CAS, and picks one whose input requirements are satisfied.

The Java class for the example is com.ibm.uima.examples.flow.WhiteboardFlowController and the source code is included in the UIMA SDK under the docs/examples/src directory.

The WhiteboardFlowController Class

public class WhiteboardFlowController extends CasFlowController_ImplBase { public Flow computeFlow(CAS aCAS) throws AnalysisEngineProcessException { WhiteboardFlow flow = new WhiteboardFlow(); flow.setCas(aCAS); return flow; }

class WhiteboardFlow extends CasFlow_ImplBase { // Discussed Later } }

The WhiteboardFlowController extends from CasFlowController_ImplBase and implements the computeFlow method. The implementation of the computeFlow method is very simple; it just constructs a new WhiteboardFlow object that will be responsible for routing this CAS, and calls the WhiteboardFlow.setCas method to give it a handle to that CAS, which it will later use to make its routing decisions. The setCas method is a method provided by the ..._ImplBase classes for Flows.

Note that we will have one instance of WhiteboardFlow per CAS, so if there are multiple CASes being simultaneously processed there will not be any confusion.

The WhiteboardFlow Class

class WhiteboardFlow extends CasFlow_ImplBase { private Set mAlreadyCalled = new HashSet(); public Step next() throws AnalysisEngineProcessException { //Get the CAS that this Flow object is responsible for routing. //Each Flow instance is responsible for a single CAS CAS cas = getCas(); //iterate over available AEs Iterator aeIter = getContext().getAnalysisEngineMetaDataMap() .entrySet().iterator(); while (aeIter.hasNext()) { Map.Entry entry = (Map.Entry) aeIter.next(); //skip AEs that were already called on this CAS String aeKey = (String) entry.getKey(); if (!mAlreadyCalled.contains(aeKey)) { //check for satisfied input capabilities // (i.e. the CAS contains at least one instance //of each required input) AnalysisEngineMetaData md = (AnalysisEngineMetaData)entry.getValue(); Capability[] caps = md.getCapabilities(); boolean satisfied = true; for (int i = 0; i < caps.length; i++) { satisfied = inputsSatisfied(caps[i].getInputs(), cas); if (satisfied) break; } if (satisfied) { mAlreadyCalled.add(aeKey); return new SimpleStep(aeKey); } } }

//no appropriate AEs to call - end of flow return new FinalStep(); }

private boolean inputsSatisfied(TypeOrFeature[] aInputs, CAS aCAS)

{ //implementation detail; see the actual source code } }

Each instance of the WhiteboardFlowController is responsible for routing a single CAS. A handle to the CAS instance is available by calling the getCas() method, which is a standard method defined on the CasFlow_ImplBase superclass.

Each time the next method is called, the Flow object iterates over the metadata of all of the available Analysis Engines (obtained via the call to getContext(). getAnalysisEngineMetaDataMap) and sees if the input types declared in an AnalysisEngineMetaData object are satisfied by the CAS (that is, the CAS contains at least one instance of each declared input type). The exact details of checking for instances of types in the CAS are not discussed here – see the WhiteboardFlowController.java file for the complete source.

When the Flow object decides which AnalysisEngine should be called next, it indicates this by creating a SimpleStep object with the key for that AnalysisEngine and returning it:

return new SimpleStep(aeKey);

The Flow object keeps a list of which Analysis Engines it has invoked in the mAlreadyCalled field, and never invokes the same Analysis Engine twice. Note this is not a hard requirement. It is acceptable to design a FlowController that invokes the same Analysis Engine more than once; however, if you do this you must make sure that the flow will eventually terminate.

If there are no Analysis Engines left whose input requirements are satisfied, the Flow object signals the end of the flow by returning a FinalStep object:

return new FinalStep();

To create a Flow Controller Descriptor in the CDE, use File -> New -> Other -> UIMA -> Flow Controller Descriptor File:

This will bring up the Overview page for the Flow Controller Descriptor:

Type in the Java class name that implements the Flow Controller, or use the "Browse" button to select it. You must select a Java class that implements the FlowController interface.

Flow Controller Descriptors are very similar to Primitive Analysis Engine Descriptors – for example you can specify configuration parameters and external resources if you wish.

If you wish to edit a Flow Controller Descriptor by hand, see section Flow Controller Descriptors for the syntax.

To use a Flow Controller you must add it to an Aggregate Analysis Engine. You can only have one Flow Controller per Aggregate Analysis Engine. In the Component Descriptor Editor, the Flow Controller is specified on the Aggregate page, as a choice in the flow control kind - pick User-defined Flow. When you do, the Browse and Search buttons underneath become active, and allow you to specify an existing Flow Controller Descriptor, which when you select it, will be imported into the aggregate descriptor.

The key name is created automatically from the name element in the Flow Controller Descriptor being imported. If you need to change this name, you can do so by switching to the "Source" view using the bottom tabs, and editing the name in the XML source.

If you edit your Aggregate Analysis Engine Descriptor by hand, the syntax for adding a Flow Controller is:

<flowController key="[String]"> <import .../> </flowController>

As usual, you can use either in import by location or import by name – see Imports .

The key that you assign to the FlowController can be used elsewhere in the Aggregate Analysis Engine Descriptor – in parameter overrides, resource bindings, and Sofa mappings.

Flow Controllers cannot be added directly to Collection Processing Engines. To use a Flow Controller in a CPE you first need to wrap the part of your CPE that requires complex flow control into an Aggregate Analysis Engine, and then add the Aggregate Analysis Engine to your CPE. The CPE's deployment and error handling options can then only be configured for the entire Aggregate Analysis Engine as a unit.

If you want your Flow Controller to work inside an Aggregate Analysis Engine that contains a CAS Multiplier (see CAS Multiplier Developer's Guide ), there are additional things you must consider.

When your Flow Controller routes a CAS to a CAS Multiplier, the CAS Multiplier may produce new CASes that then will also need to be routed by the Flow Controller. When a new output CAS is produced, the framework will call the method:

protected Flow newCasProduced(AbstractCas newOutputCas, String producedBy)

on the Flow object that was managing the flow of the parent CAS (the one that was input to the CAS Multiplier). The newCasProduced method must create a new Flow object that will be responsible for routing the new output CAS.

In the CasFlow_ImplBase and JCasFlow_ImplBase classes, the newCasProduced method is defined to throw an exception indicating that the Flow Controller does not handle CAS Multipliers. If you want your Flow Controller to properly deal with CAS Multipliers you must override this method.

Also, there is a variant of FinalStep which can only be specified for output CASes produced by CAS Multipliers within the Aggregate Analysis Engine containing the Flow Controller. This version of FinalStep is produced by the calling the constructor with a true argument, and it causes the CAS to be immediately released back to the pool. No further processing will be done on it and it will not be output from the aggregate. This is the way that you can build an Aggregate Analysis Engine that outputs some new CASes but not others. Note that if you never want any new CASes to be output from the Aggregate Analysis Engine, you don't need to use this; instead just declare <outputsNewCASes>false</outputsNewCASes> in your Aggregate Analysis Engine Descriptor as described in section Aggregate .