Collection Processing Engine Configurator User's Guide

A Collection Processing Engine (CPE) processes collections of artifacts (documents) through the combination of the following components: a Collection Reader, an optional CAS Initializer, Analysis Engines, and CAS Consumers.

The Collection Processing Engine Configurator(CPE Configurator) is a graphical tool that allows you to assemble and run CPEs.

For an introduction to Collection Processing Engine concepts, including developing the components that make up a CPE, read Chapter 5, Collection Processing Engine Developer's Guide. This chapter is a user's guide for using the CPE Configurator tool, and does not describe UIMA's Collection Processing Architecture itself.

The CPE Configurator only supports basic CPE configurations.

It only supports "Integrated" deployments (although it will connect to remotes if particular CAS Processors are specified with remote service descriptors). It doesn't support configuration of the error handling. It doesn't support Sofa Mappings; it assumes all Single-View components are operating with the _InitialView Sofa. Multi-View components will not have their names mapped. It sets up a fixed-sized CAS Pool.

For running arbitrary CPE descriptors, or running with other than the default configuration supplied by the CPE Configurator, you can write your own application, or use the runCPE script, which invokes an example application, SimpleRunCPE.

The CPE Configurator tool can be run using the cpeGui shell script, which is located in the bin directory of the UIMA SDK. If you've installed the example Eclipse project (see Chapter 3, UIMA SDK Setup for Eclipse), you can also run it using the "UIMA CPE GUI" run configuration provided in that project.

Note that if you are planning to build a CPE using components other than the examples included in the UIMA SDK, you will first need to update your CLASSPATH environment variable to include the classes needed by these components.

When you first start the CPE Configurator, you will see the main window shown here:

The CPE Configurator's main window is divided into 4 sections: one for each of the types of components that constitute a CPE: CollectionReader, CAS Initializer, Analysis Engines, and CasConsumers. Each CPE has exactly one CollectionReader, an optional CAS Initializer, and at least one each of Analysis Engines and CAS Consumers.

In each section of the CPE Configurator, you can select the component(s) you want to use by browsing to (or typing the location of) their XML descriptors. You must select a Collection Reader, at least one Analysis Engine, and at least one CAS Consumer. You may or may not need to select a CAS Initializer; this depends on the particular Collection Reader that you are using.

When you select a descriptor, the configuration parameters that are defined in that descriptor will then be displayed in the GUI; these can be modified to override the values present in the descriptor.

For example, the screen shot below shows the CPE Configurator after the following components have been chosen:

docs/examples/descriptors/collectionReader/FileSystemCollectionReader.xml docs/examples/descriptors/analysis_engine/NamesAndPersonTitles_TAE.xml docs/examples/descriptors/cas_consumer/XCasWriterCasConsumer.xml

After selecting each of the components and providing configuration settings, click the play (forward arrow) button at the bottom of the screen to begin processing. A progress bar should be displayed in the lower left corner. (Note that the progress bar will not begin to move until all components have completed their initialization, which may take several seconds.) Once processing has begun, the pause and stop buttons become enabled.

If an error occurs, you will be informed by an error dialog. If processing completes successfully, you will be presented with a performance report.

The CPE Configurator's File Menu has six options:

Open CPE Descriptor
Save CPE Descriptor
Refresh Descriptors from File System
Clear All
Exit

Open CPE Descriptor will allow you to select a CPE Descriptor file from disk, and will read in that CPE Descriptor and configure the GUI appropriately.

Save CPE Descriptor will create a CPE Descriptor file that defines the CPE you have constructed. This CPE Descriptor will identify the components that constitute the CPE, as well as the configuration settings you have specified for each of these components. Later, you can use "Open CPE Descriptor" to restore the CPE Configurator to the state. Also, CPE Descriptors can be used to easily run a CPE from a Java program – see Chapter 6, Application Developer’s Guide.

CPE Descriptors also allow specifying operational parameters, such as error handling options that are not currently available for configuration through the CPE Configurator. For more information on manually creating a CPE Descriptor, see Chapter 24, Collection Processing Engine Descriptor Reference.

Refresh Descriptors from File System will reload all descriptors from disk. This is useful if you have made a change to the descriptor outside of the CPE Configurator, and want to refresh the display.

Clear All will reset the CPE Configurator to its initial state, with no components selected.

Exit will close the CPE Configurator. If you have unsaved changes, you will be prompted as to whether you would like to save them to a CPE Descriptor file. If you do not save them, they will be lost.

When you restart the CPE Configurator, it will automatically reload the last CPE descriptor file that you were working with.

The CPE Configurator's Help menu provides "About" information and some very simple instructions on how to use the tool.