External Resources

An analysis engine often uses some data model. This may be as simple as word frequency counts or as complex as the model of a parser. Often these models can become quite large. If an analysis engine is deployed multiple times in the same pipeline or runs on multiple CPU cores, memory can be saved by using a shared instance of the data model. UIMA supports such a scenario by so-called external resources. The following sections illustrates how external resources can be used with uimaFIT. First create a class for the shared data model. Usually this class would load its data from some URI and then expose it via its methods. An example would be to load word frequency counts and to provide a `getFrequency()` method. In our simple example we do not load anything from the provided URI - we just offer a method to get the URI from which data be loaded. {{{ // Simple model that only stores the URI it was loaded from. Normally data // would be loaded from the URI instead and made accessible through methods // in this class. This simple example only allows to access the URI. public static final class SharedModel implements SharedResourceObject { private String uri; public void load(DataResource aData) throws ResourceInitializationException { uri = aData.getUri().toString(); } public String getUri() { return uri; } } }}} == Resource injection == === Regular UIMA components === When an external resource is used in a regular UIMA component, it is usually fetched from the context, cast and copied to a class member variable. {{{ class MyAnalysisEngine extends CasAnnotator_ImplBase { final static String MODEL_KEY = "Model"; private SharedModel model; public void initialize(UimaContext context) throws ResourceInitializationException { configuredResource = (SharedModel) getContext().getResourceObject(MODEL_KEY); } } }}} uimaFIT can be used to inject external resources into such traditional components using the `createDependencyAndBind()` method. To show that this works with any off-the-shelf UIMA component, the following example uses uimaFIT to configure the OpenNLP Tokenizer: {{{ // Create descriptor AnalysisEngineDescription tokenizer = createPrimitiveDescription(Tokenizer.class, UimaUtil.TOKEN_TYPE_PARAMETER, Token.class.getName(), UimaUtil.SENTENCE_TYPE_PARAMETER, Sentence.class.getName()); // Create the external resource dependency for the model and bind it createDependencyAndBind(tokenizer, UimaUtil.MODEL_PARAMETER, TokenizerModelResourceImpl.class, "http://opennlp.sourceforge.net/models-1.5/en-token.bin"); }}} === uimaFIT-aware components === uimaFIT provides the `@ExternalResource` annotation to inject external resources directly into class member variables. *`@ExternalResource`* || *Parameter* || *Description* || *Default* || || _key_ || Resource key  || field type || || _api_ || Used when the external resource type is different from the field type, e.g. when using an ExternalResourceLocator || field type || || _mandatory_ || Whether a value must be specified || true || {{{ // Example annotator that uses the SharedModel. In the process() we only test // if the model was properly initialized by uimaFIT public static class Annotator extends org.uimafit.component.JCasAnnotator_ImplBase { final static String MODEL_KEY = "Model"; @ExternalResource(key = MODEL_KEY) private SharedModel model; public void process(JCas aJCas) throws AnalysisEngineProcessException { assertTrue(model.getUri().endsWith("gene_model_v02.bin")); // Prints the instance ID to the console - this proves the same instance // of the SharedModel is used in both Annotator instances. System.out.println(model); } }}} Note, that it is no longer necessary to implement the initialize() method. uimaFIT takes care of locating the external resource "Model" in the UIMA context and assigns it to the field `model`. If a mandatory resource is not present in the context, an exception is thrown. The resource injection mechanism is implemented in the `ExternalResourceInitializer` class. uimaFIT provides several base classes that already come with an `initialize()` method using the initializer: * `CasAnnotator_ImplBase` * `CasCollectionReader_ImplBase` * `CasConsumer_ImplBase` * `CasFlowController_ImplBase` * `CasMultiplier_ImplBase` * `JCasAnnotator_ImplBase` * `JCasCollectionReader_ImplBase` * `JCasConsumer_ImplBase` * `JCasFlowController_ImplBase` * `JCasMultiplier_ImplBase` * `Resource_ImplBase` When building a pipeline, external resources can be set of a component just like configuration parameters. External resources and configuration parameters can be mixed and appear in any order when creating a component description. Note that in the following example, we create only one external resource description and use it to configure two different analysis engines. Because we only use a single description, also only a single instance of the external resource is created and shared between the two engines. {{{ ExternalResourceDescription extDesc = createExternalResourceDescription( SharedModel.class, new File("somemodel.bin")); // Binding external resource to each Annotator individually AnalysisEngineDescription aed1 = createPrimitiveDescription(Annotator.class, Annotator.MODEL_KEY, extDesc); AnalysisEngineDescription aed2 = createPrimitiveDescription(Annotator.class, Annotator.MODEL_KEY, extDesc); // Check the external resource was injected AnalysisEngineDescription aaed = createAggregateDescription(aed1, aed2); AnalysisEngine ae = createAggregate(aaed); ae.process(ae.newJCas()); }}} [http://code.google.com/p/uimafit/source/browse/trunk/uimaFIT-examples/src/main/java/org/uimafit/examples/resource/ExternalResourceExample.java This example] is given as a full JUnit-based example in the the uimaFIT-examples project. === Resources extending `Resource_ImplBase` === One kind of resources extend `Resource_ImplBase`. These are the easiest to handle, because uimaFIT's version of `Resource_ImplBase` already implements the necessary logic. Just be sure to call `super.initialize()` when overriding `initialize()`. Also mind that external resources are not available yet when `initialize()` is called. For any initialization logic that requires resources, override and implement `afterResourcesInitialized()`. Other than that, injection of external resources works as usual. {{{ public static class ChainableResource extends Resource_ImplBase { public final static String PARAM_CHAINED_RESOURCE = "chainedResource"; @ExternalResource(key = PARAM_CHAINED_RESOURCE) private ChainableResource chainedResource; public void afterResourcesInitialized() { // init logic that requires external resources } } }}} === Resources implementing `SharedResourceObject` === The other kind of resources implement `SharedResourceObject`. Since this is an interface, uimaFIT cannot provide the initialization logic, so you have to implement a couple of things in the resource: * implement `ExternalResourceAware` * declare a configuration parameter `ExternalResourceFactory.PARAM_RESOURCE_NAME` and return its value in `getResourceName()` * invoke `ConfigurationParameterInitializer.initialize()` in the `load()` method. Again, mind that external resource not properly initialized until uimaFIT invokes `afterResourcesInitialized()`. {{{ public class TestSharedResourceObject implements SharedResourceObject, ExternalResourceAware { @ConfigurationParameter(name=ExternalResourceFactory.PARAM_RESOURCE_NAME) private String resourceName; public final static String PARAM_CHAINED_RESOURCE = "chainedResource"; @ExternalResource(key = PARAM_CHAINED_RESOURCE) private ChainableResource chainedResource; public String getResourceName() { return resourceName; } public void load(DataResource aData) throws ResourceInitializationException { ConfigurationParameterInitializer.initialize(this, aData); // rest of the init logic that does not require external resources } public void afterResourcesInitialized() { // init logic that requires external resources } } }}} === Note on injecting resources into resources === Nested resources are only initialized if they are used in a pipeline which contains at least one component that calls `ConfigurationParameterInitializer.initialize()`. Any component extending uimaFIT's component base classes qualifies. If you use nested resources in a pipeline without any uimaFIT-aware components, you can just add uimaFIT's `NoopAnnotator` to the pipeline. == Resource locators == Normally, in UIMA an external resource needs to implement either `SharedResourceObject` or `Resource`. In order to inject arbitrary objects, uimaFIT has the concept of `ExternalResourceLocator`. When a resource implements this interface, not the resource itself is injected, but the method getResource() is called on the resource and the result is injected. The following example illustrates how to inject an object from JNDI into a UIMA component: {{{ class MyAnalysisEngine2 extends JCasAnnotator_ImplBase { static final String RES_DICTIONARY = "dictionary"; @ExternalResource(key = RES_DICTIONARY) Dictionary dictionary; } AnalysisEngineDescription desc = createPrimitiveDescription(MyAnalysisEngine2.class); bindResource(desc, MyAnalysisEngine2.RES_DICTIONARY, JndiResourceLocator.class, JndiResourceLocator.PARAM_NAME, "dictionaries/german"); }}}