This chapter is the reference guide for the UIMA SDK's Component Descriptor XML schema. A Component Descriptor (also sometimes called a Resource Specifier in the code) is an XML file that either (a) completely describes a component, including all information needed to construct the component and interact with it, or (b) specifies how to connect to and interact with an existing component that has been published as a remote service. Component (also called Resource) is a general term for modules produced by UIMA developers and used by UIMA applications. The types of Components are: Analysis Engines, Collection Readers, CAS Initializers, CAS Consumers, and Collection Processing Engines. However, Collection Processing Engine Descriptors are significantly different in format and are covered in a separate chapter, UIMA Collection Processing Engine Descriptor Reference.
Section 23.1 describes the notation used in this chapter.
Section 23.2 describes the UIMA SDK’s import syntax, used to allow XML descriptors to import information from other XML files, to allow sharing of information between several XML descriptors.
Section 23.4 describes the XML format for Analysis Engine Descriptors. These are descriptors that completely describe Analysis Engines, including all information needed to construct and interact with them.
Section 23.6 describes the XML format for Collection Processing Component Descriptors. This includes Collection Iterator, CAS Initializer, and CAS Consumer Descriptors.
Section 23.7 describes the XML format for Service Client Descriptors, which specify how to connect to and interact with resources deployed as remote services.
This chapter uses
an informal notation to specify the syntax of Component Descriptors. The formal syntax is defined by an XML schema
definition, which is contained in two files – resourceSpecifierSchema.xsd
and TaeSpecifierSchema.xsd
,
both of which are in the uima_core.jar
file.
The notation used in this chapter is:
<analysisEngineMetaData>
...
</analysisEngineMetaData>
<parameter>[String]</parameter> <parameter>[String]</parameter> ...
indicates that there may be arbitrarily many parameter elements in this context.
[Str
ing]
)
indicate the type of value that may be used at that location.true|false
,
indicates alternatives. This can be applied to literal values, bracketed type
names, and elements.The UIMA SDK defines a particular syntax for XML descriptors to import information from other XML files. When one of the following appears in an XML descriptor:
<import location="[URL]" /> or
<import name="[Name]" />
it indicates that information from a separate XML file is being imported. Note that imports are allowed only in certain places in the descriptor. In the remainder of this chapter, it will be indicated at which points imports are allowed.
If an import specifies a location
attribute, the value of that attribute specifies the URL at which the XML file
to import will be found. This can be a
relative URL, which will be resolved relative to the descriptor containing the import
element, or an absolute URL. Relative URLs can be written without a
protocol/scheme (e.g., "file:"), and without a host machine name. In
this case the relative URL might look something like com/ibm/myproj/MyTypeSystem.xml.
An absolute URL is written with one of the following
prefixes, followed by a path such as com/ibm/myproj/MyTypeSystem.xml
:
For more information about URLs, please read the javadoc information for the Java class "URL".
If an import specifies a name
attribute, the value of that attribute should take the form of a Java-style
dotted name (e.g. com.ibm.myproj.MyTypeSystem
). An .xml file with this name will be searched
for in the classpath or datapath (described below). As in Java, the dots in the name will be
converted to file path separators. So an
import specifying the example name in this paragraph will result in a search
for com/ibm/myproj/MyTypeSystem.xml
in the classpath
or datapath.
The datapath works similarly to the classpath but can be set programmatically through the resource manager API. Application developers can specify a datapath during initialization, using the following code:
ResourceManager resMgr = UIMAFramework.newDefaultResourceManager(); resMgr.setDataPath(yourPathString); AnalysisEngine ae = UIMAFramework.produceAE(desc, resMgr, null);
The default datapath for the entire JVM can be set via the
uima.datapath
Java system property, but this feature
should only be used for standalone applications that don't need to run in the
same JVM as other code that may need a different datapath.
The UIMA SDK also supports XInclude, a W3C candidate recommendation, to include XML files within other XML files. However, it is recommended that the import syntax be used instead, as it is more flexible and better supports tool developers.
To use XInclude, you first must include the XInclude namespace in your document’s root element, e.g.:
<analysisEngineDescription xmlns="http://uima.apache.org/resourceSpecifier" xmlns:xi="http://www.w3.org/2001/XInclude">
Then, you can include a file using the syntax <xi:include
href="[URL]"/>
where [URL] can be any relative or absolute URL referring
to another XML document. The referred-to
document must be a valid XML document, meaning that it must consist of exactly
one root element and must define all of the namespace prefixes that it uses. The default namespace (generally http://uima.apache.org/resourceSpecifier
) will be
inherited from the parent document. When UIMA parses the XML document, it will automatically replace the <xi:include>
element with the entire XML document
referred to by the href. For more
information on XInclude see http://www.w3.org/TR/xinclude/.
A Type System Descriptor is used to define the types and features that can be represented in the CAS. A Type System Descriptor can be imported into an Analysis Engine or Collection Processing Component Descriptor.
The basic structure of a Type System Descriptor is as follows:
<typeSystemDescription xmlns="http://uima.apache.org/resourceSpecifier">
<name> [String] </name> <description>[String]</description> <version>[String]</version> <vendor>[String]</vendor>
<imports> <import ...> ... </imports>
<types> <typeDescription> ... </typeDescription>
...
</types>
</typeSystemDescription>
All of the subelements are optional.
The imports
section allows this descriptor to import types
from other type system descriptors. The
import syntax is described in section 23.1
of this chapter. A type system may import any number of other
type systems and then define additional types which refer to imported
types. Circular imports are allowed.
The types
element contains zero or more typeDescription
elements. Each typeDescription
has the
form:
<typeDescription> <name>[TypeName]</name> <description>[String]</description> <supertypeName>[TypeName]</supertypeName> <features> ... </features> </typeDescription>
The name element contains the name of the type. A [TypeName]
is a dot-separated list of names, where each
name consists of a letter followed by any number of letters, digits, or
underscores. TypeNames
are case
sensitive. Letter and digit are as defined by Java; therefore, any Unicode
letter or digit may be used (subject to the character encoding defined by the
descriptor file's XML header). The name
following the final dot is considered to be the "short name" of the
type; the preceding portion is the namespace (analogous to the package.class
syntax used in Java). Namespaces
beginning with uima are reserved and should not be used. Examples of valid type names are:
These would all be considered distinct types since they
have different namespaces. Best practice
here is to follow the normal Java naming conventions of having namespaces be
all lowercase, with the short type names having an initial capital, but this is
not mandated, so ABC.mYtyPE
is an allowed type name. While type
names without namespaces (e.g. TokenAnnotation
alone) are allowed, the JCas does not
support them and so their use is strongly discouraged.
The description
element contains a textual description of the
type. The superTypeName
element contains the name of the type from which it inherits (this can be set
to the name of another user-defined type, or it may be set to any built-in type
which may be subclassed, such as "uima.tcas.Annotation"
for a new annotation
type or "uima.cas.TOP"
for a new type that is not an annotation). All three of these elements are required.
The features
element of a typeDescription
is required only if the type we are
specifying introduces new features. If
the features
element is present, it contains zero or more featureDescription
elements, each of which has the form:
<featureDescription> <name>[Name]</name> <description>[String]</description> <rangeTypeName>[Name]</rangeTypeName> <elementType>[Name]</elementType> <multipleReferencesAllowed>true|false</multipleReferencesAllowed> </featureDescription>
A feature’s name follows the same rules as a type short name – a letter followed by any number of letters, digits, or underscores. Feature names are case sensitive.
The feature’s rangeTypeName
specifies
the type of value that the feature can take. This may be the name of any type defined in your type system, or one of
the predefined types. All of the
predefined types have names that are prefixed with uima.cas
or uima.tcas
,
for example:
uima.cas.TOP uima.cas.String uima.cas.Boolean uima.cas.Byte uima.cas.Short uima.cas.Long uima.cas.Float uima.cas.Double uima.cas.FSArray uima.cas.StringArray uima.cas.BooleanArray uima.cas.ByteArray uima.cas.ShortArray uima.cas.IntegerArray uima.cas.LongArray uima.cas.FloatArray uima.cas.DoubleArray uima.cas.FSList uima.cas.StringList uima.cas.IntegerList uima.cas.FloatList uima.tcas.Annotation.
For a complete list of predefined types, see the CAS API documentation.
The elementType
of a feature is
optional, and applies only when the rangeTypeName
is
uima.cas.FSArray
or uima.cas.FSList
The elementType
specifies what type of value can be assigned as an element
of the array or list. This must be the name of a non-
primitive
type. If omitted, it defaults to uima.cas.TOP
, meaning that any FeatureStructure can be
assigned as an element the array or list. Note: depending on the CAS Interface that you use in your code, this
constraint may or may not be enforced.
The multipleReferencesAllowed
feature is optional, and applies only when the rangeTypeName
is an array or list type (it applies to arrays and lists of primitive as well
as non-primitive types). Setting this to
false (the default) indicates that this feature has exclusive ownership of the
array or list, so changes to the array or list are localized. Setting this to
true indicates that the array or list may be shared, so changes to it may
affect other objects in the CAS. Note:
there is currently no guarantee that the framework will enforce this restriction. However, this setting may affect how the CAS
is serialized.
There is one other special type that you can declare – a subset of the String type that specifies a restricted set of allowed values. This is useful for features that can have only certain String values, such as parts of speech. Here is an example of how to declare such a type:
<typeDescription> <name>PartOfSpeech</name> <description>A part of speech.</description> <supertypeName>uima.cas.String</supertypeName> <allowedValues> <value> <string>NN</string> <description>Noun, singular or mass.</description> </value> <value> <string>NNS</string> <description>Noun, plural.</description> </value> <value> <string>VB</string> <description>Verb, base form.</description> </value>
...
</allowedValues> </typeDescription>
Analysis Engine (AE) descriptors completely describe Analysis Engines. There are two basic types of Analysis Engines – Primitive and Aggregate. A Primitive Analysis Engine is a container for a single annotator, where as an Aggregate Analysis Engine is composed of a collection of other Analysis Engines. (For more information on this and other terminology, see Chapter 2, UIMA Conceptual Overview)
Both Primitive and Aggregate Analysis Engines have descriptors, and the two types of descriptors have some similarities and some differences. Primitive Analysis Engine descriptors are discussed first, in Section 23.4.1 . Section 23.4.2 then describes how Aggregate Analysis Engine descriptors are different.
<?xml version="1.0" encoding="UTF-8" ?> <analysisEngineDescription xmlns="http://uima.apache.org/resourceSpecifier"> <frameworkImplementation>com.ibm.uima.java</frameworkImplementation>
<primitive>true</primitive> <annotatorImplementationName> [String] </annotatorImplementationName>
<analysisEngineMetaData> ... </analysisEngineMetaData>
<externalResourceDependencies> ... </externalResourceDependencies>
<resourceManagerConfiguration> ... </resourceManagerConfiguration>
</analysisEngineDescription>
The document begins with a standard XML header. The recommended root tag is <analysisEngineDescription>
, although <taeDescription>
is also allowed for backwards
compatibility.
Within the root element we declare that we are using the
XML namespace http://uima.apache.org/resourceSpecifier.
It is required that this namespace be used;
otherwise, the descriptor will not be able to be validated for errors.
The first
subelement, <frameworkImplementation>,
currently must have the value com.ibm.uima.java
, or com.ibm.uima.cpp
. In
future versions, there may be other framework implementations, or perhaps
implementations produced by other vendors.
The second subelement, <primitive>,
contains the Boolean value true
, indicating that
this XML document describes a Primitive Analysis Engine.
The next subelement, <annotatorImplementationName>
is how the UIMA framework determines which annotator class to use. This should contain a fully-qualified Java
class name for Java implementations, or the name of a .dll or .so file for C++
implementations.
The <analysisEngineMetaData>
object contains
descriptive information about the analysis engine and what it does. It is described in the section Analysis Engine
Metadata.
The <externalResourceDependencies>
and <resourceManagerConfiguration>
elements declare the external
resource files that the analysis engine relies
upon. They are optional and are
described in the section External
Resource Dependencies and Resource
Manager Configuration.
<analysisEngineMetaData> <name> [String] </name> <description>[String]</description> <version>[String]</version> <vendor>[String]</vendor>
<configurationParameters> ... </configurationParameters>
<configurationParameterSettings> ... </configurationParameterSettings>
<typeSystemDescription> ... </typeSystemDescription>
<typePriorities> ... </typePriorities>
<fsIndexCollection> ... </fsIndexCollection>
<capabilities> ... </capabilities>
<operationalProperties> ... </operationalProperties>
</analysisEngineMetaData>
The analysisEngineMetaData
element contains four simple string fields – name
, description
, version
, and vendor
. Only the name
field is required, but providing values for the other
fields is recommended. The name
field is just a descriptive name meant to be read by
users; it does not need to be unique across all Analysis Engines.
The other sub-elements – configurationParameters
,
configurationParameterSettings
, typeSystemDescription
,
typePriorities
, fsIndexes
,
capabilities
and operationalProperties
are described in the following sections. The only one of these that is required is capabilities
;
the others are all technically optional but generally necessary for an analysis
engine of any complexity.
Configuration Parameters are made available to annotator
implementations and applications by the following interfaces: AnnotatorContext
(passed as an argument to the initialize()
method of an annotator), ConfigurableResource
(every
Analysis Engine implements this interface), and the UimaContext
(you can get this from any resource, including Analysis Engines, using the
method getUimaContext
()).
Use AnnotatorContext within annotators and UimaContext outside of annotators (for instance, in CasConsumers, or the containing application) to access configuration parameters.
Configuration parameters are set from the corresponding elements in the XML descriptor for the application. If you need to programmatically change parameter settings within an application, you can use methods in ConfigurableResource; if you do this, you need to call reconfigure() afterwards to have the UIMA framework notify all the contained analysis components that the parameter configuration has changed (the analysis engine's reinitialize() methods will be called). Note that in the current implementation, only integrated deployment components have configuration parameters passed to them; remote components obtain their parameters from their remote startup environment. This will likely change in the future.
There are two ways to specify the <configurationParameters>
section – as a list of configuration parameters or a list of groups. A list of parameters, which are not part of
any group, looks like this:
<configurationParameters> <configurationParameter> <name>[String]</name> <description>[String]</description> <type>String|Integer|Float|Boolean</type> <multiValued>true|false</multiValued> <mandatory>true|false</mandatory> <overrides> <parameter>[String]</parameter> <parameter>[String]</parameter> ... </overrides> </configurationParameter> <configurationParameter> ... </configurationParameter> ... </configurationParameters>
For each configuration parameter, the following are specified:
String
, Integer
, Float
, or Boolean
(required).true
if the parameter can take multiple-values (an array),
false
if the parameter takes only a single value
(optional, defaults to false).true
if a value must be provided for the parameter
(optional, defaults to false).A list of groups looks like this:
<configurationParameters defaultGroup="[String]" searchStrategy="none|default_fallback|language_fallback" >
<commonParameters> [zero or more parameters] </commonParameters>
<configurationGroup names="name1 name2 name3 ..."> [zero or more parameters] </configurationGroup>
<configurationGroup names="name4 name5 ..."> [zero or more parameters] </configurationGroup>
...
</configurationParameters>
Both the <commonParameters>
and <configurationGroup>
elements contain zero or more <configurationParameter>
elements, with the same
syntax described above.
The <com
monParameters>
element declares parameters that exist
in all groups. Each <configurationGroup>
element has a names attribute, which contains a list of group names separated
by whitespace (space or tab characters). Names consist of any number of non-whitespace characters; however the
Component Description Editor tool restricts this to be normal Java identifiers,
including the period (.) and the dash (-). One configuration group will be created for each name, and all of the
groups will contain the same set of parameters.
The defaultGroup
attribute specifies the name of the group to
be used in the case where an annotator does a lookup for a configuration
parameter without specifying a group name. It may also be used as a fallback if the annotator specifies a group
that does not exist – see below.
The searchStrategy
attribute determines the action to be
taken when the context is queried for the value of a parameter belonging to a
particular configuration group, if that group does not exist or does not
contain a value for the requested parameter. There are currently three possible values:
–
there is no fallback; return
null if there is no value in the exact group specified by the user.
–
if there is no value found
in the specified group, look in the default group (as defined by the default
attribute)
–
this setting allows for a
specific use of configuration parameter groups where the groups names
correspond to ISO language and country codes (for an example, see below). The fallback sequence is: <lang>_<country>_<region> ->
<lang>_<country> -> <lang> -> <default>.
<configurationParameters defaultGroup="en" searchStrategy="language_fallback">
<commonParameters> <configurationParameter> <name>DictionaryFile</name> <description>Location of dictionary for this language</description> <type>String</type> <multiValued>false</multiValued> <mandatory>false</mandatory> </configurationParameter> </commonParameters> <configurationGroup names="en de en-US"/> <configurationGroup names="zh"> <configurationParameter> <name>DBC_Strategy</name> <description>Strategy for dealing with double-byte characters.</description> <type>String</type> <multiValued>false</multiValued> <mandatory>false</mandatory> </configurationParameter> </configurationGroup> </configurationParameters>
In this example, we are declaring a DictionaryFile
parameter
that can have a different value for each of the languages that our TAE supports
– English (general), German, U.S. English, and Chinese. For Chinese only, we also declare a DBC_Strategy
parameter.
We are using the language_fallback
search
strategy, so if an annotator requests the dictionary file for the en-GB
(British English) group, we will fall back to the more general en
group.
Since we have defined en
as the default group,
this value will be returned if the context is queried for the DictionaryFile
parameter without specifying any group name, or if a nonexistent group name is
specified.
If no configuration groups were declared, the <configurationParameterSettings>
element looks like this:
<configurationParameterSettings> <nameValuePair> <name>[String]</name> <value> <string>[String]</string> | <integer>[Integer]</integer> | <float>[Float]</float> | <boolean>true|false</boolean> | <array> ... </array> </value> </nameValuePair> <nameValuePair> ... </nameValuePair> ... </configurationParameterSettings>
There are zero or more nameValuePair
elements. Each nameValuePair
contains a
name (which refers to one of the configuration parameters) and a value for that
parameter.
The value
element contains an element that matches the type
of the parameter. For single-valued
parameters, this is either <string>
, <integer>
, <float>
,
or <boolean>
. For multi-valued parameters, this is an <array>
element, which then contains zero or more instances of the appropriate type of
primitive value, e.g.:
<array><string>One</string><string>Two</string></array>
If configuration groups were declared, then the <configurationParameterSettings>
element looks like this:
<configurationParameterSettings>
<settingsForGroup name="[String]"> [one or more <nameValuePair> elements] </settingsForGroup>
<settingsForGroup name="[String]"> [one or more <nameValuePair> elements] </settingsForGroup>
...
</configurationParameterSettings>
where each <settingsForGroup>
element has a name that matches
one of the configuration groups declared under the <configurationParameters>
element and contains the parameter settings for that group.
Here are the settings that correspond to the parameter declarations in the previous example:
<configurationParameterSettings>
<settingsForGroup name="en"> <nameValuePair> <name>DictionaryFile</name> <value><string>resourcesEnglishdictionary.dat></string></value> </nameValuePair> </settingsForGroup>
<settingsForGroup name="en-US"> <nameValuePair> <name>DictionaryFile</name> <value><string>resourcesEnglish_USdictionary.dat</string></value> </nameValuePair> </settingsForGroup>
<settingsForGroup name="de"> <nameValuePair> <name>DictionaryFile</name> <value><string>resourcesDeutschdictionary.dat</string></value> </nameValuePair> </settingsForGroup>
<settingsForGroup name="zh"> <nameValuePair> <name>DictionaryFile</name> <value><string>resourcesChinesedictionary.dat</string></value> </nameValuePair>
<nameValuePair> <name>DBC_Strategy</name> <value><string>default</string></value> </nameValuePair>
</settingsForGroup>
</configurationParameterSettings>
<typeSystemDescription>
<name> [String] </name> <description>[String]</description> <version>[String]</version> <vendor>[String]</vendor>
<imports> <import ...> ... </imports>
<types> <typeDescription> ... </typeDescription>
...
</types>
</typeSystemDescription>
A typeSystemDescription
element defines a type system for
an Analysis Engine. The syntax for the
element is described in section 23.3
of this
chapter.
The recommended usage is to import
an external type
system, using the import syntax described in section 23.1 of this chapter. For example:
<typeSystemDescription> <imports> <import location="MySharedTypeSystem.xml"> </imports> </typeSystemDescription>
This allows several AEs to share a single type system
definition. The file MySharedTypeSystem.xml
would then contain the full type system information, including the name
, description
, vendor
, version
, and types
.
<typePriorities> <name> [String] </name> <description>[String]</description> <version>[String]</version> <vendor>[String]</vendor>
<imports> <import ...> ... </imports>
<priorityLists> <priorityList> <type>[TypeName]</type> <type>[TypeName]</type> ... </priorityList>
...
</priorityLists> </typePriorities>
The <typePriorities>
element contains zero or more <priorityList>
elements; each <priorityList>
contains zero or more types. Like a type system, a type priorities definition
may also declare a name, description, version, and vendor, and may import other
type priorities. The import syntax is
described in section 23.1
of this chapter.
Type priority is
used when iterating over feature structures in the CAS. For example, if the CAS contains a Sentence
annotation and a Paragraph
annotation with the same span of text (i.e. a one-sentence paragraph), which
annotation should be returned first by an iterator? Probably the Paragraph, since it is
conceptually "bigger," but the framework does not know that and must
be explicitly told that the Paragraph annotation has priority over the Sentence
annotation, like this:
<typePriorities> <priorityList> <type>org.myorg.Paragraph</type> <type>org.myorg.Sentence</type> </priorityList> </typePriorities>
All of the <priorityList>
elements defined in the descriptor
(and in all component descriptors of an aggregate analysis engine descriptor)
are merged to produce a single priority list.
Subtypes of types specified here are also ordered, unless overridden by another user-specified type ordering. For example, if you specify type A comes before type B, then subtypes of A will come before subtypes of B, unless there is an overriding specification which declares some subtype of B comes before some subtype of A.
If there are inconsistencies between the priority list (type A declared before type B in one priority list, and type B declared before type A in another), the framework will throw an exception.
User defined indexes may declare if they wish to use the type priority or not; see the next section.
<fsIndexCollection>
<name>[String]</name> <description>[String]</description> <version>[String]</version> <vendor>[String]</vendor>
<imports> <import ...> ... </imports>
<fsIndexes>
<fsIndexDescription> ... </fsIndexDescription>
<fsIndexDescription> ... </fsIndexDescription>
</fsIndexes>
</fsIndexCollection>
The fsIndexCollection
element declares Feature Structure
Indexes, which define an index that holds feature structures of a given
type. Information in the CAS is always
accessed through an index. There is a built-in
default annotation index declared which can be used to access instances of type
Annotation (or its subtypes), but if there is a need for a specialized index it
must be declared in this element. See Chapter
26, CAS Reference for details on FS indexes.
Like type systems and type priorities, an fsIndexCollection
can declare a name
,
description
,
vendor
,
and version
,
and may import other fsIndexCollection
s. The import syntax is described in
section 23.1
of this chapter.
An fsIndexCollection
may also define zero or more fsIndexDescr
iption
elements, each of which defines a single index. Each fsIndexDescription
has the form:
<fsIndexDescription>
<label>[String]</label> <typeName>[TypeName]</typeName> <kind>sorted|bag|set</kind>
<keys>
<fsIndexKey> <featureName>[Name]</featureName> <comparator>standard|reverse</comparator> </fsIndexKey>
<fsIndexKey> <typePriority/> </fsIndexKey>
...
</keys> </fsIndexDescription>
The label
element defines the name by which applications and
annotators refer to this index. The typeName
element contains the name of the type that will be contained in this
index. This must match one of the type
names defined in the <typeSystemDescription>
.
There are three possible values for the <kind>
of index. Sorted indexes enforce an
ordering of feature structures, and may contain duplicates. Bag indexes do not enforce ordering, and also
may contain duplicates. Set indexes do
not enforce ordering and may not contain duplicates. If the <kind>
element is
omitted, it will default to sorted, which is the most common type of index.
An index may define one or more keys. These keys determine the sort order of the feature structures within a sorted index, and determine equality for set indexes. Bag indexes do not use keys. Keys are ordered by precedence – the first key is evaluated first, and subsequent keys are evaluated only if necessary.
Each key is represented by an fsIndexKey
element. Most fsIndexKeys
contains a featureName
and a comparator
. The featureName
must match the
name of one of the features for the type specified in the <typeName>
element
for this index. The comparator defines
how the features will be compared – a value of standard
means that
features will be compared using the standard comparison for their data type
(e.g. for numerical types, smaller values precede larger values, and for string
types, Unicode string comparison is performed). A value of reverse
means that features will be compared using the
reverse of the standard comparison (e.g. for numerical types, larger values
precede smaller values, etc.). For Set
indexes, the comparator direction is ignored – the keys are only used for the
equality testing.
Each key used in comparisons must refer to a feature whose range type is String, Float, or Integer.
There is a second type of a key, one which contains only
the <typePriority/>
. When this key is used, it indicates that
Feature Structures will be compared using the type priorities declared in the <typePriorities>
section of the descriptor.
<capabilities> <capability>
<inputs> <type allAnnotatorFeatures="true|false">[TypeName]</type> ... <feature>[TypeName]:[Name]</feature> ... </inputs>
<outputs> <type allAnnotatorFeatures="true|false">[TypeName]</type> ... <feature>[TypeName]:[Name]</feature> ... </output>
<languagesSupported> <language>[ISO Language ID]</language> ... </languagesSupported>
<inputSofas> <sofaName>[name]</sofaName> ... </inputSofas>
<outputSofas> <sofaName>[name]</sofaName> ... </outputSofas> </capability>
<capability> ... </capability>
...
</capabilities>
The capabilities definition is used by the UIMA Framework in several ways, including setting up the Results Specification for process calls, routing control for aggregates based on language, and as part of the Sofa mapping function.
The capabilities
element contains one or more capability
elements. Because you can therefore
declare multiple capability sets, you can use this to model component behavior
that for a given set of inputs, produces a particular set of outputs.
Each capability
contains inputs
, outputs
,
languagesSupported,
inputSofas, and outputSofas
. Inputs and outputs element are required (though they may be empty); <languagesSupported>,
<inputSofas
>,
and <outputSofas>
are optional and is used
only used for TAEs.
Both inputs and outputs may contain a mixture of type and feature elements.
<type...>
elements contain the name of one of the
types defined in the type system or one of the built in types. Declaring a type as an input means that this
component expects instances of this type to be in the CAS when it receives it
to process. Declaring a type as an
output means that this component creates new instances of this type in the CAS.
There is an optional attribute allAnnotatorFeatures
,
which defaults to false if omitted. The
Component Descriptor Editor tool defaults this to true when a new type is added
to the list of inputs and/or outputs. When this attribute is true, it specifies that all of the type’s
features are also declared as input or output. Otherwise, the features that are required as inputs or populated as
outputs must be explicitly specified in feature elements.
<feature...>
elements contain the "fully-qualified"
feature name, which is the type name followed by a colon, followed by the
feature name, e.g. org.myorg.tae.TokenAnnotation:lemma
. <feature...>
elements in the <
inputs>
section must also have a corresponding type declared as an input. In output sections, this is not required. If the type is not specified as an output,
but a feature for that type is, this means that existing instances of the type
have the values of the specified features updated. Any type mentioned in a <feature>
element must be either specified as an input or an output or both.
language
elements contain one of the ISO language
identifiers, such as en
for English, or
en-US
for the United States
dialect of English.
The list of language codes can be found here:
http://www.ics.uci.edu/pub/ietf/http/related/iso639.txt
and the country codes here:
http://www.chemie.fu-berlin.de/diverse/doc/ISO_3166.html
<inputSofas>
and <outputSofas>
declare sofa names used by this
component. All Sofa names must be unique
within a particular capability set. A
Sofa name must be an input or an output, and cannot be both. It is an error to have a Sofa name declared
as an input in one capability set, and also have it declared as an output in
another capability set.
A <sofaName>
is written
as a simple Java-style identifier, without any periods in the name, except that
it may be written to end in .* . If
written in this manner, it specifies a set of Sofa names, all of which start
with the base name (the part before the .*) followed by a period and then an
arbitrary Java identifier (without periods). This form is used to specify in the descriptor that the component could
generate an arbitrary number of Sofas, the exact names and numbers of which are
unknown before the component is run.
Components can specify specific operational properties that can be useful in deployment. The following are available:
<operationalProperties>
<modifiesCas> true|false
</modifiesCas>
<multipleDeploymentAllowed> true|false
</multipleDeploymentAllowed>
<outputsNewCASes>
true|false </outputsNewCASes>
</operationalProperties>
ModifiesCas
, if false,
indicates that this component does not modify the CAS. If it is not specified, the default value is
true except for CAS Consumer components.
multipleDeploymentAllowed
, if
true, allows the component to be deployed multiple times to increase performance
throught scale-out techniques. If it is
not specified, the default value is true, except for CAS Consumer and
Collection Reader components.
outputsNewCASes,
if true,
allows the component to create new CASes
during processing, for example to break a large artifact into smaller
pieces. See CAS Multiplier
Developer's Guide for details.
<externalResourceDependencies> <externalResourceDependency> <key>[String]</key> <description>[String] </description> <interfaceName>[String]</interfaceName> <optional>true|false</optional> </externalResourceDependency>
<externalResourceDependency> ... </externalResourceDependency>
...
</externalResourceDependencies>
A primitive annotator may declare zero or more <externalResourceDependency>
elements. Each dependency has the
following elements:
key
– the string by which
the annotator code will attempt to access the resource. Must be unique within this annotator.description
– a textual description
of the dependencyinterfaceName
– the
fully-qualified name of the Java interface through which the annotator will
access the data. This is optional. If not specified, the annotator can only get
an InputStream to the data.optional
– whether the
resource is optional. If false, an
exception will be thrown if no resource is assigned to satisfy this
dependency. Defaults to false.
<resourceManagerConfiguration>
<name>[String]</name> <description>[String]</description> <version>[String]</version> <vendor>[String]</vendor>
<imports> <import ...> ... </imports>
<externalResources>
<externalResource> <name>[String]</name> <description>[String]</description> <fileResourceSpecifier> <fileUrl>[URL]</fileUrl> </fileResourceSpecifier> <implementationName>[String]</implementationName> </externalResource> ... </externalResources>
<externalResourceBindings> <externalResourceBinding> <key>[String]</key> <resourceName>[String]</resourceName> </externalResourceBinding> ... </externalResourceBindings>
</resourceManagerConfiguration>
This element declares external resources and binds them to annotators’ external resource dependencies.
The resourceManagerConfiguration
element may optionally
contain an import
,
which allows resource definitions to be stored in a separate (shareable) file.
See section 23.2
for details.
The externalResources
element contains zero or more externalResource
elements, each of which consists of:
name
– the name of the
resource. This name is referred to in
the bindings (see below). Resource names
need to be unique within any Aggregate Analysis Engine or Collection Processing
Engine, so the Java-like org.myorg.mycomponent.MyResource
syntax is recommended.description
– English
description of the resourceresource specifier
–
Declares the location of the resource. There are different possibilities for how this is done (see below).implementationName
– The
fully-qualified name of the Java class that will be instantiated from the
resource data. This is optional; if not
specified, the resource will be accessible as an input stream to the raw data. If specified, the Java class must implement
the interfaceName
that is specified in the
External Resource Dependency to which it is bound.
One possibility for the resource specifier is a <fileResourceSpecifier>
,
as shown above. This simply declares a
URL to the resource data. This support
is built on the Java class URL and its method URL.openStream(); it supports the
protocols "file", "http" and "jar" (for referring
to files in jars) by default, and you can plug in handlers for other protocols.
The URL has to start with file: (or
some other protocol). It is relative to
either the classpath or the "data path". The data path works like the classpath but
can be set programmatically via ResourceManager.setDataPath()
. Setting the Java System property uima.datapath
also works.
file:com/ibm.d.txt
is a
relative path; relative paths for resources are resolved using the classpath
and / or the datapath. For the file
protocol, URLs starting with file:/ or file:/// are absolute. Note that file://com/ibm/d.txt
is NOT an absolute path starting with com. The '//' indicates that what follows is a host name. Therefore if you try to use this URL it will
complain that it can't connect to the host "com"
Another option is a <fileLanguag
eResourceSpecifier>
,
which is intended to support resources, such as dictionaries, that depend on
the language of the document being processed. Instead of a single URL, a prefix and suffix are specified, like this:
<fileLanguageResourceSpecifier> <fileUrlPrefix>file:FileLanguageResource_implTest_data_</fileUrlPrefix> <fileUrlSuffix>.dat</fileUrlSuffix> </fileLanguageResourceSpecifier>
The URL of the actual resource is then formed by
concatenating the prefix, the language of the document (as an ISO language
code, e.g. en
or en-US
– see Capabilities for more information), and the suffix.
The externalResourceBindings
element declares which resources
are bound to which dependencies. Each externalResourceBinding
consists of:
key
– identifies the
dependency. For a binding declared in a
primitive analysis engine descriptor, this must match the value of the key
element of one of the externalResourceDependency
elements. Bindings may also be specified
in aggregate analysis engine descriptors, in which case a compound key is used
– see section External
Resource Bindings.resourceName
– the name of
the resource satisfying the dependency. This must match the value of the name
element of one of the externalResource
declarations.
A given resource dependency may only be bound to one external resource; one external resource may be bound to many dependencies – to allow resource sharing.
In several places throughout the descriptor, it is
possible to reference environment variables. In Java, these are actually references to Java system properties. To reference system environment variables
from a Java analysis engine you must pass the environment variables into the
Java virtual machine by using the -D
option on the java
command line.
The syntax for environment variable references is <envVarRef>[VariableName]</envVarRef>
,
where [VariableName] is any valid Java system property name. Environment variable references are valid in
the following places:
<annotatorImplementationName>
element of a primitive TAE descriptor<name>
element
within <analysisEngineMetaData>
<fileResourceSp
ecifier>
or <fileLanguageResourceSpecifier>
For example, if the value of a configuration parameter
were specified as: <string><envVarRef>TEMP_DIR</envVarRef>/temp.dat</string>
,
and the value of the TEMP_DIR
Java System property were c:/temp
, then the
configuration parameter's value would evaluate to c:/temp/temp.dat
.
Aggregate Analysis Engines do not contain an annotator, but instead contain one or more component (also called delegate) analysis engines.
Aggregate Analysis Engine Descriptors maintain most of the same structure as Primitive Analysis Engine Descriptors. The differences are:
<primitive>false</primitive>
rather than <primitive>true</primitive>
.
<annotatorImplementationName>
element.<annotatorImplementationName>
,
an Aggregate Analysis Engine Descriptor must have a <delegateAnalysisEngineSpecifiers>
element. See Delegate
Analysis Engine Specifiers.<flowController>
element immediately following the <delegateAnalysisEngineSpecifiers>
. See <XREF: Flow Controller>.<flowConstraints>
. See FlowConstraints. Typically only one of <flowController>
and <flowConstraints>
are specified. If both
are specified, the <flowController>
takes
precedence, and the flow controller implementation can use the information in
specified in the <flowConstraints>
as part of
its configuration input.<typeSystemDescription>
element. The Type System of the
Aggregate Analysis Engine is derived by merging the Type System of the Analysis
Engines that the aggregate contains.<configurationParameter>
elements may define <overrides>
. See Configuration
Parameter Overrides.<sofaMappings>
, may be included.
<delegateAnalysisEngineSpecifiers>
<delegateAnalysisEngine key="[String]">
<analysisEngineDescription>...</analysisEngineDescription> | <import .../>
</delegateAnalysisEngine>
<delegateAnalysisEngine key="[String]">
...
</delegateAnalysisEngine>
...
</delegateAnalysisEngineSpecifiers>
The delegateAnal
ysisEngineSpecifiers
element contains one or more delegateAnalysisEngine
elements. Each of these must have a
unique key, and must contain either:
analysisEngineDescription
element describing the delegate analysis engine ORimport
element giving the
name or location of the XML descriptor for the delegate analysis engine (see
section 23.1 ).
The latter is the much more common usage, and is the only form supported by the Component Descriptor Editor tool.
<flowController key="[String]"> <flowControllerDescription>...</flowControllerDescription> | <import .../> </flowController>
The optional flowController
element identifies the descriptor of the FlowController component that will be
used to determine the order in which delegate Analysis Engine are called.
The key
attribute is optional,
but recommended; it assigns the FlowController an identifier that can be used
for configuration parameter overrides, Sofa mappings, or external resource
bindings. The key must not be the same
as any of the delegate analysis engine keys.
As with the delegateAnalysisEngine
element, the flowController
element may contain
either a complete flowControllerDescription
or an import
, but the import is recommended. The Component Descriptor Editor tool only
supports imports here.
If a <flowController>
is
not specified, the order in which delegate Analysis Engines are called within
the aggregate Analysis Engine is specified using the <flowConstraints>
element, which must occur immediately following the configurationParameterSettings
element. If a <flowController>
is
specified, then the <flowConstraints>
are optional. They can be used to pass an ordering of
delegate keys to the <flowController>
.
There are two options for flow constraints -- <fixedFlow>
or <capabilityLangua
geFlow>
.
Each is discussed in a separate section below.
<flowConstraints>
<fixedFlow> <node>[String]</node> <node>[String]</node> ... </fixedFlow>
</flowConstraints>
The flowConstraints
element must be included immediately
following the configurationParameterSettings
element.
Currently the flowConstraints
element must contain a fixedFlow
element. Eventually, other types of flow
constraints may be possible.
The fixedFlow
element contains one or more node
elements, each of which contains an identifier which must match the key of a
delegate analysis engine specified in the delegateAnalysisEngineSpecifiers
element.
<flowConstraints> <capabilityLanguageFlow> <node>[String]</node> <node>[String]</node> ... </capabilityLanguageFlow> </flowConstraints>
If you use <capabilityLanguageFlow>
, the delegate Analysis
Engines named by the <node>
elements are called in the given order,
except that a delegate Analysis Engine is skipped if any of the following are
true (according to that Analysis Engine's declared output capabilities):
For example, if two annotators produce org.myorg.TokenAnnotation
feature structures for the same language, these feature structures will only be
produced by the first annotator in the list.
In an aggregate Analysis Engine Descriptor, each <configurationParameter>
element should contain an <overrides>
element,
with the following syntax:
<overrides>
<parameter> [delegateAnalysisEngineKey]/[parameterName] </parameter>
<parameter> [delegateAnalysisEngineKey]/[parameterName] </parameter> ...
</overrides>
Since aggregate Analysis Engines have no code associated
with them, the only way in which their configuration parameters can affect
their processing is by overriding the parameter values of one or more delegate
analysis engines. The <overrides>
element
determines which parameters, in which delegate Analysis Engines, are overridden
by this configuration parameter.
For example, consider an aggregate Analysis Engine
Descriptor that contains delegate Analysis Engines with keys annotator1
and annotator2
(as declared in the <delegateAnalysisEngine> element – see Delegate Analysis
Engine Specifiers) and also declares a configuration parameter
as follows:
<configurationParameter> <name>AggregateParam</name> <type>String</type> <overrides> <parameter>annotator1/param1</parameter> <parameter>annotator2/param2</parameter> </overrides> </configurationParameter>
The value of the AggregateParam
parameter
(whether assigned in the aggregate descriptor or at runtime by an application)
will override the value of parameter param1
in annotator1
and also override the value of parameter param2
in an
notator2
. No other
parameters will be affected.
For historical reasons only, if an aggregate Analysis Engine descriptor declares a configuration parameter with no explicit overrides, that parameter will override any parameters having the same name within any delegate analysis engine. This usage is strongly discouraged. The UIMA SDK currently supports this usage but logs a warning message to the log file. This support may be dropped in future versions.
Aggregate analysis engine descriptors can declare resource bindings that bind resources to dependencies declared in any of the delegate analysis engines (or their subcomponents, recursively) within that aggregate. This allows resource sharing. Any binding at this level overrides (supersedes) any binding specified by a contained component or their subcomponents, recursively.
For example, consider an aggregate Analysis Engine
Descriptor that contains delegate Analysis Engines with keys annotator1
and annotator2
(as declared in the <delegateAnalysisEngine>
element – see Delegate Analysis
Engine Specifiers), where annotator1
declares a
resource dependency with key myResource
and annotator2
declares a
resource dependency with key someResource
.
Within that aggregate Analysis Engine Descriptor, the
following resourceManagerConfiguration
would bind both of those dependencies to a single external resource file.
<resourceManagerConfiguration>
<externalResources> <externalResource> <name>ExampleResource</name> <fileResourceSpecifier> <fileUrl>file:MyResourceFile.dat</fileUrl> </fileResourceSpecifier> </externalResource> </externalResources>
<externalResourceBindings> <externalResourceBinding> <key>annotator1/myResource</key> <resourceName>ExampleResource</resourceName> </externalResourceBinding> <externalResourceBinding> <key>annotator2/someResource</key> <resourceName>ExampleResource</resourceName> </externalResourceBinding> </externalResourceBindings>
</resourceManagerConfiguration>
The syntax for the externalResources
declaration is exactly the same as described previously. In the resource bindings note the use of the
compound keys, e.g. annotator1/myResource
. This identifies the resource dependency key myResource
within the
annotator with key annotator1
. Compound resource dependencies can be multiple levels deep to handle
nested aggregate analysis engines.
Sofa mappings are specified between Sofa names declared in
this aggregate descriptor as part of the <capability>
section, and the Sofa names declared in the delegate components. For purposes of the mapping, all the
declarations of Sofas in any of the capability sets contained within the <capabilities>
element are considered together.
<sofaMappings> <sofaMapping> <componentKey>[keyName]</componentKey> <componentSofaName>[sofaName]</componentSofaName> <aggregateSofaName>[sofaName]</aggregateSofaName> </sofaMapping> ... </sofaMappings>
The <componentSofaName> may be omitted in the case where the component is not aware of Multiple Views or Sofas. In this case, the UIMA framework will arrange for the specified <aggregateSofaName> to be the one visible to the delegate component.
The <componentKey> is the key name for the component as specified in the list of delegate components for this aggregate.
The sofaNames used must be declared as input or output sofas in some capability set.
The basic structure of a Flow Controller Descriptor is as follows:
<?xml version="1.0" encoding="UTF-8" ?>
<flowControllerDescription xmlns="http://uima.apache.org/resourceSpecifier">
<frameworkImplementation>com.ibm.uima.java</frameworkImplementation>
<implementationName>[ClassName]</implementationName>
<processingResourceMetaData> ... </processingResourceMetaData>
<externalResourceDependencies> ... </externalResourceDependencies>
<resourceManagerConfiguration> ... </resourceManagerConfiguration>
</flowControllerDescription>
The frameworkImplementation
element must always be set to the
value com.ibm.uima.java
.
The implementationName
element must contain the
fully-qualified class name of the Flow Controller implementation. This must name a class that implements the FlowController
interface.
The processingResourceMetaData
element contains essentially
the same information as a Primitive Analysis Engine Descriptor's analysisEngineMetaData
element, described in Section 23-297 Analysis Engine Metadata.
The externalResourceDependencies
and resourceManagerConfiguration
elements are exactly the same as in Primitive Analysis Engine Descriptors (see 23-309 External
Resource Dependencies and 23-309 Resource
Manager Configuration).
There are three types of Collection Processing Components – Collection Readers, CAS Initializers, and CAS Consumers. Each type of component has a corresponding descriptor. The structure of these descriptors is very similar to that of primitive Analysis Engine Descriptors.
The basic structure of a Collection Reader descriptor is as follows:
<?xml version="1.0" encoding="UTF-8" ?>
<collectionReaderDescription xmlns="http://uima.apache.org/resourceSpecifier">
<frameworkImplementation>com.ibm.uima.java</frameworkImplementation> <implementationName>[ClassName]</implementationName>
<processingResourceMetaData> ... </processingResourceMetaData>
<externalResourceDependencies> ... </externalResourceDependencies>
<resourceManagerConfiguration>
...
</resourceManagerConfiguration>
</collectionReaderDescription>
The frameworkImplementation
element must always be set to the
value com.ibm.uima.java
.
The implementationName
element contains the fully-qualified
class name of the Collection Reader implementation. This must name a class that implements the CollectionReader
interface.
The processingResourceMetaData
element contains essentially
the same information as a Primitive Analysis Engine Descriptor's' analysisEngineMetaData
element:
<processingResourceMetaData>
<name> [String] </name> <description>[String]</description> <version>[String]</version> <vendor>[String]</vendor>
<configurationParameters> ... </configurationParameters>
<configurationParameterSettings> ... </configurationParameterSettings>
<typeSystemDescription> ... </typeSystemDescription>
<typePriorities> ... </typePriorities>
<fsIndexes> ... </fsIndexes>
<capabilities> ... </capabilities>
</processingResourceMetaData>
The contents of these elements are the same as that described in 23-297 Analysis Engine Metadata, with the exception that the capabilities section should not declare any inputs (because the Collection Reader is always the first component to receive the CAS).
The externalResourceDependencies
and resourceManagerConfiguration
elements are exactly the
same as in the Primitive Analysis Engine Descriptors (see 23-309 External Resource
Dependencies and 23-309 Resource Manager
Configuration).
The basic structure of a CAS Initializer Descriptor is as follows:
<?xml version="1.0" encoding="UTF-8" ?>
<casInitializerDescription xmlns="http://uima.apache.org/resourceSpecifier">
<frameworkImplementation>com.ibm.uima.java</frameworkImplementation> <implementationName>[ClassName] </implementationName>
<processingResourceMetaData> ... </processingResourceMetaData>
<externalResourceDependencies> ... </externalResourceDependencies>
<resourceManagerConfiguration> ... </resourceManagerConfiguration>
</casInitializerDescription>
The frameworkImplem
entation
element must always be set to the value com.ibm.uima.java
.
The implementationName
element contains the fully-qualified
class name of the CAS Initializer implementation. This must name a class that implements the CasInitializer
interface.
The proc
essingResourceMetaData
element contains essentially the same information as a Primitive Analysis
Engine Descriptor's' analysisEngineMetaData
element, as described in Section 23-297 Analysis Engine Metadata, with the exception of some changes to
the capabilities section. A CAS
Initializer's capabilities element looks like this:
<capabilities>
<capability> <outputs> <type allAnnotatorFeatures="true|false">[String]</type> <type>[TypeName]</type> ... <feature>[TypeName]:[Name]</feature> ... </outputs>
<outputSofas> <sofaName>[name]</sofaName> ... </outputSofas>
<mimeTypesSupported> <mimeType>[MIME Type]</mimeType> ... </mimeTypesSupported> </capability>
<capability> ... </capability>
...
</capabilities>
The differences between a CAS Initializer's capabilities declaration and a TAE's capabilities declaration are that the CAS Initializer does not declare any input CAS types and features or input Sofas (because it is always the first to operate on a CAS), it doesn't have a language specifier, and that the CAS Initializer may declare a set of MIME types that it supports for its input documents. Examples include: text/plain, text/html, and application/pdf. For a list of MIME types see http://www.iana.org/assignments/media-types/. This information is currently only for users' information, the framework does not use it for anything. This may change in future versions.
The externalResourceDependencies
and resourceManagerConfiguration
elements are exactly the
same as in the Primitive Analysis Engine Descriptors (see 23-309 External Resource
Dependencies and 23-309 Resource Manager
Configuration).
The basic structure of a CAS Consumer Descriptor is as follows:
<?xml version="1.0" encoding="UTF-8" ?>
<casConsumerDescription xmlns="http://uima.apache.org/resourceSpecifier">
<frameworkImplementation>com.ibm.uima.java</frameworkImplementation>
<implementationName>[ClassName] </implementationName>
<processingResourceMetaData> ... </processingResourceMetaData>
<externalResourceDependencies> ... </externalResourceDependencies>
<resourceManagerConfiguration> ... </resourceManagerConfiguration>
</casConsumerDescription>
The frameworkImplementation
element must always be set to the
value com.ibm.uima.java
.
The implementationName
element must contain the
fully-qualified class name of the CAS Consumer implementation. This must name a class that implements the CasC
onsumer
interface.
The processingResourceMetaData
element contains essentially
the same information as a Primitive Analysis Engine Descriptor's analysisEngineMetaData
element, described in Section 23-297 Analysis Engine Metadata, except that the CAS Consumer
Descriptor's capab
ilities
element should not declare outputs or outputSofas (since CAS Consumers do not
modify the CAS).
The externalResourceDependencies
and resourceManagerConfiguration
elements are exactly the
same as in Primitive Analysis Engine Descriptors (see 23-309 External
Resource Dependencies and 23-309 Resource
Manager Configuration).
Service Client Descriptors specify only a location of a remote service. They are therefore much simpler in structure. In the UIMA SDK, a Service Client Descriptor that refers to a valid Analysis Engine or CAS Consumer service can be used in place of the actual Analysis Engine or CAS Consumer Descriptor. The UIMA SDK will handle the details of calling the remote service. (For details on deploying an Analysis Engine or CAS Consumer as a service, see Chapter 24, Collection Processing Engine Descriptor Reference).
The UIMA SDK is extensible to support different types of
remote services. In future versions,
there may be different variations of service client descriptors that cater to
different types of services. For now,
the only type of service client descriptor is the uriSpecifier
, which supports
the SOAP and Vinci protocols.
<?xml version="1.0" encoding="UTF-8" ?> <uriSpecifier xmlns="http://uima.apache.org/resourceSpecifier"> <resourceType>AnalysisEngine | CasConsumer </resourceType> <uri>[URI]</uri> <protocol>SOAP | SOAPwithAttachments | Vinci</protocol> <timeout>[Integer]</timeout> <parameters> <parameter name="VNS_HOST" value="some.internet.ip.name-or-address"/> <parameter name="VNS_PORT" value="9000"/> </parameters> </uriSpecifier>
The resourceType
element is required for new descriptors, but
is currently allowed to be omitted for backward compatibility. It specifies the type of component (Analysis
Engine or CAS Consumer) that is implemented by the service endpoint described
by this descriptor.
The uri
element contains the URI for the web service. (Note that in the case of Vinci, this will be
the service name, which is looked up in the Vinci Naming Service.)
The protocol
element may be set to SOAP, SOAPwithAttachments,
or Vinci; other protocols may be added later. These specify the particular data transport format that will be used.
The t
imeout
element is optional. If present, it specifies the number of
milliseconds to wait for a request to be processed before an exception is
thrown. A value of zero or less will
wait forever. If no timeout is
specified, a default value (currently 60 seconds) will be used.
The parameter element is optional. If present, it specifies the Vinci naming
service host and/or port number. If not
present, the value used for these comes from parameters passed on the Java
command line using the -DVNS_HOST=<host>
and/or
-DVNS_PORT=<port>
system arguments. If not present, and a system argument is also
not present, the values for these default to localhost
for the VNS_HOST
and 9000
for the VNS_PORT
.
For details on how to deploy and call Analysis Engine and CAS Consumer services, see Section 6.6 , Working with Analysis Engine and CAS Consumer Services.