PEAR Reference

A PEAR (Processing Engine ARchive) file is a standard package for UIMA (Unstructured Information Management Architecture) components. This chapter describes the PEAR 1.0 structure and specification.

The PEAR package can be used for distribution and reuse by other components or applications. It also allows applications and tools to manage UIMA components automatically for verification, deployment, invocation, testing, etc.

Currently, the PEAR Eclipse Plugin is available as a tool to create PEAR files for standard UIMA components. Please refer to Chapter 14, PEAR Packager User's Guide for more information about this tool.

For the purpose of describing the process of creating a PEAR file and its internal structure, this section describes the steps used to package a UIMA component as a valid PEAR file. The PEAR packaging process consists of the following steps:

Creating the PEAR structure

The first step in the PEAR creation process is to create a PEAR structure. The PEAR structure is a structured tree of folders and files, including the following elements:

  • Required Elements:
    • The metadata folder which contains the PEAR installation descriptor and properties files.
    • The installation descriptor (metadata/install.xml)
    • A UIMA analysis engine descriptor and its required code, delegates (if any), and resources
  • Optional Elements:
    • The desc folder to contain descriptor files of analysis engines, delegates analysis engines (all levels), and other components (Collection Readers, CAS Consumers, etc).
    • The src folder to contain the source code
    • The bin folder to contain executables, scripts, class files, dlls, shared libraries, etc.
    • The lib folder to contain jar files.
    • The doc folder containing documentation materials, preferably accessible through an index.html.
    • The data folder to contain data files (e.g. for testing).
    • The conf folder to contain configuration files.
    • The resources folder to contain other resources and dependencies.
    • Other user-defined folders or files are allowed, but should be avoided.

The PEAR Structure

Populating the PEAR structure

After creating the PEAR structure, the component’s descriptor files, code files, resources files, and any other files and folders are copied into the corresponding folders of the PEAR structure. The developer should make sure that the code would work with this layout of files and folders, and that there are no broken links. Although it is strongly discouraged, the optional elements of the PEAR structure can be replaced by other user defined files and folder, if required for the component to work properly.

  • The PEAR structure must be self-contained. For example, this means that the component must run properly independently from the PEAR root folder location. If the developer needs to use an absolute path in configuration or descriptor files, then he/she should put these files in the "conf" or "desc" and replace the path of the PEAR root folder with the string "$main_root". The tools that deploy and use PEAR files should localize the files in the "conf" and "desc" folders by replacing the string "$main_root" with the local absolute path of the PEAR root folder. The "$main_root" macro can also be used in the Installation descriptor (install.xml)

Currently there are three types of component packages depending on their deployment:

Standard type

A component package with the standard type must be a valid Analysis Engine, and all the required files to deploy it locally must be included in the PEAR package.

Service type

A component package with the service type must be deployable locally as a supported UIMA service (e.g. Vinci). In this case, all the required files to deploy it locally must be included in the PEAR package.

Network Type

A component package with the network type is not deployed locally but rather in the "remote" environment. It’s accessed as a network AE (e.g. Vinci Service). The component owner has the responsibility to start the service and make sure it’s up and running before it’s used by others (like a webmaster that makes sure the web site is up and running). In this case, the PEAR package does not have to contain files required for deployment, but must contain the network AE descriptor (see 4.1.4, Creating the XML Descriptor) and the <DESC> tag in the installation descriptor must point to the network TAE descriptor. For more information about Network Analysis Engines, please refer to Section 6.6, Working with Analysis Engine and CAS Consumer Services.

Creating the installation descriptor

The installation descriptor is an xml file called install.xml under the metadata folder of the PEAR structure. It’s also called InsD. The InsD XML file should be created in the UTF-8 file encoding. The InsD should contain the following sections:

  • <OS>: This section is used to specify supported operating systems
  • <TOOLKITS>: This section is used to specify toolkits, such as JDK, needed by the component.
  • <SUBMITTED_COMPONENT>: This is the most important section in the InsD. It’s used to specify required information about the component. See section 2.3.2 for detailed information about this section.
  • <INSTALLATION>: This section is explained in section 29.1.5 .

Documented template for the installation descriptor:

The following is "documented template" for the content of the installation descriptor install.xml:

<? xml version="1.0" encoding="UTF-8"?> <!-- Installation Descriptor Template --> <COMPONENT_INSTALLATION_DESCRIPTOR> <!-- Specifications of OS names, including version, etc. --> <OS> <NAME>OS_Name_1</NAME> <NAME>OS_Name_2</NAME> </OS> <!-- Specifications of required standard toolkits --> <TOOLKITS> <JDK_VERSION>JDK_Version</JDK_VERSION> </TOOLKITS>

<!-- There are 2 types of variables that are used in the InsD: a) $main_root , which will be substituted with the real path to the main component root directory after installing the main (submitted) component b) $component_id$root, which will be substituted with the real path to the root directory of a given delegate component after installing the given delegate component -->

<!-- Specification of submitted component (TAE) --> <!-- Note: submitted_component_id is assigned by developer; --> <!-- XML descriptor file name is set by developer. --> <!-- Important: ID element should be the first in the --> <!-- SUBMITTED_COMPONENT section. --> <!-- Submitted component may include optional specification --> <!-- of Collection Reader that can be used for testing the --> <!-- submitted component. --> <!-- Submitted component may include optional specification --> <!-- of CAS Consumer that can be used for testing the --> <!-- submitted component. -->

<SUBMITTED_COMPONENT> <ID>submitted_component_id</ID> <NAME>Submitted component name</NAME> <DESC>$main_root/desc/ComponentDescriptor.xml</DESC>

<!-- deployment options: --> <!-- a) ©standard© is deploying AE locally --> <!-- b) ©service© is deploying AE locally as a service, --> <!-- using specified command (script) --> <!-- c) ©network© is deploying a pure network AE, which --> <!-- is running somewhere on the network -->

<DEPLOYMENT>standard | service | network</DEPLOYMENT>

<!-- Specifications for ©service© deployment option only --> <SERVICE_COMMAND>$main_root/bin/startService.bat</SERVICE_COMMAND> <SERVICE_WORKING_DIR>$main_root</SERVICE_WORKING_DIR> <SERVICE_COMMAND_ARGS>

<ARGUMENT> <VALUE>1st_parameter_value</VALUE> <COMMENTS>1st parameter description</COMMENTS> </ARGUMENT>

<ARGUMENT> <VALUE>2nd_parameter_value</VALUE> <COMMENTS>2nd parameter description</COMMENTS> </ARGUMENT>

</SERVICE_COMMAND_ARGS>

<!-- Specifications for ©network© deployment option only -->

<NETWORK_PARAMETERS> <VNS_SPECS VNS_HOST="vns_host_IP" VNS_PORT="vns_port_No" /> </NETWORK_PARAMETERS>

<!-- General specifications -->

<COMMENTS>Main component description</COMMENTS>

<COLLECTION_READER> <COLLECTION_ITERATOR_DESC> $main_root/desc/CollIterDescriptor.xml </COLLECTION_ITERATOR_DESC>

<CAS_INITIALIZER_DESC> $main_root/desc/CASInitializerDescriptor.xml </CAS_INITIALIZER_DESC> </COLLECTION_READER>

<CAS_CONSUMER> <DESC>$main_root/desc/CASConsumerDescriptor.xml</DESC> </CAS_CONSUMER>

</SUBMITTED_COMPONENT> <!-- Specifications of the component installation process --> <INSTALLATION> <!-- List of delegate components that should be installed together --> <!-- with the main submitted component (for aggregate components) --> <!-- Important: ID element should be the first in each -->

<!-- DELEGATE_COMPONENT section. --> <DELEGATE_COMPONENT> <ID>first_delegate_component_id</ID> <NAME>Name of first required separate component</NAME> </DELEGATE_COMPONENT>

<DELEGATE_COMPONENT> <ID>second_delegate_component_id</ID> <NAME>Name of second required separate component</NAME> </DELEGATE_COMPONENT>

<!-- Specifications of local path names that should be replaced --> <!-- with real path names after the main component as well as --> <!-- all required delegate (library) components are installed. --> <!-- <FILE> and <REPLACE_WITH> values may use the $main_root or --> <!-- one of the $component_id$root variables. --> <!-- Important: ACTION element should be the first in each --> <!-- PROCESS section. -->

<PROCESS> <ACTION>find_and_replace_path</ACTION> <PARAMETERS> <FILE>$main_root/desc/ComponentDescriptor.xml</FILE> <FIND_STRING>../resources/dict/</FIND_STRING> <REPLACE_WITH>$main_root/resources/dict/</REPLACE_WITH> <COMMENTS>Specify actual dictionary location in XML component descriptor </COMMENTS> </PARAMETERS> </PROCESS>

<PROCESS> <ACTION>find_and_replace_path</ACTION> <PARAMETERS> <FILE>$main_root/desc/DelegateComponentDescriptor.xml</FILE> <FIND_STRING> local_root_directory_for_1st_delegate_component/resources/dict/ </FIND_STRING> <REPLACE_WITH> $first_delegate_component_id$root/resources/dict/ </REPLACE_WITH> <COMMENTS> Specify actual dictionary location in the descriptor of the 1st delegate component </COMMENTS> </PARAMETERS> </PROCESS>

<!-- Specifications of environment variables that should be set prior to running the main component and all other reused components. <VAR_VALUE> values may use the $main_root or one of the $component_id$root variables. -->

<PROCESS> <ACTION>set_env_variable</ACTION> <PARAMETERS> <VAR_NAME>env_variable_name</VAR_NAME> <VAR_VALUE>env_variable_value</VAR_VALUE> <COMMENTS>Set environment variable value</COMMENTS> </PARAMETERS> </PROCESS>

</INSTALLATION> </COMPONENT_INSTALLATION_DESCRIPTOR>

The SUBMITTED_COMPONENT section

The SUBMITTED_COMPONENT section of the installation descriptor (install.xml) is the most important. It's used to specify required information about the UIMA component. Before explaining the details, let's clarify the concept of component ID and "macros" used in the installation descriptor. The component ID element should be the first element in the SUBMITTED_COMPONENT section.

The component id is a string that uniquely identifies the component. It should use the JAVA naming convention (e.g. ibm.uima.mycomponent).

Macros are variables such as $main_root, used to represent a string such as the full path of a certain directory.

These macros should be defined in the PEAR.properties file using the local values. The tools and applications that use and deploy PEAR files should replace these macros with the corresponding values in the local environment as part of the deployment process in the files included in the conf and desc folders.

Currently, there are two types of macros:

  • $main_root, which represents the local absolute path of the main component root directory after deployment.
  • $component_id$root, which represents the local absolute path to the root directory of the component which has component_id as component ID. This component could be, for instance, a delegate component.

For example, if some part of a descriptor needed to have a path to the data subdirectory of the PEAR, you would write $main_root/data. If your PEAR refers to a delegate component having the ID "my.comp.Dictionary", and you need to specify a path to one of this component's subdirectories, say resource/dict, you would write $my.comp.Dictionary$root/resources/dict.

The ID, NAME, and DESC tags

These tags are used to specify the component ID, Name, and descriptor path using the corresponding tags as follows:

<SUBMITTED_COMPONENT>

<ID>submitted_component_id</ID>

<NAME>Submitted component name</NAME>

<DESC>$main_root/desc/ComponentDescriptor.xml</DESC>

Tags related to deployment types

As mentioned before, there are currently three types of PEAR packages, depending on the following deployment types:

Standard type

A component package with the standard type must be a valid UIMA Analysis Engine, and all the required files to deploy it must be included in the PEAR package. This deployment type should be specified as follows:

<DEPLOYMENT>standard</DEPLOYMENT>

Service type

A component package with the service type must be deployable locally as a supported UIMA service (e.g. Vinci). The installation descriptor must include the path for the executable or script to start the service including its arguments, and the working directory from where to launch it, following this template:

<DEPLOYMENT>service</DEPLOYMENT>

<SERVICE_COMMAND>$main_root/bin/startService.bat</SERVICE_COMMAND>

<SERVICE_WORKING_DIR>$main_root</SERVICE_WORKING_DIR>

<SERVICE_COMMAND_ARGS>

<ARGUMENT>

<VALUE>1st_parameter_value</VALUE>

<COMMENTS>1st parameter description</COMMENTS>

</ARGUMENT>

<ARGUMENT>

<VALUE>2nd_parameter_value</VALUE>

<COMMENTS>2nd parameter description</COMMENTS>

</ARGUMENT>

</SERVICE_COMMAND_ARGS>

Network Type

A component package with the network type is not deployed locally, but rather in a "remote" environment. It’s accessed as a network AE (e.g. Vinci Service). In this case, the PEAR package does not have to contain files required for deployment, but must contain the network AE descriptor. The <DESC> tag in the installation descriptor (See section 2.3.2.1) must point to the network AE descriptor. Here is a template in the case of Vinci services:

<DEPLOYMENT>network</DEPLOYMENT>

<NETWORK_PARAMETERS>

<VNS_SPECS VNS_HOST="vns_host_IP" VNS_PORT="vns_port_No" />

</NETWORK_PARAMETERS>

The Collection Reader and CAS Consumer tags

These sections of the installation descriptor are used by any specific Collection Reader or CAS Consumer to be used with the packaged analysis engine. See the template in section 2.3.1.

The INSTALLATION section

The <INSTALLATION> section specifies the external dependencies of the component and the operations that should be performed during the PEAR package installation.

The component dependencies are specified in the <DELEGATE_COMPONENT> sub-sections, as shown in the installation descriptor template above.

Important: The ID element should be the first element in each <DELEGATE_COMPONENT> sub-section.

The <INSTALLATION> section may specify the following operations:

  • Setting environment variables that are required to run the installed component.
  • Note that you can use "macros", like $main_root or $component_id$root in the VAR_VALUE element of the <PARAMETERS> sub-section.
  • Finding and replacing string expressions in files.
  • Note that you can use the "macros" in the FILE and REPLACE_WITH elements of the <PARAMETERS> sub-section.

Important: the ACTION element always should be the 1st element in each <PROCESS> sub-section.

By default, the PEAR Installer will try to process every file in the desc and conf directories of the PEAR package in order to find the "macros" and replace them with actual path expressions. In addition to this, the installer will process the files specified in the <INSTALLATION> section.

Important: all XML files which are going to be processed should be created using UTF-8 or UTF-16 file encoding. All other text files which are going to be processed should be created using the ASCII file encoding.

Packaging the PEAR structure into one file

The last step of the PEAR process is to simply zip the content of the PEAR root folder (not including the root folder itself). The PEAR file must have a ".pear" extension.

Installing a PEAR file

For information about the installation of a PEAR file and the PEAR Installer tool, please refer to the "PEAR Installer" Chapter.