A PEAR (Processing Engine ARchive) file is a standard package for UIMA (Unstructured Information Management Architecture) components. This chapter describes the PEAR 1.0 structure and specification.
The PEAR package can be used for distribution and reuse by other components or applications. It also allows applications and tools to manage UIMA components automatically for verification, deployment, invocation, testing, etc.
Currently, the PEAR Eclipse Plugin is available as a tool to create PEAR files for standard UIMA components. Please refer to Chapter 14, PEAR Packager User's Guide for more information about this tool.
For the purpose of describing the process of creating a PEAR file and its internal structure, this section describes the steps used to package a UIMA component as a valid PEAR file. The PEAR packaging process consists of the following steps:
The first step in the PEAR creation process is to create a PEAR structure. The PEAR structure is a structured tree of folders and files, including the following elements:
After creating the PEAR structure, the component’s descriptor files, code files, resources files, and any other files and folders are copied into the corresponding folders of the PEAR structure. The developer should make sure that the code would work with this layout of files and folders, and that there are no broken links. Although it is strongly discouraged, the optional elements of the PEAR structure can be replaced by other user defined files and folder, if required for the component to work properly.
Currently there are three types of component packages depending on their deployment:
A component package with the standard type must be a valid Analysis Engine, and all the required files to deploy it locally must be included in the PEAR package.
A component package with the service type must be deployable locally as a supported UIMA service (e.g. Vinci). In this case, all the required files to deploy it locally must be included in the PEAR package.
A component package with the network type is not deployed locally but rather in the "remote" environment. It’s accessed as a network AE (e.g. Vinci Service). The component owner has the responsibility to start the service and make sure it’s up and running before it’s used by others (like a webmaster that makes sure the web site is up and running). In this case, the PEAR package does not have to contain files required for deployment, but must contain the network AE descriptor (see 4.1.4, Creating the XML Descriptor) and the <DESC> tag in the installation descriptor must point to the network TAE descriptor. For more information about Network Analysis Engines, please refer to Section 6.6, Working with Analysis Engine and CAS Consumer Services.
The installation descriptor is an xml file called install.xml under the metadata folder of the PEAR structure. It’s also called InsD. The InsD XML file should be created in the UTF-8 file encoding. The InsD should contain the following sections:
The following is "documented template" for the content of the installation descriptor install.xml:
<? xml version="1.0" encoding="UTF-8"?> <!-- Installation Descriptor Template --> <COMPONENT_INSTALLATION_DESCRIPTOR> <!-- Specifications of OS names, including version, etc. --> <OS> <NAME>OS_Name_1</NAME> <NAME>OS_Name_2</NAME> </OS> <!-- Specifications of required standard toolkits --> <TOOLKITS> <JDK_VERSION>JDK_Version</JDK_VERSION> </TOOLKITS>
<!-- There are 2 types of variables that are used in the InsD: a) $main_root , which will be substituted with the real path to the main component root directory after installing the main (submitted) component b) $component_id$root, which will be substituted with the real path to the root directory of a given delegate component after installing the given delegate component -->
<!-- Specification of submitted component (TAE) --> <!-- Note: submitted_component_id is assigned by developer; --> <!-- XML descriptor file name is set by developer. --> <!-- Important: ID element should be the first in the --> <!-- SUBMITTED_COMPONENT section. --> <!-- Submitted component may include optional specification --> <!-- of Collection Reader that can be used for testing the --> <!-- submitted component. --> <!-- Submitted component may include optional specification --> <!-- of CAS Consumer that can be used for testing the --> <!-- submitted component. -->
<SUBMITTED_COMPONENT> <ID>submitted_component_id</ID> <NAME>Submitted component name</NAME> <DESC>$main_root/desc/ComponentDescriptor.xml</DESC>
<!-- deployment options: --> <!-- a) ©standard© is deploying AE locally --> <!-- b) ©service© is deploying AE locally as a service, --> <!-- using specified command (script) --> <!-- c) ©network© is deploying a pure network AE, which --> <!-- is running somewhere on the network -->
<DEPLOYMENT>standard | service | network</DEPLOYMENT>
<!-- Specifications for ©service© deployment option only --> <SERVICE_COMMAND>$main_root/bin/startService.bat</SERVICE_COMMAND> <SERVICE_WORKING_DIR>$main_root</SERVICE_WORKING_DIR> <SERVICE_COMMAND_ARGS>
<ARGUMENT> <VALUE>1st_parameter_value</VALUE> <COMMENTS>1st parameter description</COMMENTS> </ARGUMENT>
<ARGUMENT> <VALUE>2nd_parameter_value</VALUE> <COMMENTS>2nd parameter description</COMMENTS> </ARGUMENT>
</SERVICE_COMMAND_ARGS>
<!-- Specifications for ©network© deployment option only -->
<NETWORK_PARAMETERS> <VNS_SPECS VNS_HOST="vns_host_IP" VNS_PORT="vns_port_No" /> </NETWORK_PARAMETERS>
<!-- General specifications -->
<COMMENTS>Main component description</COMMENTS>
<COLLECTION_READER> <COLLECTION_ITERATOR_DESC> $main_root/desc/CollIterDescriptor.xml </COLLECTION_ITERATOR_DESC>
<CAS_INITIALIZER_DESC> $main_root/desc/CASInitializerDescriptor.xml </CAS_INITIALIZER_DESC> </COLLECTION_READER>
<CAS_CONSUMER> <DESC>$main_root/desc/CASConsumerDescriptor.xml</DESC> </CAS_CONSUMER>
</SUBMITTED_COMPONENT> <!-- Specifications of the component installation process --> <INSTALLATION> <!-- List of delegate components that should be installed together --> <!-- with the main submitted component (for aggregate components) --> <!-- Important: ID element should be the first in each -->
<!-- DELEGATE_COMPONENT section. --> <DELEGATE_COMPONENT> <ID>first_delegate_component_id</ID> <NAME>Name of first required separate component</NAME> </DELEGATE_COMPONENT>
<DELEGATE_COMPONENT> <ID>second_delegate_component_id</ID> <NAME>Name of second required separate component</NAME> </DELEGATE_COMPONENT>
<!-- Specifications of local path names that should be replaced --> <!-- with real path names after the main component as well as --> <!-- all required delegate (library) components are installed. --> <!-- <FILE> and <REPLACE_WITH> values may use the $main_root or --> <!-- one of the $component_id$root variables. --> <!-- Important: ACTION element should be the first in each --> <!-- PROCESS section. -->
<PROCESS> <ACTION>find_and_replace_path</ACTION> <PARAMETERS> <FILE>$main_root/desc/ComponentDescriptor.xml</FILE> <FIND_STRING>../resources/dict/</FIND_STRING> <REPLACE_WITH>$main_root/resources/dict/</REPLACE_WITH> <COMMENTS>Specify actual dictionary location in XML component descriptor </COMMENTS> </PARAMETERS> </PROCESS>
<PROCESS> <ACTION>find_and_replace_path</ACTION> <PARAMETERS> <FILE>$main_root/desc/DelegateComponentDescriptor.xml</FILE> <FIND_STRING> local_root_directory_for_1st_delegate_component/resources/dict/ </FIND_STRING> <REPLACE_WITH> $first_delegate_component_id$root/resources/dict/ </REPLACE_WITH> <COMMENTS> Specify actual dictionary location in the descriptor of the 1st delegate component </COMMENTS> </PARAMETERS> </PROCESS>
<!-- Specifications of environment variables that should be set prior to running the main component and all other reused components. <VAR_VALUE> values may use the $main_root or one of the $component_id$root variables. -->
<PROCESS> <ACTION>set_env_variable</ACTION> <PARAMETERS> <VAR_NAME>env_variable_name</VAR_NAME> <VAR_VALUE>env_variable_value</VAR_VALUE> <COMMENTS>Set environment variable value</COMMENTS> </PARAMETERS> </PROCESS>
</INSTALLATION> </COMPONENT_INSTALLATION_DESCRIPTOR>
The SUBMITTED_COMPONENT section of the installation descriptor (install.xml) is the most important. It's used to specify required information about the UIMA component. Before explaining the details, let's clarify the concept of component ID and "macros" used in the installation descriptor. The component ID element should be the first element in the SUBMITTED_COMPONENT section.
The component id is a string that uniquely identifies the component. It should use the JAVA naming convention (e.g. ibm.uima.mycomponent).
Macros are variables such as $main_root, used to represent a string such as the full path of a certain directory.
These macros should be defined in the PEAR.properties file using the local values. The tools and applications that use and deploy PEAR files should replace these macros with the corresponding values in the local environment as part of the deployment process in the files included in the conf and desc folders.
Currently, there are two types of macros:
For example, if some part of a descriptor needed to have a
path to the data subdirectory of the PEAR, you would write $main_root/data
. If your PEAR refers to a delegate component having the ID "my.comp.Dictionary
", and you need to specify a path
to one of this component's subdirectories, say resource/dict
,
you would write $my.comp.Dictionary$root/resources/dict
.
These tags are used to specify the component ID, Name, and descriptor path using the corresponding tags as follows:
<SUBMITTED_COMPONENT>
<ID>submitted_component_id</ID>
<NAME>Submitted component name</NAME>
<DESC>$main_root/desc/ComponentDescriptor.xml</DESC>
As mentioned before, there are currently three types of PEAR packages, depending on the following deployment types:
Standard type
A component package with the standard type must be a valid UIMA Analysis Engine, and all the required files to deploy it must be included in the PEAR package. This deployment type should be specified as follows:
<DEPLOYMENT>standard</DEPLOYMENT>
Service type
A component package with the service type must be deployable locally as a supported UIMA service (e.g. Vinci). The installation descriptor must include the path for the executable or script to start the service including its arguments, and the working directory from where to launch it, following this template:
<DEPLOYMENT>service</DEPLOYMENT>
<SERVICE_COMMAND>$main_root/bin/startService.bat</SERVICE_COMMAND>
<SERVICE_WORKING_DIR>$main_root</SERVICE_WORKING_DIR>
<SERVICE_COMMAND_ARGS>
<ARGUMENT>
<VALUE>1st_parameter_value</VALUE>
<COMMENTS>1st parameter description</COMMENTS>
</ARGUMENT>
<ARGUMENT>
<VALUE>2nd_parameter_value</VALUE>
<COMMENTS>2nd parameter description</COMMENTS>
</ARGUMENT>
</SERVICE_COMMAND_ARGS>
Network Type
A component package with the network type is not deployed locally, but rather in a "remote" environment. It’s accessed as a network AE (e.g. Vinci Service). In this case, the PEAR package does not have to contain files required for deployment, but must contain the network AE descriptor. The <DESC> tag in the installation descriptor (See section 2.3.2.1) must point to the network AE descriptor. Here is a template in the case of Vinci services:
<DEPLOYMENT>network</DEPLOYMENT>
<NETWORK_PARAMETERS>
<VNS_SPECS VNS_HOST="vns_host_IP" VNS_PORT="vns_port_No" />
</NETWORK_PARAMETERS>
These sections of the installation descriptor are used by any specific Collection Reader or CAS Consumer to be used with the packaged analysis engine. See the template in section 2.3.1.
The <INSTALLATION> section specifies the external dependencies of the component and the operations that should be performed during the PEAR package installation.
The component dependencies are specified in the <DELEGATE_COMPONENT> sub-sections, as shown in the installation descriptor template above.
Important: The ID element should be the first element in each <DELEGATE_COMPONENT> sub-section.
The <INSTALLATION> section may specify the following operations:
Important: the ACTION element always should be the 1st element in each <PROCESS> sub-section.
By default, the PEAR Installer will try to process every file in the desc and conf directories of the PEAR package in order to find the "macros" and replace them with actual path expressions. In addition to this, the installer will process the files specified in the <INSTALLATION> section.
Important: all XML files which are going to be processed should be created using UTF-8 or UTF-16 file encoding. All other text files which are going to be processed should be created using the ASCII file encoding.
The last step of the PEAR process is to simply zip the content of the PEAR root folder (not including the root folder itself). The PEAR file must have a ".pear" extension.
For information about the installation of a PEAR file and the PEAR Installer tool, please refer to the "PEAR Installer" Chapter.