PEAR Merger User's Guide

The PEAR Merger utility takes two or more PEAR files and merges their contents, creating a new PEAR which has, in turn, a new Aggregate analysis engine whose delegates are the components from the original files being merged. It does this by (1) copying the contents of the input components into the output component, placing each component into a separate subdirectory, (2) generating a UIMA descriptor for the output Aggregate text analysis engine and (3) creating an output PEAR file that encapsulates the output Aggregate.

The merge logic is quite simple, and is intended to work for simple cases. More complex merging needs to be done by hand. Please see the Restrictions and Limitations section, below.

This is a command-line utility; there are shell scripts (.bat for Windows, and .sh for Unix) to run it.

runPearMerger 1st_input_pear_file ... nth_input_pear_file
-n output_analysis_engine_name [ -f output_pear_file ]

The first group of parameters are the input PEAR files. No duplicates are allowed here. The -n parameter is the name of the generated Aggregate Analysis Engine. The optional -f parameter specifies the name of the output file. If it is omitted, the output is written to output_tae_name.pear in the current working directory.

During the running of this tool, work files are written to a temporary directory created in the user's home directory.

The PEARs are merged using the following steps:

  1. A temporary working directory, is created for the output aggregate component.
  2. Each input PEAR file is extracted into a separate 'input_component_name' folder under the working directory.
  3. The extracted files are processed to adjust the '$main_root' macros. This operation differs from the PEAR installation operation, because it does not replace the macros with absolute paths.
  4. The output PEAR directory structure, 'metadata' and 'desc' folders under the working directory, are created.
  5. The UIMA TAE descriptor for the output aggregate component is built in the 'desc' folder. This aggregate descriptor refers to the input delegate components, specifying 'fixed flow' based on the original order of the input components in the command line. The aggregate descriptor's 'capabilities' and 'operational properties' sections are built based on the input components' specifications.
  6. A new PEAR installation descriptor is created in the 'metadata' folder, referencing the new output aggregate descriptor built in the previous step.
  7. The content of the temporary output working directory is zipped to created the output PEAR, and then the temporary working directory is deleted.

The PEAR merger utility logs all the operations both to standard console output and to a log file, pm.log, which is created in the current working directory.

The output PEAR file can be installed and tested using the PEAR Installer. The output aggregate component can also be tested by using the CVD or DocAnalyzer tools.

The PEAR Installer creates Eclipse project files (.classpath and .project) in the root directory of the installer PEAR, so the installed component can be imported into the Eclipse IDE as an external project. Once the component is in the Eclipse IDE, developers may use the Component Descriptor Editor and the PEAR Packager to modify the output aggregate descriptor and re-package the component.

The PEAR Merger utility only does basic merging operations, and is limited as follows. You can overcome these by editing the resulting PEAR file or the resulting Aggregate Descriptor.

  1. The Merge operation specifies Fixed Flow sequencing for the Aggregate.
  2. The merged aggregate does not define any parameters, so the delegate parameters cannot be overridden.
  3. No External Resource definitions are generated for the aggregate.
  4. No Sofa Mappings are generated for the aggregate.
  5. Name collisions are not checked for. Possible name collisions could occur in the fully-qualified class names of the implementing Java classes, the names of JAR files, the names of descriptor files, and the names of resource bindings or resource file paths.
  6. The input and output capabilities are generated based on merging the capabilities from the components (removing duplicates). Capability sets are ignored - only the first of the set is used in this process, and only one set is created for the generated Aggregate. There is no support for merging Sofa specifications.
  7. No Indexes or Type Priorities are created for the generated Aggregate. No checking is done to see if the Indexes or Type Priorities of the components conflict or are inconsistent.
  8. You can only merge Analysis Engines and CAS Consumers.
  9. Although PEAR file installation descriptors that are being merged can have specific XML elements describing Collection Reader and CAS Consumer descriptors, these elements are ignored during the merge, in the sense that the installation descriptor that is created by the merge does not set these elements. The merge process does not use these elements; the output PEAR's new aggregate only references the merged components' main PEAR descriptor element, as identified by the PEAR element:
    <SUBMITTED_COMPONENT>
    <DESC>the_component.xml</DESC>...
    </SUBMITTED_COMPONENT>
    .