The PEAR Merger utility takes two or more PEAR files and
merges their contents, creating a new PEAR which has, in turn, a new Aggregate
analysis engine whose delegates are the components from the original files
being merged. It does this by (1)
copying the contents of the input components into the output component, placing
each component into a separate subdirectory, (2) generating a UIMA descriptor for
the output Aggregate text analysis engine and (3) creating an output PEAR file that encapsulates the
output Aggregate.
The merge logic is quite simple, and is intended to work
for simple cases. More complex merging
needs to be done by hand. Please see the
Restrictions and Limitations section, below.
This is a command-line utility; there are shell scripts
(.bat for Windows, and .sh for Unix) to run it.
runPearMerger 1st_input_pear_file
... nth_input_pear_file
-n
output_analysis_engine_name [ -f output_pear_file ]
The first group of parameters are the input PEAR files. No duplicates are allowed here. The -n
parameter
is the name of the generated Aggregate Analysis Engine. The optional -f
parameter specifies the name of the output file. If it is omitted, the output is written to output_tae_name.pear
in the current working directory.
During the running of this tool, work files are written to
a temporary directory created in the user's home directory.
The PEARs are merged using the following steps:
- A temporary working directory, is created for the output aggregate
component.
- Each input PEAR file is extracted into a separate 'input_component_name'
folder under the working directory.
- The extracted files are processed to adjust the '$main_root' macros.
This operation differs from the PEAR installation operation, because it does
not replace the macros with absolute paths.
- The output PEAR directory structure, 'metadata' and 'desc' folders under the working
directory, are created.
- The UIMA TAE descriptor for the output aggregate component is built in
the 'desc' folder. This aggregate descriptor refers to the input delegate
components, specifying 'fixed flow' based on the original order of the input
components in the command line. The aggregate descriptor's 'capabilities' and
'operational properties' sections are built based on the input components'
specifications.
- A new PEAR installation descriptor is created in the 'metadata' folder,
referencing the new output aggregate descriptor built in the previous step.
- The content of the temporary output working directory is zipped to
created the output PEAR, and then the temporary working directory is deleted.
The PEAR merger utility logs all the operations both to
standard console output and to a log file, pm.log, which is created in the
current working directory.
The output PEAR file can be installed and tested using the
PEAR Installer. The output aggregate component can also be tested by using the
CVD or DocAnalyzer tools.
The PEAR Installer creates Eclipse project files
(.classpath and .project) in the root directory of the installer PEAR, so the
installed component can be imported into the Eclipse IDE as an external
project. Once the component is in the Eclipse IDE, developers may use the
Component Descriptor Editor and the PEAR Packager to modify the output
aggregate descriptor and re-package the component.
The PEAR Merger utility only does basic merging
operations, and is limited as follows. You can overcome these by editing the resulting PEAR file or the
resulting Aggregate Descriptor.
- The Merge operation specifies Fixed Flow sequencing for the Aggregate.
- The merged aggregate does not define any parameters, so the delegate
parameters cannot be overridden.
- No External Resource definitions are generated for the aggregate.
- No Sofa Mappings are generated for the aggregate.
- Name collisions are not checked for. Possible name collisions could occur in the fully-qualified class names
of the implementing Java classes, the names of JAR files, the names of
descriptor files, and the names of resource bindings or resource file paths.
- The input and output capabilities are generated based on merging the
capabilities from the components (removing duplicates). Capability sets are ignored - only the first
of the set is used in this process, and only one set is created for the
generated Aggregate. There is no support
for merging Sofa specifications.
- No Indexes or Type Priorities are created for the generated
Aggregate. No checking is done to see if
the Indexes or Type Priorities of the components conflict or are inconsistent.
- You can only merge Analysis Engines and CAS Consumers.
- Although PEAR file installation descriptors that are being merged can
have specific XML elements describing Collection Reader and CAS Consumer
descriptors, these elements are ignored during the merge, in the sense that the
installation descriptor that is created by the merge does not set these
elements. The merge process does not use
these elements; the output PEAR's new aggregate
only references the merged components' main PEAR descriptor element, as
identified by the PEAR element:
<SUBMITTED_COMPONENT>
<DESC>the_component.xml</DESC>...
</SUBMITTED_COMPONENT>
.