Apache UIMA (Unstructured Information Management Architecture) v2.3.1 SDK ------------------------------------------------------------------------- Building from the Source Distribution ------------------------------------- We use Maven 3.0 for building; download this if needed, and set the environment variable MAVEN_OPTS to -Xmx800m -XX:MaxPerSize-256m. Then build everything by going into the .../uimaj directory, and issuing the command mvn clean install. Look for the result in the two artifacts: ../uimaj/target/uimaj-[version]-source-release.zip and ../uimaj-distr/target/uimaj-[version]-bin.zip (or ...tar.gz) For more details, please see http://uima.apache.org/building-uima.html What's New in 2.3.1 ------------------- CAS Editor ---------- The Cas Editor received a few important enhancements aimed at making it the preferred editor and viewer for CAS files inside a UIMA project. It is now possible to open CAS files from any location within an eclipse workspace. Based on feedback from our users the Cas Editor received a number of usability enhancements and bug fixes. A new view to show the annotation colors and visualization style was added. The view interacts with the editor and can be used to define which kind of annotations are displayed. There is now a new annotation style to display features values in between the text lines. This is especially useful to show Part-Of-Speech tags, entity categories or a stemming. There have been a couple of minor improvements: the editor context menu now shows keyboard shortcuts, and it is possible to choose the encoding and resulting CAS format when importing a text file. Build ----- The build process was redone to align it with normal Maven build procedures, where possible. There is a new top-level directory in our SVN called "build" which holds the build tooling. The new build tooling is described in general on the website http://uima.apache.org/maven-design.html Result Specification -------------------- The implementation handling the result specification was redone to correct several corner case issues. Performance ----------- The Component Descriptor Editor (CDE) (an Eclipse tool) had potentially poor performance when editing aggregate descriptors that referenced many remote components. The performance in this case has been greatly improved, using a caching technique and eliminating repetitive remote accesses. A complete list of fixes is in issuesFixed/jira-report.hmtl. Supported Platforms -------------------- Apache UIMA requires Java level 1.5; it has been tested with Sun Java SDK v5.0 and v6.0, and IBM Java 6.0. Running the Eclipse plugin tooling for UIMA requires you start Eclipse using a Java 1.5 or later, as well. The supported platforms are: Windows, Linux, Solaris, AIX and Mac OS X. Other platforms and Java (1.5+) implementations should work, but have not been significantly tested. Many of the scripts in the /bin directory invoke Java. They use the value of the environment variable, JAVA_HOME, to locate the Java to use; if it is not set, they invoke "java" expecting to find an appropriate Java in your PATH. Environment Variables ---------------------- After you have unpacked the Apache UIMA distribution from the package of your choice (e.g. .zip or .gz), perform the steps below to set up UIMA so that it will function properly. * Set JAVA_HOME to the directory of your JRE installation you would like to use for UIMA. * Set UIMA_HOME to the apache-uima directory of your unpacked Apache UIMA distribution * Append UIMA_HOME/bin to your PATH * Please run the script UIMA_HOME/bin/adjustExamplePaths.bat (or .sh), to update paths in the examples based on the actual UIMA_HOME directory path. This script runs a Java program; you must either have java in your PATH or set the environment variable JAVA_HOME to a suitable JRE. Note: The Mac OS X operating system procedures for setting up global environment variables are described here: see http://developer.apple.com/qa/qa2001/qa1067.html. Verifying Your Installation ---------------------------- To test the installation, run the documentAnalyzer.bat (or .sh) file located in the bin subdirectory. This should pop up a "Document Analyzer" window. Set the values displayed in this GUI to as follows: * Input Directory: UIMA_HOME/examples/data * Output Directory: UIMA_HOME/examples/data/processed * Location of Analysis Engine XML Descriptor: UIMA_HOME/examples/descriptors/analysis_engine/PersonTitleAnnotator.xml Replace UIMA_HOME above with the path of your Apache UIMA installation. Next, click the "Run" button, which should, after a brief pause, pop up an "Analyzed Results" window. Double-click on one of the documents to display the analysis results for that document. Getting Started ---------------- For an introduction to Apache UIMA and how to use it, please read the documentation located in the docs subdirectory. A good place to start is the overview_and_setup book's first chapter, which has a brief guide to the documentation.