Apache OpenNLP ${pom.version} Release Notes

Contents

What is Similarity component of Apache OpenNLP?
This Release
How to Get Involved
How to Report Issues
List of JIRA Issues Fixed in this Release

1. What is Apache OpenNLP?

This component does text relevance assessment. It takes two portions of texts (phrases, sentences, paragraphs) and returns a similarity score. Similarity component can be used on top of search to improve relevance, computing similarity score between a question and all search results (snippets). Also, this component is useful for web mining of images, videos, forums, blogs, and other media with textual descriptions. Such applications as content generation and filtering meaningless speech recognition results are included in the sample applications of this component. Relevance assessment is based on machine learning of syntactic parse trees (constituency trees, http://en.wikipedia.org/wiki/Parse_tree). The similarity score is calculated as the size of all maximal common sub-trees for sentences from a pair of texts ( www.aaai.org/ocs/index.php/WS/AAAIW11/paper/download/3971/4187, www.aaai.org/ocs/index.php/FLAIRS/FLAIRS11/paper/download/2573/3018, www.aaai.org/ocs/index.php/SSS/SSS10/paper/download/1146/1448). The objective of Similarity component is to give an application engineer as tool for text relevance which can be used as a black box, no need to understand computational linguistics or machine learning.

This Release

Please see the README for this information.

How to Get Involved

The Apache OpenNLP project really needs and appreciates any contributions, including documentation help, source code and feedback. If you are interested in contributing, please visit http://opennlp.apache.org/

How to Report Issues

The Apache OpenNLP project uses JIRA for issue tracking. Please report any issues you find at http://issues.apache.org/jira/browse/opennlp

List of JIRA Issues Fixed in this Release

Click issuesFixed/jira-report.hmtl for the list of issues fixed in this release.