The OpenNLP Maxent Homepage

HOWTO

[Home] [About] [HOWTO] [Download] [API] [Forums] [CVS]

Introduction

We've tried to make it fairly easy to build and use maxent models, but you need two things to start with:

An understanding of feature selection for maxent modeling.
Java skills or the ability to read some example Java code and turn it into what you need.

I'll write a very basic summary of what goes on with feature selection. For more details refer to some of the papers mentioned in here.

Features in maxent are functions from outcomes (classes) and contexts to true or false. To take an example from Adwait Ratnaparkhi's part of speech tagger, a useful feature might be:

    feature (outcome, context) = { 1   if outcome=DETERMINER
                                                             {          && currentword(context) = "that"
                                                             { 0   otherwise

Your job, as a person creating a model of a classification task, is to select the features that will be useful in making decisions. One thing to keep in mind, especially if you are reading any papers on maxent, is that the theoretical representation of these features is not the same as how they are represented in the implementation. (Actually, you really don't need to know the theoretical side to start selecting features with opennlp.maxent.) If you are familiar with feature selection for Adwait Ratnaparkhi's maxent implementation, you should have no problems since our implementation uses features in the same manner as his. Basically, features like the example above are reduced, for your purposes, to the contextual predicate portion of the feature, i.e. currentword(context)="that" (in the implementation this will further reduce to "current=that" or even just "that"). From this point on, I'll forget theory and discuss features from the perspective of the implementation, but for correctness I'll point out that whenever I say feature, I am actually talking about a contextual predicate which will expand into several features (however, this is entirely hidden from the user, so don't worry if you don't understand).

Using a Model

So, say you want to implement a program which uses maxent to find names in a text., such as:

He succeeds Terrence D. Daniels, formerly a W.R. Grace vice chairman, who resigned.

If you are currently looking at the word Terrence and are trying to decide if it is a name or not, examples of the kinds of features you might use are "previous=succeeds", "current=Terrence", "next=D.", and "currentWordIsCapitalized". You might even add a feature that says that "Terrence" was seen as a name before.

Here's how this information translates into the implementation. Let's assume that you already have a trained model for name finding available, that you have created an instance of the MaxentModel interface using that model, and that you are at currently looking at Terrence in the example sentence above. To ask the model whether it believes that Terrence is a name or not, you send a String[] with all of the features (such as those discussed above) to the model by calling the method:

public double[] eval(String[] context);

The double[] which you get back will contain the probabilities of the various outcomes which the model has assigned based on the features which you sent it. The indexes of the double[] are actually paired with outcomes. For example, the outcomes associated with the probabilites might be "TRUE" for index 0 and "FALSE" for index 1. To find the String name of a particular index outcome, call the method:

public String getOutcome(int i);

Also, if you have gotten back double[] after calling eval and are interested in only the outcome which the model assigns the highest probability, you can call the method:

public String getBestOutcome(double[] outcomes);

And this will return the String name of that most likely outcome.

You can find many examples of these methods being used to make predictions for natural language processing tasks in the OpenNLP Tools project

Training a Model

In order to train a model, you need some way to produce a set of events which serve as examples for your model. This is typically done by using data that has been annotated by someone with the outcomes that your model is trying to predict. This is done with an EventStream object. An event stream is just an iterator over a set of events. An event consists of an outcome and a context. For the example above, an event might look like:

outcome: T
context: previous=succeeds current=Terrence next=D. currentWordIsCapitalized

Once you have both your EventStream implementation as well as your training data in hand, you can train up a model. opennlp.maxent has an implementation of Generalized Iterative Scaling (opennlp.maxent.GIS) which you can use for this purpose. Write some code somewhere to make a call to the method GIS.trainModel.

public static MaxentModel trainModel(DataIndexer di, int iterations) { ... }

The iterations are the number of times the training procedure should iterate when finding the model's parameters. You shouldn't need more than 100 iterations, and when you are first trying to create your model, you'll probably want to use fewer so that you can iron out problems without waiting each time for all those iterations, which can be quite a while depending on the task.

The DataIndexer is an abstract object that pulls in all those events that your EventStream has gathered and then manipulates them into a format that is much more efficient for the training procedure to work with. There is nothing complicated here --- you just need to create an instance of a DataIndexer, typically the OnePassDataIndexer, with the events and an integer that is the cutoff for the number of times a feature must have been seen in order to be considered in the model.

public OnePassDataIndexer(EventStream es, int cutoff){ ... }

You can also call the constructor OnePassDataIndexer(EventStream events), which assumes a cutoff of 0.

Once the model is returned you can write it to disk using the following code:

File outputFile = new File(modelFileName+".bin.gz");
GISModelWriter writer = new SuffixSensiiveGISModelWriter(model, outputFile);
writer.persist();

This will save you're model in a compressed binary format (using the BinaryGISModelWriter class) based on the file extension.

Likewise you can load your model from disk using:

GISModel m = new SuffixSensitiveGISModelReader(new File(modelFileName)).getModel();

A more detailed example is available in the "samples/sports" section of the distribution which comes with training data, code to build a model, data to test the model on, and code to make predictions and evaluate to model against the test data.

That's it! Hopefully, with this little HOWTO and the example implementations available in opennlp.grok.preprocess, you'll be able to get maxent models up and running without too much difficulty. Please let me know if any parts of this HOWTO are particularly confusing and I'll try to make things more clear. I would also welcome "patches" to this document if you feel like making changes yourself.

If you have any questions, do not hesitate to post them on the help forum.

[Home] [About] [HOWTO] [Download] [API] [Forums] [CVS]