correlations =
...;
ItemSimilarity itemSimilarity =
new GenericItemSimilarity(correlations);
Then we can finish as before to produce recommendations:
Recommender recommender =
new GenericItemBasedRecommender(model, itemSimilarity);
Recommender cachingRecommender = new CachingRecommender(recommender);
...
List recommendations =
cachingRecommender.recommend(1234, 10);
h3. Slope-One Recommender
This is a simple yet effective Recommender and we present another example
to round out the list:
DataModel model = new FileDataModel(new File("data.txt"));
// Make a weighted slope one recommender
Recommender recommender = new SlopeOneRecommender(model);
Recommender cachingRecommender = new
CachingRecommender(recommender);
## Integration with your application
### Direct
You can create a Recommender, as shown above, wherever you like in your
Java application, and use it. This includes simple Java applications or GUI
applications, server applications, and J2EE web applications.
### Standalone server
A Mahout recommender can also be run as an external server, which may be
the only option for non-Java applications. It can be exposed as a web
application via org.apach.mahout.cf.taste.web.RecommenderServlet, and your
application can then access recommendations via simple HTTP requests and
response. See above, and see the javadoc for details.
## Performance
### Runtime Performance
The more data you give, the better. Though Mahout is designed for
performance, you will undoubtedly run into performance issues at some
point. For best results, consider using the following command-line flags to
your JVM:
* -server: Enables the server VM, which is generally appropriate for
long-running, computation-intensive applications.
* -Xms1024m -Xmx1024m: Make the heap as big as possible -- a gigabyte
doesn't hurt when dealing with tens millions of preferences. Mahout
recommenders will generally use as much memory as you give it for caching,
which helps performance. Set the initial and max size to the same value to
avoid wasting time growing the heap, and to avoid having the JVM run minor
collections to avoid growing the heap, which will clear cached values.
* -da -dsa: Disable all assertions.
* -XX:NewRatio=9: Increase heap allocated to 'old' objects, which is most
of them in this framework
* -XX:+UseParallelGC -XX:+UseParallelOldGC (multi-processor machines only):
Use a GC algorithm designed to take advantage of multiple processors, and
designed for throughput. This is a default in J2SE 5.0.
* -XX:-DisableExplicitGC: Disable calls to System.gc(). These calls can
only hurt in the presence of modern GC algorithms; they may force Mahout to
remove cached data needlessly. This flag isn't needed if you're sure your
code and third-party code you use doesn't call this method.
Also consider the following tips:
* Use CachingRecommender on top of your custom Recommender implementation.
* When using JDBCDataModel, make sure you've taken basic steps to optimize
the table storing preference data. Create a primary key on the user ID and
item ID columns, and an index on them. Set them to be non-null. And so on.
Tune your database for lots of concurrent reads! When using JDBC, the
database is almost always the bottleneck. Plenty of memory and caching are
even more important.
* Also, pooling database connections is essential to performance. If using
a J2EE container, it probably provides a way to configure connection pools.
If you are creating your own DataSource directly, try wrapping it in
org.apache.mahout.cf.taste.impl.model.jdbc.ConnectionPoolDataSource
* See MySQL-specific notes on performance in the javadoc for
MySQLJDBCDataModel.
### Algorithm Performance: Which One Is Best?
There is no right answer; it depends on your data, your application,
environment, and performance needs. Mahout provides the building blocks
from which you can construct the best Recommender for your application. The
links below provide research on this topic. You will probably need a bit of
trial-and-error to find a setup that works best. The code sample above
provides a good starting point.
Fortunately, Mahout provides a way to evaluate the accuracy of your
Recommender on your own data, in org.apache.mahout.cf.taste.eval"
DataModel myModel = ...;
RecommenderBuilder builder = new RecommenderBuilder() {
public Recommender buildRecommender(DataModel model) {
// build and return the Recommender to evaluate here
}
};
RecommenderEvaluator evaluator =
new AverageAbsoluteDifferenceRecommenderEvaluator();
double evaluation = evaluator.evaluate(builder, myModel, 0.9, 1.0);
For "boolean" data model situations, where there are no notions of
preference value, the above evaluation based on estimated preference does
not make sense. In this case, try this kind of evaluation, which presents
traditional information retrieval figures like precision and recall, which
are more meaningful:
...
RecommenderIRStatsEvaluator evaluator =
new GenericRecommenderIRStatsEvaluator();
IRStatistics stats =
evaluator.evaluate(builder, null, myModel, null, 3,
RecommenderIRStatsEvaluator.CHOOSE_THRESHOLD,
§1.0);
## Useful Links
You'll want to look at these packages too, which offer more algorithms and
approaches that you may find useful:
* [Cofi](http://www.nongnu.org/cofi/)
: A Java-Based Collaborative Filtering Library
* [CoFE](http://eecs.oregonstate.edu/iis/CoFE/)
Here's a handful of research papers that I've read and found particularly
useful:
J.S. Breese, D. Heckerman and C. Kadie, "[Empirical Analysis of Predictive Algorithms for Collaborative Filtering](http://research.microsoft.com/research/pubs/view.aspx?tr_id=166)
," in Proceedings of the Fourteenth Conference on Uncertainity in
Artificial Intelligence (UAI 1998), 1998.
B. Sarwar, G. Karypis, J. Konstan and J. Riedl, "[Item-based collaborative filtering recommendation algorithms](http://www10.org/cdrom/papers/519/)
" in Proceedings of the Tenth International Conference on the World Wide
Web (WWW 10), pp. 285-295, 2001.
P. Resnick, N. Iacovou, M. Suchak, P. Bergstrom and J. Riedl, "[GroupLens: an open architecture for collaborative filtering of netnews](http://doi.acm.org/10.1145/192844.192905)
" in Proceedings of the 1994 ACM conference on Computer Supported
Cooperative Work (CSCW 1994), pp. 175-186, 1994.
J.L. Herlocker, J.A. Konstan, A. Borchers and J. Riedl, "[An algorithmic framework for performing collaborative filtering](http://www.grouplens.org/papers/pdf/algs.pdf)
" in Proceedings of the 22nd annual international ACM SIGIR Conference on
Research and Development in Information Retrieval (SIGIR 99), pp. 230-237,
1999.
Clifford Lyon, "[Movie Recommender](http://materialobjects.com/cf/MovieRecommender.pdf)
" CSCI E-280 final project, Harvard University, 2004.
Daniel Lemire, Anna Maclachlan, "[Slope One Predictors for Online Rating-Based Collaborative Filtering](http://www.daniel-lemire.com/fr/abstracts/SDM2005.html)
," Proceedings of SIAM Data Mining (SDM '05), 2005.
Michelle Anderson, Marcel Ball, Harold Boley, Stephen Greene, Nancy Howse, Daniel Lemire and Sean McGrath, "[RACOFI: A Rule-Applying Collaborative Filtering System](http://www.daniel-lemire.com/fr/documents/publications/racofi_nrc.pdf)
"," Proceedings of COLA '03, 2003.
These links will take you to all the collaborative filtering reading you
could ever want!
* [Paul Perry's notes](http://www.paulperry.net/notes/cf.asp)
* [James Thornton's collaborative filtering resources](http://jamesthornton.com/cf/)
* [Daniel Lemire's blog](http://www.daniel-lemire.com/blog/)
which frequently covers collaborative filtering topics