Apache Mahout: Scalable machine learning and data mining

Apache Mahout is a new Apache TLP project to create scalable, machine learning algorithms under the Apache license.

{toc:style=disc

minlevel=2}

General

Overview – Mahout? What’s that supposed to be?

Quickstart – learn how to quickly setup Apache Mahout for your project.

FAQ – Frequent questions encountered on the mailing lists.

Developer Resources – overview of the Mahout development infrastructure.

How To Contribute – get involved with the Mahout community.

How To Become A Committer – become a member of the Mahout development community.

Hadoop – several of our implementations depend on Hadoop.

Machine Learning Open Source Software – other projects implementing Open Source Machine Learning libraries.

Mahout – The name, history and its pronunciation

Community

Who we are – who are the developers behind Apache Mahout?

Books, Tutorials, Talks, Articles, News, Background Reading, etc. on Mahout

Issue Tracker – see what features people are working on, submit patches and file bugs.

Source Code (SVN) – [Fisheye|http://fisheye6.atlassian.com/browse/mahout] – download the Mahout source code from svn.

Mailing lists and IRC – links to our mailing lists, IRC channel and archived design and algorithm discussions, maybe your questions was answered there already?

Version Control – where we track our code.

Professional Support – who is offering professional support for Mahout?

Mahout and Google Summer of Code – All you need to know about Mahout and GSoC.

Glossary of commonly used terms and abbreviations

Installation/Setup

System Requirements – what do you need to run Mahout?

Quickstart – get started with Mahout, run the examples and get pointers to further resources.

Downloads – a list of Mahout releases.

Download and installation – build Mahout from the sources.

Mahout on Amazon’s EC2 Service – run Mahout on Amazon’s EC2.

Mahout on Amazon’s EMR – Run Mahout on Amazon’s Elastic Map Reduce

Integrating Mahout into an Application – integrate Mahout’s capabilities in your application.

Examples

ASF Email Examples – Examples of recommenders, clustering and classification all using a public domain collection of 7 million emails.

Implementation Background

Requirements and Design

Matrix and Vector Needs – requirements for Mahout vectors.

Collection(De-)Serialization

Collections and Algorithms

Learn more about mahout-collections , containers for efficient storage of primitive-type data and open hash tables.

Learn more about the Algorithms discussed and employed by Mahout.

Learn more about the Mahout recommender implementation .

Utilities

This section describes tools that might be useful for working with Mahout.

Converting Content – Mahout has some utilities for converting content such as logs to formats more amenable for consumption by Mahout. Creating Vectors – Mahout’s algorithms operate on vectors. Learn more on how to generate these from raw data. Viewing Result – How to visualize the result of your trained algorithms.

Data

Collections – To try out and test Mahout’s algorithms you need training data. We are always looking for new training data collections.

Benchmarks

Mahout Benchmarks

Committer’s Resources

Testing – Information on test plans and ideas for testing

Project Resources

Additional Resources

Apache Machine Status - Check to see if SVN, other resources are available.
Committer’s FAQ
Apache Dev

How To Edit This Wiki

How to edit this Wiki

This Wiki is a collaborative site, anyone can contribute and share:

Create an account by clicking the “Login” link at the top of any page, and picking a username and password.
Edit any page by pressing Edit at the top of the page

There are some conventions used on the Mahout wiki:

* {noformat}+*TODO:*+{noformat} (+*TODO:*+ ) is used to denote sections that definitely need to be cleaned up.
* {noformat}+*Mahout_(version)*+{noformat} (+*Mahout_0.2*+) is used to draw attention to which version of Mahout a feature was (or will be) added to Mahout.

Twitter

Apache Software Foundation

Related Projects