Home

Apache Mahout is a new Apache TLP project to create scalable, machine learning algorithms under the Apache license. It is related to other Apache Lucene projects and integrates well with Solr.

General
Community
Installation/Setup
Implementation Background

Requirements and Design
Collections and Algorithms
Utilities
Data
Benchmarks

Committer's Resources

Project Resources
Additional Resources

How To Edit This Wiki

General

Overview – Mahout? What's that supposed to be?

QuickStart – learn how to quickly setup Apache Mahout for your project.

FAQ – Frequent questions encountered on the mailing lists.

DeveloperResources – overview of the Mahout development infrastructure.

HowToContribute – get involved with the Mahout community.

HowToBecomeACommitter – become a member of the Mahout development community.

Hadoop – several of our implementations depend on Hadoop.

Machine Learning Open Source Software – other projects implementing Open Source Machine Learning libraries.

TODO

Community

Who we are – who are the developers behind Apache Mahout?

Books, Tutorials, Talks, Articles, News, etc. on Mahout

IssueTracker – see what features people are working on, submit patches and file bugs.

Source Code (SVN) – Fisheye – download the Mahout source code from svn.

Mailing lists – links to our mailing lists and archived design and algorithm discussions, maybe your questions was answered there already?

VersionControl – where we track our code.

PoweredBy – who is using Mahout in production?

Mahout and Google Summer of Code – All you need to know about Mahout and GSoC.

Machine Learning Resources – books, tutorials, talks, papers on machine learning problems.

Glossary of commonly used terms

Installation/Setup

System Requirements – what do you need to run Mahout?

QuickStart – get started with Mahout, run the examples and get pointers to further resources.

Releases – a list of Mahout releases.

Download and installation – build Mahout from the sources.

Mahout on Amazon's EC2 Service – run Mahout on Amazon's EC2.

Integrating Mahout into an Application – integrate Mahout's capabilities in your application.

Implementation Background

Requirements and Design

Matrix and Vector Needs – requirements for Mahout vectors.

Collection(De-)Serialization

Collections and Algorithms

Learn more about mahout-collections, containers for efficient storage of primitive-type data and open hash tables.

Learn more about the Algorithms discussed and employed by Mahout.

Learn more about the Mahout recommender implementation.

Utilities

This section describes tools that might be useful for working with Mahout.

Creating Vectors – Mahout's algorithms operate on vectors. Learn more on how to generate these from raw data.
Viewing Result – How to visualize the result of your trained algorithms.

Data

Collections – To try out and test Mahout's algorithms you need training data. We are always looking for new training data collections.

Benchmarks

MahoutBenchmarks

Committer's Resources

Project Resources

Additional Resources

Apache Machine Status - Check to see if SVN, other resources are available.
Committer's FAQ
Apache Dev

How To Edit This Wiki

How to edit this Wiki

This Wiki is a collaborative site, anyone can contribute and share:

Create an account by clicking the "Login" link at the top of any page, and picking a username and password.
Edit any page by pressing Edit at the top of the page

There are some conventions used on the Mahout wiki:

```
+*TODO:*+
```
(TODO: ) is used to denote sections that definitely need to be cleaned up.
```
+*Mahout_(version)*+
```
(Mahout_0.2) is used to draw attention to which version of Mahout a feature was (or will be) added to Mahout.