Apache Mahout is a new Apache TLP project to create scalable, machine
learning algorithms under the Apache license.
{toc:style=disc |
minlevel=2} |
General
Overview
– Mahout? What’s that supposed to be?
Quickstart
– learn how to quickly setup Apache Mahout for your project.
FAQ
– Frequent questions encountered on the mailing lists.
Developer Resources
– overview of the Mahout development infrastructure.
How To Contribute
– get involved with the Mahout community.
How To Become A Committer
– become a member of the Mahout development community.
Hadoop
– several of our implementations depend on Hadoop.
Machine Learning Open Source Software
– other projects implementing Open Source Machine Learning libraries.
Mahout – The name, history and its pronunciation
Who we are
– who are the developers behind Apache Mahout?
Books, Tutorials, Talks, Articles, News, Background Reading, etc. on Mahout
Issue Tracker
– see what features people are working on, submit patches and file bugs.
Source Code (SVN)
– [Fisheye|http://fisheye6.atlassian.com/browse/mahout]
– download the Mahout source code from svn.
Mailing lists and IRC
– links to our mailing lists, IRC channel and archived design and
algorithm discussions, maybe your questions was answered there already?
Version Control
– where we track our code.
Powered By Mahout
– who is using Mahout in production?
Professional Support
– who is offering professional support for Mahout?
Mahout and Google Summer of Code
– All you need to know about Mahout and GSoC.
Glossary of commonly used terms and abbreviations
Installation/Setup
System Requirements
– what do you need to run Mahout?
Quickstart
– get started with Mahout, run the examples and get pointers to further
resources.
Downloads
– a list of Mahout releases.
Download and installation
– build Mahout from the sources.
Mahout on Amazon’s EC2 Service
– run Mahout on Amazon’s EC2.
Mahout on Amazon’s EMR
– Run Mahout on Amazon’s Elastic Map Reduce
Integrating Mahout into an Application
– integrate Mahout’s capabilities in your application.
Examples
- ASF Email Examples
– Examples of recommenders, clustering and classification all using a
public domain collection of 7 million emails.
Implementation Background
Requirements and Design
Matrix and Vector Needs
– requirements for Mahout vectors.
Collection(De-)Serialization
Collections and Algorithms
Learn more about mahout-collections
, containers for efficient storage of primitive-type data and open hash
tables.
Learn more about the Algorithms
discussed and employed by Mahout.
Learn more about the Mahout recommender implementation
.
Utilities
This section describes tools that might be useful for working with Mahout.
Converting Content
– Mahout has some utilities for converting content such as logs to
formats more amenable for consumption by Mahout.
Creating Vectors
– Mahout’s algorithms operate on vectors. Learn more on how to generate
these from raw data.
Viewing Result
– How to visualize the result of your trained algorithms.
Data
Collections
– To try out and test Mahout’s algorithms you need training data. We are
always looking for new training data collections.
Benchmarks
Mahout Benchmarks
Committer’s Resources
- Testing
– Information on test plans and ideas for testing
Project Resources
Additional Resources
How To Edit This Wiki
How to edit this Wiki
This Wiki is a collaborative site, anyone can contribute and share:
- Create an account by clicking the “Login” link at the top of any page,
and picking a username and password.
- Edit any page by pressing Edit at the top of the page
There are some conventions used on the Mahout wiki:
* {noformat}+*TODO:*+{noformat} (+*TODO:*+ ) is used to denote sections that definitely need to be cleaned up.
* {noformat}+*Mahout_(version)*+{noformat} (+*Mahout_0.2*+) is used to draw attention to which version of Mahout a feature was (or will be) added to Mahout.