=== Mahout Status Report: May 2010 === (This is the first report from Mahout as a top-level Apache project; previously it was a subproject of Apache Lucene. Mahout recently reported status with Lucene's special April report. We take the opportunity to summarize Mahout state and restate recent activity.) ISSUES There are no issues requiring board attention at this time. OVERVIEW Mahout's goal is to build scalable implementations of machine learning and data mining algorithms. "Scalable" means designed with exceptional scale in mind, for efficiency and low memory consumption, and in many cases means providing Hadoop-based implementations. The "machine learning" implemented to date has been primarily in the broad areas of: - Collaborative filtering / recommender engines - Clustering - Classification - Frequent item set mining - Evolutionary algorithms CURRENT ACTIVITY Mahout has created a release approximately every six months, most recently releasing version 0.3 in March 2010. The project remains in a state of rapid change and evolution, and looks to release 0.4 in September, 2010. Recent activity in the project can be viewed here: https://issues.apache.org/jira/secure/IssueNavigator.jspa? pid=12310751&fixfor=12314396&resolution=1 This month, Mahout will complete migration of website, mailing lists, SVN, and other information to reflect its status as a top-level project. GOOGLE SUMMER OF CODE Mahout will mentor five projects as part of Google's Summer of Code program. The projects will add or enhance capability in the specific areas of: - Boltzmann Machines - Support Vector Machines - Singular Value Decomposition for recommendations - Neural network with back propagation learning - Eigencuts spectral clustering MAHOUT IN ACTION The book "Mahout in Action", published by Manning, continues to be written and is approximately half complete. It has received some favorable feedback via Manning's early access program.