Apache SAMOA is a distributed streaming machine learning (ML) framework that contains a programing abstraction for distributed streaming ML algorithms.

Apache SAMOA enables development of new ML algorithms without directly dealing with the complexity of underlying distributed stream processing engines (DSPEe, such as Apache Storm, Apache Flink, and Apache Samza). Apache SAMOA users can develop distributed streaming ML algorithms once and execute them on multiple DSPEs.

Build Apache SAMOA

Build Apache SAMOA for Apache Storm, Apache Flink, Apache Samza, or Local mode.
Getting started!

Hands-on with Apache SAMOA: Getting Started in 5 minutes!
Documentation

Learn how to use Apache SAMOA in the various different ways possible.
Wiki

Roadmap and instructions for contributors.

News

Apache Samoa was presented in Apache Big Data North America, 2016

Apache Samoa was presented in Apache Big Data Europe, 2015

Slides

G. De Francisci Morales, A. Bifet. "SAMOA: Scalable Advanced Massive Online Analysis." Journal of Machine Learning Research, 16(Jan):149−153, 2015.

API Javadoc Reference

http://samoa.incubator.apache.org/documentation/api/current

Mailing list

Development mailing list dev@samoa.incubator.apache.org
[ subscribe | unsubscribe | archives ]

Contributors

List of contributors to the SAMOA project.

Build Apache SAMOA

Apache Storm

Go to the folder where you want to store your project, and clone the new repository:

~$git clone http://git.apache.org/incubator-samoa.git

~$cd incubator-samoa

~$mvn -Pstorm package

The deployable jar for Apache SAMOA will be in target/SAMOA-Storm-0.4.0-SNAPSHOT.jar.
Apache S4 (now decommisioned after 0.4.0)

If you want to compile Apache SAMOA for S4, you will need to install the S4 dependencies manually as explained in Executing Apache SAMOA with Apache S4.

~$git clone http://git.apache.org/incubator-samoa.git

~$cd incubator-samoa

~$mvn -Ps4 package

The deployable jar for Apache SAMOA will be in target/SAMOA-S4-0.3.0-SNAPSHOT.jar.
Apache Samza

Go to the folder where you want to store your project, and clone the new repository:

~$git clone http://git.apache.org/incubator-samoa.git

~$cd incubator-samoa

~$mvn -Psamza package

The deployable jar for Apache SAMOA will be in target/SAMOA-Samza-0.4.0-SNAPSHOT.jar.
Local Test Mode

If you want to test Apache SAMOA in a local environment, simply clone the repository and install Apache SAMOA.

~$git clone http://git.apache.org/incubator-samoa.git

~$cd incubator-samoa

~$mvn package

The deployable jar for Apache SAMOA will be in target/SAMOA-Local-0.4.0-SNAPSHOT.jar.

Getting Started

Download Apache SAMOA

~$git clone http://git.apache.org/incubator-samoa.git

~$cd incubator-samoa

~$mvn package
Download the Forest CoverType dataset

If you want to compile Apache SAMOA for S4, you will need to install the S4 dependencies manually as explained in Executing Apache SAMOA with Apache S4.

~$wget "http://downloads.sourceforge.net/project/moa-datastream/Datasets/Classification/covtypeNorm.arff.zip"

~$unzip covtypeNorm.arff.zip

Forest Covertype contains the forest cover type for 30 x 30 meter cells obtained from the US Forest Service (USFS) Region 2 Resource Information System (RIS) data. It contains 581,012 instances and 54 attributes, and it has been used in several articles on data stream classification.
Run an example

Classifying the CoverType dataset with the bagging algorithm

~$bin/samoa local target/SAMOA-Local-0.4.0-SNAPSHOT.jar "PrequentialEvaluation -l classifiers.ensemble.Bagging -s (ArffFileStream -f covtypeNorm.arff) -f 100000"

The output will be a list of the evaluation results, plotted each 100,000 instances.

Apache SAMOA is an effort undergoing incubation at The Apache Software Foundation (ASF), sponsored by the Apache Incubator. Incubation is required of all newly accepted projects until a further review indicates that the infrastructure, communications, and decision making process have stabilized in a manner consistent with other successful ASF projects. While incubation status is not necessarily a reflection of the completeness or stability of the code, it does indicate that the project has yet to be fully endorsed by the ASF.

Apache and the Apache feather logo are trademarks of The Apache Software Foundation.

Apache SAMOA

Scalable Advanced Massive Online Analysis

Apache SAMOA is currently undergoing incubation at the Apache Software Foundation.
Latest source release: 0.4.0-incubating
View on GitHub.

Apache SAMOA is a distributed streaming machine learning (ML) framework that contains a programing abstraction for distributed streaming ML algorithms.

Build Apache SAMOA

Getting started!

Documentation

Wiki

News

Apache Samoa was presented in Apache Big Data North America, 2016

Apache Samoa was presented in Apache Big Data Europe, 2015

Slides

G. De Francisci Morales, A. Bifet. "SAMOA: Scalable Advanced Massive Online Analysis." Journal of Machine Learning Research, 16(Jan):149−153, 2015.

Video

NoSQL matters Conference, Barcelona 2013.

Apache SAMOA Developer's Guide

API Javadoc Reference

http://samoa.incubator.apache.org/documentation/api/current

Mailing list

Development mailing list dev@samoa.incubator.apache.org
[ subscribe | unsubscribe | archives ]

Contributors

List of contributors to the SAMOA project.

License

The use and distribution terms for this software are covered by the Apache License, Version 2.0 (http://www.apache.org/licenses/LICENSE-2.0.html).

Build Apache SAMOA

Apache Storm

Apache S4 (now decommisioned after 0.4.0)

Apache Samza

Local Test Mode

Getting Started

Download Apache SAMOA

Download the Forest CoverType dataset

Run an example

Apache and the Apache feather logo are trademarks of The Apache Software Foundation.

Apache SAMOA

Scalable Advanced Massive Online Analysis

Apache SAMOA is currently undergoing incubation at the Apache Software Foundation. Latest source release: 0.4.0-incubating View on GitHub.

Apache SAMOA is a distributed streaming machine learning (ML) framework that contains a programing abstraction for distributed streaming ML algorithms.

Build Apache SAMOA

Getting started!

Documentation

Wiki

News

Apache Samoa was presented in Apache Big Data North America, 2016

Apache Samoa was presented in Apache Big Data Europe, 2015

Slides

G. De Francisci Morales, A. Bifet. "SAMOA: Scalable Advanced Massive Online Analysis." Journal of Machine Learning Research, 16(Jan):149−153, 2015.

Video

NoSQL matters Conference, Barcelona 2013.

Apache SAMOA Developer's Guide

API Javadoc Reference

http://samoa.incubator.apache.org/documentation/api/current

Mailing list

Development mailing list dev@samoa.incubator.apache.org [ subscribe | unsubscribe | archives ]

Contributors

List of contributors to the SAMOA project.

License

The use and distribution terms for this software are covered by the Apache License, Version 2.0 (http://www.apache.org/licenses/LICENSE-2.0.html).

Build Apache SAMOA

Apache Storm

Apache S4 (now decommisioned after 0.4.0)

Apache Samza

Local Test Mode

Getting Started

Download Apache SAMOA

Download the Forest CoverType dataset

Run an example

Apache and the Apache feather logo are trademarks of The Apache Software Foundation.

Apache SAMOA is currently undergoing incubation at the Apache Software Foundation.
Latest source release: 0.4.0-incubating
View on GitHub.

Development mailing list dev@samoa.incubator.apache.org
[ subscribe | unsubscribe | archives ]