Hama - a Bulk Synchronous Parallel computing framework on top of Hadoop

Apache Hama

Apache Hama is a pure BSP (Bulk Synchronous Parallel) computing framework on top of HDFS (Hadoop Distributed File System) for massive scientific computations such as matrix, graph and network algorithms.

Why Hama and BSP?

Today, many practical data processing applications require a more flexible programming abstraction model that is compatible to run on highly scalable and massive data systems (e.g., HDFS, HBase, etc). A message passing paradigm beyond Map-Reduce framework would increase its flexibility in its communication capability. Bulk Synchronous Parallel (BSP) model fills the bill appropriately. Some of its significant advantages over MapReduce and MPI are:

Supports message passing paradigm style of application development
Provides a flexible, simple, and easy-to-use small APIs
Enables to perform better than MPI for communication-intensive applications
Guarantees impossibility of deadlocks or collisions in the communication mechanisms

Getting Started

Start by installing Hama on a Hadoop cluster.

Download Hama from the release page.
Getting Started with Hama.
Launch a Hama cluster on Clouds.
Hama BSP Tutorial.
Hama Graph Tutorial.
Learn about Hama and BSP by reading the documentation.

Getting Involved

Hama is an open source volunteer project under the Apache Software Foundation. We encourage you to learn about the project and contribute your expertise. Here are some starter links:

See our How to Contribute to Hama page
Jira usage guidelines

Information

Resources

Documentation

Related Projects

Art works

ASF

Apache Hama

Recent News

Why Hama and BSP?

Getting Started

Getting Involved