Apache Hama

Apache Hama is a pure BSP (Bulk Synchronous Parallel) computing framework on top of HDFS (Hadoop Distributed File System) for massive scientific computations such as matrix, graph and network algorithms.

Recent News

  • June 31, 2012: release 0.5.0 available [downloads]
  • May 17, 2012: Apache Hama graduated as a Top Level Project!
  • Feb 5, 2012: release 0.4.0 available
  • July 28, 2011: release 0.3.0 available
  • June 2, 2011: release 0.2.0 available
  • Apr 30, 2010: Introduced in the BSP Worldwide

Why Hama and BSP?

Today, many practical data processing applications require a more flexible programming abstraction model that is compatible to run on highly scalable and massive data systems (e.g., HDFS, HBase, etc). A message passing paradigm beyond Map-Reduce framework would increase its flexibility in its communication capability. Bulk Synchronous Parallel (BSP) model fills the bill appropriately. Some of its significant advantages over MapReduce and MPI are:

  • Supports message passing paradigm style of application development
  • Provides a flexible, simple, and easy-to-use small APIs
  • Enables to perform better than MPI for communication-intensive applications
  • Guarantees impossibility of deadlocks or collisions in the communication mechanisms

Getting Started

Start by installing Hama on a Hadoop cluster.

Getting Involved

Hama is an open source volunteer project under the Apache Software Foundation. We encourage you to learn about the project and contribute your expertise. Here are some starter links: