Introduction

Apache Ambari is a web-based tool for installing, managing, and monitoring Apache Hadoop clusters. The set of Hadoop components that are currently supported by Ambari includes:

Ambari's primary audience is system administrators responsible for managing Hadoop clusters.

Ambari allows them to:

  • Easily Install a Hadoop Cluster
    • Ambari provides an easy-to-use, step-by-step wizard for installing Hadoop services across any number of nodes.
    • Ambari leverages Puppet to perform installation and configuration of Hadoop services for the cluster.
  • Manage a Hadoop Cluster
    • Ambari provides central management for starting, stopping, and reconfiguring Hadoop services across the entire cluster.
  • Monitor a Hadoop Cluster
    • Ambari provides a dashboard for monitoring health and status of the Hadoop cluster. Ambari leverages Ganglia to collect system metrics.
    • Ambari sends email alerts when your attention is needed (e.g., a node goes down, remaining disk space is low, etc). Ambari leverages Nagios to monitor and trigger alerts.

In the near future, Ambari will allow third-party tool developers to integrate Hadoop cluster management and monitoring capabilities via its RESTful interface.

Roadmap

  • Support for Hadoop Security
  • Support for various operating systems
    • Ambari currently supports 64-bit RHEL/CentOS 5.* and 6.*
    • Support for other operating systems are being worked on (SLES 11.* support will be coming soon)
  • RESTful API for third-party integration
    • Ambari will expose a unified, RESTful API to enable third-party applications to integrate Hadoop cluster management and monitoring capabilities. This is an area of active development. We will publish the API docs soon.
  • Granular configurations
    • Ambari currently applies configurations at the cluster-level. To allow for more flexibility, Ambari will allow for configurations in a more granular manner (e.g., apply a set of configurations to a specific group of nodes, etc.)
  • Security
    • Easy installation of secure Hadoop clusters (Kerberos-based)
    • Role-based user authentication, authorization, and auditing
    • Support for LDAP and Active Directory
  • Visualization
    • Interactive visualization of current and historical states of the cluster for a number of key metrics
    • Interactive visualization of Pig, Hive, and MapReduce jobs