Welcome to Apache Hadoop!

What Is Hadoop?
Who Uses Hadoop?
News

What Is Hadoop?

The Apache Hadoop project develops open-source software for reliable, scalable, distributed computing. Hadoop includes these subprojects:

Hadoop Common: The common utilities that support the other Hadoop subprojects.
Avro: A data serialization system that provides dynamic integration with scripting languages.
Chukwa: A data collection system for managing large distributed systems.
HBase: A scalable, distributed database that supports structured data storage for large tables.
HDFS: A distributed file system that provides high throughput access to application data.
Hive: A data warehouse infrastructure that provides data summarization and ad hoc querying.
MapReduce: A software framework for distributed processing of large data sets on compute clusters.
Pig: A high-level data-flow language and execution framework for parallel computation.
ZooKeeper: A high-performance coordination service for distributed applications.

Who Uses Hadoop?

A wide variety of companies and organizations use Hadoop for both research and production. Users are encouraged to add themselves to the Hadoop PoweredBy wiki page.

News

July 2009 - New Hadoop Subprojects

Hadoop is getting bigger!

Hadoop Core is renamed Hadoop Common.
MapReduce and the Hadoop Distributed File System (HDFS) are now separate subprojects.
Avro and Chukwa are new Hadoop subprojects.

See the summary descriptions for all subprojects above. Visit the individual sites for more detailed information.

March 2009 - ApacheCon EU

In case you missed it.... ApacheCon Europe 2009

November 2008 - ApacheCon US

In case you missed it.... ApacheCon US 2008

July 2008 - Hadoop Wins Terabyte Sort Benchmark

Hadoop Wins Terabyte Sort Benchmark: One of Yahoo's Hadoop clusters sorted 1 terabyte of data in 209 seconds, which beat the previous record of 297 seconds in the annual general purpose (Daytona) terabyte sort benchmark. This is the first time that either a Java or an open source program has won.