Apache > Lucene > Hadoop
 

Welcome to Hadoop!

Hadoop is a software platform that lets one easily write and run applications that process vast amounts of data.

Here's what makes Hadoop especially useful:

Hadoop implements MapReduce, using the Hadoop Distributed File System (HDFS) (see figure below.) MapReduce divides applications into many small blocks of work. HDFS creates multiple replicas of data blocks for reliability, placing them on compute nodes around the cluster. MapReduce can then process the data where it is located.

Hadoop has been demonstrated on clusters with 2000 nodes. The current design target is 10,000 node clusters.

Hadoop is a Lucene sub-project that contains the distributed computing platform that was formerly a part of Nutch.

For more information about Hadoop, please see the Hadoop wiki.

architecture

Getting Started

The Hadoop project plans to scale Hadoop up to handling thousands of computers. However, to begin with you can start by installing in on a single machine or a very small cluster.

  1. Learn about Hadoop by reading the documentation.
  2. Download Hadoop from the release page.
  3. Hadoop Quickstart.
  4. Hadoop Cluster Setup.
  5. Discuss it on the mailing list.

Getting Involved

Hadoop is an open source volunteer project under the Apache Software Foundation. We encourage you to learn about the project and contribute your expertise. Here are some starter links:

  1. See our How to Contribute to Hadoop page.
  2. Give us feedback: What can we do better?
  3. Join the mailing list: Meet the community.