ApacheCon US 2008 Session

Introduction to Hadoop

Hadoop is an Apache project that provides a framework for running applications on large clusters of commodity hardware. The Hadoop framework transparently provides applications both reliability and data motion. Hadoop implements a distributed file system, similar to GFS, and MapReduce. This presentation presents the motivation and approach for Hadoop, an overview of the components and architecture, and an overview of the tools and applications built on top of Hadoop.