ApacheCon US 2008 Session

Programming Hadoop Map-Reduce

Apache Hadoop is a software framework for running applications on large clusters built of commodity hardware. Hadoop provides a distributed file-system and a parallel processing framework based on the Map-Reduce programming paradigm. Hadoop Map-Reduce is framework which simplifies writing efficient data-intensive (100's of terabytes) applications running on large clusters (1 to 1000's of computers). This talk will describe how to use Hadoop Map/Reduce to write (and debug!) efficient scalable applications that can process large amounts of data. It will include discussions of the Java, C++, and Unix text filter interfaces to Hadoop Map/Reduce. It will also present a bried discussion on how to use Hadoop Map-Reduce via a simple query language called Pig (http://incubator.apache.org/pig/).