Chapter 7. HBase and MapReduce

Table of Contents

7.1. Map-Task Spitting
7.1.1. The Default HBase MapReduce Splitter
7.1.2. Custom Splitters
7.2. HBase MapReduce Examples
7.2.1. HBase MapReduce Read Example
7.2.2. HBase MapReduce Read/Write Example
7.2.3. HBase MapReduce Read/Write Example With Multi-Table Output
7.2.4. HBase MapReduce Summary to HBase Example
7.2.5. HBase MapReduce Summary to File Example
7.2.6. HBase MapReduce Summary to HBase Without Reducer
7.2.7. HBase MapReduce Summary to RDBMS
7.3. Accessing Other HBase Tables in a MapReduce Job
7.4. Speculative Execution

See HBase and MapReduce up in javadocs. Start there. Below is some additional help.

For more information about MapReduce (i.e., the framework in general), see the Hadoop MapReduce Tutorial.

7.1. Map-Task Spitting

7.1.1. The Default HBase MapReduce Splitter

When TableInputFormat is used to source an HBase table in a MapReduce job, its splitter will make a map task for each region of the table. Thus, if there are 100 regions in the table, there will be 100 map-tasks for the job - regardless of how many column families are selected in the Scan.

7.1.2. Custom Splitters

For those interested in implementing custom splitters, see the method getSplits in TableInputFormatBase. That is where the logic for map-task assignment resides.

comments powered by Disqus