MRUnit is a library designed to allow easy testing of Mapper and Reducer classes using existing tools such as JUnit. MRUnit provides mock implementations of OutputCollector and Reporter for use in calling Mapper.map() and Reducer.reduce(), as well as a set of "driver" classes that manage delivery of key/value pair inputs to tasks, and comparison of actual task outputs with expected outputs.

The primary advantage of MRUnit is that it allows you to test the outputs of individual maps and reduces, as well as the composition of the two, without needing to use the MiniMR cluster, or start a real MapReduce job in Hadoop, which are time-consuming processes.

Using MRUnit

A MapDriver or ReduceDriver instance is created for each test, as well as a fresh instance of your Mapper or Reducer. The driver is configured with the input keys and values, and expected output keys and values.

The run() method will execute the map or reduce, and returns the outputs retrieved from the OutputCollector. The runTest() method will execute the map or reduce, and compares the actual outputs with the expected outputs, and returns true to indicate success and false on failure. When expecting multiple outputs, the test drivers enforce that the order of the actual outputs is the same as the order in which outputs are configured (i.e., the order of calls to withOutput() or addOutput()).

Example

import junit.framework.TestCase; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapred.Mapper; import org.apache.hadoop.mapred.lib.IdentityMapper; import org.junit.Before; import org.junit.Test; public class TestExample extends TestCase { private Mapper mapper; private MapDriver driver; @Before public void setUp() { mapper = new IdentityMapper(); driver = new MapDriver(mapper); } @Test public void testIdentityMapper() { driver.withInput(new Text("foo"), new Text("bar")) .withOutput(new Text("foo"), new Text("bar")) .runTest(); } }

This test first instantiates the Mapper and MapDriver. It configures an input (key, value) pair consisting of the strings "foo" and "bar", and expects these same values as output. It then calls runTest() to actually invoke the mapper, and compare the actual and expected outputs. The runTest() method will throw a RuntimeException if the output is not what it expects, which causes JUnit to mark the test case as failed.

All with*() methods in MRUnit return a reference to this to allow them to be easily chained (e.g., driver.withInput(a, b).withOutput(c, d).withOutput(d, e)...). These methods are analogous to the more conventional setInput(), addOutput(), etc. methods, which are also included.

Further examples of MRUnit usage can be seen in its own test/ directory. The above example is in org.apache.hadoop.mrunit.TestExample. Further "tests" of the IdentityMapper are used to test the correctness of MRUnit itself; org.apache.hadoop.mrunit.TestMapDriver includes several tests of correctness for the MapDriver class; the testRunTest*() methods show how to apply the MapDriver to the IdentityMapper to confirm behavior surrounding both correct and incorrect input/output data sets. The testRunTest*() methods in org.apache.hadoop.mrunit.TestReduceDriver show how to apply the ReduceDriver test component to the LongSumReducer class.