Hadoop 0.23.5 Release Notes

These release notes include new developer and user-facing incompatibilities, features, and major improvements.

Changes since Hadoop 0.23.4

YARN-216. Major improvement reported by Todd Lipcon and fixed by Robert Joseph Evans
Remove jquery theming support

As of today we have 9.4MB of JQuery themes in our code tree. In addition to being a waste of space, it's a highly questionable feature. I've never heard anyone complain that the Hadoop interface isn't themeable enough, and there's far more value in consistency across installations than there is in themeability. Let's rip it out.
YARN-214. Major bug reported by Jason Lowe and fixed by Jonathan Eagles (resourcemanager)
RMContainerImpl does not handle event EXPIRE at state RUNNING

RMContainerImpl has a race condition where a container can enter the RUNNING state just as the container expires. This results in an invalid event transition error: {noformat} 2012-11-11 05:31:38,954 [ResourceManager Event Processor] ERROR org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: Can't handle this event at current state org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: EXPIRE at RUNNING at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443) at org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:205) at org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:44) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApp.containerCompleted(SchedulerApp.java:203) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.completedContainer(LeafQueue.java:1337) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.completedContainer(CapacityScheduler.java:739) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:659) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:80) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:340) at java.lang.Thread.run(Thread.java:619) {noformat} EXPIRE needs to be handled (well at least ignored) in the RUNNING state to account for this race condition.
YARN-212. Blocker bug reported by Nathan Roberts and fixed by Nathan Roberts (nodemanager)
NM state machine ignores an APPLICATION_CONTAINER_FINISHED event when it shouldn't

The NM state machines can make the following two invalid state transitions when a speculative attempt is killed shortly after it gets started. When this happens the NM keeps the log aggregation context open for this application and therefore chews up FDs and leases on the NN, eventually running the NN out of FDs and bringing down the entire cluster. 2012-11-07 05:36:33,774 [AsyncDispatcher event handler] WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Can't handle this event at current state org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: APPLICATION_CONTAINER_FINISHED at INITING 2012-11-07 05:36:33,775 [AsyncDispatcher event handler] WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Can't handle this event at current state: Current: [DONE], eventType: [INIT_CONTAINER] org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: INIT_CONTAINER at DONE
YARN-206. Major bug reported by Jason Lowe and fixed by Jason Lowe (resourcemanager)
TestApplicationCleanup.testContainerCleanup occasionally fails

testContainerCleanup is occasionally failing with the error: testContainerCleanup(org.apache.hadoop.yarn.server.resourcemanager.TestApplicationCleanup): expected:<2> but was:<1>
YARN-202. Critical bug reported by Kihwal Lee and fixed by Kihwal Lee
Log Aggregation generates a storm of fsync() for namenode

When the log aggregation is on, write to each aggregated container log causes hflush() to be called. For large clusters, this can creates a lot of fsync() calls for namenode. We have seen 6-7x increase in the average number of fsync operations compared to 1.0.x on a large busy cluster. Over 99% of fsync ops were for log aggregation writing to tmp files.
YARN-201. Critical bug reported by Jason Lowe and fixed by Jason Lowe (capacityscheduler)
CapacityScheduler can take a very long time to schedule containers if requests are off cluster

When a user runs a job where one of the input files is a large file on another cluster, the job can create many splits on nodes which are unreachable for computation from the current cluster. The off-switch delay logic in LeafQueue can cause the ResourceManager to allocate containers for the job very slowly. In one case the job was only getting one container every 23 seconds, and the queue had plenty of spare capacity.
YARN-189. Blocker bug reported by Thomas Graves and fixed by Thomas Graves (resourcemanager)
deadlock in RM - AMResponse object

we ran into a deadlock in the RM. ============================= "1128743461@qtp-1252749669-5201": waiting for ownable synchronizer 0x00002aabbc87b960, (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync), which is held by "AsyncDispatcher event handler" "AsyncDispatcher event handler": waiting to lock monitor 0x00002ab0bba3a370 (object 0x00002aab3d4cd698, a org.apache.hadoop.yarn.api.records.impl.pb.AMResponsePBImpl), which is held by "IPC Server handler 36 on 8030" "IPC Server handler 36 on 8030": waiting for ownable synchronizer 0x00002aabbc87b960, (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync), which is held by "AsyncDispatcher event handler" Java stack information for the threads listed above: =================================================== "1128743461@qtp-1252749669-5201": at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x00002aabbc87b960> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811) at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:941) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1261) at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:594) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.getFinalApplicationStatus(RMAppAttemptImpl.java:2 95) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.getFinalApplicationStatus(RMAppImpl.java:222) at org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices.getApps(RMWebServices.java:328) at sun.reflect.GeneratedMethodAccessor41.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60) at com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$TypeOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:185) at com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaM ... ... .. "AsyncDispatcher event handler": at org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.unregisterAttempt(ApplicationMasterService.java:307) - waiting to lock <0x00002aab3d4cd698> (a org.apache.hadoop.yarn.api.records.impl.pb.AMResponsePBImpl) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$BaseFinalTransition.transition(RMAppAttemptImpl.java:647) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$FinalTransition.transition(RMAppAttemptImpl.java:809) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$FinalTransition.transition(RMAppAttemptImpl.java:796) at org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:357) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:298) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443) - locked <0x00002aabbb673090> (a org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:478) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:81) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:436) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:417) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75) at java.lang.Thread.run(Thread.java:619) "IPC Server handler 36 on 8030": at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x00002aabbc87b960> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:842) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1178) at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:807) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.pullJustFinishedContainers(RMAppAttemptImpl.java:437) at org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:285) - locked <0x00002aab3d4cd698> (a org.apache.hadoop.yarn.api.records.impl.pb.AMResponsePBImpl) at org.apache.hadoop.yarn.api.impl.pb.service.AMRMProtocolPBServiceImpl.allocate(AMRMProtocolPBServiceImpl.java:56) at org.apache.hadoop.yarn.proto.AMRMProtocol$AMRMProtocolService$2.callBlockingMethod(AMRMProtocol.java:87) at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Server.call(ProtoOverHadoopRpcEngine.java:353) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1528) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1524) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1212) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1522)
YARN-188. Major test reported by Aleksey Gorshkov and fixed by Aleksey Gorshkov (capacityscheduler)
Coverage fixing for CapacityScheduler

some tests for CapacityScheduler YARN-188-branch-0.23.patch patch for branch 0.23 YARN-188-branch-2.patch patch for branch 2 YARN-188-trunk.patch patch for trunk
YARN-186. Major test reported by Aleksey Gorshkov and fixed by Aleksey Gorshkov (resourcemanager , scheduler)
Coverage fixing LinuxContainerExecutor

Added some tests for LinuxContainerExecuror YARN-186-branch-0.23.patch patch for branch-0.23 YARN-186-branch-2.patch patch for branch-2 ARN-186-trunk.patch patch for trank
YARN-180. Critical bug reported by Thomas Graves and fixed by Arun C Murthy (capacityscheduler)
Capacity scheduler - containers that get reserved create container token to early

The capacity scheduler has the ability to 'reserve' containers. Unfortunately before it decides that it goes to reserved rather then assigned, the Container object is created which creates a container token that expires in roughly 10 minutes by default. This means that by the time the NM frees up enough space on that node for the container to move to assigned the container token may have expired.
YARN-178. Critical bug reported by Radim Kolar and fixed by Radim Kolar
Fix custom ProcessTree instance creation

1. In current pluggable resourcecalculatorprocesstree is not passed root process id to custom implementation making it unusable. 2. pstree do not extend Configured as it should Added constructor with pid argument with testsuite. Also added test that pstree is correctly configured.
YARN-177. Critical bug reported by Thomas Graves and fixed by Arun C Murthy (capacityscheduler)
CapacityScheduler - adding a queue while the RM is running has wacky results

Adding a queue to the capacity scheduler while the RM is running and then running a job in the queue added results in very strange behavior. The cluster Total Memory can either decrease or increase. We had a cluster where total memory decreased to almost 1/6th the capacity. Running on a small test cluster resulted in the capacity going up by simply adding a queue and running wordcount. Looking at the RM logs, used memory can go negative but other logs show the number positive: 2012-10-21 22:56:44,796 [ResourceManager Event Processor] INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: assignedContainer queue=root usedCapacity=0.0375 absoluteUsedCapacity=0.0375 used=memory: 7680 cluster=memory: 204800 2012-10-21 22:56:45,831 [ResourceManager Event Processor] INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: completedContainer queue=root usedCapacity=-0.0225 absoluteUsedCapacity=-0.0225 used=memory: -4608 cluster=memory: 204800
YARN-174. Major bug reported by Robert Joseph Evans and fixed by Vinod Kumar Vavilapalli (nodemanager)
TestNodeStatusUpdater is failing in trunk

{noformat} 2012-10-19 12:18:23,941 FATAL [Node Status Updater] nodemanager.NodeManager (NodeManager.java:initAndStartNodeManager(277)) - Error starting NodeManager org.apache.hadoop.yarn.YarnException: ${yarn.log.dir}/userlogs is not a valid path. Path should be with file scheme or without scheme at org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService.validatePaths(LocalDirsHandlerService.java:321) at org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService$MonitoringTimerTask.<init>(LocalDirsHandlerService.java:95) at org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService.init(LocalDirsHandlerService.java:123) at org.apache.hadoop.yarn.service.CompositeService.init(CompositeService.java:58) at org.apache.hadoop.yarn.server.nodemanager.NodeHealthCheckerService.init(NodeHealthCheckerService.java:48) at org.apache.hadoop.yarn.service.CompositeService.init(CompositeService.java:58) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.init(NodeManager.java:165) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:274) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.stateChanged(NodeManager.java:256) at org.apache.hadoop.yarn.service.AbstractService.changeState(AbstractService.java:163) at org.apache.hadoop.yarn.service.AbstractService.stop(AbstractService.java:112) at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.stop(NodeStatusUpdaterImpl.java:149) at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.reboot(NodeStatusUpdaterImpl.java:157) at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.access$900(NodeStatusUpdaterImpl.java:63) at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl$1.run(NodeStatusUpdaterImpl.java:357) {noformat} The NM then calls System.exit(-1), which makes the unit test exit and produces an error that is hard to track down.
YARN-166. Major bug reported by Thomas Graves and fixed by Thomas Graves (capacityscheduler)
capacity scheduler doesn't allow capacity < 1.0

1.x supports queue capacity < 1, but in 0.23 the capacity scheduler doesn't. This is an issue for us since we have a large cluster running 1.x that currently has a queue with capacity 0.5%.
YARN-165. Blocker improvement reported by Jason Lowe and fixed by Jason Lowe (resourcemanager)
RM should point tracking URL to RM web page for app when AM fails

Currently when an ApplicationMaster fails the ResourceManager is updating the tracking URL to an empty string, see RMAppAttemptImpl.ContainerFinishedTransition. Unfortunately when the client attempts to follow the proxy URL it results in a web page showing an HTTP 500 error and an ugly backtrace because "http://" isn't a very helpful tracking URL. It would be much more helpful if the proxy URL redirected to the RM webapp page for the specific application. That page shows the various AM attempts and pointers to their logs which will be useful for debugging the problems that caused the AM attempts to fail.
YARN-163. Major bug reported by Jason Lowe and fixed by Jason Lowe (nodemanager)
Retrieving container log via NM webapp can hang with multibyte characters in log

ContainerLogsBlock.printLogs currently assumes that skipping N bytes in the log file is the same as skipping N characters, but that is not true when the log contains multibyte characters. This can cause the loop that skips a portion of the log to try to skip past the end of the file and loop forever (or until Jetty kills the worker thread).
YARN-161. Major bug reported by Chris Nauroth and fixed by Chris Nauroth (api)
Yarn Common has multiple compiler warnings for unchecked operations

The warnings are in classes StateMachineFactory, RecordFactoryProvider, RpcFactoryProvider, and YarnRemoteExceptionFactoryProvider. OpenJDK 1.6.0_24 actually treats these as compilation errors, causing the build to fail.
YARN-159. Major bug reported by Thomas Graves and fixed by Thomas Graves (resourcemanager)
RM web ui applications page should be sorted to display last app first

RM web ui applications page should be sorted to display last app first. It currently sorts with smallest application id first, which is the first apps that were submitted. After you have one page worth of apps its much more useful for it to sort such that the biggest appid (last submitted app) shows up first.
YARN-151. Major bug reported by Robert Joseph Evans and fixed by Ravi Prakash
Browser thinks RM main page JS is taking too long

The main RM page with the default settings of 10,000 applications can cause browsers to think that the JS on the page is stuck and ask you if you want to kill it. This is a big usability problem.
YARN-144. Major bug reported by Robert Parker and fixed by Robert Parker
MiniMRYarnCluster launches RM and JHS on default ports

MAPREDUCE-3867, MAPREDUCE-3869, and MAPREDUCE-4406 need to be combined and applied to branch-0.23
YARN-139. Major bug reported by Nathan Roberts and fixed by Vinod Kumar Vavilapalli (api)
Interrupted Exception within AsyncDispatcher leads to user confusion

Successful applications tend to get InterruptedExceptions during shutdown. The exception is harmless but it leads to lots of user confusion and therefore could be cleaned up. 2012-09-28 14:50:12,477 WARN [AsyncDispatcher event handler] org.apache.hadoop.yarn.event.AsyncDispatcher: Interrupted Exception while stopping java.lang.InterruptedException at java.lang.Object.wait(Native Method) at java.lang.Thread.join(Thread.java:1143) at java.lang.Thread.join(Thread.java:1196) at org.apache.hadoop.yarn.event.AsyncDispatcher.stop(AsyncDispatcher.java:105) at org.apache.hadoop.yarn.service.CompositeService.stop(CompositeService.java:99) at org.apache.hadoop.yarn.service.CompositeService.stop(CompositeService.java:89) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobFinishEventHandler.handle(MRAppMaster.java:437) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobFinishEventHandler.handle(MRAppMaster.java:402) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75) at java.lang.Thread.run(Thread.java:619) 2012-09-28 14:50:12,477 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.service.AbstractService: Service:Dispatcher is stopped. 2012-09-28 14:50:12,477 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.service.AbstractService: Service:org.apache.hadoop.mapreduce.v2.app.MRAppMaster is stopped. 2012-09-28 14:50:12,477 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Exiting MR AppMaster..GoodBye
YARN-131. Major bug reported by Ahmed Radwan and fixed by Ahmed Radwan (capacityscheduler)
Incorrect ACL properties in capacity scheduler documentation

The CapacityScheduler apt file incorrectly specifies the property names controlling acls for application submission and queue administration. {{yarn.scheduler.capacity.root.<queue-path>.acl_submit_jobs}} should be {{yarn.scheduler.capacity.root.<queue-path>.acl_submit_applications}} {{yarn.scheduler.capacity.root.<queue-path>.acl_administer_jobs}} should be {{yarn.scheduler.capacity.root.<queue-path>.acl_administer_queue}} Uploading a patch momentarily.
YARN-116. Major bug reported by xieguiming and fixed by xieguiming (resourcemanager)
RM is missing ability to add include/exclude files without a restart

The "yarn.resourcemanager.nodes.include-path" default value is "", if we need to add an include file, we must currently restart the RM. I suggest that for adding an include or exclude file, there should be no need to restart the RM. We may only execute the refresh command. The HDFS NameNode already has this ability. Fix is to the modify HostsFileReader class instances: From: {code} public HostsFileReader(String inFile, String exFile) {code} To: {code} public HostsFileReader(Configuration conf, String NODES_INCLUDE_FILE_PATH,String DEFAULT_NODES_INCLUDE_FILE_PATH, String NODES_EXCLUDE_FILE_PATH,String DEFAULT_NODES_EXCLUDE_FILE_PATH) {code} And thus, we can read the config file dynamically when a {{refreshNodes}} is invoked and therefore have no need to restart the ResourceManager.
YARN-102. Trivial bug reported by Devaraj K and fixed by Devaraj K (resourcemanager)
Move the apache licence header to the top of the file in MemStore.java
YARN-43. Major bug reported by Thomas Graves and fixed by Thomas Graves
TestResourceTrackerService fail intermittently on jdk7

Tests run: 6, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 1.73 sec <<< FAILURE! testDecommissionWithIncludeHosts(org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService) Time elapsed: 0.086 sec <<< FAILURE! junit.framework.AssertionFailedError: expected:<0> but was:<1> at junit.framework.Assert.fail(Assert.java:47) at junit.framework.Assert.failNotEquals(Assert.java:283) at junit.framework.Assert.assertEquals(Assert.java:64) at junit.framework.Assert.assertEquals(Assert.java:195) at junit.framework.Assert.assertEquals(Assert.java:201) at org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService.testDecommissionWithIncludeHosts(TestResourceTrackerService.java:90)
YARN-32. Major bug reported by Thomas Graves and fixed by Vinod Kumar Vavilapalli
TestApplicationTokens fails intermintently on jdk7

TestApplicationsTokens fails intermintently on jdk7.
YARN-30. Major bug reported by Thomas Graves and fixed by Thomas Graves
TestNMWebServicesApps, TestRMWebServicesApps and TestRMWebServicesNodes fail on jdk7

It looks like the string changed from "const class" to "constant". Tests run: 19, Failures: 3, Errors: 0, Skipped: 0, Time elapsed: 6.786 sec <<< FAILURE! testNodeAppsStateInvalid(org.apache.hadoop.yarn.server.nodemanager.webapp.TestNMWebServicesApps) Time elapsed: 0.248 sec <<< FAILURE! java.lang.AssertionError: exception message doesn't match, got: No enum constant org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationState.FOO_STATE expected: No enum const class org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationState.FOO_STATE
YARN-28. Major bug reported by Thomas Graves and fixed by Thomas Graves
TestCompositeService fails on jdk7

test TestCompositeService fails when run with jdk7. It appears it expects test testCallSequence to be called first and the sequence numbers to start at 0. On jdk7 its not being called first and sequence number has already been incremented.
MAPREDUCE-4802. Major improvement reported by Ravi Prakash and fixed by Ravi Prakash (mr-am , mrv2 , webapps)
Takes a long time to load the task list on the AM for large jobs
MAPREDUCE-4801. Critical bug reported by Jason Lowe and fixed by Jason Lowe
ShuffleHandler can generate large logs due to prematurely closed channels
MAPREDUCE-4797. Major bug reported by Jason Lowe and fixed by Jason Lowe (applicationmaster)
LocalContainerAllocator can loop forever trying to contact the RM
MAPREDUCE-4787. Major bug reported by Ravi Prakash and fixed by Robert Parker (test)
TestJobMonitorAndPrint is broken
MAPREDUCE-4786. Major bug reported by Ravi Prakash and fixed by Ravi Prakash (mrv2)
Job End Notification retry interval is 5 milliseconds by default
MAPREDUCE-4782. Blocker bug reported by Mark Fuhs and fixed by Mark Fuhs (client)
NLineInputFormat skips first line of last InputSplit
MAPREDUCE-4774. Major bug reported by Ivan A. Veselovsky and fixed by Jason Lowe (applicationmaster , mrv2)
JobImpl does not handle asynchronous task events in FAILED state
MAPREDUCE-4772. Critical bug reported by Robert Joseph Evans and fixed by Robert Joseph Evans (mrv2)
Fetch failures can take way too long for a map to be restarted
MAPREDUCE-4771. Major bug reported by Jason Lowe and fixed by Jason Lowe (mrv2)
KeyFieldBasedPartitioner not partitioning properly when configured
MAPREDUCE-4763. Minor improvement reported by Ivan A. Veselovsky and fixed by
repair test org.apache.hadoop.mapreduce.security.TestUmbilicalProtocolWithJobToken
MAPREDUCE-4752. Major improvement reported by Robert Joseph Evans and fixed by Robert Joseph Evans (mrv2)
Reduce MR AM memory usage through String Interning
MAPREDUCE-4751. Major bug reported by Ravi Prakash and fixed by Vinod Kumar Vavilapalli
AM stuck in KILL_WAIT for days
MAPREDUCE-4748. Blocker bug reported by Robert Joseph Evans and fixed by Jason Lowe (mrv2)
Invalid event: T_ATTEMPT_SUCCEEDED at SUCCEEDED
MAPREDUCE-4746. Major bug reported by Robert Parker and fixed by Robert Parker (applicationmaster)
The MR Application Master does not have a config to set environment variables
MAPREDUCE-4741. Minor bug reported by Jason Lowe and fixed by Vinod Kumar Vavilapalli (applicationmaster , mrv2)
WARN and ERROR messages logged during normal AM shutdown
MAPREDUCE-4740. Blocker bug reported by Robert Joseph Evans and fixed by Robert Joseph Evans (mrv2)
only .jars can be added to the Distributed Cache classpath
MAPREDUCE-4733. Major bug reported by Jason Lowe and fixed by Jason Lowe (applicationmaster , mrv2)
Reducer can fail to make progress during shuffle if too many reducers complete consecutively
MAPREDUCE-4730. Blocker bug reported by Jason Lowe and fixed by Jason Lowe (applicationmaster , mrv2)
AM crashes due to OOM while serving up map task completion events
MAPREDUCE-4729. Major bug reported by Thomas Graves and fixed by Vinod Kumar Vavilapalli (jobhistoryserver)
job history UI not showing all job attempts
MAPREDUCE-4724. Major bug reported by Thomas Graves and fixed by Thomas Graves (jobhistoryserver)
job history web ui applications page should be sorted to display last app first
MAPREDUCE-4721. Major bug reported by Ravi Prakash and fixed by Ravi Prakash (jobhistoryserver)
Task startup time in JHS is same as job startup time.
MAPREDUCE-4720. Major bug reported by Robert Joseph Evans and fixed by Ravi Prakash
Browser thinks History Server main page JS is taking too long
MAPREDUCE-4705. Critical bug reported by Jason Lowe and fixed by Jason Lowe (jobhistoryserver , mrv2)
Historyserver links expire before the history data does
MAPREDUCE-4674. Minor bug reported by Robert Justice and fixed by Robert Justice
Hadoop examples secondarysort has a typo "secondarysrot" in the usage
MAPREDUCE-4666. Minor improvement reported by Jason Lowe and fixed by Jason Lowe (jobhistoryserver)
JVM metrics for history server
MAPREDUCE-4596. Major task reported by Siddharth Seth and fixed by Siddharth Seth (applicationmaster , mrv2)
Split StateMachine state from states seen by MRClientProtocol (for Job, Task, TaskAttempt)
MAPREDUCE-4554. Major bug reported by Benoy Antony and fixed by Benoy Antony (job submission , security)
Job Credentials are not transmitted if security is turned off
MAPREDUCE-4549. Critical bug reported by Robert Joseph Evans and fixed by Robert Joseph Evans (mrv2)
Distributed cache conflicts breaks backwards compatability
MAPREDUCE-4521. Major bug reported by Jason Lowe and fixed by Ravi Prakash (mrv2)
mapreduce.user.classpath.first incompatibility with 0.20/1.x
MAPREDUCE-4517. Minor improvement reported by James Kinley and fixed by Jason Lowe (applicationmaster)
Too many INFO messages written out during AM to RM heartbeat
MAPREDUCE-4479. Major bug reported by Mariappan Asokan and fixed by Mariappan Asokan (test)
Fix parameter order in assertEquals() in TestCombineInputFileFormat.java
MAPREDUCE-4425. Critical bug reported by Siddharth Seth and fixed by Jason Lowe (mrv2)
Speculation + Fetch failures can lead to a hung job
MAPREDUCE-4266. Major task reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur (build)
remove Ant remnants from MR
MAPREDUCE-4229. Major improvement reported by Todd Lipcon and fixed by Miomir Boljanovic (jobtracker)
Counter names' memory usage can be decreased by interning
MAPREDUCE-4107. Major sub-task reported by Devaraj K and fixed by Devaraj K (mrv2)
Fix tests in org.apache.hadoop.ipc.TestSocketFactory
MAPREDUCE-1806. Major bug reported by Paul Yang and fixed by Gera Shegalov (harchive)
CombineFileInputFormat does not work with paths not on default FS
HDFS-4186. Critical bug reported by Kihwal Lee and fixed by Kihwal Lee (name-node)
logSync() is called with the write lock held while releasing lease
HDFS-4182. Critical bug reported by Todd Lipcon and fixed by Robert Joseph Evans (name-node)
SecondaryNameNode leaks NameCache entries
HDFS-4181. Critical bug reported by Kihwal Lee and fixed by Kihwal Lee (name-node)
LeaseManager tries to double remove and prints extra messages
HDFS-4172. Minor bug reported by Derek Dagit and fixed by Derek Dagit (name-node)
namenode does not URI-encode parameters when building URI for datanode request
HDFS-4162. Minor bug reported by Derek Dagit and fixed by Derek Dagit (data-node)
Some malformed and unquoted HTML strings are returned from datanode web ui
HDFS-4090. Critical bug reported by Kihwal Lee and fixed by Kihwal Lee (hdfs client)
getFileChecksum() result incompatible when called against zero-byte files.
HDFS-4080. Major bug reported by Kihwal Lee and fixed by Kihwal Lee (name-node)
Add a separate logger for block state change logs to enable turning off those logs
HDFS-4075. Critical bug reported by Kihwal Lee and fixed by Kihwal Lee (name-node)
Reduce recommissioning overhead
HDFS-4016. Minor bug reported by Ivan A. Veselovsky and fixed by Ivan A. Veselovsky
back-port HDFS-3582 to branch-0.23
HDFS-3996. Minor bug reported by Eli Collins and fixed by Eli Collins
Add debug log removed in HDFS-3873 back
HDFS-3990. Critical bug reported by Daryn Sharp and fixed by Daryn Sharp (name-node)
NN's health report has severe performance problems
HDFS-3919. Minor bug reported by Andy Isaacson and fixed by Andy Isaacson (test)
MiniDFSCluster:waitClusterUp can hang forever
HDFS-3905. Critical bug reported by Daryn Sharp and fixed by Daryn Sharp (hdfs client , security)
Secure cluster cannot use hftp to an insecure cluster
HDFS-3829. Major bug reported by Trevor Robinson and fixed by Trevor Robinson (test)
TestHftpURLTimeouts fails intermittently with JDK7
HDFS-3824. Major bug reported by Trevor Robinson and fixed by Trevor Robinson (test)
TestHftpDelegationToken fails intermittently with JDK7
HDFS-3483. Major improvement reported by Stephen Chu and fixed by Stephen Fritz
Better error message when hdfs fsck is run against a ViewFS config
HDFS-3224. Minor bug reported by Eli Collins and fixed by Jason Lowe
Bug in check for DN re-registration with different storage ID
HADOOP-9025. Major bug reported by Robert Joseph Evans and fixed by Jonathan Eagles
org.apache.hadoop.tools.TestCopyListing failing
HADOOP-9022. Major bug reported by Haiyang Jiang and fixed by Jonathan Eagles
Hadoop distcp tool fails to copy file if -m 0 specified
HADOOP-8986. Critical bug reported by Robert Joseph Evans and fixed by Robert Joseph Evans (ipc)
Server$Call object is never released after it is sent
HADOOP-8962. Critical bug reported by Jason Lowe and fixed by Jason Lowe (fs)
RawLocalFileSystem.listStatus fails when a child filename contains a colon
HADOOP-8932. Major improvement reported by Kihwal Lee and fixed by Kihwal Lee (security)
JNI-based user-group mapping modules can be too chatty on lookup failures
HADOOP-8930. Major improvement reported by Andrey Klochkov and fixed by Andrey Klochkov (test)
Cumulative code coverage calculation
HADOOP-8926. Major improvement reported by Gopal V and fixed by Gopal V (util)
hadoop.util.PureJavaCrc32 cache hit-ratio is low for static data

Speed up Crc32 by improving the cache hit-ratio of hadoop.util.PureJavaCrc32
HADOOP-8906. Critical bug reported by Daryn Sharp and fixed by Daryn Sharp (fs)
paths with multiple globs are unreliable
HADOOP-8889. Major improvement reported by Todd Lipcon and fixed by Todd Lipcon (build , test)
Upgrade to Surefire 2.12.3
HADOOP-8851. Minor improvement reported by Ivan A. Veselovsky and fixed by Ivan A. Veselovsky (test)
Use -XX:+HeapDumpOnOutOfMemoryError JVM option in the forked tests
HADOOP-8819. Major bug reported by Brandon Li and fixed by Brandon Li (fs)
Should use && instead of & in a few places in FTPFileSystem,FTPInputStream,S3InputStream,ViewFileSystem,ViewFs
HADOOP-8791. Major bug reported by Bertrand Dechoux and fixed by Jing Zhao (documentation)
rm "Only deletes non empty directory and files."
HADOOP-8789. Minor improvement reported by Andy Isaacson and fixed by Andy Isaacson (test)
Tests setLevel(Level.OFF) should be Level.ERROR
HADOOP-8775. Major bug reported by Sandy Ryza and fixed by Sandy Ryza
MR2 distcp permits non-positive value to -bandwidth option which causes job never to complete
HADOOP-8755. Major improvement reported by Andrey Klochkov and fixed by Andrey Klochkov (test)
Print thread dump when tests fail due to timeout
HADOOP-8386. Major bug reported by Christopher Berner and fixed by Christopher Berner (scripts)
hadoop script doesn't work if 'cd' prints to stdout (default behavior in Ubuntu)