Hadoop 0.23.5 Release Notes
These release notes include new developer and user-facing incompatibilities, features, and major improvements.
Changes since Hadoop 0.23.4
- YARN-216.
Major improvement reported by Todd Lipcon and fixed by Robert Joseph Evans
Remove jquery theming support
As of today we have 9.4MB of JQuery themes in our code tree. In addition to being a waste of space, it's a highly questionable feature. I've never heard anyone complain that the Hadoop interface isn't themeable enough, and there's far more value in consistency across installations than there is in themeability. Let's rip it out.
- YARN-214.
Major bug reported by Jason Lowe and fixed by Jonathan Eagles (resourcemanager)
RMContainerImpl does not handle event EXPIRE at state RUNNING
RMContainerImpl has a race condition where a container can enter the RUNNING state just as the container expires. This results in an invalid event transition error:
{noformat}
2012-11-11 05:31:38,954 [ResourceManager Event Processor] ERROR org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: Can't handle this event at current state
org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: EXPIRE at RUNNING
at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
at org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:205)
at org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:44)
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApp.containerCompleted(SchedulerApp.java:203)
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.completedContainer(LeafQueue.java:1337)
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.completedContainer(CapacityScheduler.java:739)
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:659)
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:80)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:340)
at java.lang.Thread.run(Thread.java:619)
{noformat}
EXPIRE needs to be handled (well at least ignored) in the RUNNING state to account for this race condition.
- YARN-212.
Blocker bug reported by Nathan Roberts and fixed by Nathan Roberts (nodemanager)
NM state machine ignores an APPLICATION_CONTAINER_FINISHED event when it shouldn't
The NM state machines can make the following two invalid state transitions when a speculative attempt is killed shortly after it gets started. When this happens the NM keeps the log aggregation context open for this application and therefore chews up FDs and leases on the NN, eventually running the NN out of FDs and bringing down the entire cluster.
2012-11-07 05:36:33,774 [AsyncDispatcher event handler] WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Can't handle this event at current state
org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: APPLICATION_CONTAINER_FINISHED at INITING
2012-11-07 05:36:33,775 [AsyncDispatcher event handler] WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Can't handle this event at current state: Current: [DONE], eventType: [INIT_CONTAINER]
org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: INIT_CONTAINER at DONE
- YARN-206.
Major bug reported by Jason Lowe and fixed by Jason Lowe (resourcemanager)
TestApplicationCleanup.testContainerCleanup occasionally fails
testContainerCleanup is occasionally failing with the error:
testContainerCleanup(org.apache.hadoop.yarn.server.resourcemanager.TestApplicationCleanup): expected:<2> but was:<1>
- YARN-202.
Critical bug reported by Kihwal Lee and fixed by Kihwal Lee
Log Aggregation generates a storm of fsync() for namenode
When the log aggregation is on, write to each aggregated container log causes hflush() to be called. For large clusters, this can creates a lot of fsync() calls for namenode.
We have seen 6-7x increase in the average number of fsync operations compared to 1.0.x on a large busy cluster. Over 99% of fsync ops were for log aggregation writing to tmp files.
- YARN-201.
Critical bug reported by Jason Lowe and fixed by Jason Lowe (capacityscheduler)
CapacityScheduler can take a very long time to schedule containers if requests are off cluster
When a user runs a job where one of the input files is a large file on another cluster, the job can create many splits on nodes which are unreachable for computation from the current cluster. The off-switch delay logic in LeafQueue can cause the ResourceManager to allocate containers for the job very slowly. In one case the job was only getting one container every 23 seconds, and the queue had plenty of spare capacity.
- YARN-189.
Blocker bug reported by Thomas Graves and fixed by Thomas Graves (resourcemanager)
deadlock in RM - AMResponse object
we ran into a deadlock in the RM.
=============================
"1128743461@qtp-1252749669-5201":
waiting for ownable synchronizer 0x00002aabbc87b960, (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync),
which is held by "AsyncDispatcher event handler"
"AsyncDispatcher event handler":
waiting to lock monitor 0x00002ab0bba3a370 (object 0x00002aab3d4cd698, a org.apache.hadoop.yarn.api.records.impl.pb.AMResponsePBImpl),
which is held by "IPC Server handler 36 on 8030"
"IPC Server handler 36 on 8030":
waiting for ownable synchronizer 0x00002aabbc87b960, (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync),
which is held by "AsyncDispatcher event handler"
Java stack information for the threads listed above:
===================================================
"1128743461@qtp-1252749669-5201":
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00002aabbc87b960> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:941) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1261)
at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:594) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.getFinalApplicationStatus(RMAppAttemptImpl.java:2
95)
at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.getFinalApplicationStatus(RMAppImpl.java:222)
at org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices.getApps(RMWebServices.java:328)
at sun.reflect.GeneratedMethodAccessor41.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60)
at com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$TypeOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:185)
at com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaM
...
...
..
"AsyncDispatcher event handler":
at org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.unregisterAttempt(ApplicationMasterService.java:307)
- waiting to lock <0x00002aab3d4cd698> (a org.apache.hadoop.yarn.api.records.impl.pb.AMResponsePBImpl)
at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$BaseFinalTransition.transition(RMAppAttemptImpl.java:647)
at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$FinalTransition.transition(RMAppAttemptImpl.java:809)
at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$FinalTransition.transition(RMAppAttemptImpl.java:796)
at org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:357)
at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:298)
at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
- locked <0x00002aabbb673090> (a org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine)
at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:478)
at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:81)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:436)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:417)
at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126)
at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75)
at java.lang.Thread.run(Thread.java:619)
"IPC Server handler 36 on 8030":
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00002aabbc87b960> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:842)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1178)
at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:807)
at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.pullJustFinishedContainers(RMAppAttemptImpl.java:437)
at org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:285)
- locked <0x00002aab3d4cd698> (a org.apache.hadoop.yarn.api.records.impl.pb.AMResponsePBImpl)
at org.apache.hadoop.yarn.api.impl.pb.service.AMRMProtocolPBServiceImpl.allocate(AMRMProtocolPBServiceImpl.java:56)
at org.apache.hadoop.yarn.proto.AMRMProtocol$AMRMProtocolService$2.callBlockingMethod(AMRMProtocol.java:87)
at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Server.call(ProtoOverHadoopRpcEngine.java:353)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1528)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1524)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1212)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1522)
- YARN-188.
Major test reported by Aleksey Gorshkov and fixed by Aleksey Gorshkov (capacityscheduler)
Coverage fixing for CapacityScheduler
some tests for CapacityScheduler
YARN-188-branch-0.23.patch patch for branch 0.23
YARN-188-branch-2.patch patch for branch 2
YARN-188-trunk.patch patch for trunk
- YARN-186.
Major test reported by Aleksey Gorshkov and fixed by Aleksey Gorshkov (resourcemanager , scheduler)
Coverage fixing LinuxContainerExecutor
Added some tests for LinuxContainerExecuror
YARN-186-branch-0.23.patch patch for branch-0.23
YARN-186-branch-2.patch patch for branch-2
ARN-186-trunk.patch patch for trank
- YARN-180.
Critical bug reported by Thomas Graves and fixed by Arun C Murthy (capacityscheduler)
Capacity scheduler - containers that get reserved create container token to early
The capacity scheduler has the ability to 'reserve' containers. Unfortunately before it decides that it goes to reserved rather then assigned, the Container object is created which creates a container token that expires in roughly 10 minutes by default.
This means that by the time the NM frees up enough space on that node for the container to move to assigned the container token may have expired.
- YARN-178.
Critical bug reported by Radim Kolar and fixed by Radim Kolar
Fix custom ProcessTree instance creation
1. In current pluggable resourcecalculatorprocesstree is not passed root process id to custom implementation making it unusable.
2. pstree do not extend Configured as it should
Added constructor with pid argument with testsuite. Also added test that pstree is correctly configured.
- YARN-177.
Critical bug reported by Thomas Graves and fixed by Arun C Murthy (capacityscheduler)
CapacityScheduler - adding a queue while the RM is running has wacky results
Adding a queue to the capacity scheduler while the RM is running and then running a job in the queue added results in very strange behavior. The cluster Total Memory can either decrease or increase. We had a cluster where total memory decreased to almost 1/6th the capacity. Running on a small test cluster resulted in the capacity going up by simply adding a queue and running wordcount.
Looking at the RM logs, used memory can go negative but other logs show the number positive:
2012-10-21 22:56:44,796 [ResourceManager Event Processor] INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: assignedContainer queue=root usedCapacity=0.0375 absoluteUsedCapacity=0.0375 used=memory: 7680 cluster=memory: 204800
2012-10-21 22:56:45,831 [ResourceManager Event Processor] INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: completedContainer queue=root usedCapacity=-0.0225 absoluteUsedCapacity=-0.0225 used=memory: -4608 cluster=memory: 204800
- YARN-174.
Major bug reported by Robert Joseph Evans and fixed by Vinod Kumar Vavilapalli (nodemanager)
TestNodeStatusUpdater is failing in trunk
{noformat}
2012-10-19 12:18:23,941 FATAL [Node Status Updater] nodemanager.NodeManager (NodeManager.java:initAndStartNodeManager(277)) - Error starting NodeManager
org.apache.hadoop.yarn.YarnException: ${yarn.log.dir}/userlogs is not a valid path. Path should be with file scheme or without scheme
at org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService.validatePaths(LocalDirsHandlerService.java:321)
at org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService$MonitoringTimerTask.<init>(LocalDirsHandlerService.java:95)
at org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService.init(LocalDirsHandlerService.java:123)
at org.apache.hadoop.yarn.service.CompositeService.init(CompositeService.java:58)
at org.apache.hadoop.yarn.server.nodemanager.NodeHealthCheckerService.init(NodeHealthCheckerService.java:48)
at org.apache.hadoop.yarn.service.CompositeService.init(CompositeService.java:58)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.init(NodeManager.java:165)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:274)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.stateChanged(NodeManager.java:256)
at org.apache.hadoop.yarn.service.AbstractService.changeState(AbstractService.java:163)
at org.apache.hadoop.yarn.service.AbstractService.stop(AbstractService.java:112)
at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.stop(NodeStatusUpdaterImpl.java:149)
at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.reboot(NodeStatusUpdaterImpl.java:157)
at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.access$900(NodeStatusUpdaterImpl.java:63)
at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl$1.run(NodeStatusUpdaterImpl.java:357)
{noformat}
The NM then calls System.exit(-1), which makes the unit test exit and produces an error that is hard to track down.
- YARN-166.
Major bug reported by Thomas Graves and fixed by Thomas Graves (capacityscheduler)
capacity scheduler doesn't allow capacity < 1.0
1.x supports queue capacity < 1, but in 0.23 the capacity scheduler doesn't. This is an issue for us since we have a large cluster running 1.x that currently has a queue with capacity 0.5%.
- YARN-165.
Blocker improvement reported by Jason Lowe and fixed by Jason Lowe (resourcemanager)
RM should point tracking URL to RM web page for app when AM fails
Currently when an ApplicationMaster fails the ResourceManager is updating the tracking URL to an empty string, see RMAppAttemptImpl.ContainerFinishedTransition. Unfortunately when the client attempts to follow the proxy URL it results in a web page showing an HTTP 500 error and an ugly backtrace because "http://" isn't a very helpful tracking URL.
It would be much more helpful if the proxy URL redirected to the RM webapp page for the specific application. That page shows the various AM attempts and pointers to their logs which will be useful for debugging the problems that caused the AM attempts to fail.
- YARN-163.
Major bug reported by Jason Lowe and fixed by Jason Lowe (nodemanager)
Retrieving container log via NM webapp can hang with multibyte characters in log
ContainerLogsBlock.printLogs currently assumes that skipping N bytes in the log file is the same as skipping N characters, but that is not true when the log contains multibyte characters. This can cause the loop that skips a portion of the log to try to skip past the end of the file and loop forever (or until Jetty kills the worker thread).
- YARN-161.
Major bug reported by Chris Nauroth and fixed by Chris Nauroth (api)
Yarn Common has multiple compiler warnings for unchecked operations
The warnings are in classes StateMachineFactory, RecordFactoryProvider, RpcFactoryProvider, and YarnRemoteExceptionFactoryProvider. OpenJDK 1.6.0_24 actually treats these as compilation errors, causing the build to fail.
- YARN-159.
Major bug reported by Thomas Graves and fixed by Thomas Graves (resourcemanager)
RM web ui applications page should be sorted to display last app first
RM web ui applications page should be sorted to display last app first.
It currently sorts with smallest application id first, which is the first apps that were submitted. After you have one page worth of apps its much more useful for it to sort such that the biggest appid (last submitted app) shows up first.
- YARN-151.
Major bug reported by Robert Joseph Evans and fixed by Ravi Prakash
Browser thinks RM main page JS is taking too long
The main RM page with the default settings of 10,000 applications can cause browsers to think that the JS on the page is stuck and ask you if you want to kill it. This is a big usability problem.
- YARN-144.
Major bug reported by Robert Parker and fixed by Robert Parker
MiniMRYarnCluster launches RM and JHS on default ports
MAPREDUCE-3867, MAPREDUCE-3869, and MAPREDUCE-4406 need to be combined and applied to branch-0.23
- YARN-139.
Major bug reported by Nathan Roberts and fixed by Vinod Kumar Vavilapalli (api)
Interrupted Exception within AsyncDispatcher leads to user confusion
Successful applications tend to get InterruptedExceptions during shutdown. The exception is harmless but it leads to lots of user confusion and therefore could be cleaned up.
2012-09-28 14:50:12,477 WARN [AsyncDispatcher event handler] org.apache.hadoop.yarn.event.AsyncDispatcher: Interrupted Exception while stopping
java.lang.InterruptedException
at java.lang.Object.wait(Native Method)
at java.lang.Thread.join(Thread.java:1143)
at java.lang.Thread.join(Thread.java:1196)
at org.apache.hadoop.yarn.event.AsyncDispatcher.stop(AsyncDispatcher.java:105)
at org.apache.hadoop.yarn.service.CompositeService.stop(CompositeService.java:99)
at org.apache.hadoop.yarn.service.CompositeService.stop(CompositeService.java:89)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobFinishEventHandler.handle(MRAppMaster.java:437)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobFinishEventHandler.handle(MRAppMaster.java:402)
at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126)
at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75)
at java.lang.Thread.run(Thread.java:619)
2012-09-28 14:50:12,477 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.service.AbstractService: Service:Dispatcher is stopped.
2012-09-28 14:50:12,477 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.service.AbstractService: Service:org.apache.hadoop.mapreduce.v2.app.MRAppMaster is stopped.
2012-09-28 14:50:12,477 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Exiting MR AppMaster..GoodBye
- YARN-131.
Major bug reported by Ahmed Radwan and fixed by Ahmed Radwan (capacityscheduler)
Incorrect ACL properties in capacity scheduler documentation
The CapacityScheduler apt file incorrectly specifies the property names controlling acls for application submission and queue administration.
{{yarn.scheduler.capacity.root.<queue-path>.acl_submit_jobs}}
should be
{{yarn.scheduler.capacity.root.<queue-path>.acl_submit_applications}}
{{yarn.scheduler.capacity.root.<queue-path>.acl_administer_jobs}}
should be
{{yarn.scheduler.capacity.root.<queue-path>.acl_administer_queue}}
Uploading a patch momentarily.
- YARN-116.
Major bug reported by xieguiming and fixed by xieguiming (resourcemanager)
RM is missing ability to add include/exclude files without a restart
The "yarn.resourcemanager.nodes.include-path" default value is "", if we need to add an include file, we must currently restart the RM.
I suggest that for adding an include or exclude file, there should be no need to restart the RM. We may only execute the refresh command. The HDFS NameNode already has this ability.
Fix is to the modify HostsFileReader class instances:
From:
{code}
public HostsFileReader(String inFile,
String exFile)
{code}
To:
{code}
public HostsFileReader(Configuration conf,
String NODES_INCLUDE_FILE_PATH,String DEFAULT_NODES_INCLUDE_FILE_PATH,
String NODES_EXCLUDE_FILE_PATH,String DEFAULT_NODES_EXCLUDE_FILE_PATH)
{code}
And thus, we can read the config file dynamically when a {{refreshNodes}} is invoked and therefore have no need to restart the ResourceManager.
- YARN-102.
Trivial bug reported by Devaraj K and fixed by Devaraj K (resourcemanager)
Move the apache licence header to the top of the file in MemStore.java
- YARN-43.
Major bug reported by Thomas Graves and fixed by Thomas Graves
TestResourceTrackerService fail intermittently on jdk7
Tests run: 6, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 1.73 sec <<< FAILURE!
testDecommissionWithIncludeHosts(org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService) Time elapsed: 0.086 sec <<< FAILURE!
junit.framework.AssertionFailedError: expected:<0> but was:<1> at junit.framework.Assert.fail(Assert.java:47)
at junit.framework.Assert.failNotEquals(Assert.java:283)
at junit.framework.Assert.assertEquals(Assert.java:64)
at junit.framework.Assert.assertEquals(Assert.java:195)
at junit.framework.Assert.assertEquals(Assert.java:201)
at org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService.testDecommissionWithIncludeHosts(TestResourceTrackerService.java:90)
- YARN-32.
Major bug reported by Thomas Graves and fixed by Vinod Kumar Vavilapalli
TestApplicationTokens fails intermintently on jdk7
TestApplicationsTokens fails intermintently on jdk7.
- YARN-30.
Major bug reported by Thomas Graves and fixed by Thomas Graves
TestNMWebServicesApps, TestRMWebServicesApps and TestRMWebServicesNodes fail on jdk7
It looks like the string changed from "const class" to "constant".
Tests run: 19, Failures: 3, Errors: 0, Skipped: 0, Time elapsed: 6.786 sec <<< FAILURE!
testNodeAppsStateInvalid(org.apache.hadoop.yarn.server.nodemanager.webapp.TestNMWebServicesApps) Time elapsed: 0.248 sec <<< FAILURE!
java.lang.AssertionError: exception message doesn't match, got: No enum constant org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationState.FOO_STATE expected: No enum const class org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationState.FOO_STATE
- YARN-28.
Major bug reported by Thomas Graves and fixed by Thomas Graves
TestCompositeService fails on jdk7
test TestCompositeService fails when run with jdk7.
It appears it expects test testCallSequence to be called first and the sequence numbers to start at 0. On jdk7 its not being called first and sequence number has already been incremented.
- MAPREDUCE-4802.
Major improvement reported by Ravi Prakash and fixed by Ravi Prakash (mr-am , mrv2 , webapps)
Takes a long time to load the task list on the AM for large jobs
- MAPREDUCE-4801.
Critical bug reported by Jason Lowe and fixed by Jason Lowe
ShuffleHandler can generate large logs due to prematurely closed channels
- MAPREDUCE-4797.
Major bug reported by Jason Lowe and fixed by Jason Lowe (applicationmaster)
LocalContainerAllocator can loop forever trying to contact the RM
- MAPREDUCE-4787.
Major bug reported by Ravi Prakash and fixed by Robert Parker (test)
TestJobMonitorAndPrint is broken
- MAPREDUCE-4786.
Major bug reported by Ravi Prakash and fixed by Ravi Prakash (mrv2)
Job End Notification retry interval is 5 milliseconds by default
- MAPREDUCE-4782.
Blocker bug reported by Mark Fuhs and fixed by Mark Fuhs (client)
NLineInputFormat skips first line of last InputSplit
- MAPREDUCE-4774.
Major bug reported by Ivan A. Veselovsky and fixed by Jason Lowe (applicationmaster , mrv2)
JobImpl does not handle asynchronous task events in FAILED state
- MAPREDUCE-4772.
Critical bug reported by Robert Joseph Evans and fixed by Robert Joseph Evans (mrv2)
Fetch failures can take way too long for a map to be restarted
- MAPREDUCE-4771.
Major bug reported by Jason Lowe and fixed by Jason Lowe (mrv2)
KeyFieldBasedPartitioner not partitioning properly when configured
- MAPREDUCE-4763.
Minor improvement reported by Ivan A. Veselovsky and fixed by
repair test org.apache.hadoop.mapreduce.security.TestUmbilicalProtocolWithJobToken
- MAPREDUCE-4752.
Major improvement reported by Robert Joseph Evans and fixed by Robert Joseph Evans (mrv2)
Reduce MR AM memory usage through String Interning
- MAPREDUCE-4751.
Major bug reported by Ravi Prakash and fixed by Vinod Kumar Vavilapalli
AM stuck in KILL_WAIT for days
- MAPREDUCE-4748.
Blocker bug reported by Robert Joseph Evans and fixed by Jason Lowe (mrv2)
Invalid event: T_ATTEMPT_SUCCEEDED at SUCCEEDED
- MAPREDUCE-4746.
Major bug reported by Robert Parker and fixed by Robert Parker (applicationmaster)
The MR Application Master does not have a config to set environment variables
- MAPREDUCE-4741.
Minor bug reported by Jason Lowe and fixed by Vinod Kumar Vavilapalli (applicationmaster , mrv2)
WARN and ERROR messages logged during normal AM shutdown
- MAPREDUCE-4740.
Blocker bug reported by Robert Joseph Evans and fixed by Robert Joseph Evans (mrv2)
only .jars can be added to the Distributed Cache classpath
- MAPREDUCE-4733.
Major bug reported by Jason Lowe and fixed by Jason Lowe (applicationmaster , mrv2)
Reducer can fail to make progress during shuffle if too many reducers complete consecutively
- MAPREDUCE-4730.
Blocker bug reported by Jason Lowe and fixed by Jason Lowe (applicationmaster , mrv2)
AM crashes due to OOM while serving up map task completion events
- MAPREDUCE-4729.
Major bug reported by Thomas Graves and fixed by Vinod Kumar Vavilapalli (jobhistoryserver)
job history UI not showing all job attempts
- MAPREDUCE-4724.
Major bug reported by Thomas Graves and fixed by Thomas Graves (jobhistoryserver)
job history web ui applications page should be sorted to display last app first
- MAPREDUCE-4721.
Major bug reported by Ravi Prakash and fixed by Ravi Prakash (jobhistoryserver)
Task startup time in JHS is same as job startup time.
- MAPREDUCE-4720.
Major bug reported by Robert Joseph Evans and fixed by Ravi Prakash
Browser thinks History Server main page JS is taking too long
- MAPREDUCE-4705.
Critical bug reported by Jason Lowe and fixed by Jason Lowe (jobhistoryserver , mrv2)
Historyserver links expire before the history data does
- MAPREDUCE-4674.
Minor bug reported by Robert Justice and fixed by Robert Justice
Hadoop examples secondarysort has a typo "secondarysrot" in the usage
- MAPREDUCE-4666.
Minor improvement reported by Jason Lowe and fixed by Jason Lowe (jobhistoryserver)
JVM metrics for history server
- MAPREDUCE-4596.
Major task reported by Siddharth Seth and fixed by Siddharth Seth (applicationmaster , mrv2)
Split StateMachine state from states seen by MRClientProtocol (for Job, Task, TaskAttempt)
- MAPREDUCE-4554.
Major bug reported by Benoy Antony and fixed by Benoy Antony (job submission , security)
Job Credentials are not transmitted if security is turned off
- MAPREDUCE-4549.
Critical bug reported by Robert Joseph Evans and fixed by Robert Joseph Evans (mrv2)
Distributed cache conflicts breaks backwards compatability
- MAPREDUCE-4521.
Major bug reported by Jason Lowe and fixed by Ravi Prakash (mrv2)
mapreduce.user.classpath.first incompatibility with 0.20/1.x
- MAPREDUCE-4517.
Minor improvement reported by James Kinley and fixed by Jason Lowe (applicationmaster)
Too many INFO messages written out during AM to RM heartbeat
- MAPREDUCE-4479.
Major bug reported by Mariappan Asokan and fixed by Mariappan Asokan (test)
Fix parameter order in assertEquals() in TestCombineInputFileFormat.java
- MAPREDUCE-4425.
Critical bug reported by Siddharth Seth and fixed by Jason Lowe (mrv2)
Speculation + Fetch failures can lead to a hung job
- MAPREDUCE-4266.
Major task reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur (build)
remove Ant remnants from MR
- MAPREDUCE-4229.
Major improvement reported by Todd Lipcon and fixed by Miomir Boljanovic (jobtracker)
Counter names' memory usage can be decreased by interning
- MAPREDUCE-4107.
Major sub-task reported by Devaraj K and fixed by Devaraj K (mrv2)
Fix tests in org.apache.hadoop.ipc.TestSocketFactory
- MAPREDUCE-1806.
Major bug reported by Paul Yang and fixed by Gera Shegalov (harchive)
CombineFileInputFormat does not work with paths not on default FS
- HDFS-4186.
Critical bug reported by Kihwal Lee and fixed by Kihwal Lee (name-node)
logSync() is called with the write lock held while releasing lease
- HDFS-4182.
Critical bug reported by Todd Lipcon and fixed by Robert Joseph Evans (name-node)
SecondaryNameNode leaks NameCache entries
- HDFS-4181.
Critical bug reported by Kihwal Lee and fixed by Kihwal Lee (name-node)
LeaseManager tries to double remove and prints extra messages
- HDFS-4172.
Minor bug reported by Derek Dagit and fixed by Derek Dagit (name-node)
namenode does not URI-encode parameters when building URI for datanode request
- HDFS-4162.
Minor bug reported by Derek Dagit and fixed by Derek Dagit (data-node)
Some malformed and unquoted HTML strings are returned from datanode web ui
- HDFS-4090.
Critical bug reported by Kihwal Lee and fixed by Kihwal Lee (hdfs client)
getFileChecksum() result incompatible when called against zero-byte files.
- HDFS-4080.
Major bug reported by Kihwal Lee and fixed by Kihwal Lee (name-node)
Add a separate logger for block state change logs to enable turning off those logs
- HDFS-4075.
Critical bug reported by Kihwal Lee and fixed by Kihwal Lee (name-node)
Reduce recommissioning overhead
- HDFS-4016.
Minor bug reported by Ivan A. Veselovsky and fixed by Ivan A. Veselovsky
back-port HDFS-3582 to branch-0.23
- HDFS-3996.
Minor bug reported by Eli Collins and fixed by Eli Collins
Add debug log removed in HDFS-3873 back
- HDFS-3990.
Critical bug reported by Daryn Sharp and fixed by Daryn Sharp (name-node)
NN's health report has severe performance problems
- HDFS-3919.
Minor bug reported by Andy Isaacson and fixed by Andy Isaacson (test)
MiniDFSCluster:waitClusterUp can hang forever
- HDFS-3905.
Critical bug reported by Daryn Sharp and fixed by Daryn Sharp (hdfs client , security)
Secure cluster cannot use hftp to an insecure cluster
- HDFS-3829.
Major bug reported by Trevor Robinson and fixed by Trevor Robinson (test)
TestHftpURLTimeouts fails intermittently with JDK7
- HDFS-3824.
Major bug reported by Trevor Robinson and fixed by Trevor Robinson (test)
TestHftpDelegationToken fails intermittently with JDK7
- HDFS-3483.
Major improvement reported by Stephen Chu and fixed by Stephen Fritz
Better error message when hdfs fsck is run against a ViewFS config
- HDFS-3224.
Minor bug reported by Eli Collins and fixed by Jason Lowe
Bug in check for DN re-registration with different storage ID
- HADOOP-9025.
Major bug reported by Robert Joseph Evans and fixed by Jonathan Eagles
org.apache.hadoop.tools.TestCopyListing failing
- HADOOP-9022.
Major bug reported by Haiyang Jiang and fixed by Jonathan Eagles
Hadoop distcp tool fails to copy file if -m 0 specified
- HADOOP-8986.
Critical bug reported by Robert Joseph Evans and fixed by Robert Joseph Evans (ipc)
Server$Call object is never released after it is sent
- HADOOP-8962.
Critical bug reported by Jason Lowe and fixed by Jason Lowe (fs)
RawLocalFileSystem.listStatus fails when a child filename contains a colon
- HADOOP-8932.
Major improvement reported by Kihwal Lee and fixed by Kihwal Lee (security)
JNI-based user-group mapping modules can be too chatty on lookup failures
- HADOOP-8930.
Major improvement reported by Andrey Klochkov and fixed by Andrey Klochkov (test)
Cumulative code coverage calculation
- HADOOP-8926.
Major improvement reported by Gopal V and fixed by Gopal V (util)
hadoop.util.PureJavaCrc32 cache hit-ratio is low for static data
Speed up Crc32 by improving the cache hit-ratio of hadoop.util.PureJavaCrc32
- HADOOP-8906.
Critical bug reported by Daryn Sharp and fixed by Daryn Sharp (fs)
paths with multiple globs are unreliable
- HADOOP-8889.
Major improvement reported by Todd Lipcon and fixed by Todd Lipcon (build , test)
Upgrade to Surefire 2.12.3
- HADOOP-8851.
Minor improvement reported by Ivan A. Veselovsky and fixed by Ivan A. Veselovsky (test)
Use -XX:+HeapDumpOnOutOfMemoryError JVM option in the forked tests
- HADOOP-8819.
Major bug reported by Brandon Li and fixed by Brandon Li (fs)
Should use && instead of & in a few places in FTPFileSystem,FTPInputStream,S3InputStream,ViewFileSystem,ViewFs
- HADOOP-8791.
Major bug reported by Bertrand Dechoux and fixed by Jing Zhao (documentation)
rm "Only deletes non empty directory and files."
- HADOOP-8789.
Minor improvement reported by Andy Isaacson and fixed by Andy Isaacson (test)
Tests setLevel(Level.OFF) should be Level.ERROR
- HADOOP-8775.
Major bug reported by Sandy Ryza and fixed by Sandy Ryza
MR2 distcp permits non-positive value to -bandwidth option which causes job never to complete
- HADOOP-8755.
Major improvement reported by Andrey Klochkov and fixed by Andrey Klochkov (test)
Print thread dump when tests fail due to timeout
- HADOOP-8386.
Major bug reported by Christopher Berner and fixed by Christopher Berner (scripts)
hadoop script doesn't work if 'cd' prints to stdout (default behavior in Ubuntu)