Hadoop 2.4.1 Release Notes

These release notes include new developer and user-facing incompatibilities, features, and major improvements.

Changes since Hadoop 2.4.0

YARN-2081. Minor bug reported by Hong Zhiguo and fixed by Hong Zhiguo (applications/distributed-shell)
TestDistributedShell fails after YARN-1962

java.lang.AssertionError: expected:<1> but was:<0> at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:555) at org.junit.Assert.assertEquals(Assert.java:542) at org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:198)
YARN-2066. Minor bug reported by Ted Yu and fixed by Hong Zhiguo
Wrong field is referenced in GetApplicationsRequestPBImpl#mergeLocalToBuilder()

{code} if (this.finish != null) { builder.setFinishBegin(start.getMinimumLong()); builder.setFinishEnd(start.getMaximumLong()); } {code} this.finish should be referenced in the if block.
YARN-2053. Major sub-task reported by Sumit Mohanty and fixed by Wangda Tan (resourcemanager)
Slider AM fails to restart: NPE in RegisterApplicationMasterResponseProto$Builder.addAllNmTokensFromPreviousAttempts

Slider AppMaster restart fails with the following: {code} org.apache.hadoop.yarn.proto.YarnServiceProtos$RegisterApplicationMasterResponseProto$Builder.addAllNmTokensFromPreviousAttempts(YarnServiceProtos.java:2700) {code}
YARN-2016. Major bug reported by Venkat Ranganathan and fixed by Junping Du (resourcemanager)
Yarn getApplicationRequest start time range is not honored

When we query for the previous applications by creating an instance of GetApplicationsRequest and setting the start time range and application tag, we see that the start range provided is not honored and all applications with the tag are returned Attaching a reproducer.
YARN-1986. Critical bug reported by Jon Bringhurst and fixed by Hong Zhiguo
In Fifo Scheduler, node heartbeat in between creating app and attempt causes NPE

After upgrade from 2.2.0 to 2.4.0, NPE on first job start. -After RM was restarted, the job runs without a problem.- {noformat} 19:11:13,441 FATAL ResourceManager:600 - Error in handling event type NODE_UPDATE to the scheduler java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.assignContainers(FifoScheduler.java:462) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.nodeUpdate(FifoScheduler.java:714) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.handle(FifoScheduler.java:743) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.handle(FifoScheduler.java:104) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:591) at java.lang.Thread.run(Thread.java:744) 19:11:13,443 INFO ResourceManager:604 - Exiting, bbye.. {noformat}
YARN-1976. Major bug reported by Yesha Vora and fixed by Junping Du
Tracking url missing http protocol for FAILED application

Run yarn application -list -appStates FAILED, It does not print http protocol name like FINISHED apps. {noformat} -bash-4.1$ yarn application -list -appStates FINISHED,FAILED,KILLED 14/04/15 23:55:07 INFO client.RMProxy: Connecting to ResourceManager at host Total number of applications (application-types: [] and states: [FINISHED, FAILED, KILLED]):4 Application-Id Application-Name Application-Type User Queue State Final-State Progress Tracking-URL application_1397598467870_0004 Sleep job MAPREDUCE hrt_qa default FINISHED SUCCEEDED 100% http://host:19888/jobhistory/job/job_1397598467870_0004 application_1397598467870_0003 Sleep job MAPREDUCE hrt_qa default FINISHED SUCCEEDED 100% http://host:19888/jobhistory/job/job_1397598467870_0003 application_1397598467870_0002 Sleep job MAPREDUCE hrt_qa default FAILED FAILED 100% host:8088/cluster/app/application_1397598467870_0002 application_1397598467870_0001 word count MAPREDUCE hrt_qa default FINISHED SUCCEEDED 100% http://host:19888/jobhistory/job/job_1397598467870_0001 {noformat} It only prints 'host:8088/cluster/app/application_1397598467870_0002' instead 'http://host:8088/cluster/app/application_1397598467870_0002'
YARN-1975. Major bug reported by Nathan Roberts and fixed by Mit Desai (resourcemanager)
Used resources shows escaped html in CapacityScheduler and FairScheduler page

Used resources displays as &lt;memory:1111, vCores;&gt; with capacity scheduler
YARN-1962. Major sub-task reported by Mohammad Kamrul Islam and fixed by Mohammad Kamrul Islam
Timeline server is enabled by default

Since Timeline server is not matured and secured yet, enabling it by default might create some confusion. We were playing with 2.4.0 and found a lot of exceptions for distributed shell example related to connection refused error. Btw, we didn't run TS because it is not secured yet. Although it is possible to explicitly turn it off through yarn-site config. In my opinion, this extra change for this new service is not worthy at this point,. This JIRA is to turn it off by default. If there is an agreement, i can put a simple patch about this. {noformat} 14/04/17 23:24:33 ERROR impl.TimelineClientImpl: Failed to get the response from the timeline server. com.sun.jersey.api.client.ClientHandlerException: java.net.ConnectException: Connection refused at com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:149) at com.sun.jersey.api.client.Client.handle(Client.java:648) at com.sun.jersey.api.client.WebResource.handle(WebResource.java:670) at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74) at com.sun.jersey.api.client.WebResource$Builder.post(WebResource.java:563) at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.doPostingEntities(TimelineClientImpl.java:131) at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:104) at org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.publishApplicationAttemptEvent(ApplicationMaster.java:1072) at org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.run(ApplicationMaster.java:515) at org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.main(ApplicationMaster.java:281) Caused by: java.net.ConnectException: Connection refused at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:198) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:579) at java.net.Socket.connect(Socket.java:528) at sun.net.NetworkClient.doConnect(NetworkClient.java:180) at sun.net.www.http.HttpClient.openServer(HttpClient.java:432) at sun.net.www.http.HttpClient.openServer(HttpClient.java:527) at sun.net.www.http.HttpClient.<in14/04/17 23:24:33 ERROR impl.TimelineClientImpl: Failed to get the response from the timeline server. com.sun.jersey.api.client.ClientHandlerException: java.net.ConnectException: Connection refused at com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:149) at com.sun.jersey.api.client.Client.handle(Client.java:648) at com.sun.jersey.api.client.WebResource.handle(WebResource.java:670) at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74) at com.sun.jersey.api.client.WebResource$Builder.post(WebResource.java:563) at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.doPostingEntities(TimelineClientImpl.java:131) at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:104) at org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.publishApplicationAttemptEvent(ApplicationMaster.java:1072) at org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.run(ApplicationMaster.java:515) at org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.main(ApplicationMaster.java:281) Caused by: java.net.ConnectException: Connection refused at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:198) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:579) at java.net.Socket.connect(Socket.java:528) at sun.net.NetworkClient.doConnect(NetworkClient.java:180) at sun.net.www.http.HttpClient.openServer(HttpClient.java:432) at sun.net.www.http.HttpClient.openServer(HttpClient.java:527) at sun.net.www.http.HttpClient.<init>(HttpClient.java:211) at sun.net.www.http.HttpClient.New(HttpClient.java:308) at sun.net.www.http.HttpClient.New(HttpClient.java:326) at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:996) at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:932) at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:850) at sun.net.www.protocol.http.HttpURLConnection.getOutputStream(HttpURLConnection.java:1091) at com.sun.jersey.client.urlconnection.URLConnectionClientHandler$1$1.getOutputStream(URLConnectionClientHandler.java:225) at com.sun.jersey.api.client.CommittingOutputStream.commitWrite(CommittingOutputStream.java:117) at com.sun.jersey.api.client.CommittingOutputStream.write(CommittingOutputStream.java:89) at org.codehaus.jackson.impl.Utf8Generator._flushBuffer(Utf8Generator.java:1754) at org.codehaus.jackson.impl.Utf8Generator.flush(Utf8Generator.java:1088) at org.codehaus.jackson.map.ObjectMapper.writeValue(ObjectMapper.java:1354) at org.codehaus.jackson.jaxrs.JacksonJsonProvider.writeTo(JacksonJsonProvider.java:527) at com.sun.jersey.api.client.RequestWriter.writeRequestEntity(RequestWriter.java:300) at com.sun.jersey.client.urlconnection.URLConnectionClientHandler._invoke(URLConnectionClientHandler.java:204) at com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:147) ... 9 moreit>(HttpClient.java:211) at sun.net.www.http.HttpClient.New(HttpClient.java:308) at sun.net.www.http.HttpClient.New(HttpClient.java:326) at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:996) at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:932) at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:850) at sun.net.www.protocol.http.HttpURLConnection.getOutputStream(HttpURLConnection.java:1091) at com.sun.jersey.client.urlconnection.URLConnectionClientHandler$1$1.getOutputStream(URLConnectionClientHandler.java:225) at com.sun.jersey.api.client.CommittingOutputStream.commitWrite(CommittingOutputStream.java:117) at com.sun.jersey.api.client.CommittingOutputStream.write(CommittingOutputStream.java:89) at org.codehaus.jackson.impl.Utf8Generator._flushBuffer(Utf8Generator.java:1754) at org.codehaus.jackson.impl.Utf8Generator.flush(Utf8Generator.java:1088) at org.codehaus.jackson.map.ObjectMapper.writeValue(ObjectMapper.java:1354) at org.codehaus.jackson.jaxrs.JacksonJsonProvider.writeTo(JacksonJsonProvider.java:527) at com.sun.jersey.api.client.RequestWriter.writeRequestEntity(RequestWriter.java:300) at com.sun.jersey.client.urlconnection.URLConnectionClientHandler._invoke(URLConnectionClientHandler.java:204) at com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:147) ... 9 more {noformat}
YARN-1957. Major sub-task reported by Carlo Curino and fixed by Carlo Curino (resourcemanager)
ProportionalCapacitPreemptionPolicy handling of corner cases...

The current version of ProportionalCapacityPreemptionPolicy should be improved to deal with the following two scenarios: 1) when rebalancing over-capacity allocations, it potentially preempts without considering the maxCapacity constraints of a queue (i.e., preempting possibly more than strictly necessary) 2) a zero capacity queue is preempted even if there is no demand (coherent with old use of zero-capacity to disabled queues) The proposed patch fixes both issues, and introduce few new test cases.
YARN-1947. Major test reported by Jian He and fixed by Jian He
TestRMDelegationTokens#testRMDTMasterKeyStateOnRollingMasterKey is failing intermittently

java.lang.AssertionError: null at org.junit.Assert.fail(Assert.java:92) at org.junit.Assert.assertTrue(Assert.java:43) at org.junit.Assert.assertTrue(Assert.java:54) at org.apache.hadoop.yarn.server.resourcemanager.security.TestRMDelegationTokens.testRMDTMasterKeyStateOnRollingMasterKey(TestRMDelegationTokens.java:117)
YARN-1934. Blocker bug reported by Rohith and fixed by Karthik Kambatla (resourcemanager)
Potential NPE in ZKRMStateStore caused by handling Disconnected event from ZK.

For ZK disconnected event , zkClient is set to null. It is very much prone to throw NPE. {noformat} case Disconnected: LOG.info("ZKRMStateStore Session disconnected"); oldZkClient = zkClient; zkClient = null; break; {noformat}
YARN-1933. Major bug reported by Jian He and fixed by Jian He
TestAMRestart and TestNodeHealthService failing sometimes on Windows

TestNodeHealthService failures: testNodeHealthScript(org.apache.hadoop.yarn.server.nodemanager.TestNodeHealthService) Time elapsed: 1.405 sec <<< ERROR! java.io.FileNotFoundException: C:\Users\Administrator\Documents\hadoop-common\hadoop-yarn-project\hadoop-yarn\hadoop-yarn-server\hadoop-yarn-server-nodemanager\target\org.apache.hadoop.yarn.server.nodemanager.TestNodeHealthService-localDir\failingscript.cmd (The process cannot access the file because it is being used by another process) at java.io.FileOutputStream.open(Native Method) at java.io.FileOutputStream.<init>(FileOutputStream.java:221) at java.io.FileOutputStream.<init>(FileOutputStream.java:171) at org.apache.hadoop.yarn.server.nodemanager.TestNodeHealthService.writeNodeHealthScriptFile(TestNodeHealthService.java:82) at org.apache.hadoop.yarn.server.nodemanager.TestNodeHealthService.testNodeHealthScript(TestNodeHealthService.java:154) testNodeHealthScriptShouldRun(org.apache.hadoop.yarn.server.nodemanager.TestNodeHealthService) Time elapsed: 0 sec <<< ERROR! java.io.FileNotFoundException: C:\Users\Administrator\Documents\hadoop-common\hadoop-yarn-project\hadoop-yarn\hadoop-yarn-server\hadoop-yarn-server-nodemanager\target\org.apache.hadoop.yarn.server.nodemanager.TestNodeHealthService-localDir\failingscript.cmd (Access is denied) at java.io.FileOutputStream.open(Native Method) at java.io.FileOutputStream.<init>(FileOutputStream.java:221) at java.io.FileOutputStream.<init>(FileOutputStream.java:171) at org.apache.hadoop.yarn.server.nodemanager.TestNodeHealthService.writeNodeHealthScriptFile(TestNodeHealthService.java:82) at org.apache.hadoop.yarn.server.nodemanager.TestNodeHealthService.testNodeHealthScriptShouldRun(TestNodeHealthService.java:103)
YARN-1932. Blocker bug reported by Mit Desai and fixed by Mit Desai
Javascript injection on the job status page

Scripts can be injected into the job status page as the diagnostics field is not sanitized. Whatever string you set there will show up to the jobs page as it is ... ie. if you put any script commands, they will be executed in the browser of the user who is opening the page. We need escaping the diagnostic string in order to not run the scripts.
YARN-1931. Blocker bug reported by Thomas Graves and fixed by Sandy Ryza (applications)
Private API change in YARN-1824 in 2.4 broke compatibility with previous releases

YARN-1824 broke compatibility with previous 2.x releases by changes the API's in org.apache.hadoop.yarn.util.Apps.{setEnvFromInputString,addToEnvironment} The old api should be added back in. This affects any ApplicationMasters who were using this api. It also breaks previously built MapReduce libraries from working with the new Yarn release as MR uses this api.
YARN-1929. Blocker bug reported by Rohith and fixed by Karthik Kambatla (resourcemanager)
DeadLock in RM when automatic failover is enabled.

Dead lock detected in RM when automatic failover is enabled. {noformat} Found one Java-level deadlock: ============================= "Thread-2": waiting to lock monitor 0x00007fb514303cf0 (object 0x00000000ef153fd0, a org.apache.hadoop.ha.ActiveStandbyElector), which is held by "main-EventThread" "main-EventThread": waiting to lock monitor 0x00007fb514750a48 (object 0x00000000ef154020, a org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService), which is held by "Thread-2" {noformat}
YARN-1928. Major bug reported by Zhijie Shen and fixed by Zhijie Shen
TestAMRMRPCNodeUpdates fails ocassionally

{code} junit.framework.AssertionFailedError: expected:<0> but was:<4> at junit.framework.Assert.fail(Assert.java:50) at junit.framework.Assert.failNotEquals(Assert.java:287) at junit.framework.Assert.assertEquals(Assert.java:67) at junit.framework.Assert.assertEquals(Assert.java:199) at junit.framework.Assert.assertEquals(Assert.java:205) at org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRMRPCNodeUpdates.testAMRMUnusableNodes(TestAMRMRPCNodeUpdates.java:136) {code}
YARN-1926. Major bug reported by Varun Vasudev and fixed by Varun Vasudev
DistributedShell unit tests fail on Windows

Couple of unit tests for the DistributedShell fail on Windows - specifically testDSShellWithShellScript and testDSRestartWithPreviousRunningContainers
YARN-1924. Critical bug reported by Arpit Gupta and fixed by Jian He
STATE_STORE_OP_FAILED happens when ZKRMStateStore tries to update app(attempt) before storing it

Noticed on a HA cluster Both RM shut down with this error.
YARN-1920. Major bug reported by Vinod Kumar Vavilapalli and fixed by Vinod Kumar Vavilapalli
TestFileSystemApplicationHistoryStore.testMissingApplicationAttemptHistoryData fails in windows

Though this was only failing in Windows, after debugging, I realized that the test fails because we are leaking a file-handle in the history service.
YARN-1914. Major bug reported by Varun Vasudev and fixed by Varun Vasudev
Test TestFSDownload.testDownloadPublicWithStatCache fails on Windows

The TestFSDownload.testDownloadPublicWithStatCache test in hadoop-yarn-common consistently fails on Windows environments. The root cause is that the test checks for execute permission for all users on every ancestor of the target directory. In windows, by default, group "Everyone" has no permissions on any directory in the install drive. It's unreasonable to expect this test to pass and we should skip it on Windows.
YARN-1910. Major bug reported by Xuan Gong and fixed by Xuan Gong
TestAMRMTokens fails on windows
YARN-1908. Major bug reported by Tassapol Athiapinya and fixed by Vinod Kumar Vavilapalli (applications/distributed-shell)
Distributed shell with custom script has permission error.

Create test1.sh having "pwd". Run this command as user1: hadoop jar /usr/lib/hadoop-yarn/hadoop-yarn-applications-distributedshell.jar -jar /usr/lib/hadoop-yarn/hadoop-yarn-applications-distributedshell.jar -shell_script test1.sh NM is run by yarn user. An exception is thrown because yarn user has no permissions on custom script in hdfs path. The custom script is created with distributed shell app. {code} Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException): Permission denied: user=yarn, access=WRITE, inode="/user/user1/DistributedShell/70":user1:user1:drwxr-xr-x at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkFsPermission(FSPermissionChecker.java:265) {code}
YARN-1907. Major bug reported by Mit Desai and fixed by Mit Desai
TestRMApplicationHistoryWriter#testRMWritingMassiveHistory runs slow and intermittently fails

The test has 10000 containers that it tries to cleanup. The cleanup has a timeout of 20000ms in which the test sometimes cannot do the cleanup completely and gives out an Assertion Failure.
YARN-1905. Trivial test reported by Chris Nauroth and fixed by Chris Nauroth (nodemanager)
TestProcfsBasedProcessTree must only run on Linux.

The tests in {{TestProcfsBasedProcessTree}} only make sense on Linux, where the process tree calculations are based on reading the /proc file system. Right now, not all of the individual tests are skipped when the OS is not Linux. This patch will make it consistent.
YARN-1903. Major bug reported by Zhijie Shen and fixed by Zhijie Shen
Killing Container on NEW and LOCALIZING will result in exitCode and diagnostics not set

The container status after stopping container is not expected. {code} java.lang.AssertionError: 4: at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.assertTrue(Assert.java:43) at org.apache.hadoop.yarn.client.api.impl.TestNMClient.testGetContainerStatus(TestNMClient.java:382) at org.apache.hadoop.yarn.client.api.impl.TestNMClient.testContainerManagement(TestNMClient.java:346) at org.apache.hadoop.yarn.client.api.impl.TestNMClient.testNMClient(TestNMClient.java:226) {code}
YARN-1898. Major sub-task reported by Yesha Vora and fixed by Xuan Gong (resourcemanager)
Standby RM's conf, stacks, logLevel, metrics, jmx and logs links are redirecting to Active RM

Standby RM links /conf, /stacks, /logLevel, /metrics, /jmx is redirected to Active RM. It should not be redirected to Active RM
YARN-1892. Minor improvement reported by Siddharth Seth and fixed by Jian He (scheduler)
Excessive logging in RM

Mostly in the CS I believe {code} INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt: Application application_1395435468498_0011 reserved container container_1395435468498_0011_01_000213 on node host: #containers=5 available=4096 used=20960, currently has 1 at priority 4; currentReservation 4096 {code} {code} INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: hive2 usedResources: <memory:20480, vCores:5> clusterResources: <memory:81920, vCores:16> currentCapacity 0.25 required <memory:4096, vCores:1> potentialNewCapacity: 0.255 ( max-capacity: 0.25) {code}
YARN-1883. Major bug reported by Mit Desai and fixed by Mit Desai
TestRMAdminService fails due to inconsistent entries in UserGroups

testRefreshUserToGroupsMappingsWithFileSystemBasedConfigurationProvider fails with the following error: {noformat} java.lang.AssertionError: null at org.junit.Assert.fail(Assert.java:92) at org.junit.Assert.assertTrue(Assert.java:43) at org.junit.Assert.assertTrue(Assert.java:54) at org.apache.hadoop.yarn.server.resourcemanager.TestRMAdminService.testRefreshUserToGroupsMappingsWithFileSystemBasedConfigurationProvider(TestRMAdminService.java:421) at org.apache.hadoop.yarn.server.resourcemanager.TestRMAdminService.testOrder(TestRMAdminService.java:104) {noformat} Line Numbers will be inconsistent as I was testing to run it in a particular order. But the Line on which the failure occurs is {code} Assert.assertTrue(groupBefore.contains("test_group_A") && groupBefore.contains("test_group_B") && groupBefore.contains("test_group_C") && groupBefore.size() == 3); {code} testRMInitialsWithFileSystemBasedConfigurationProvider() and testRefreshUserToGroupsMappingsWithFileSystemBasedConfigurationProvider() calls the function {{MockUnixGroupsMapping.updateGroups();}} which changes the list of userGroups. testRefreshUserToGroupsMappingsWithFileSystemBasedConfigurationProvider() tries to verify the groups before changing it and fails if testRMInitialsWithFileSystemBasedConfigurationProvider() already ran and made the changes.
YARN-1861. Blocker sub-task reported by Arpit Gupta and fixed by Karthik Kambatla (resourcemanager)
Both RM stuck in standby mode when automatic failover is enabled

In our HA tests we noticed that the tests got stuck because both RM's got into standby state and no one became active.
YARN-1837. Major bug reported by Tsuyoshi OZAWA and fixed by Hong Zhiguo
TestMoveApplication.testMoveRejectedByScheduler randomly fails

TestMoveApplication#testMoveRejectedByScheduler fails because of NullPointerException. It looks caused by unhandled exception handling at server-side.
YARN-1750. Major test reported by Ming Ma and fixed by Wangda Tan (nodemanager)
TestNodeStatusUpdater#testNMRegistration is incorrect in test case

This test case passes. However, the test output log has java.lang.AssertionError: Number of applications should only be one! expected:<1> but was:<2> at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.failNotEquals(Assert.java:647) at org.junit.Assert.assertEquals(Assert.java:128) at org.junit.Assert.assertEquals(Assert.java:472) at org.apache.hadoop.yarn.server.nodemanager.TestNodeStatusUpdater$MyResourceTracker.nodeHeartbeat(TestNodeStatusUpdater.java:267) at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl$1.run(NodeStatusUpdaterImpl.java:469) at java.lang.Thread.run(Thread.java:695) TestNodeStatusUpdater.java has invalid asserts. } else if (heartBeatID == 3) { // Checks on the RM end Assert.assertEquals("Number of applications should only be one!", 1, appToContainers.size()); Assert.assertEquals("Number of container for the app should be two!", 2, appToContainers.get(appId2).size()); We should fix the assert and add more check to the test.
YARN-1701. Major sub-task reported by Gera Shegalov and fixed by Tsuyoshi OZAWA
Improve default paths of timeline store and generic history store

When I enable AHS via yarn.ahs.enabled, the app history is still not visible in AHS webUI. This is due to NullApplicationHistoryStore as yarn.resourcemanager.history-writer.class. It would be good to have just one key to enable basic functionality. yarn.ahs.fs-history-store.uri uses {code}${hadoop.log.dir}{code}, which is local file system location. However, FileSystemApplicationHistoryStore uses DFS by default.
YARN-1696. Blocker sub-task reported by Karthik Kambatla and fixed by Tsuyoshi OZAWA (resourcemanager)
Document RM HA

Add documentation for RM HA. Marking this a blocker for 2.4 as this is required to call RM HA Stable and ready for public consumption.
YARN-1281. Major test reported by Karthik Kambatla and fixed by Tsuyoshi OZAWA (resourcemanager)
TestZKRMStateStoreZKClientConnections fails intermittently

The test fails intermittently - haven't been able to reproduce the failure deterministically.
YARN-1201. Minor bug reported by Nemon Lou and fixed by Wangda Tan (resourcemanager)
TestAMAuthorization fails with local hostname cannot be resolved

When hostname is 158-1-131-10, TestAMAuthorization fails. {code} Running org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization Tests run: 4, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 14.034 sec <<< FAILURE! - in org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization testUnauthorizedAccess[0](org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization) Time elapsed: 3.952 sec <<< ERROR! java.lang.NullPointerException: null at org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization.testUnauthorizedAccess(TestAMAuthorization.java:284) testUnauthorizedAccess[1](org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization) Time elapsed: 3.116 sec <<< ERROR! java.lang.NullPointerException: null at org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization.testUnauthorizedAccess(TestAMAuthorization.java:284) Results : Tests in error: TestAMAuthorization.testUnauthorizedAccess:284 NullPointer TestAMAuthorization.testUnauthorizedAccess:284 NullPointer Tests run: 4, Failures: 0, Errors: 2, Skipped: 0 {code}
MAPREDUCE-5843. Major test reported by Varun Vasudev and fixed by Varun Vasudev
TestMRKeyValueTextInputFormat failing on Windows
MAPREDUCE-5841. Major bug reported by Sangjin Lee and fixed by Sangjin Lee (mrv2)
uber job doesn't terminate on getting mapred job kill
MAPREDUCE-5835. Critical bug reported by Ming Ma and fixed by Ming Ma
Killing Task might cause the job to go to ERROR state
MAPREDUCE-5833. Major test reported by Zhijie Shen and fixed by Zhijie Shen
TestRMContainerAllocator fails ocassionally
MAPREDUCE-5832. Major bug reported by Jian He and fixed by Vinod Kumar Vavilapalli
Few tests in TestJobClient fail on Windows
MAPREDUCE-5830. Blocker bug reported by Jason Lowe and fixed by Akira AJISAKA
HostUtil.getTaskLogUrl is not backwards binary compatible with 2.3
MAPREDUCE-5828. Major bug reported by Vinod Kumar Vavilapalli and fixed by Vinod Kumar Vavilapalli
TestMapReduceJobControl fails on JDK 7 + Windows
MAPREDUCE-5827. Major bug reported by Zhijie Shen and fixed by Zhijie Shen
TestSpeculativeExecutionWithMRApp fails
MAPREDUCE-5826. Major bug reported by Varun Vasudev and fixed by Varun Vasudev
TestHistoryServerFileSystemStateStoreService.testTokenStore fails in windows
MAPREDUCE-5824. Major bug reported by Xuan Gong and fixed by Xuan Gong
TestPipesNonJavaInputFormat.testFormat fails in windows
MAPREDUCE-5821. Major bug reported by Todd Lipcon and fixed by Todd Lipcon (performance , task)
IFile merge allocates new byte array for every value
MAPREDUCE-5818. Major bug reported by Jian He and fixed by Jian He
hsadmin cmd is missing in mapred.cmd
MAPREDUCE-5815. Blocker bug reported by Gera Shegalov and fixed by Akira AJISAKA (client , mrv2)
Fix NPE in TestMRAppMaster
MAPREDUCE-5714. Major bug reported by Jinghui Wang and fixed by Jinghui Wang (test)
TestMRAppComponentDependencies causes surefire to exit without saying proper goodbye
MAPREDUCE-3191. Trivial bug reported by Todd Lipcon and fixed by Chen He
docs for map output compression incorrectly reference SequenceFile
HDFS-6527. Blocker bug reported by Kihwal Lee and fixed by Kihwal Lee
Edit log corruption due to defered INode removal
HDFS-6411. Major bug reported by Zhongyi Xie and fixed by Brandon Li (nfs)
nfs-hdfs-gateway mount raises I/O error and hangs when a unauthorized user attempts to access it
HDFS-6402. Trivial bug reported by Chris Nauroth and fixed by Chris Nauroth (namenode)
Suppress findbugs warning for failure to override equals and hashCode in FsAclPermission.
HDFS-6397. Critical bug reported by Mohammad Kamrul Islam and fixed by Mohammad Kamrul Islam
NN shows inconsistent value in deadnode count
HDFS-6362. Blocker bug reported by Arpit Agarwal and fixed by Arpit Agarwal (namenode)
InvalidateBlocks is inconsistent in usage of DatanodeUuid and StorageID
HDFS-6361. Major bug reported by Yongjun Zhang and fixed by Yongjun Zhang (nfs)
TestIdUserGroup.testUserUpdateSetting failed due to out of range nfsnobody Id
HDFS-6340. Blocker bug reported by Rahul Singhal and fixed by Rahul Singhal (datanode)
DN can't finalize upgrade
HDFS-6329. Blocker bug reported by Kihwal Lee and fixed by Kihwal Lee
WebHdfs does not work if HA is enabled on NN but logical URI is not configured.
HDFS-6326. Blocker bug reported by Daryn Sharp and fixed by Chris Nauroth (webhdfs)
WebHdfs ACL compatibility is broken
HDFS-6325. Major bug reported by Konstantin Shvachko and fixed by Keith Pak (namenode)
Append should fail if the last block has insufficient number of replicas

I have committed the fix to the trunk, branch-2, and branch-2.4 respectively. Thanks Keith!
HDFS-6313. Blocker bug reported by Daryn Sharp and fixed by Kihwal Lee (webhdfs)
WebHdfs may use the wrong NN when configured for multiple HA NNs
HDFS-6245. Major bug reported by Arpit Gupta and fixed by Arpit Agarwal
datanode fails to start with a bad disk even when failed volumes is set
HDFS-6236. Minor bug reported by Chris Nauroth and fixed by Chris Nauroth (namenode)
ImageServlet should use Time#monotonicNow to measure latency.
HDFS-6235. Trivial bug reported by Chris Nauroth and fixed by Chris Nauroth (namenode , test)
TestFileJournalManager can fail on Windows due to file locking if tests run out of order.
HDFS-6234. Trivial bug reported by Chris Nauroth and fixed by Chris Nauroth (datanode , test)
TestDatanodeConfig#testMemlockLimit fails on Windows due to invalid file path.
HDFS-6232. Major bug reported by Stephen Chu and fixed by Akira AJISAKA (tools)
OfflineEditsViewer throws a NPE on edits containing ACL modifications
HDFS-6231. Major bug reported by Chris Nauroth and fixed by Chris Nauroth (hdfs-client)
DFSClient hangs infinitely if using hedged reads and all eligible datanodes die.
HDFS-6229. Major bug reported by Jing Zhao and fixed by Jing Zhao (ha)
Race condition in failover can cause RetryCache fail to work
HDFS-6215. Minor bug reported by Kihwal Lee and fixed by Kihwal Lee
Wrong error message for upgrade
HDFS-6209. Minor bug reported by Arpit Agarwal and fixed by Arpit Agarwal (test)
Fix flaky test TestValidateConfigurationSettings.testThatDifferentRPCandHttpPortsAreOK
HDFS-6208. Major bug reported by Chris Nauroth and fixed by Chris Nauroth (datanode)
DataNode caching can leak file descriptors.
HDFS-6206. Major bug reported by Tsz Wo Nicholas Sze and fixed by Tsz Wo Nicholas Sze
DFSUtil.substituteForWildcardAddress may throw NPE
HDFS-6204. Minor bug reported by Tsz Wo Nicholas Sze and fixed by Tsz Wo Nicholas Sze (test)
TestRBWBlockInvalidation may fail
HDFS-6198. Major bug reported by Chris Nauroth and fixed by Chris Nauroth (datanode)
DataNode rolling upgrade does not correctly identify current block pool directory and replace with trash on Windows.
HDFS-6197. Minor bug reported by Chris Nauroth and fixed by Chris Nauroth (namenode)
Rolling upgrade rollback on Windows can fail attempting to rename edit log segment files to a destination that already exists.
HDFS-6189. Major test reported by Chris Nauroth and fixed by Chris Nauroth (test)
Multiple HDFS tests fail on Windows attempting to use a test root path containing a colon.
HDFS-4052. Minor improvement reported by Jing Zhao and fixed by Jing Zhao
BlockManager#invalidateWork should print logs outside the lock
HDFS-2882. Major bug reported by Todd Lipcon and fixed by Vinayakumar B (datanode)
DN continues to start up, even if block pool fails to initialize
HADOOP-10612. Major bug reported by Brandon Li and fixed by Brandon Li (nfs)
NFS failed to refresh the user group id mapping table
HADOOP-10562. Critical bug reported by Suresh Srinivas and fixed by Suresh Srinivas
Namenode exits on exception without printing stack trace in AbstractDelegationTokenSecretManager
HADOOP-10527. Major bug reported by Kihwal Lee and fixed by Kihwal Lee
Fix incorrect return code and allow more retries on EINTR
HADOOP-10522. Critical bug reported by Kihwal Lee and fixed by Kihwal Lee
JniBasedUnixGroupMapping mishandles errors
HADOOP-10490. Minor bug reported by Chris Nauroth and fixed by Chris Nauroth (test)
TestMapFile and TestBloomMapFile leak file descriptors.
HADOOP-10473. Minor bug reported by Tsz Wo Nicholas Sze and fixed by Tsz Wo Nicholas Sze (test)
TestCallQueueManager is still flaky
HADOOP-10466. Minor improvement reported by Nicolas Liochon and fixed by Nicolas Liochon (security)
Lower the log level in UserGroupInformation
HADOOP-10456. Major bug reported by Nishkam Ravi and fixed by Nishkam Ravi (conf)
Bug in Configuration.java exposed by Spark (ConcurrentModificationException)
HADOOP-10455. Major bug reported by Tsz Wo Nicholas Sze and fixed by Tsz Wo Nicholas Sze (ipc)
When there is an exception, ipc.Server should first check whether it is an terse exception
HADOOP-8826. Minor bug reported by Robert Joseph Evans and fixed by Mit Desai
Docs still refer to 0.20.205 as stable line

Hadoop 2.4.0 Release Notes

These release notes include new developer and user-facing incompatibilities, features, and major improvements.

Changes since Hadoop 2.3.0

YARN-1893. Blocker sub-task reported by Xuan Gong and fixed by Xuan Gong (resourcemanager)
Make ApplicationMasterProtocol#allocate AtMostOnce
YARN-1891. Minor task reported by Varun Vasudev and fixed by Varun Vasudev
Document NodeManager health-monitoring

Start documenting node manager starting with the health monitoring.
YARN-1873. Major bug reported by Mit Desai and fixed by Mit Desai
TestDistributedShell#testDSShell fails when the test cases are out of order

testDSShell fails when the tests are run in random order. I see a cleanup issue here. {noformat} Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 72.222 sec <<< FAILURE! - in org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell testOrder(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell) Time elapsed: 44.127 sec <<< FAILURE! java.lang.AssertionError: expected:<1> but was:<6> at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.failNotEquals(Assert.java:647) at org.junit.Assert.assertEquals(Assert.java:128) at org.junit.Assert.assertEquals(Assert.java:472) at org.junit.Assert.assertEquals(Assert.java:456) at org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:204) at org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testOrder(TestDistributedShell.java:134) Results : Failed tests: TestDistributedShell.testOrder:134->testDSShell:204 expected:<1> but was:<6> {noformat} The Line numbers will be little deviated because I was trying to reproduce the error by running the tests in specific order. But the Line that causes the assert fail is {{Assert.assertEquals(1, entitiesAttempts.getEntities().size());}}
YARN-1867. Blocker bug reported by Karthik Kambatla and fixed by Vinod Kumar Vavilapalli (resourcemanager)
NPE while fetching apps via the REST API

We ran into the following NPE when fetching applications using the REST API: {noformat} INTERNAL_SERVER_ERROR java.lang.NullPointerException at org.apache.hadoop.yarn.server.security.ApplicationACLsManager.checkAccess(ApplicationACLsManager.java:104) at org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices.hasAccess(RMWebServices.java:123) at org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices.getApps(RMWebServices.java:418) {noformat}
YARN-1866. Blocker bug reported by Arpit Gupta and fixed by Jian He
YARN RM fails to load state store with delegation token parsing error

In our secure Nightlies we saw exceptions in the RM log where it failed to parse the deletegation token.
YARN-1863. Blocker test reported by Ted Yu and fixed by Xuan Gong
TestRMFailover fails with 'AssertionError: null'

This happened in Hadoop-Yarn-trunk - Build # 516 and can be reproduced: {code} testWebAppProxyInStandAloneMode(org.apache.hadoop.yarn.client.TestRMFailover) Time elapsed: 5.834 sec <<< FAILURE! java.lang.AssertionError: null at org.junit.Assert.fail(Assert.java:92) at org.junit.Assert.assertTrue(Assert.java:43) at org.junit.Assert.assertTrue(Assert.java:54) at org.apache.hadoop.yarn.client.TestRMFailover.verifyExpectedException(TestRMFailover.java:250) at org.apache.hadoop.yarn.client.TestRMFailover.testWebAppProxyInStandAloneMode(TestRMFailover.java:216) testEmbeddedWebAppProxy(org.apache.hadoop.yarn.client.TestRMFailover) Time elapsed: 5.341 sec <<< FAILURE! java.lang.AssertionError: null at org.junit.Assert.fail(Assert.java:92) at org.junit.Assert.assertTrue(Assert.java:43) at org.junit.Assert.assertTrue(Assert.java:54) at org.apache.hadoop.yarn.client.TestRMFailover.verifyExpectedException(TestRMFailover.java:250) at org.apache.hadoop.yarn.client.TestRMFailover.testEmbeddedWebAppProxy(TestRMFailover.java:241) {code}
YARN-1859. Major bug reported by Zhijie Shen and fixed by Zhijie Shen
WebAppProxyServlet will throw ApplicationNotFoundException if the app is no longer cached in RM

WebAppProxyServlet checks null to determine whether the application is not found or not. {code} ApplicationReport applicationReport = getApplicationReport(id); if(applicationReport == null) { LOG.warn(req.getRemoteUser()+" Attempting to access "+id+ " that was not found"); {code} However, WebAppProxyServlet calls AppReportFetcher, which consequently calls ClientRMService. When application is not found, ClientRMService throws ApplicationNotFoundException. Therefore, in WebAppProxyServlet, the following logic to create the tracking url for a non-cached app will no longer be in use.
YARN-1855. Critical test reported by Ted Yu and fixed by Zhijie Shen
TestRMFailover#testRMWebAppRedirect fails in trunk

From https://builds.apache.org/job/Hadoop-Yarn-trunk/514/console : {code} testRMWebAppRedirect(org.apache.hadoop.yarn.client.TestRMFailover) Time elapsed: 5.39 sec <<< ERROR! java.lang.NullPointerException: null at org.apache.hadoop.yarn.client.TestRMFailover.testRMWebAppRedirect(TestRMFailover.java:269) {code}
YARN-1854. Blocker test reported by Mit Desai and fixed by Rohith
Race condition in TestRMHA#testStartAndTransitions

There is race in test. TestRMHA#testStartAndTransitions calls verifyClusterMetrics() immediately after application is submitted, but QueueMetrics are updated after app attempt is sheduled. Calling verifyClusterMetrics() without verifying app attempt is in Scheduled state cause random test failures. MockRM.submitApp() return when application is in ACCEPTED, but QueueMetrics updated at APP_ATTEMPT_ADDED event. There is high chance of getting queue metrics before app attempt is Scheduled. {noformat} testStartAndTransitions(org.apache.hadoop.yarn.server.resourcemanager.TestRMHA) Time elapsed: 5.883 sec <<< FAILURE! java.lang.AssertionError: Incorrect value for metric availableMB expected:<2048> but was:<4096> at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.failNotEquals(Assert.java:647) at org.junit.Assert.assertEquals(Assert.java:128) at org.junit.Assert.assertEquals(Assert.java:472) at org.apache.hadoop.yarn.server.resourcemanager.TestRMHA.assertMetric(TestRMHA.java:396) at org.apache.hadoop.yarn.server.resourcemanager.TestRMHA.verifyClusterMetrics(TestRMHA.java:387) at org.apache.hadoop.yarn.server.resourcemanager.TestRMHA.testStartAndTransitions(TestRMHA.java:160) Results : Failed tests: TestRMHA.testStartAndTransitions:160->verifyClusterMetrics:387->assertMetric:396 Incorrect value for metric availableMB expected:<2048> but was:<4096> {noformat}
YARN-1852. Major bug reported by Rohith and fixed by Rohith (resourcemanager)
Application recovery throws InvalidStateTransitonException for FAILED and KILLED jobs

Recovering for failed/killed application throw InvalidStateTransitonException. These are logged during recovery of applications.
YARN-1850. Major sub-task reported by Zhijie Shen and fixed by Zhijie Shen
Make enabling timeline service configurable

Like generic history service, we'd better to make enabling timeline service configurable, in case the timeline server is not up
YARN-1849. Blocker bug reported by Karthik Kambatla and fixed by Karthik Kambatla (resourcemanager)
NPE in ResourceTrackerService#registerNodeManager for UAM

While running an UnmanagedAM on secure cluster, ran into an NPE on failover/restart. This is similar to YARN-1821.
YARN-1846. Major bug reported by Robert Kanter and fixed by Robert Kanter
TestRM#testNMTokenSentForNormalContainer assumes CapacityScheduler

TestRM.testNMTokenSentForNormalContainer assumes the CapacityScheduler is being used and tries to do: {code:java} CapacityScheduler cs = (CapacityScheduler) rm.getResourceScheduler(); {code} This throws a {{ClassCastException}} if you're not using the CapacityScheduler.
YARN-1839. Critical bug reported by Tassapol Athiapinya and fixed by Jian He (applications , capacityscheduler)
Capacity scheduler preempts an AM out. AM attempt 2 fails to launch task container with SecretManager$InvalidToken: No NMToken sent

Use single-node cluster. Turn on capacity scheduler preemption. Run MR sleep job as app 1. Take entire cluster. Run MR sleep job as app 2. Preempt app1 out. Wait till app 2 finishes. App 1 AM attempt 2 will start. It won't be able to launch a task container with this error stack trace in AM logs: {code} 2014-03-13 20:13:50,254 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1394741557066_0001_m_000000_1009: Container launch failed for container_1394741557066_0001_02_000021 : org.apache.hadoop.security.token.SecretManager$InvalidToken: No NMToken sent for <host>:45454 at org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData.newProxy(ContainerManagementProtocolProxy.java:206) at org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData.<init>(ContainerManagementProtocolProxy.java:196) at org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy.getProxy(ContainerManagementProtocolProxy.java:117) at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl.getCMProxy(ContainerLauncherImpl.java:403) at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:138) at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:369) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:722) {code}
YARN-1838. Major sub-task reported by Srimanth Gunturi and fixed by Billie Rinaldi
Timeline service getEntities API should provide ability to get entities from given id

To support pagination, we need ability to get entities from a certain ID by providing a new param called {{fromid}}. For example on a page of 10 jobs, our first call will be like [http://server:8188/ws/v1/timeline/HIVE_QUERY_ID?fields=events,primaryfilters,otherinfo&limit=11] When user hits next, we would like to call [http://server:8188/ws/v1/timeline/HIVE_QUERY_ID?fields=events,primaryfilters,otherinfo&fromid=JID11&limit=11] and continue on for further _Next_ clicks On hitting back, we will make similar calls for previous items [http://server:8188/ws/v1/timeline/HIVE_QUERY_ID?fields=events,primaryfilters,otherinfo&fromid=JID1&limit=11] {{fromid}} should be inclusive of the id given.
YARN-1833. Major bug reported by Mit Desai and fixed by Mit Desai
TestRMAdminService Fails in trunk and branch-2 : Assert Fails due to different count of UserGroups for currentUser()

In the test testRefreshUserToGroupsMappingsWithFileSystemBasedConfigurationProvider, the following assert is not needed. {code} Assert.assertTrue(groupWithInit.size() != groupBefore.size()); {code} As the assert takes the default groups for groupWithInit (which in my case are users, sshusers and wheel), it fails as the size of both groupWithInit and groupBefore are same. I do not think we need to have this assert here. Moreover we are also checking that the groupInit does not have the userGroups that are in the groupBefore so removing the assert may not be harmful.
YARN-1830. Major bug reported by Karthik Kambatla and fixed by Zhijie Shen (resourcemanager)
TestRMRestart.testQueueMetricsOnRMRestart failure

TestRMRestart.testQueueMetricsOnRMRestart fails intermittently as follows (reported on YARN-1815): {noformat} java.lang.AssertionError: expected:<37> but was:<38> ... at org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.assertQueueMetrics(TestRMRestart.java:1728) at org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testQueueMetricsOnRMRestart(TestRMRestart.java:1682) {noformat}
YARN-1824. Major bug reported by Jian He and fixed by Jian He
Make Windows client work with Linux/Unix cluster
YARN-1821. Blocker sub-task reported by Karthik Kambatla and fixed by Karthik Kambatla (resourcemanager)
NPE on registerNodeManager if the request has containers for UnmanagedAMs

On RM restart (or failover), NM re-registers with the RM. If it was running containers for Unmanaged AMs, it runs into the following NPE: {noformat} Caused by: org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService.registerNodeManager(ResourceTrackerService.java:213) at org.apache.hadoop.yarn.server.api.impl.pb.service.ResourceTrackerPBServiceImpl.registerNodeManager(ResourceTrackerPBServiceImpl.java:54) {noformat}
YARN-1816. Major sub-task reported by Arpit Gupta and fixed by Jian He
Succeeded application remains in accepted after RM restart

{code} 2014-03-10 18:07:31,944|beaver.machine|INFO|Application-Id Application-Name Application-Type User Queue State Final-State Progress Tracking-URL 2014-03-10 18:07:31,945|beaver.machine|INFO|application_1394449508064_0008 test_mapred_ha_multiple_job_nn-rm-1-min-5-jobs_1394449960-4 MAPREDUCE hrt_qa default ACCEPTED SUCCEEDED 100% http://hostname:19888/jobhistory/job/job_1394449508064_0008 2014-03-10 18:08:02,125|beaver.machine|INFO|RUNNING: /usr/bin/yarn application -list -appStates NEW,NEW_SAVING,SUBMITTED,ACCEPTED,RUNNING 2014-03-10 18:08:03,198|beaver.machine|INFO|14/03/10 18:08:03 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm2 2014-03-10 18:08:03,238|beaver.machine|INFO|Total number of applications (application-types: [] and states: [NEW, NEW_SAVING, SUBMITTED, ACCEPTED, RUNNING]):1 2014-03-10 18:08:03,239|beaver.machine|INFO|Application-Id Application-Name Application-Type User Queue State Final-State Progress Tracking-URL 2014-03-10 18:08:03,239|beaver.machine|INFO|application_1394449508064_0008 test_mapred_ha_multiple_job_nn-rm-1-min-5-jobs_1394449960-4 MAPREDUCE hrt_qa default ACCEPTED SUCCEEDED 100% http://hostname:19888/jobhistory/job/job_1394449508064_0008 2014-03-10 18:08:33,390|beaver.machine|INFO|RUNNING: /usr/bin/yarn application -list -appStates NEW,NEW_SAVING,SUBMITTED,ACCEPTED,RUNNING 2014-03-10 18:08:34,437|beaver.machine|INFO|14/03/10 18:08:34 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm2 2014-03-10 18:08:34,477|beaver.machine|INFO|Total number of applications (application-types: [] and states: [NEW, NEW_SAVING, SUBMITTED, ACCEPTED, RUNNING]):1 2014-03-10 18:08:34,477|beaver.machine|INFO|Application-Id Application-Name Application-Type User Queue State Final-State Progress Tracking-URL 2014-03-10 18:08:34,478|beaver.machine|INFO|application_1394449508064_0008 test_mapred_ha_multiple_job_nn-rm-1-min-5-jobs_1394449960-4 MAPREDUCE hrt_qa default ACCEPTED SUCCEEDED 100% http://hostname:19888/jobhistory/job/job_1394449508064_0008 2014-03-10 18:09:04,628|beaver.machine|INFO|RUNNING: /usr/bin/yarn application -list -appStates NEW,NEW_SAVING,SUBMITTED,ACCEPTED,RUNNING 2014-03-10 18:09:05,688|beaver.machine|INFO|14/03/10 18:09:05 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm2 2014-03-10 18:09:05,728|beaver.machine|INFO|Total number of applications (application-types: [] and states: [NEW, NEW_SAVING, SUBMITTED, ACCEPTED, RUNNING]):1 2014-03-10 18:09:05,728|beaver.machine|INFO|Application-Id Application-Name Application-Type User Queue State Final-State Progress Tracking-URL 2014-03-10 18:09:05,729|beaver.machine|INFO|application_1394449508064_0008 test_mapred_ha_multiple_job_nn-rm-1-min-5-jobs_1394449960-4 MAPREDUCE hrt_qa default ACCEPTED SUCCEEDED 100% http://hostname:19888/jobhistory/job/job_1394449508064_0008 2014-03-10 18:09:35,879|beaver.machine|INFO|RUNNING: /usr/bin/yarn application -list -appStates NEW,NEW_SAVING,SUBMITTED,ACCEPTED,RUNNING 2014-03-10 18:09:36,951|beaver.machine|INFO|14/03/10 18:09:36 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm2 2014-03-10 18:09:36,992|beaver.machine|INFO|Total number of applications (application-types: [] and states: [NEW, NEW_SAVING, SUBMITTED, ACCEPTED, RUNNING]):1 2014-03-10 18:09:36,993|beaver.machine|INFO|Application-Id Application-Name Application-Type User Queue State Final-State Progress Tracking-URL 2014-03-10 18:09:36,993|beaver.machine|INFO|application_1394449508064_0008 test_mapred_ha_multiple_job_nn-rm-1-min-5-jobs_1394449960-4 MAPREDUCE hrt_qa default ACCEPTED SUCCEEDED 100% http://hostname:19888/jobhistory/job/job_1394449508064_0008 2014-03-10 18:10:07,142|beaver.machine|INFO|RUNNING: /usr/bin/yarn application -list -appStates NEW,NEW_SAVING,SUBMITTED,ACCEPTED,RUNNING 2014-03-10 18:10:08,201|beaver.machine|INFO|14/03/10 18:10:08 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm2 2014-03-10 18:10:08,242|beaver.machine|INFO|Total number of applications (application-types: [] and states: [NEW, NEW_SAVING, SUBMITTED, ACCEPTED, RUNNING]):1 2014-03-10 18:10:08,242|beaver.machine|INFO|Application-Id Application-Name Application-Type User Queue State Final-State Progress Tracking-URL 2014-03-10 18:10:08,242|beaver.machine|INFO|application_1394449508064_0008 test_mapred_ha_multiple_job_nn-rm-1-min-5-jobs_1394449960-4 MAPREDUCE hrt_qa default ACCEPTED SUCCEEDED 100% http://hostname:19888/jobhistory/job/job_1394449508064_0008 2014-03-10 18:10:38,392|beaver.machine|INFO|RUNNING: /usr/bin/yarn application -list -appStates NEW,NEW_SAVING,SUBMITTED,ACCEPTED,RUNNING 2014-03-10 18:10:39,443|beaver.machine|INFO|14/03/10 18:10:39 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm2 2014-03-10 18:10:39,484|beaver.machine|INFO|Total number of applications (application-types: [] and states: [NEW, NEW_SAVING, SUBMITTED, ACCEPTED, RUNNING]):1 2014-03-10 18:10:39,484|beaver.machine|INFO|Application-Id Application-Name Application-Type User Queue State Final-State Progress Tracking-URL 2014-03-10 18:10:39,485|beaver.machine|INFO|application_1394449508064_0008 test_mapred_ha_multiple_job_nn-rm-1-min-5-jobs_1394449960-4 MAPREDUCE hrt_qa default ACCEPTED SUCCEEDED 100% http://hostname:19888/jobhistory/job/job_1394449508064_0008 {code}
YARN-1812. Major sub-task reported by Yesha Vora and fixed by Jian He
Job stays in PREP state for long time after RM Restarts

Steps followed: 1) start a sort job with 80 maps and 5 reducers 2) restart Resource manager when 60 maps and 0 reducers are finished 3) Wait for job to come out of PREP state. The job does not come out of PREP state after 7-8 mins. After waiting for 7-8 mins, test kills the job. However, Sort job should not take this long time to come out of PREP state
YARN-1811. Major sub-task reported by Robert Kanter and fixed by Robert Kanter (resourcemanager)
RM HA: AM link broken if the AM is on nodes other than RM

When using RM HA, if you click on the "Application Master" link in the RM web UI while the job is running, you get an Error 500:
YARN-1800. Critical sub-task reported by Paul Isaychuk and fixed by Varun Vasudev (nodemanager)
YARN NodeManager with java.util.concurrent.RejectedExecutionException

Noticed this on tests running on Apache Hadoop 2.2 cluster {code} 2014-01-23 01:30:28,575 INFO localizer.LocalizedResource (LocalizedResource.java:handle(196)) - Resource hdfs://colo-2:8020/user/fertrist/oozie-oozi/0000605-140114233013619-oozie-oozi-W/aggregator--map-reduce/map-reduce-launcher.jar transitioned from INIT to DOWNLOADING 2014-01-23 01:30:28,575 INFO localizer.LocalizedResource (LocalizedResource.java:handle(196)) - Resource hdfs://colo-2:8020/user/fertrist/.staging/job_1389742077466_0396/job.splitmetainfo transitioned from INIT to DOWNLOADING 2014-01-23 01:30:28,575 INFO localizer.LocalizedResource (LocalizedResource.java:handle(196)) - Resource hdfs://colo-2:8020/user/fertrist/.staging/job_1389742077466_0396/job.split transitioned from INIT to DOWNLOADING 2014-01-23 01:30:28,575 INFO localizer.LocalizedResource (LocalizedResource.java:handle(196)) - Resource hdfs://colo-2:8020/user/fertrist/.staging/job_1389742077466_0396/job.xml transitioned from INIT to DOWNLOADING 2014-01-23 01:30:28,576 INFO localizer.ResourceLocalizationService (ResourceLocalizationService.java:addResource(651)) - Downloading public rsrc:{ hdfs://colo-2:8020/user/fertrist/oozie-oozi/0000605-140114233013619-oozie-oozi-W/aggregator--map-reduce/map-reduce-launcher.jar, 1390440627435, FILE, null } 2014-01-23 01:30:28,576 FATAL event.AsyncDispatcher (AsyncDispatcher.java:dispatch(141)) - Error in dispatcher thread java.util.concurrent.RejectedExecutionException at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:1768) at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:767) at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:658) at java.util.concurrent.ExecutorCompletionService.submit(ExecutorCompletionService.java:152) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$PublicLocalizer.addResource(ResourceLocalizationService.java:678) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.handle(ResourceLocalizationService.java:583) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.handle(ResourceLocalizationService.java:525) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:134) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:81) at java.lang.Thread.run(Thread.java:662) 2014-01-23 01:30:28,577 INFO event.AsyncDispatcher (AsyncDispatcher.java:dispatch(144)) - Exiting, bbye.. 2014-01-23 01:30:28,596 INFO mortbay.log (Slf4jLog.java:info(67)) - Stopped SelectChannelConnector@0.0.0.0:50060 2014-01-23 01:30:28,597 INFO containermanager.ContainerManagerImpl (ContainerManagerImpl.java:cleanUpApplicationsOnNMShutDown(328)) - Applications still running : [application_1389742077466_0396] 2014-01-23 01:30:28,597 INFO containermanager.ContainerManagerImpl (ContainerManagerImpl.java:cleanUpApplicationsOnNMShutDown(336)) - Wa {code}
YARN-1793. Critical bug reported by Karthik Kambatla and fixed by Karthik Kambatla (resourcemanager)
yarn application -kill doesn't kill UnmanagedAMs

Trying to kill an Unmanaged AM though CLI (yarn application -kill <id>) logs a success, but doesn't actually kill the AM or reclaim the containers allocated to it.
YARN-1789. Minor improvement reported by Akira AJISAKA and fixed by Tsuyoshi OZAWA (resourcemanager)
ApplicationSummary does not escape newlines in the app name

YARN-side of MAPREDUCE-5778. ApplicationSummary is not escaping newlines in the app name. This can result in an application summary log entry that spans multiple lines when users are expecting one-app-per-line output.
YARN-1788. Critical bug reported by Tassapol Athiapinya and fixed by Varun Vasudev (resourcemanager)
AppsCompleted/AppsKilled metric is incorrect when MR job is killed with yarn application -kill

Run MR sleep job. Kill the application in RUNNING state. Observe RM metrics. Expecting AppsCompleted = 0/AppsKilled = 1 Actual is AppsCompleted = 1/AppsKilled = 0
YARN-1787. Major sub-task reported by Zhijie Shen and fixed by Zhijie Shen
yarn applicationattempt/container print wrong usage information

yarn applicationattempt prints: {code} Invalid Command Usage : usage: application -appStates <States> Works with -list to filter applications based on input comma-separated list of application states. The valid application state can be one of the following: ALL,NEW,NEW_SAVING,SUBMITTED,ACCEPTED,RUN NING,FINISHED,FAILED,KILLED -appTypes <Types> Works with -list to filter applications based on input comma-separated list of application types. -help Displays help for all commands. -kill <Application ID> Kills the application. -list <arg> List application attempts for aplication from AHS. -movetoqueue <Application ID> Moves the application to a different queue. -queue <Queue Name> Works with the movetoqueue command to specify which queue to move an application to. -status <Application ID> Prints the status of the application. {code} yarn container prints: {code} Invalid Command Usage : usage: application -appStates <States> Works with -list to filter applications based on input comma-separated list of application states. The valid application state can be one of the following: ALL,NEW,NEW_SAVING,SUBMITTED,ACCEPTED,RUN NING,FINISHED,FAILED,KILLED -appTypes <Types> Works with -list to filter applications based on input comma-separated list of application types. -help Displays help for all commands. -kill <Application ID> Kills the application. -list <arg> List application attempts for aplication from AHS. -movetoqueue <Application ID> Moves the application to a different queue. -queue <Queue Name> Works with the movetoqueue command to specify which queue to move an application to. -status <Application ID> Prints the status of the application. {code} Both commands print irrelevant yarn application usage information.
YARN-1785. Major bug reported by bc Wong and fixed by bc Wong
FairScheduler treats app lookup failures as ERRORs

When invoking the /ws/v1/cluster/apps endpoint, RM will eventually get to RMAppImpl#createAndGetApplicationReport, which calls RMAppAttemptImpl#getApplicationResourceUsageReport, which looks up the app in the scheduler, which may or may not exist. So FairScheduler shouldn't log an error for every lookup failure: {noformat} 2014-02-17 08:23:21,240 ERROR org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Request for appInfo of unknown attemptappattempt_1392419715319_0135_000001 {noformat}
YARN-1783. Critical bug reported by Arpit Gupta and fixed by Jian He
yarn application does not make any progress even when no other application is running when RM is being restarted in the background

Noticed that during HA tests some tests took over 3 hours to run when the test failed. Looking at the logs i see the application made no progress for a very long time. However if i look at application log from yarn it actually ran in 5 mins I am seeing same behavior when RM was being restarted in the background and when both RM and AM were being restarted. This does not happen for all applications but a few will hit this in the nightly run.
YARN-1781. Major sub-task reported by Varun Vasudev and fixed by Varun Vasudev (nodemanager)
NM should allow users to specify max disk utilization for local disks

This is related to YARN-257(it's probably a sub task?). Currently, the NM does not detect full disks and allows full disks to be used by containers leading to repeated failures. YARN-257 deals with graceful handling of full disks. This ticket is only about detection of full disks by the disk health checkers. The NM should allow users to set a maximum disk utilization for local disks and mark disks as bad once they exceed that utilization. At the very least, the NM should at least detect full disks.
YARN-1780. Major sub-task reported by Zhijie Shen and fixed by Zhijie Shen
Improve logging in timeline service

It's difficult to trace whether the client has successfully posted the entity to the timeline service or not.
YARN-1776. Major sub-task reported by Zhijie Shen and fixed by Zhijie Shen
renewDelegationToken should survive RM failover

When a delegation token is renewed, two RMStateStore operations: 1) removing the old DT, and 2) storing the new DT will happen. If RM fails in between. There would be problem.
YARN-1775. Major sub-task reported by Rajesh Balamohan and fixed by Rajesh Balamohan (nodemanager)
Create SMAPBasedProcessTree to get PSS information

Create SMAPBasedProcessTree (by extending ProcfsBasedProcessTree), which will make use of PSS for computing the memory usage.
YARN-1774. Blocker bug reported by Anubhav Dhoot and fixed by Anubhav Dhoot (resourcemanager)
FS: Submitting to non-leaf queue throws NPE

If you create a hierarchy of queues and assign a job to parent queue, FairScheduler quits with a NPE.
YARN-1771. Critical improvement reported by Sangjin Lee and fixed by Sangjin Lee (nodemanager)
many getFileStatus calls made from node manager for localizing a public distributed cache resource

We're observing that the getFileStatus calls are putting a fair amount of load on the name node as part of checking the public-ness for localizing a resource that belong in the public cache. We see 7 getFileStatus calls made for each of these resource. We should look into reducing the number of calls to the name node. One example: {noformat} 2014-02-27 18:07:27,351 INFO audit: ... cmd=getfileinfo src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ... 2014-02-27 18:07:27,352 INFO audit: ... cmd=getfileinfo src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ... 2014-02-27 18:07:27,352 INFO audit: ... cmd=getfileinfo src=/tmp/temp-887708724/tmp883330348 ... 2014-02-27 18:07:27,353 INFO audit: ... cmd=getfileinfo src=/tmp/temp-887708724 ... 2014-02-27 18:07:27,353 INFO audit: ... cmd=getfileinfo src=/tmp ... 2014-02-27 18:07:27,354 INFO audit: ... cmd=getfileinfo src=/ ... 2014-02-27 18:07:27,354 INFO audit: ... cmd=getfileinfo src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ... 2014-02-27 18:07:27,355 INFO audit: ... cmd=open src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ... {noformat}
YARN-1768. Minor bug reported by Hitesh Shah and fixed by Tsuyoshi OZAWA (client)
yarn kill non-existent application is too verbose

Instead of catching ApplicationNotFound and logging a simple app not found message, the whole stack trace is logged.
YARN-1766. Major sub-task reported by Xuan Gong and fixed by Xuan Gong
When RM does the initiation, it should use loaded Configuration instead of bootstrap configuration.

Right now, we have FileSystemBasedConfigurationProvider to let Users upload the configurations into remote File System, and let different RMs share the same configurations. During the initiation, RM will load the configurations from Remote File System. So when RM initiates the services, it should use the loaded Configurations instead of using the bootstrap configurations.
YARN-1765. Major sub-task reported by Xuan Gong and fixed by Xuan Gong
Write test cases to verify that killApplication API works in RM HA
YARN-1764. Major sub-task reported by Xuan Gong and fixed by Xuan Gong
Handle RM fail overs after the submitApplication call.
YARN-1761. Major sub-task reported by Xuan Gong and fixed by Xuan Gong
RMAdminCLI should check whether HA is enabled before executes transitionToActive/transitionToStandby
YARN-1760. Trivial bug reported by Karthik Kambatla and fixed by Karthik Kambatla
TestRMAdminService assumes CapacityScheduler

YARN-1611 adds TestRMAdminService which assumes the use of CapacityScheduler. {noformat} java.lang.ClassCastException: org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler cannot be cast to org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler at org.apache.hadoop.yarn.server.resourcemanager.TestRMAdminService.testAdminRefreshQueuesWithFileSystemBasedConfigurationProvider(TestRMAdminService.java:115) {noformat}
YARN-1758. Blocker bug reported by Hitesh Shah and fixed by Xuan Gong
MiniYARNCluster broken post YARN-1666

NPE seen when trying to use MiniYARNCluster
YARN-1752. Major bug reported by Jian He and fixed by Rohith
Unexpected Unregistered event at Attempt Launched state

{code} 2014-02-21 14:56:03,453 ERROR org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: Can't handle this event at current state org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: UNREGISTERED at LAUNCHED at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:647) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:103) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:733) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:714) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) at java.lang.Thread.run(Thread.java:695) {code}
YARN-1749. Major sub-task reported by Zhijie Shen and fixed by Zhijie Shen
Review AHS configs and sync them up with the timeline-service configs

We need to: 1. Review the configuration names and default values 2. Combine the two store class configurations Some other thoughts: 1. Maybe we don't need null implementation of ApplicationHistoryStore any more 2. Maybe if yarn.ahs.enabled = false, we should stop AHS web server returning historic information
YARN-1748. Blocker bug reported by Sravya Tirukkovalur and fixed by Sravya Tirukkovalur
hadoop-yarn-server-tests packages core-site.xml breaking downstream tests

Jars should not package config files, as this might come into the classpaths of clients causing the clients to break.
YARN-1742. Trivial bug reported by Akira AJISAKA and fixed by Akira AJISAKA (documentation)
Fix javadoc of parameter DEFAULT_NM_MIN_HEALTHY_DISKS_FRACTION

In YarnConfiguration.java, {code} /** * By default, at least 5% of disks are to be healthy to say that the node * is healthy in terms of disks. */ public static final float DEFAULT_NM_MIN_HEALTHY_DISKS_FRACTION = 0.25F; {code} 25% is the correct.
YARN-1734. Critical sub-task reported by Xuan Gong and fixed by Xuan Gong
RM should get the updated Configurations when it transits from Standby to Active

Currently, we have ConfigurationProvider which can support LocalConfiguration, and FileSystemBasedConfiguration. When HA is enabled, and FileSystemBasedConfiguration is enabled, RM can not get the updated Configurations when it transits from Standby to Active
YARN-1732. Major sub-task reported by Billie Rinaldi and fixed by Billie Rinaldi
Change types of related entities and primary filters in ATSEntity

The current types Map<String, List<String>> relatedEntities and Map<String, Object> primaryFilters have issues. The List<String> value of the related entities map could have multiple identical strings in it, which doesn't make sense. A more major issue is that we cannot allow primary filter values to be overwritten, because otherwise we will be unable to find those primary filter entries when we want to delete an entity (without doing a nearly full scan). I propose changing related entities to Map<String, Set<String>> and primary filters to Map<String, Set<Object>>. The basic methods to add primary filters and related entities are of the form add(key, value) and will not need to change.
YARN-1730. Major sub-task reported by Billie Rinaldi and fixed by Billie Rinaldi
Leveldb timeline store needs simple write locking

Although the leveldb writes are performed atomically in a batch, a start time for the entity needs to identified before each write. Thus a per-entity write lock should be acquired.
YARN-1729. Major sub-task reported by Billie Rinaldi and fixed by Billie Rinaldi
TimelineWebServices always passes primary and secondary filters as strings

Primary filters and secondary filter values can be arbitrary json-compatible Object. The web services should determine if the filters specified as query parameters are objects or strings before passing them to the store.
YARN-1724. Critical bug reported by Sandy Ryza and fixed by Sandy Ryza (scheduler)
Race condition in Fair Scheduler when continuous scheduling is turned on

If nodes resource allocations change during Collections.sort(nodeIdList, nodeAvailableResourceComparator); we'll hit: java.lang.IllegalArgumentException: Comparison method violates its general contract!
YARN-1721. Critical bug reported by Sandy Ryza and fixed by Sandy Ryza (scheduler)
When moving app between queues in Fair Scheduler, grab lock on FSSchedulerApp

FairScheduler.moveApplication should grab lock on FSSchedulerApp, so that allocate() can't be modifying it at the same time.
YARN-1719. Major sub-task reported by Billie Rinaldi and fixed by Billie Rinaldi
ATSWebServices produces jersey warnings

These don't appear to affect how the web services work, but the following warnings are logged: {noformat} WARNING: The following warnings have been detected with resource and/or provider classes: WARNING: A sub-resource method, public org.apache.hadoop.yarn.server.applicati onhistoryservice.webapp.ATSWebServices$AboutInfo org.apache.hadoop.yarn.server.a pplicationhistoryservice.webapp.ATSWebServices.about(javax.servlet.http.HttpServ letRequest,javax.servlet.http.HttpServletResponse), with URI template, "/", is t reated as a resource method WARNING: A sub-resource method, public org.apache.hadoop.yarn.api.records.appt imeline.ATSPutErrors org.apache.hadoop.yarn.server.applicationhistoryservice.web app.ATSWebServices.postEntities(javax.servlet.http.HttpServletRequest,javax.serv let.http.HttpServletResponse,org.apache.hadoop.yarn.api.records.apptimeline.ATSE ntities), with URI template, "/", is treated as a resource method {noformat}
YARN-1717. Major sub-task reported by Billie Rinaldi and fixed by Billie Rinaldi
Enable offline deletion of entries in leveldb timeline store

The leveldb timeline store implementation needs the following: * better documentation of its internal structures * internal changes to enable deleting entities ** never overwrite existing primary filter entries ** add hidden reverse pointers to related entities
YARN-1706. Major sub-task reported by Zhijie Shen and fixed by Zhijie Shen
Create an utility function to dump timeline records to json

For verification and log purpose
YARN-1704. Blocker sub-task reported by Billie Rinaldi and fixed by Billie Rinaldi
Review LICENSE and NOTICE to reflect new levelDB releated libraries being used

Make any changes necessary in LICENSE and NOTICE related to dependencies introduced by the application timeline store.
YARN-1698. Major sub-task reported by Zhijie Shen and fixed by Zhijie Shen
Replace MemoryApplicationTimelineStore with LeveldbApplicationTimelineStore as default
YARN-1697. Major bug reported by Sandy Ryza and fixed by Sandy Ryza (nodemanager)
NodeManager reports negative running containers

We're seeing the NodeManager metrics report a negative number of running containers.
YARN-1692. Major bug reported by Sangjin Lee and fixed by Sangjin Lee (scheduler)
ConcurrentModificationException in fair scheduler AppSchedulable

We saw a ConcurrentModificationException thrown in the fair scheduler: {noformat} 2014-02-07 01:40:01,978 ERROR org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Exception in fair scheduler UpdateThread java.util.ConcurrentModificationException at java.util.HashMap$HashIterator.nextEntry(HashMap.java:926) at java.util.HashMap$ValueIterator.next(HashMap.java:954) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AppSchedulable.updateDemand(AppSchedulable.java:85) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.updateDemand(FSLeafQueue.java:125) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.updateDemand(FSParentQueue.java:82) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.update(FairScheduler.java:217) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$UpdateThread.run(FairScheduler.java:195) at java.lang.Thread.run(Thread.java:724) {noformat} The map that gets returned by FSSchedulerApp.getResourceRequests() are iterated on without proper synchronization.
YARN-1690. Major sub-task reported by Mayank Bansal and fixed by Mayank Bansal
Sending timeline entities+events from Distributed shell
YARN-1689. Critical bug reported by Deepesh Khandelwal and fixed by Vinod Kumar Vavilapalli (resourcemanager)
RMAppAttempt is not killed when RMApp is at ACCEPTED

When running some Hive on Tez jobs, the RM after a while gets into an unusable state where no jobs run. In the RM log I see the following exception: {code} 2014-02-04 20:28:08,553 WARN ipc.Server (Server.java:run(1978)) - IPC Server handler 0 on 8030, call org.apache.hadoop.yarn.api.ApplicationMasterProtocolPB.registerApplicationMaster from 172.18.145.156:40474 Call#0 Retry#0: error: java.lang.NullPointerException java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.getTransferredContainers(AbstractYarnScheduler.java:48) at org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.registerApplicationMaster(ApplicationMasterService.java:278) at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.registerApplicationMaster(ApplicationMasterProtocolPBServiceImpl.java:90) at org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:95) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1962) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1958) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1956) ...... 2014-02-04 20:28:08,544 ERROR rmapp.RMAppImpl (RMAppImpl.java:handle(626)) - Can't handle this event at current state org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: ATTEMPT_REGISTERED at KILLED at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:624) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:81) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:656) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:640) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) at java.lang.Thread.run(Thread.java:662) 2014-02-04 20:28:08,549 INFO resourcemanager.RMAuditLogger (RMAuditLogger.java:logSuccess(140)) - USER=hrt_qa IP=172.18.145.156 OPERATION=Kill Application Request TARGET=ClientRMService RESULT=SUCCESS APPID=application_1391543307203_0001 2014-02-04 20:28:08,553 WARN ipc.Server (Server.java:run(1978)) - IPC Server handler 0 on 8030, call org.apache.hadoop.yarn.api.ApplicationMasterProtocolPB.registerApplicationMaster from 172.18.145.156:40474 Call#0 Retry#0: error: java.lang.NullPointerException java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.getTransferredContainers(AbstractYarnScheduler.java:48) at org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.registerApplicationMaster(ApplicationMasterService.java:278) at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.registerApplicationMaster(ApplicationMasterProtocolPBServiceImpl.java:90) at org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:95) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1962) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1958) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1956) {code}
YARN-1687. Major sub-task reported by Zhijie Shen and fixed by Zhijie Shen
Refactoring timeline classes to remove "app" related words

Remove ATS prefix, change package name, fix javadoc and so on
YARN-1686. Major bug reported by Rohith and fixed by Rohith (nodemanager)
NodeManager.resyncWithRM() does not handle exception which cause NodeManger to Hang.

During start of NodeManager,if registration with resourcemanager throw exception then nodemager shutdown happens. Consider case where NM-1 is registered with RM. RM issued Resync to NM. If any exception thrown in "resyncWithRM" (starts new thread which does not handle exception) during RESYNC evet, then this thread is lost. NodeManger enters hanged state.
YARN-1685. Major sub-task reported by Mayank Bansal and fixed by Zhijie Shen
Bugs around log URL

1. Log URL should be different when the container is running and finished 2. Null case needs to be handled 3. The way of constructing log URL should be corrected
YARN-1684. Major sub-task reported by Billie Rinaldi and fixed by Billie Rinaldi
Fix history server heap size in yarn script

The yarn script currently has the following: {noformat} if [ "$YARN_RESOURCEMANAGER_HEAPSIZE" != "" ]; then JAVA_HEAP_MAX="-Xmx""$YARN_HISTORYSERVER_HEAPSIZE""m" fi {noformat}
YARN-1676. Major sub-task reported by Xuan Gong and fixed by Xuan Gong
Make admin refreshUserToGroupsMappings of configuration work across RM failover
YARN-1673. Blocker bug reported by Tassapol Athiapinya and fixed by Mayank Bansal (client)
Valid yarn kill application prints out help message.

yarn application -kill <application ID> used to work previously. In 2.4.0 it prints out help message and does not kill the application.
YARN-1672. Trivial bug reported by Karthik Kambatla and fixed by Naren Koneru (nodemanager)
YarnConfiguration is missing a default for yarn.nodemanager.log.retain-seconds

YarnConfiguration is missing a default for yarn.nodemanager.log.retain-seconds
YARN-1670. Critical bug reported by Thomas Graves and fixed by Mit Desai
aggregated log writer can write more log data then it says is the log length

We have seen exceptions when using 'yarn logs' to read log files. at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) at java.lang.Long.parseLong(Long.java:441) at java.lang.Long.parseLong(Long.java:483) at org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogReader.readAContainerLogsForALogType(AggregatedLogFormat.java:518) at org.apache.hadoop.yarn.logaggregation.LogDumper.dumpAContainerLogs(LogDumper.java:178) at org.apache.hadoop.yarn.logaggregation.LogDumper.run(LogDumper.java:130) at org.apache.hadoop.yarn.logaggregation.LogDumper.main(LogDumper.java:246) We traced it down to the reader trying to read the file type of the next file but where it reads is still log data from the previous file. What happened was the Log Length was written as a certain size but the log data was actually longer then that. Inside of the write() routine in LogValue it first writes what the logfile length is, but then when it goes to write the log itself it just goes to the end of the file. There is a race condition here where if someone is still writing to the file when it goes to be aggregated the length written could be to small. We should have the write() routine stop when it writes whatever it said was the length. It would be nice if we could somehow tell the user it might be truncated but I'm not sure of a good way to do this. We also noticed that a bug in readAContainerLogsForALogType where it is using an int for curRead whereas it should be using a long. while (len != -1 && curRead < fileLength) { This isn't actually a problem right now as it looks like the underlying decoder is doing the right thing and the len condition exits.
YARN-1669. Major sub-task reported by Xuan Gong and fixed by Xuan Gong
Make admin refreshServiceAcls work across RM failover
YARN-1668. Major sub-task reported by Xuan Gong and fixed by Xuan Gong
Make admin refreshAdminAcls work across RM failover

Change the handling of admin-acls to be available across RM failover by making using of a remote configuration-provider
YARN-1667. Major sub-task reported by Xuan Gong and fixed by Xuan Gong
Make admin refreshSuperUserGroupsConfiguration work across RM failover
YARN-1666. Major sub-task reported by Xuan Gong and fixed by Xuan Gong
Make admin refreshNodes work across RM failover
YARN-1665. Major sub-task reported by Arpit Gupta and fixed by Xuan Gong (resourcemanager)
Set better defaults for HA configs for automatic failover

In order to enable HA (automatic failover) i had to set the following configs {code} <property> <name>yarn.resourcemanager.ha.enabled</name> <value>true</value> </property> <property> <name>yarn.resourcemanager.ha.automatic-failover.enabled</name> <value>true</value> </property> <property> <name>yarn.resourcemanager.ha.automatic-failover.embedded</name> <value>true</value> </property> {code} I believe the user should just have to set yarn.resourcemanager.ha.enabled=true and the rest should be set as defaults. Basically automatic failover should be the default.
YARN-1661. Major bug reported by Tassapol Athiapinya and fixed by Vinod Kumar Vavilapalli (applications/distributed-shell)
AppMaster logs says failing even if an application does succeed.

Run: /usr/bin/yarn org.apache.hadoop.yarn.applications.distributedshell.Client -jar <distributed shell jar> -shell_command ls Open AM logs. Last line would indicate AM failure even though container logs print good ls result. {code} 2014-01-24 21:45:29,592 INFO [main] distributedshell.ApplicationMaster (ApplicationMaster.java:finish(599)) - Application completed. Signalling finish to RM 2014-01-24 21:45:29,612 INFO [main] impl.AMRMClientImpl (AMRMClientImpl.java:unregisterApplicationMaster(315)) - Waiting for application to be successfully unregistered. 2014-01-24 21:45:29,816 INFO [main] distributedshell.ApplicationMaster (ApplicationMaster.java:main(267)) - Application Master failed. exiting {code}
YARN-1660. Major sub-task reported by Arpit Gupta and fixed by Xuan Gong (resourcemanager)
add the ability to set yarn.resourcemanager.hostname.rm-id instead of setting all the various host:port properties for RM

Currently the user has to specify all the various host:port properties for RM. We should follow the pattern that we do for non HA setup where we can specify yarn.resourcemanager.hostname.rm-id and the defaults are used for all other affected properties.
YARN-1659. Major sub-task reported by Billie Rinaldi and fixed by Billie Rinaldi
Define the ApplicationTimelineStore store as an abstraction for implementing different storage impls for storing timeline information

These will be used by ApplicationTimelineStore interface. The web services will convert the store-facing obects to the user-facing objects.
YARN-1658. Major sub-task reported by Cindy Li and fixed by Cindy Li
Webservice should redirect to active RM when HA is enabled.

When HA is enabled, web service to standby RM should be redirected to the active RM. This is a related Jira to YARN-1525.
YARN-1641. Major sub-task reported by Karthik Kambatla and fixed by Karthik Kambatla (resourcemanager)
ZK store should attempt a write periodically to ensure it is still Active

Fencing in ZK store kicks in when the RM tries to write something to the store. If the RM doesn't write anything to the store, it doesn't get fenced and can continue to assume being the Active. By periodically writing a file (say, every RM_ZK_TIMEOUT_MS seconds), we can ensure it gets fenced.
YARN-1640. Blocker sub-task reported by Xuan Gong and fixed by Xuan Gong
Manual Failover does not work in secure clusters

NodeManager gets rejected after manually making one RM as active.
YARN-1639. Major sub-task reported by Arpit Gupta and fixed by Xuan Gong (resourcemanager)
YARM RM HA requires different configs on different RM hosts

We need to set yarn.resourcemanager.ha.id to rm1 or rm2 based on which rm you want to first or second. This means we have different configs on different RM nodes. This is unlike HDFS HA where the same configs are pushed to both NN's and it would be better to have the same setup for RM as this would make installation and managing easier.
YARN-1637. Major sub-task reported by Vinod Kumar Vavilapalli and fixed by Zhijie Shen
Implement a client library for java users to post entities+events

This is a wrapper around the web-service to facilitate easy posting of entity+event data to the time-line server.
YARN-1636. Major sub-task reported by Vinod Kumar Vavilapalli and fixed by Zhijie Shen
Implement timeline related web-services inside AHS for storing and retrieving entities+events
YARN-1635. Major sub-task reported by Vinod Kumar Vavilapalli and fixed by Billie Rinaldi
Implement a Leveldb based ApplicationTimelineStore

As per the design doc, we need a levelDB + local-filesystem based implementation to start with and for small deployments.
YARN-1634. Major sub-task reported by Vinod Kumar Vavilapalli and fixed by Zhijie Shen
Define an in-memory implementation of ApplicationTimelineStore

As per the design doc, the store needs to pluggable. We need a base interface, and an in-memory implementation for testing.
YARN-1633. Major sub-task reported by Vinod Kumar Vavilapalli and fixed by Zhijie Shen
Define user-faced entity, entity-info and event objects

Define the core objects of the application-timeline effort.
YARN-1632. Minor bug reported by Chen He and fixed by Chen He
TestApplicationMasterServices should be under org.apache.hadoop.yarn.server.resourcemanager package

ApplicationMasterService is under org.apache.hadoop.yarn.server.resourcemanager package. However, its unit test file TestApplicationMasterService is placed under org.apache.hadoop.yarn.server.resourcemanager.applicationmasterservice package which only contains one file (TestApplicationMasterService).
YARN-1625. Trivial sub-task reported by Shinichi Yamashita and fixed by Shinichi Yamashita
mvn apache-rat:check outputs warning message in YARN-321 branch

When I ran dev-support/test-patch.sh, following message output. {code} mvn apache-rat:check -DHadoopPatchProcess > /tmp/patchReleaseAuditOutput.txt 2>&1 There appear to be 1 release audit warnings after applying the patch. {code} {code} !????? /home/sinchii/git/YARN-321-test/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/webapps/applicationhistory/.keep Lines that start with ????? in the release audit report indicate files that do not have an Apache license header. {code} To avoid release audit warning, it should fix pom.xml.
YARN-1617. Major bug reported by Sandy Ryza and fixed by Sandy Ryza (scheduler)
Remove ancient comment and surround LOG.debug in AppSchedulingInfo.allocate

{code} synchronized private void allocate(Container container) { // Update consumption and track allocations //TODO: fixme sharad /* try { store.storeContainer(container); } catch (IOException ie) { // TODO fix this. we shouldnt ignore }*/ LOG.debug("allocate: applicationId=" + applicationId + " container=" + container.getId() + " host=" + container.getNodeId().toString()); } {code}
YARN-1613. Major sub-task reported by Zhijie Shen and fixed by Akira AJISAKA
Fix config name YARN_HISTORY_SERVICE_ENABLED

YARN_HISTORY_SERVICE_ENABLED property name is "yarn.ahs..enabled", which is wrong.
YARN-1611. Major sub-task reported by Xuan Gong and fixed by Xuan Gong
Make admin refresh of capacity scheduler configuration work across RM failover

Currently, If we do refresh* for a standby RM, it will failover to the current active RM, and do the refresh* based on the local configuration file of the active RM.
YARN-1605. Major sub-task reported by Vinod Kumar Vavilapalli and fixed by Vinod Kumar Vavilapalli
Fix formatting issues with new module in YARN-321 branch

There are a bunch of formatting issues. I'm restricting myself for a sweep of all the files in the new module.
YARN-1597. Major sub-task reported by Vinod Kumar Vavilapalli and fixed by Vinod Kumar Vavilapalli
FindBugs warnings on YARN-321 branch

There are a bunch of findBugs warnings on YARN-321 branch.
YARN-1596. Major sub-task reported by Vinod Kumar Vavilapalli and fixed by Vinod Kumar Vavilapalli
Javadoc failures on YARN-321 branch

There are some javadoc issues on YARN-321 branch.
YARN-1595. Major sub-task reported by Vinod Kumar Vavilapalli and fixed by Vinod Kumar Vavilapalli
Test failures on YARN-321 branch

mvn test doesn't pass on YARN-321 branch anymore.
YARN-1594. Major sub-task reported by Vinod Kumar Vavilapalli and fixed by Vinod Kumar Vavilapalli
YARN-321 branch needs to be updated after YARN-888 pom changes

YARN-888 changed the pom structure. And so latest merge to trunk breaks YARN-321 branch.
YARN-1591. Major bug reported by Vinod Kumar Vavilapalli and fixed by Tsuyoshi OZAWA
TestResourceTrackerService fails randomly on trunk

As evidenced by Jenkins at https://issues.apache.org/jira/browse/YARN-1041?focusedCommentId=13868621&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13868621. It's failing randomly on trunk on my local box too
YARN-1590. Major bug reported by Mohammad Kamrul Islam and fixed by Mohammad Kamrul Islam (resourcemanager)
_HOST doesn't expand properly for RM, NM, ProxyServer and JHS

_HOST is not properly substituted when we use VIP address. Currently it always used the host name of the machine and disregard the VIP address. It is true mainly for RM, NM, WebProxy, and JHS rpc service. Looks like it is working fine for webservice authentication. On the other hand, the same thing is working fine for NN and SNN in RPC as well as webservice.
YARN-1588. Major sub-task reported by Jian He and fixed by Jian He
Rebind NM tokens for previous attempt's running containers to the new attempt
YARN-1587. Major sub-task reported by Mayank Bansal and fixed by Vinod Kumar Vavilapalli
[YARN-321] Merge Patch for YARN-321

Merge Patch
YARN-1578. Major sub-task reported by Shinichi Yamashita and fixed by Shinichi Yamashita
Fix how to read history file in FileSystemApplicationHistoryStore

I carried out PiEstimator job at Hadoop cluster which applied YARN-321. After the job end and when I accessed Web UI of HistoryServer, it displayed "500". And HistoryServer daemon log was output as follows. {code} 2014-01-09 13:31:12,227 ERROR org.apache.hadoop.yarn.webapp.Dispatcher: error handling URI: /applicationhistory/appattempt/appattempt_1389146249925_0008_000001 java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:153) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) (snip...) Caused by: java.lang.NullPointerException at org.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore.mergeContainerHistoryData(FileSystemApplicationHistoryStore.java:696) at org.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore.getContainers(FileSystemApplicationHistoryStore.java:429) at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.getContainers(ApplicationHistoryManagerImpl.java:201) at org.apache.hadoop.yarn.server.webapp.AppAttemptBlock.render(AppAttemptBlock.java:110) (snip...) {code} I confirmed that there was container which was not finished from ApplicationHistory file. In ResourceManager daemon log, ResourceManager reserved this container, but did not allocate it. When FileSystemApplicationHistoryStore reads container information without finish data in history file, this problem occurs. In consideration of the case which there is not finish data, we should fix how to read history file in FileSystemApplicationHistoryStore.
YARN-1577. Blocker sub-task reported by Jian He and fixed by Jian He
Unmanaged AM is broken because of YARN-1493

Today unmanaged AM client is waiting for app state to be Accepted to launch the AM. This is broken since we changed in YARN-1493 to start the attempt after the application is Accepted. We may need to introduce an attempt state report that client can rely on to query the attempt state and choose to launch the unmanaged AM.
YARN-1570. Minor improvement reported by Akira AJISAKA and fixed by Akira AJISAKA (documentation)
Formatting the lines within 80 chars in YarnCommands.apt.vm

In YarnCommands.apt.vm, there are some lines longer than 80 characters. For example: {code} Yarn commands are invoked by the bin/yarn script. Running the yarn script without any arguments prints the description for all commands. {code}
YARN-1566. Major sub-task reported by Jian He and fixed by Jian He
Change distributed-shell to retain containers from previous AppAttempt

Change distributed-shell to reuse previous AM's running containers when AM is restarting. It can also be made configurable whether to enable this feature or not.
YARN-1555. Major sub-task reported by Vinod Kumar Vavilapalli and fixed by Vinod Kumar Vavilapalli
[YARN-321] Failing tests in org.apache.hadoop.yarn.server.applicationhistoryservice.*

Several tests are failing on the latest YARN-321 branch.
YARN-1553. Major bug reported by Haohui Mai and fixed by Haohui Mai
Do not use HttpConfig.isSecure() in YARN

HDFS-5305 and related jira decide that each individual project will have their own configuration on http policy. {{HttpConfig.isSecure}} is a global static method which does not fit the design anymore. The same functionality should be moved into the YARN code base.
YARN-1536. Minor improvement reported by Karthik Kambatla and fixed by Anubhav Dhoot (resourcemanager)
Cleanup: Get rid of ResourceManager#get*SecretManager() methods and use the RMContext methods instead

Both ResourceManager and RMContext have methods to access the secret managers, and it should be safe (cleaner) to get rid of the ResourceManager methods.
YARN-1534. Major sub-task reported by Shinichi Yamashita and fixed by Shinichi Yamashita
TestAHSWebApp failed in YARN-321 branch

I ran the following commands. And I confirmed failure of TestAHSWebApp. {code} [sinchii@hdX YARN-321-test]$ mvn clean test -Dtest=org.apache.hadoop.yarn.server.applicationhistoryservice.webapp.* {code} {code} Running org.apache.hadoop.yarn.server.applicationhistoryservice.webapp.TestAHSWebServices Tests run: 9, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 7.492 sec - in org.apache.hadoop.yarn.server.applicationhistoryservice.webapp.TestAHSWebServices Running org.apache.hadoop.yarn.server.applicationhistoryservice.webapp.TestAHSWebApp Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.193 sec <<< FAILURE! - in org.apache.hadoop.yarn.server.applicationhistoryservice.webapp.TestAHSWebApp initializationError(org.apache.hadoop.yarn.server.applicationhistoryservice.webapp.TestAHSWebApp) Time elapsed: 0.016 sec <<< ERROR! java.lang.Exception: Test class should have exactly one public zero-argument constructor at org.junit.runners.BlockJUnit4ClassRunner.validateZeroArgConstructor(BlockJUnit4ClassRunner.java:144) at org.junit.runners.BlockJUnit4ClassRunner.validateConstructor(BlockJUnit4ClassRunner.java:121) at org.junit.runners.BlockJUnit4ClassRunner.collectInitializationErrors(BlockJUnit4ClassRunner.java:101) at org.junit.runners.ParentRunner.validate(ParentRunner.java:344) (*snip*) {code}
YARN-1531. Major bug reported by Akira AJISAKA and fixed by Akira AJISAKA (documentation)
True up yarn command documentation

There are some options which are not written to Yarn Command document. For example, "yarn rmadmin" command options are as follows: {code} Usage: yarn rmadmin -refreshQueues -refreshNodes -refreshSuperUserGroupsConfiguration -refreshUserToGroupsMappings -refreshAdminAcls -refreshServiceAcl -getGroups [username] -help [cmd] -transitionToActive <serviceId> -transitionToStandby <serviceId> -failover [--forcefence] [--forceactive] <serviceId> <serviceId> -getServiceState <serviceId> -checkHealth <serviceId> {code} But some of the new options such as "-getGroups", "-transitionToActive", and "-transitionToStandby" are not documented.
YARN-1528. Blocker bug reported by Karthik Kambatla and fixed by Karthik Kambatla (resourcemanager)
Allow setting auth for ZK connections

ZK store and embedded election allow setting ZK-acls but not auth information
YARN-1525. Major sub-task reported by Xuan Gong and fixed by Cindy Li
Web UI should redirect to active RM when HA is enabled.

When failover happens, web UI should redirect to the current active rm.
YARN-1521. Blocker sub-task reported by Xuan Gong and fixed by Xuan Gong
Mark appropriate protocol methods with the idempotent annotation or AtMostOnce annotation

After YARN-1028, we add the automatically failover into RMProxy. This JIRA is to identify whether we need to add idempotent annotation and which methods can be marked as idempotent.
YARN-1512. Major improvement reported by Arun C Murthy and fixed by Arun C Murthy
Enhance CS to decouple scheduling from node heartbeats

Enhance CS to decouple scheduling from node heartbeats; a prototype has improved latency significantly.
YARN-1493. Major sub-task reported by Jian He and fixed by Jian He
Schedulers don't recognize apps separately from app-attempts

Today, scheduler is tied to attempt only. We need to separate app-level handling logic in scheduler. We can add new app-level events to the scheduler and separate the app-level logic out. This is good for work-preserving AM restart, RM restart, and also needed for differentiating app-level metrics and attempt-level metrics.
YARN-1490. Major sub-task reported by Vinod Kumar Vavilapalli and fixed by Jian He
RM should optionally not kill all containers when an ApplicationMaster exits

This is needed to enable work-preserving AM restart. Some apps can chose to reconnect with old running containers, some may not want to. This should be an option.
YARN-1470. Major bug reported by Sandy Ryza and fixed by Anubhav Dhoot
Add audience annotation to MiniYARNCluster

We should make it clear whether this is a public interface.
YARN-1461. Major sub-task reported by Karthik Kambatla and fixed by Karthik Kambatla (resourcemanager)
RM API and RM changes to handle tags for running jobs
YARN-1459. Major sub-task reported by Karthik Kambatla and fixed by Xuan Gong (resourcemanager)
RM services should depend on ConfigurationProvider during startup too

YARN-1667, YARN-1668, YARN-1669 already changed RM to depend on a configuration provider so as to be able to refresh many configuration files across RM fail-over. The dependency on the configuration-provider by the RM should happen at its boot up time too.
YARN-1452. Major task reported by Zhijie Shen and fixed by Zhijie Shen
Document the usage of the generic application history and the timeline data service

We need to write a bunch of documents to guide users. such as command line tools, configurations and REST APIs
YARN-1444. Blocker bug reported by Robert Grandl and fixed by Wangda Tan (client , resourcemanager)
RM crashes when node resource request sent without corresponding off-switch request

I have tried to force reducers to execute on certain nodes. What I did is I changed for reduce tasks, the RMContainerRequestor#addResourceRequest(req.priority, ResourceRequest.ANY, req.capability) to RMContainerRequestor#addResourceRequest(req.priority, HOST_NAME, req.capability). However, this change lead to RM crashes when reducers needs to be assigned with the following exception: FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in handling event type NODE_UPDATE to the scheduler java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:841) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:640) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:554) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.nodeUpdate(CapacityScheduler.java:695) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:739) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:86) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:549) at java.lang.Thread.run(Thread.java:722)
YARN-1428. Major sub-task reported by Zhijie Shen and fixed by Zhijie Shen
RM cannot write the final state of RMApp/RMAppAttempt to the application history store in the transition to the final state

ApplicationFinishData and ApplicationAttemptFinishData are written in the final transitions of RMApp/RMAppAttempt respectively. However, in the transitions, getState() is not getting the state that RMApp/RMAppAttempt is going to enter, but prior one.
YARN-1417. Blocker bug reported by Omkar Vinit Joshi and fixed by Jian He
RM may issue expired container tokens to AM while issuing new containers.

Today we create new container token when we create container in RM as a part of schedule cycle. However that container may get reserved or assigned. If the container gets reserved and remains like that (in reserved state) for more than container token expiry interval then RM will end up issuing container with expired token.
YARN-1410. Major sub-task reported by Bikas Saha and fixed by Xuan Gong
Handle RM fails over after getApplicationID() and before submitApplication().

App submission involves 1) creating appId 2) using that appId to submit an ApplicationSubmissionContext to the user. The client may have obtained an appId from an RM, the RM may have failed over, and the client may submit the app to the new RM. Since the new RM has a different notion of cluster timestamp (used to create app id) the new RM may reject the app submission resulting in unexpected failure on the client side. The same may happen for other 2 step client API operations.
YARN-1398. Blocker bug reported by Sunil G and fixed by Vinod Kumar Vavilapalli (resourcemanager)
Deadlock in capacity scheduler leaf queue and parent queue for getQueueInfo and completedContainer call

getQueueInfo in parentQueue will call child.getQueueInfo(). This will try acquire the leaf queue lock over parent queue lock. Now at same time if a completedContainer call comes and acquired LeafQueue lock and it will wait for ParentQueue's completedConatiner call. This lock usage is not in synchronous and can lead to deadlock. With JCarder, this is showing as a potential deadlock scenario.
YARN-1389. Major sub-task reported by Mayank Bansal and fixed by Mayank Bansal
ApplicationClientProtocol and ApplicationHistoryProtocol should expose analogous APIs

As we plan to have the APIs in ApplicationHistoryProtocol to expose the reports of *finished* application attempts and containers, we should do the same for ApplicationClientProtocol, which will return the reports of *running* attempts and containers. Later on, we can improve YarnClient to direct the query of running instance to ApplicationClientProtocol, while that of finished instance to ApplicationHistoryProtocol, making it transparent to the users.
YARN-1379. Major sub-task reported by Vinod Kumar Vavilapalli and fixed by Vinod Kumar Vavilapalli
[YARN-321] AHS protocols need to be in yarn proto package name after YARN-1170

Found this while merging YARN-321 to the latest branch-2. Without this, compilation fails.
YARN-1345. Major sub-task reported by Zhijie Shen and fixed by Zhijie Shen
Removing FINAL_SAVING from YarnApplicationAttemptState

Whenever YARN-891 is done, we need to add the mapping of RMAppAttemptState.FINAL_SAVING -> YarnApplicationAttemptState.FINAL_SAVING in RMServerUtils#createApplicationAttemptState
YARN-1301. Minor bug reported by Zhijie Shen and fixed by Tsuyoshi OZAWA
Need to log the blacklist additions/removals when YarnSchedule#allocate

Now without the log, it's hard to debug whether blacklist is updated on the scheduler side or not
YARN-1285. Major bug reported by Zhijie Shen and fixed by Kenji Kikushima
Inconsistency of default "yarn.acl.enable" value

In yarn-default.xml, "yarn.acl.enable" is true while in YarnConfiguration, DEFAULT_YARN_ACL_ENABLE is false.
YARN-1266. Major sub-task reported by Mayank Bansal and fixed by Mayank Bansal
Implement PB service and client wrappers for ApplicationHistoryProtocol

Adding ApplicationHistoryProtocolPBService to make web apps to work and changing yarn to run AHS as a seprate process
YARN-1242. Major sub-task reported by Zhijie Shen and fixed by Mayank Bansal
Script changes to start AHS as an individual process

Add the command in yarn and yarn.cmd to start and stop AHS
YARN-1206. Blocker bug reported by Jian He and fixed by Rohith
AM container log link broken on NM web page even though local container logs are available

With log aggregation disabled, when container is running, its logs link works properly, but after the application is finished, the link shows 'Container does not exist.'
YARN-1191. Major sub-task reported by Mayank Bansal and fixed by Mayank Bansal
[YARN-321] Update artifact versions for application history service

Compilation is failing for YARN-321 branch
YARN-1171. Major improvement reported by Sandy Ryza and fixed by Naren Koneru (documentation , scheduler)
Add default queue properties to Fair Scheduler documentation

The Fair Scheduler doc is missing the following properties. - defaultMinSharePreemptionTimeout - queueMaxAppsDefault
YARN-1166. Blocker bug reported by Srimanth Gunturi and fixed by Zhijie Shen (resourcemanager)
YARN 'appsFailed' metric should be of type 'counter'

Currently in YARN's queue metrics, the cumulative metric 'appsFailed' is of type 'guage' - which means the exact value will be reported. All other cumulative queue metrics (AppsSubmitted, AppsCompleted, AppsKilled) are all of type 'counter' - meaning Ganglia will use slope to provide deltas between time-points. To be consistent, AppsFailed metric should also be of type 'counter'.
YARN-1123. Major sub-task reported by Zhijie Shen and fixed by Mayank Bansal
[YARN-321] Adding ContainerReport and Protobuf implementation

Like YARN-978, we need some client-oriented class to expose the container history info. Neither Container nor RMContainer is the right one.
YARN-1071. Major bug reported by Srimanth Gunturi and fixed by Jian He (resourcemanager)
ResourceManager's decommissioned and lost node count is 0 after restart

I had 6 nodes in a cluster with 2 NMs stopped. Then I put a host into YARN's {{yarn.resourcemanager.nodes.exclude-path}}. After running {{yarn rmadmin -refreshNodes}}, RM's JMX correctly showed decommissioned node count: {noformat} "NumActiveNMs" : 3, "NumDecommissionedNMs" : 1, "NumLostNMs" : 2, "NumUnhealthyNMs" : 0, "NumRebootedNMs" : 0 {noformat} After restarting RM, the counts were shown as below in JMX. {noformat} "NumActiveNMs" : 3, "NumDecommissionedNMs" : 0, "NumLostNMs" : 0, "NumUnhealthyNMs" : 0, "NumRebootedNMs" : 0 {noformat} Notice that the lost and decommissioned NM counts are both 0.
YARN-1041. Major sub-task reported by Steve Loughran and fixed by Jian He (resourcemanager)
Protocol changes for RM to bind and notify a restarted AM of existing containers

For long lived containers we don't want the AM to be a SPOF. When the RM restarts a (failed) AM, it should be given the list of containers it had already been allocated. the AM should then be able to contact the NMs to get details on them. NMs would also need to do any binding of the containers needed to handle a moved/restarted AM.
YARN-1023. Major sub-task reported by Devaraj K and fixed by Zhijie Shen
[YARN-321] Webservices REST API's support for Application History
YARN-1017. Blocker sub-task reported by Jian He and fixed by Jian He (resourcemanager)
Document RM Restart feature

This should give users a general idea about how RM Restart works and how to use RM Restart
YARN-1007. Major sub-task reported by Devaraj K and fixed by Mayank Bansal
[YARN-321] Enhance History Reader interface for Containers

If we want to show the containers used by application/app attempt, We need to have two more API's which returns collection of ContainerHistoryData for application id and applcation attempt id something like below. {code:xml} Collection<ContainerHistoryData> getContainers( ApplicationAttemptId appAttemptId); Collection<ContainerHistoryData> getContainers(ApplicationId appId); {code} {code:xml} /** * This method returns {@link Container} for specified {@link ContainerId}. * * @param {@link ContainerId} * @return {@link Container} for ContainerId */ ContainerHistoryData getAMContainer(ContainerId containerId); {code} In the above API, we need to change the argument to application attempt id or we can remove this API because every attempt history data has master container id field, using master container id, history data can get using this below API if it takes argument as container id. {code:xml} /** * This method returns {@link ContainerHistoryData} for specified * {@link ApplicationAttemptId}. * * @param {@link ApplicationAttemptId} * @return {@link ContainerHistoryData} for ApplicationAttemptId */ ContainerHistoryData getContainer(ApplicationAttemptId appAttemptId); {code} Here application attempt can use numbers of containers but we cannot choose which container history data to return. This API argument also need to be changed to take container id instead of app attempt id.
YARN-987. Major sub-task reported by Mayank Bansal and fixed by Mayank Bansal
Adding ApplicationHistoryManager responsible for exposing reports to all clients
YARN-986. Blocker sub-task reported by Vinod Kumar Vavilapalli and fixed by Karthik Kambatla
RM DT token service should have service addresses of both RMs

Previously: YARN should use cluster-id as token service address This needs to be done to support non-ip based fail over of RM. Once the server sets the token service address to be this generic ClusterId/ServiceId, clients can translate it to appropriate final IP and then be able to select tokens via TokenSelectors. Some workarounds for other related issues were put in place at YARN-945.
YARN-984. Major sub-task reported by Devaraj K and fixed by Devaraj K
[YARN-321] Move classes from applicationhistoryservice.records.pb.impl package to applicationhistoryservice.records.impl.pb

While creating instance for applicationhistoryservice.records.* pb records, It is throwing the ClassNotFoundException. {code:xml} Caused by: java.lang.ClassNotFoundException: Class org.apache.hadoop.yarn.server.applicationhistoryservice.records.impl.pb.ApplicationHistoryDataPBImpl not found at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1619) at org.apache.hadoop.yarn.factories.impl.pb.RecordFactoryPBImpl.newRecordInstance(RecordFactoryPBImpl.java:56) ... 49 more {code}
YARN-979. Major sub-task reported by Mayank Bansal and fixed by Mayank Bansal
[YARN-321] Add more APIs related to ApplicationAttempt and Container in ApplicationHistoryProtocol

ApplicationHistoryProtocol should have the following APIs as well: * getApplicationAttemptReport * getApplicationAttempts * getContainerReport * getContainers The corresponding request and response classes need to be added as well.
YARN-978. Major sub-task reported by Mayank Bansal and fixed by Mayank Bansal
[YARN-321] Adding ApplicationAttemptReport and Protobuf implementation

We dont have ApplicationAttemptReport and Protobuf implementation. Adding that. Thanks, Mayank
YARN-975. Major sub-task reported by Zhijie Shen and fixed by Zhijie Shen
Add a file-system implementation for history-storage

HDFS implementation should be a standard persistence strategy of history storage
YARN-974. Major sub-task reported by Zhijie Shen and fixed by Zhijie Shen
RMContainer should collect more useful information to be recorded in Application-History

To record the history of a container, users may be also interested in the following information: 1. Start Time 2. Stop Time 3. Diagnostic Information 4. URL to the Log File 5. Actually Allocated Resource 6. Actually Assigned Node These should be remembered during the RMContainer's life cycle.
YARN-967. Major sub-task reported by Devaraj K and fixed by Mayank Bansal
[YARN-321] Command Line Interface(CLI) for Reading Application History Storage Data
YARN-962. Major sub-task reported by Zhijie Shen and fixed by Zhijie Shen
Update application_history_service.proto

1. Change it's name to application_history_client.proto 2. Fix the incorrect proto reference. 3. Correct the dir in pom.xml
YARN-956. Major sub-task reported by Vinod Kumar Vavilapalli and fixed by Zhijie Shen
[YARN-321] Add a testable in-memory HistoryStorage
YARN-955. Major sub-task reported by Vinod Kumar Vavilapalli and fixed by Mayank Bansal
[YARN-321] Implementation of ApplicationHistoryProtocol
YARN-954. Major sub-task reported by Vinod Kumar Vavilapalli and fixed by Zhijie Shen
[YARN-321] History Service should create the webUI and wire it to HistoryStorage
YARN-953. Major sub-task reported by Vinod Kumar Vavilapalli and fixed by Zhijie Shen
[YARN-321] Enable ResourceManager to write history data
YARN-947. Major sub-task reported by Zhijie Shen and fixed by Zhijie Shen
Defining the history data classes for the implementation of the reading/writing interface

We need to define the history data classes have the exact fields to be stored. Therefore, all the implementations don't need to have the duplicate logic to exact the required information from RMApp, RMAppAttempt and RMContainer. We use protobuf to define these classes, such that they can be ser/des to/from bytes, which are easier for persistence.
YARN-935. Major sub-task reported by Zhijie Shen and fixed by Zhijie Shen
YARN-321 branch is broken due to applicationhistoryserver module's pom.xml

The branch was created from branch-2, hadoop-yarn-server-applicationhistoryserver/pom.xml should use 2.2.0-SNAPSHOT, not 3.0.0-SNAPSHOT. Otherwise, the sub-project cannot be built correctly because of wrong dependency.
YARN-934. Major sub-task reported by Zhijie Shen and fixed by Zhijie Shen
HistoryStorage writer interface for Application History Server
YARN-930. Major sub-task reported by Vinod Kumar Vavilapalli and fixed by Vinod Kumar Vavilapalli
Bootstrap ApplicationHistoryService module
YARN-713. Critical bug reported by Jason Lowe and fixed by Jian He (resourcemanager)
ResourceManager can exit unexpectedly if DNS is unavailable

As discussed in MAPREDUCE-5261, there's a possibility that a DNS outage could lead to an unhandled exception in the ResourceManager's AsyncDispatcher, and that ultimately would cause the RM to exit. The RM should not exit during DNS hiccups.
MAPREDUCE-5813. Blocker bug reported by Gera Shegalov and fixed by Gera Shegalov (mrv2 , task)
YarnChild does not load job.xml with mapreduce.job.classloader=true
MAPREDUCE-5810. Major bug reported by Mit Desai and fixed by Akira AJISAKA (contrib/streaming)
TestStreamingTaskLog#testStreamingTaskLogWithHadoopCmd is failing
MAPREDUCE-5806. Major bug reported by Eugene Koifman and fixed by Varun Vasudev
Log4j settings in container-log4j.properties cannot be overridden
MAPREDUCE-5805. Major bug reported by Fengdong Yu and fixed by Akira AJISAKA (jobhistoryserver)
Unable to parse launch time from job history file
MAPREDUCE-5795. Major bug reported by Yesha Vora and fixed by Xuan Gong
Job should be marked as Failed if it is recovered from commit.
MAPREDUCE-5794. Minor bug reported by Tsz Wo Nicholas Sze and fixed by Tsz Wo Nicholas Sze (test)
SliveMapper always uses default FileSystem.
MAPREDUCE-5791. Major bug reported by Nikola Vujic and fixed by Nikola Vujic (client)
Shuffle phase is slow in Windows - FadviseFileRegion::transferTo does not read disks efficiently
MAPREDUCE-5789. Major bug reported by Rushabh S Shah and fixed by Rushabh S Shah (jobhistoryserver , webapps)
Average Reduce time is incorrect on Job Overview page
MAPREDUCE-5787. Critical sub-task reported by Rajesh Balamohan and fixed by Rajesh Balamohan (nodemanager)
Modify ShuffleHandler to support Keep-Alive
MAPREDUCE-5780. Minor bug reported by Tsz Wo Nicholas Sze and fixed by Tsz Wo Nicholas Sze (test)
SliveTest always uses default FileSystem
MAPREDUCE-5778. Major bug reported by Jason Lowe and fixed by Akira AJISAKA (jobhistoryserver)
JobSummary does not escape newlines in the job name
MAPREDUCE-5773. Blocker improvement reported by Gera Shegalov and fixed by Gera Shegalov (mr-am)
Provide dedicated MRAppMaster syslog length limit
MAPREDUCE-5770. Major bug reported by Yesha Vora and fixed by Jian He
Redirection from AM-URL is broken with HTTPS_ONLY policy
MAPREDUCE-5769. Major bug reported by Rohith and fixed by Rohith
Unregistration to RM should not be called if AM is crashed before registering with RM
MAPREDUCE-5768. Major bug reported by Zhijie Shen and fixed by Gera Shegalov
TestMRJobs.testContainerRollingLog fails on trunk
MAPREDUCE-5766. Minor bug reported by Ramya Sunil and fixed by Jian He (applicationmaster)
Ping messages from attempts should be moved to DEBUG
MAPREDUCE-5761. Trivial improvement reported by Yesha Vora and fixed by Jian He
Add a log message like "encrypted shuffle is ON" in nodemanager logs
MAPREDUCE-5757. Major bug reported by Jason Lowe and fixed by Jason Lowe (client)
ConcurrentModificationException in JobControl.toList
MAPREDUCE-5754. Major improvement reported by Gera Shegalov and fixed by Gera Shegalov (jobhistoryserver , mr-am)
Preserve Job diagnostics in history
MAPREDUCE-5751. Major bug reported by Sangjin Lee and fixed by Sangjin Lee
MR app master fails to start in some cases if mapreduce.job.classloader is true
MAPREDUCE-5746. Major bug reported by Jason Lowe and fixed by Jason Lowe (jobhistoryserver)
Job diagnostics can implicate wrong task for a failed job
MAPREDUCE-5732. Major improvement reported by Sandy Ryza and fixed by Sandy Ryza
Report proper queue when job has been automatically placed
MAPREDUCE-5699. Major bug reported by Karthik Kambatla and fixed by Karthik Kambatla (applicationmaster)
Allow setting tags on MR jobs
MAPREDUCE-5688. Major bug reported by Mit Desai and fixed by Mit Desai
TestStagingCleanup fails intermittently with JDK7
MAPREDUCE-5670. Minor bug reported by Jason Lowe and fixed by Chen He (mrv2)
CombineFileRecordReader should report progress when moving to the next file
MAPREDUCE-5570. Major bug reported by Jason Lowe and fixed by Rushabh S Shah (mr-am , mrv2)
Map task attempt with fetch failure has incorrect attempt finish time
MAPREDUCE-5553. Minor improvement reported by Paul Han and fixed by Paul Han (applicationmaster)
Add task state filters on Application/MRJob page for MR Application master
MAPREDUCE-5028. Critical bug reported by Karthik Kambatla and fixed by Karthik Kambatla
Maps fail when io.sort.mb is set to high value
MAPREDUCE-4052. Major bug reported by xieguiming and fixed by Jian He (job submission)
Windows eclipse cannot submit job from Windows client to Linux/Unix Hadoop cluster.
MAPREDUCE-2349. Major improvement reported by Joydeep Sen Sarma and fixed by Siddharth Seth (task)
speed up list[located]status calls from input formats
HDFS-6166. Blocker bug reported by Nathan Roberts and fixed by Nathan Roberts (balancer)
revisit balancer so_timeout
HDFS-6163. Minor bug reported by Fengdong Yu and fixed by Fengdong Yu (documentation)
Fix a minor bug in the HA upgrade document
HDFS-6157. Major bug reported by Haohui Mai and fixed by Haohui Mai
Fix the entry point of OfflineImageViewer for hdfs.cmd
HDFS-6150. Minor improvement reported by Suresh Srinivas and fixed by Suresh Srinivas (namenode)
Add inode id information in the logs to make debugging easier
HDFS-6140. Major bug reported by Chris Nauroth and fixed by Chris Nauroth (webhdfs)
WebHDFS cannot create a file with spaces in the name after HA failover changes.
HDFS-6138. Minor improvement reported by Sanjay Radia and fixed by Sanjay Radia (documentation)
User Guide for how to use viewfs with federation
HDFS-6135. Blocker bug reported by Jing Zhao and fixed by Jing Zhao (journal-node)
In HDFS upgrade with HA setup, JournalNode cannot handle layout version bump when rolling back
HDFS-6131. Major bug reported by Jing Zhao and fixed by Jing Zhao (documentation)
Move HDFSHighAvailabilityWithNFS.apt.vm and HDFSHighAvailabilityWithQJM.apt.vm from Yarn to HDFS
HDFS-6130. Blocker bug reported by Fengdong Yu and fixed by Haohui Mai (namenode)
NPE when upgrading namenode from fsimages older than -32
HDFS-6129. Minor bug reported by Tsz Wo Nicholas Sze and fixed by Tsz Wo Nicholas Sze (datanode)
When a replica is not found for deletion, do not throw exception.
HDFS-6127. Major bug reported by Arpit Gupta and fixed by Haohui Mai (ha)
WebHDFS tokens cannot be renewed in HA setup
HDFS-6124. Major sub-task reported by Suresh Srinivas and fixed by Suresh Srinivas
Add final modifier to class members
HDFS-6123. Minor improvement reported by Tsz Wo Nicholas Sze and fixed by Tsz Wo Nicholas Sze (datanode)
Improve datanode error messages
HDFS-6120. Major improvement reported by Arpit Agarwal and fixed by Arpit Agarwal (namenode)
Fix and improve safe mode log messages
HDFS-6117. Minor bug reported by Suresh Srinivas and fixed by Suresh Srinivas (namenode)
Print file path information in FileNotFoundException
HDFS-6115. Minor bug reported by Vinayakumar B and fixed by Vinayakumar B (datanode)
flush() should be called for every append on block scan verification log
HDFS-6107. Major bug reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe (datanode)
When a block can't be cached due to limited space on the DataNode, that block becomes uncacheable
HDFS-6106. Major bug reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe
Reduce default for dfs.namenode.path.based.cache.refresh.interval.ms
HDFS-6105. Major bug reported by Kihwal Lee and fixed by Haohui Mai
NN web UI for DN list loads the same jmx page multiple times.
HDFS-6102. Blocker bug reported by Andrew Wang and fixed by Andrew Wang (namenode)
Lower the default maximum items per directory to fix PB fsimage loading
HDFS-6100. Major bug reported by Arpit Gupta and fixed by Haohui Mai (ha)
DataNodeWebHdfsMethods does not failover in HA mode
HDFS-6099. Major bug reported by Chris Nauroth and fixed by Chris Nauroth (namenode)
HDFS file system limits not enforced on renames.
HDFS-6097. Major bug reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe (hdfs-client)
zero-copy reads are incorrectly disabled on file offsets above 2GB
HDFS-6096. Minor bug reported by Tsz Wo Nicholas Sze and fixed by Tsz Wo Nicholas Sze (test)
TestWebHdfsTokens may timeout
HDFS-6094. Major bug reported by Arpit Agarwal and fixed by Arpit Agarwal (namenode)
The same block can be counted twice towards safe mode threshold
HDFS-6090. Minor improvement reported by Akira AJISAKA and fixed by Akira AJISAKA (test)
Use MiniDFSCluster.Builder instead of deprecated constructors
HDFS-6089. Major bug reported by Arpit Gupta and fixed by Jing Zhao (ha)
Standby NN while transitioning to active throws a connection refused error when the prior active NN process is suspended
HDFS-6086. Major sub-task reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe (datanode)
Fix a case where zero-copy or no-checksum reads were not allowed even when the block was cached
HDFS-6085. Major improvement reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe (namenode)
Improve CacheReplicationMonitor log messages a bit
HDFS-6084. Minor improvement reported by Travis Thompson and fixed by Travis Thompson (namenode)
Namenode UI - "Hadoop" logo link shouldn't go to hadoop homepage
HDFS-6080. Major improvement reported by Abin Shahab and fixed by Abin Shahab (nfs , performance)
Improve NFS gateway performance by making rtmax and wtmax configurable
HDFS-6079. Major bug reported by Andrew Wang and fixed by Andrew Wang (hdfs-client)
Timeout for getFileBlockStorageLocations does not work
HDFS-6078. Minor bug reported by Arpit Agarwal and fixed by Arpit Agarwal (test)
TestIncrementalBlockReports is flaky
HDFS-6077. Major bug reported by Arpit Gupta and fixed by Jing Zhao
running slive with webhdfs on secure HA cluster fails with unkown host exception
HDFS-6076. Minor sub-task reported by Tsz Wo Nicholas Sze and fixed by Tsz Wo Nicholas Sze (datanode , test)
SimulatedDataSet should not create DatanodeRegistration with namenode layout version and type
HDFS-6072. Major improvement reported by Haohui Mai and fixed by Haohui Mai
Clean up dead code of FSImage
HDFS-6071. Major bug reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe
BlockReaderLocal doesn't return -1 on EOF when doing a zero-length read on a short file
HDFS-6070. Trivial improvement reported by Andrew Wang and fixed by Andrew Wang
Cleanup use of ReadStatistics in DFSInputStream
HDFS-6069. Trivial improvement reported by Andrew Wang and fixed by Chris Nauroth (namenode)
Quash stack traces when ACLs are disabled
HDFS-6068. Major bug reported by Andrew Wang and fixed by sathish (snapshots)
Disallow snapshot names that are also invalid directory names
HDFS-6067. Major bug reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe (hdfs-client)
TestPread.testMaxOutHedgedReadPool is flaky
HDFS-6065. Major bug reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe (hdfs-client)
HDFS zero-copy reads should return null on EOF when doing ZCR
HDFS-6064. Minor bug reported by Vinayakumar B and fixed by Vinayakumar B (datanode)
DFSConfigKeys.DFS_BLOCKREPORT_INTERVAL_MSEC_DEFAULT is not updated with latest block report interval of 6 hrs
HDFS-6063. Minor bug reported by Colin Patrick McCabe and fixed by Chris Nauroth (test , tools)
TestAclCLI fails intermittently when running test 24: copyFromLocal
HDFS-6062. Minor bug reported by Jing Zhao and fixed by Jing Zhao
TestRetryCacheWithHA#testConcat is flaky
HDFS-6061. Major sub-task reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe (datanode)
Allow dfs.datanode.shared.file.descriptor.path to contain multiple entries and fall back when needed
HDFS-6060. Major sub-task reported by Brandon Li and fixed by Brandon Li (namenode)
NameNode should not check DataNode layout version
HDFS-6059. Major bug reported by Akira AJISAKA and fixed by Akira AJISAKA
TestBlockReaderLocal fails if native library is not available
HDFS-6058. Major bug reported by Vinayakumar B and fixed by Haohui Mai
Fix TestHDFSCLI failures after HADOOP-8691 change
HDFS-6057. Blocker bug reported by Eric Sirianni and fixed by Colin Patrick McCabe (hdfs-client)
DomainSocketWatcher.watcherThread should be marked as daemon thread
HDFS-6055. Major improvement reported by Suresh Srinivas and fixed by Chris Nauroth (namenode)
Change default configuration to limit file name length in HDFS

The default configuration of HDFS now sets dfs.namenode.fs-limits.max-component-length to 255 for improved interoperability with other file system implementations. This limits each component of a file system path to a maximum of 255 bytes in UTF-8 encoding. Attempts to create new files that violate this rule will fail with an error. Existing files that violate the rule are not effected. Previously, dfs.namenode.fs-limits.max-component-length was set to 0 (ignored). If necessary, it is possible to set the value back to 0 in the cluster's configuration to restore the old behavior.
HDFS-6053. Major bug reported by Jing Zhao and fixed by Jing Zhao (namenode)
Fix TestDecommissioningStatus and TestDecommission in branch-2
HDFS-6051. Blocker bug reported by Chris Nauroth and fixed by Colin Patrick McCabe (hdfs-client)
HDFS cannot run on Windows since short-circuit shared memory segment changes.
HDFS-6047. Major bug reported by stack and fixed by stack
TestPread NPE inside in DFSInputStream hedgedFetchBlockByteRange
HDFS-6046. Major sub-task reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe (hdfs-client)
add dfs.client.mmap.enabled
HDFS-6044. Minor improvement reported by Brandon Li and fixed by Brandon Li (nfs)
Add property for setting the NFS look up time for users
HDFS-6043. Major improvement reported by Brandon Li and fixed by Brandon Li (nfs)
Give HDFS daemons NFS3 and Portmap their own OPTS
HDFS-6040. Blocker sub-task reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe (hdfs-client)
fix DFSClient issue without libhadoop.so and some other ShortCircuitShm cleanups
HDFS-6039. Major bug reported by Yesha Vora and fixed by Chris Nauroth (namenode)
Uploading a File under a Dir with default acls throws "Duplicated ACLFeature"
HDFS-6038. Major sub-task reported by Haohui Mai and fixed by Jing Zhao (journal-node , namenode)
Allow JournalNode to handle editlog produced by new release with future layoutversion
HDFS-6033. Major bug reported by Aaron T. Myers and fixed by Aaron T. Myers (caching)
PBImageXmlWriter incorrectly handles processing cache directives
HDFS-6030. Trivial task reported by Yongjun Zhang and fixed by Yongjun Zhang
Remove an unused constructor in INode.java
HDFS-6028. Trivial bug reported by Chris Nauroth and fixed by Chris Nauroth (namenode)
Print clearer error message when user attempts to delete required mask entry from ACL.
HDFS-6025. Minor task reported by Tsz Wo Nicholas Sze and fixed by Tsz Wo Nicholas Sze (build)
Update findbugsExcludeFile.xml
HDFS-6018. Trivial improvement reported by Jing Zhao and fixed by Jing Zhao
Exception recorded in LOG when IPCLoggerChannel#close is called
HDFS-6008. Minor bug reported by Benoy Antony and fixed by Benoy Antony (namenode)
Namenode dead node link is giving HTTP error 500
HDFS-6006. Trivial improvement reported by Akira AJISAKA and fixed by Akira AJISAKA (namenode)
Remove duplicate code in FSNameSystem#getFileInfo
HDFS-5988. Blocker bug reported by Andrew Wang and fixed by Andrew Wang (namenode)
Bad fsimage always generated after upgrade
HDFS-5986. Major improvement reported by Suresh Srinivas and fixed by Chris Nauroth (namenode)
Capture the number of blocks pending deletion on namenode webUI
HDFS-5982. Critical bug reported by Tassapol Athiapinya and fixed by Jing Zhao (namenode)
Need to update snapshot manager when applying editlog for deleting a snapshottable directory
HDFS-5981. Minor bug reported by Haohui Mai and fixed by Haohui Mai (tools)
PBImageXmlWriter generates malformed XML
HDFS-5979. Minor improvement reported by Andrew Wang and fixed by Andrew Wang
Typo and logger fix for fsimage PB code
HDFS-5973. Major sub-task reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe (hdfs-client)
add DomainSocket#shutdown method
HDFS-5962. Critical bug reported by Kihwal Lee and fixed by Akira AJISAKA
Mtime and atime are not persisted for symbolic links
HDFS-5961. Critical bug reported by Kihwal Lee and fixed by Kihwal Lee
OIV cannot load fsimages containing a symbolic link
HDFS-5959. Minor bug reported by Akira AJISAKA and fixed by Akira AJISAKA
Fix typo at section name in FSImageFormatProtobuf.java
HDFS-5956. Major sub-task reported by Akira AJISAKA and fixed by Akira AJISAKA (tools)
A file size is multiplied by the replication factor in 'hdfs oiv -p FileDistribution' option
HDFS-5953. Major test reported by Ted Yu and fixed by Akira AJISAKA
TestBlockReaderFactory fails if libhadoop.so has not been built
HDFS-5950. Major sub-task reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe (datanode , hdfs-client)
The DFSClient and DataNode should use shared memory segments to communicate short-circuit information
HDFS-5949. Minor bug reported by Travis Thompson and fixed by Travis Thompson (namenode)
New Namenode UI when trying to download a file, the browser doesn't know the file name
HDFS-5948. Major bug reported by Andrew Wang and fixed by Haohui Mai
TestBackupNode flakes with port in use error
HDFS-5944. Major bug reported by zhaoyunjiong and fixed by zhaoyunjiong (namenode)
LeaseManager:findLeaseWithPrefixPath can't handle path like /a/b/ right and cause SecondaryNameNode failed do checkpoint
HDFS-5943. Major bug reported by Yesha Vora and fixed by Suresh Srinivas
'dfs.namenode.https-address.ns1' property is not used in federation setup
HDFS-5942. Minor sub-task reported by Akira AJISAKA and fixed by Akira AJISAKA (documentation , tools)
Fix javadoc in OfflineImageViewer
HDFS-5941. Major bug reported by Haohui Mai and fixed by Haohui Mai (documentation , namenode)
add dfs.namenode.secondary.https-address and dfs.namenode.secondary.https-address in hdfs-default.xml
HDFS-5940. Major sub-task reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe (hdfs-client)
Minor cleanups to ShortCircuitReplica, FsDatasetCache, and DomainSocketWatcher
HDFS-5939. Major improvement reported by Yongjun Zhang and fixed by Yongjun Zhang (hdfs-client)
WebHdfs returns misleading error code and logs nothing if trying to create a file with no DNs in cluster
HDFS-5938. Trivial sub-task reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe (hdfs-client)
Make BlockReaderFactory#BlockReaderPeer a static class
HDFS-5936. Major test reported by Andrew Wang and fixed by Binglin Chang (namenode , test)
MiniDFSCluster does not clean data left behind by SecondaryNameNode.
HDFS-5935. Minor improvement reported by Travis Thompson and fixed by Travis Thompson (namenode)
New Namenode UI FS browser should throw smarter error messages
HDFS-5934. Minor bug reported by Travis Thompson and fixed by Travis Thompson (namenode)
New Namenode UI back button doesn't work as expected
HDFS-5929. Major improvement reported by Siqi Li and fixed by Siqi Li (federation)
Add Block pool % usage to HDFS federated nn page
HDFS-5922. Major bug reported by Aaron T. Myers and fixed by Arpit Agarwal (datanode)
DN heartbeat thread can get stuck in tight loop
HDFS-5915. Major bug reported by Haohui Mai and fixed by Haohui Mai (namenode)
Refactor FSImageFormatProtobuf to simplify cross section reads
HDFS-5913. Minor bug reported by Ted Yu and fixed by Brandon Li (nfs)
Nfs3Utils#getWccAttr() should check attr parameter against null
HDFS-5910. Major improvement reported by Benoy Antony and fixed by Benoy Antony (security)
Enhance DataTransferProtocol to allow per-connection choice of encryption/plain-text
HDFS-5904. Major bug reported by Mit Desai and fixed by Mit Desai
TestFileStatus fails intermittently on trunk and branch2
HDFS-5901. Major bug reported by Vinayakumar B and fixed by Vinayakumar B (namenode)
NameNode new UI doesn't support IE8 and IE9 on windows 7
HDFS-5900. Major bug reported by Tassapol Athiapinya and fixed by Andrew Wang (caching)
Cannot set cache pool limit of "unlimited" via CacheAdmin
HDFS-5898. Major sub-task reported by Jing Zhao and fixed by Abin Shahab (nfs)
Allow NFS gateway to login/relogin from its kerberos keytab
HDFS-5895. Major bug reported by Tassapol Athiapinya and fixed by Tassapol Athiapinya (tools)
HDFS cacheadmin -listPools has exit_code of 1 when the command returns 0 result.
HDFS-5893. Major bug reported by Yesha Vora and fixed by Haohui Mai
HftpFileSystem.RangeHeaderUrlOpener uses the default URLConnectionFactory which does not import SSL certificates
HDFS-5892. Minor test reported by Ted Yu and fixed by
TestDeleteBlockPool fails in branch-2
HDFS-5891. Major bug reported by Haohui Mai and fixed by Haohui Mai (namenode , webhdfs)
webhdfs should not try connecting the DN during redirection
HDFS-5886. Major bug reported by Ted Yu and fixed by Brandon Li (nfs)
Potential null pointer deference in RpcProgramNfs3#readlink()
HDFS-5882. Minor test reported by Jimmy Xiang and fixed by Jimmy Xiang
TestAuditLogs is flaky
HDFS-5881. Critical bug reported by Kihwal Lee and fixed by Kihwal Lee
Fix skip() of the short-circuit local reader (legacy).
HDFS-5879. Major bug reported by Gera Shegalov and fixed by Gera Shegalov (test)
Some TestHftpFileSystem tests do not close streams
HDFS-5868. Major sub-task reported by Taylor, Buddy and fixed by (datanode)
Make hsync implementation pluggable
HDFS-5866. Major sub-task reported by Akira AJISAKA and fixed by Akira AJISAKA (tools)
'-maxSize' and '-step' option fail in OfflineImageViewer
HDFS-5859. Major bug reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe (datanode)
DataNode#checkBlockToken should check block tokens even if security is not enabled
HDFS-5857. Major bug reported by Mit Desai and fixed by Mit Desai
TestWebHDFS#testNamenodeRestart fails intermittently with NPE
HDFS-5856. Minor bug reported by Josh Elser and fixed by Josh Elser (datanode)
DataNode.checkDiskError might throw NPE
HDFS-5847. Major sub-task reported by Haohui Mai and fixed by Jing Zhao
Consolidate INodeReference into a separate section
HDFS-5846. Major bug reported by Nikola Vujic and fixed by Nikola Vujic (namenode)
Assigning DEFAULT_RACK in resolveNetworkLocation method can break data resiliency
HDFS-5843. Major bug reported by Laurent Goujon and fixed by Laurent Goujon (datanode)
DFSClient.getFileChecksum() throws IOException if checksum is disabled
HDFS-5840. Blocker bug reported by Aaron T. Myers and fixed by Jing Zhao (ha , journal-node , namenode)
Follow-up to HDFS-5138 to improve error handling during partial upgrade failures
HDFS-5828. Major bug reported by Taylor, Buddy and fixed by Taylor, Buddy (namenode)
BlockPlacementPolicyWithNodeGroup can place multiple replicas on the same node group when dfs.namenode.avoid.write.stale.datanode is true
HDFS-5821. Major bug reported by Gera Shegalov and fixed by Gera Shegalov (test)
TestHDFSCLI fails for user names with the dash character
HDFS-5810. Major sub-task reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe (hdfs-client)
Unify mmap cache and short-circuit file descriptor cache
HDFS-5807. Major bug reported by Mit Desai and fixed by Chen He (test)
TestBalancerWithNodeGroup.testBalancerWithNodeGroup fails intermittently on Branch-2
HDFS-5804. Major sub-task reported by Abin Shahab and fixed by Abin Shahab (nfs)
HDFS NFS Gateway fails to mount and proxy when using Kerberos

Fixes NFS on Kerberized cluster.
HDFS-5803. Major bug reported by Mit Desai and fixed by Chen He
TestBalancer.testBalancer0 fails
HDFS-5791. Major bug reported by Brandon Li and fixed by Haohui Mai (test)
TestHttpsFileSystem should use a random port to avoid binding error during testing
HDFS-5790. Major bug reported by Todd Lipcon and fixed by Todd Lipcon (namenode , performance)
LeaseManager.findPath is very slow when many leases need recovery

Committed to branch-2 and trunk.
HDFS-5781. Minor improvement reported by Jing Zhao and fixed by Jing Zhao (namenode)
Use an array to record the mapping between FSEditLogOpCode and the corresponding byte value
HDFS-5780. Major bug reported by Mit Desai and fixed by Mit Desai
TestRBWBlockInvalidation times out intemittently on branch-2
HDFS-5776. Major improvement reported by Liang Xie and fixed by Liang Xie (hdfs-client)
Support 'hedged' reads in DFSClient

If a read from a block is slow, start up another parallel, 'hedged' read against a different block replica. We then take the result of which ever read returns first (the outstanding read is cancelled). This 'hedged' read feature will help rein in the outliers, the odd read that takes a long time because it hit a bad patch on the disc, etc. This feature is off by default. To enable this feature, set <code>dfs.client.hedged.read.threadpool.size</code> to a positive number. The threadpool size is how many threads to dedicate to the running of these 'hedged', concurrent reads in your client. Then set <code>dfs.client.hedged.read.threshold.millis</code> to the number of milliseconds to wait before starting up a 'hedged' read. For example, if you set this property to 10, then if a read has not returned within 10 milliseconds, we will start up a new read against a different block replica. This feature emits new metrics: + hedgedReadOps + hedgeReadOpsWin -- how many times the hedged read 'beat' the original read + hedgedReadOpsInCurThread -- how many times we went to do a hedged read but we had to run it in the current thread because dfs.client.hedged.read.threadpool.size was at a maximum.
HDFS-5775. Major improvement reported by Haohui Mai and fixed by Haohui Mai (namenode)
Consolidate the code for serialization in CacheManager
HDFS-5768. Major improvement reported by Haohui Mai and fixed by Haohui Mai (namenode)
Consolidate the serialization code in DelegationTokenSecretManager
HDFS-5767. Blocker bug reported by Yongjun Zhang and fixed by Yongjun Zhang (nfs)
NFS implementation assumes userName userId mapping to be unique, which is not true sometimes
HDFS-5759. Major bug reported by Haohui Mai and fixed by Haohui Mai
Web UI does not show up during the period of loading FSImage
HDFS-5746. Major sub-task reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe (datanode , hdfs-client)
add ShortCircuitSharedMemorySegment
HDFS-5742. Minor bug reported by Arpit Agarwal and fixed by Arpit Agarwal (test)
DatanodeCluster (mini cluster of DNs) fails to start
HDFS-5726. Minor sub-task reported by Jing Zhao and fixed by Jing Zhao (namenode)
Fix compilation error in AbstractINodeDiff for JDK7
HDFS-5716. Major bug reported by Haohui Mai and fixed by Haohui Mai (webhdfs)
Allow WebHDFS to use pluggable authentication filter
HDFS-5715. Major sub-task reported by Jing Zhao and fixed by Jing Zhao (namenode)
Use Snapshot ID to indicate the corresponding Snapshot for a FileDiff/DirectoryDiff
HDFS-5709. Major improvement reported by Andrew Wang and fixed by Andrew Wang (namenode)
Improve NameNode upgrade with existing reserved paths and path components
HDFS-5705. Major bug reported by Ted Yu and fixed by Ted Yu (datanode)
TestSecondaryNameNodeUpgrade#testChangeNsIDFails may fail due to ConcurrentModificationException
HDFS-5698. Major improvement reported by Haohui Mai and fixed by Haohui Mai (namenode)
Use protobuf to serialize / deserialize FSImage

Use protobuf to serialize/deserialize the FSImage.
HDFS-5672. Major test reported by Ted Yu and fixed by Jing Zhao (namenode)
TestHASafeMode#testSafeBlockTracking fails in trunk
HDFS-5647. Major sub-task reported by Haohui Mai and fixed by Haohui Mai (namenode)
Merge INodeDirectory.Feature and INodeFile.Feature
HDFS-5638. Major sub-task reported by Chris Nauroth and fixed by Vinayakumar B (hdfs-client)
HDFS implementation of FileContext API for ACLs.
HDFS-5632. Major sub-task reported by Jing Zhao and fixed by Jing Zhao (namenode)
Add Snapshot feature to INodeDirectory
HDFS-5626. Major bug reported by Stephen Chu and fixed by Colin Patrick McCabe (caching)
dfsadmin -report shows incorrect cache values
HDFS-5554. Major sub-task reported by Jing Zhao and fixed by Jing Zhao (namenode)
Add Snapshot Feature to INodeFile
HDFS-5537. Major sub-task reported by Jing Zhao and fixed by Jing Zhao (namenode , snapshots)
Remove FileWithSnapshot interface
HDFS-5535. Major new feature reported by Nathan Roberts and fixed by Tsz Wo Nicholas Sze (datanode , ha , hdfs-client , namenode)
Umbrella jira for improved HDFS rolling upgrades
HDFS-5531. Minor sub-task reported by Tsz Wo Nicholas Sze and fixed by Tsz Wo Nicholas Sze (namenode)
Combine the getNsQuota() and getDsQuota() methods in INode
HDFS-5516. Major bug reported by Chris Nauroth and fixed by Miodrag Radulovic (webhdfs)
WebHDFS does not require user name when anonymous http requests are disallowed.
HDFS-5492. Minor bug reported by Akira AJISAKA and fixed by Akira AJISAKA (documentation)
Port HDFS-2069 (Incorrect default trash interval in the docs) to trunk
HDFS-5483. Major sub-task reported by Arpit Agarwal and fixed by Arpit Agarwal (namenode)
NN should gracefully handle multiple block replicas on same DN
HDFS-5339. Major bug reported by Stephen Chu and fixed by Haohui Mai (webhdfs)
WebHDFS URI does not accept logical nameservices when security is enabled
HDFS-5321. Major sub-task reported by Haohui Mai and fixed by Haohui Mai
Clean up the HTTP-related configuration in HDFS

dfs.http.port and dfs.https.port are removed. Filesystem clients, such as WebHdfsFileSystem, now have fixed instead of configurable default ports (i.e., 50070 for http and 50470 for https). Users can explicitly specify the port in the URI to access the file system which runs on non-default ports.
HDFS-5318. Major improvement reported by Eric Sirianni and fixed by (namenode)
Support read-only and read-write paths to shared replicas
HDFS-5286. Major sub-task reported by Tsz Wo Nicholas Sze and fixed by Tsz Wo Nicholas Sze (namenode)
Flatten INodeDirectory hierarchy: add DirectoryWithQuotaFeature
HDFS-5285. Major sub-task reported by Tsz Wo Nicholas Sze and fixed by Jing Zhao (namenode)
Flatten INodeFile hierarchy: Add UnderContruction Feature
HDFS-5244. Major bug reported by Jinghui Wang and fixed by Jinghui Wang (test)
TestNNStorageRetentionManager#testPurgeMultipleDirs fails
HDFS-5167. Minor sub-task reported by Jing Zhao and fixed by Tsuyoshi OZAWA (ha , namenode)
Add metrics about the NameNode retry cache
HDFS-5153. Major improvement reported by Arpit Agarwal and fixed by Arpit Agarwal (datanode)
Datanode should send block reports for each storage in a separate message
HDFS-5138. Blocker bug reported by Kihwal Lee and fixed by Aaron T. Myers
Support HDFS upgrade in HA
HDFS-5064. Major bug reported by Aaron T. Myers and fixed by Aaron T. Myers (ha , namenode)
Standby checkpoints should not block concurrent readers
HDFS-4911. Minor improvement reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe
Reduce PeerCache timeout to be commensurate with dfs.datanode.socket.reuse.keepalive
HDFS-4858. Minor bug reported by Jagane Sundar and fixed by Henry Wang (datanode)
HDFS DataNode to NameNode RPC should timeout
HDFS-4685. Major new feature reported by Sachin Jose and fixed by Chris Nauroth (hdfs-client , namenode , security)
Implementation of ACLs in HDFS

HDFS now supports ACLs (Access Control Lists). ACLs can specify fine-grained file permissions for specific named users or named groups.
HDFS-4564. Blocker sub-task reported by Daryn Sharp and fixed by Daryn Sharp (webhdfs)
Webhdfs returns incorrect http response codes for denied operations
HDFS-4370. Major improvement reported by Konstantin Shvachko and fixed by Chu Tong (datanode)
Fix typo Blanacer in DataNode

I just committed this. Thank you Chu.
HDFS-4200. Major improvement reported by Suresh Srinivas and fixed by Andrew Wang (datanode)
Reduce the size of synchronized sections in PacketResponder
HDFS-3969. Major bug reported by Todd Lipcon and fixed by Todd Lipcon (hdfs-client)
Small bug fixes and improvements for disk locations API
HDFS-3405. Major improvement reported by Aaron T. Myers and fixed by Vinayakumar B
Checkpointing should use HTTP POST or PUT instead of GET-GET to send merged fsimages
HDFS-3128. Minor bug reported by Eli Collins and fixed by Andrew Wang (test)
Unit tests should not use a test root in /tmp
HADOOP-10450. Major bug reported by Chris Nauroth and fixed by Chris Nauroth (io , native)
Build zlib native code bindings in hadoop.dll for Windows.
HADOOP-10449. Minor sub-task reported by Tsz Wo Nicholas Sze and fixed by Tsz Wo Nicholas Sze (security)
Fix the javac warnings in the security packages.
HADOOP-10442. Blocker bug reported by Kihwal Lee and fixed by Kihwal Lee
Group look-up can cause segmentation fault when certain JNI-based mapping module is used.
HADOOP-10441. Blocker bug reported by Jing Zhao and fixed by Jing Zhao (metrics)
Namenode metric "rpc.RetryCache/NameNodeRetryCache.CacheHit" can't be correctly processed by Ganglia
HADOOP-10440. Major bug reported by guodongdong and fixed by guodongdong (fs)
HarFsInputStream of HarFileSystem, when reading data, computing the position has bug
HADOOP-10437. Minor sub-task reported by Tsz Wo Nicholas Sze and fixed by Tsz Wo Nicholas Sze (conf , util)
Fix the javac warnings in the conf and the util package
HADOOP-10425. Critical bug reported by Brandon Li and fixed by Tsz Wo Nicholas Sze (fs)
Incompatible behavior of LocalFileSystem:getContentSummary
HADOOP-10423. Minor improvement reported by Chris Nauroth and fixed by Chris Nauroth (documentation)
Clarify compatibility policy document for combination of new client and old server.
HADOOP-10422. Minor bug reported by Chris Nauroth and fixed by Chris Nauroth (ipc)
Remove redundant logging of RPC retry attempts.
HADOOP-10407. Minor sub-task reported by Tsz Wo Nicholas Sze and fixed by Tsz Wo Nicholas Sze (ipc)
Fix the javac warnings in the ipc package.
HADOOP-10399. Major sub-task reported by Chris Nauroth and fixed by Vinayakumar B (fs)
FileContext API for ACLs.
HADOOP-10395. Minor bug reported by Arpit Agarwal and fixed by Arpit Agarwal (test)
TestCallQueueManager is flaky
HADOOP-10394. Major bug reported by Arpit Agarwal and fixed by Arpit Agarwal (test)
TestAuthenticationFilter is flaky
HADOOP-10393. Minor sub-task reported by Tsz Wo Nicholas Sze and fixed by Tsz Wo Nicholas Sze (security)
Fix hadoop-auth javac warnings
HADOOP-10386. Minor improvement reported by Arpit Gupta and fixed by Haohui Mai (ha)
Log proxy hostname in various exceptions being thrown in a HA setup
HADOOP-10383. Major improvement reported by Enis Soztutar and fixed by Enis Soztutar
InterfaceStability annotations should have RetentionPolicy.RUNTIME
HADOOP-10379. Major improvement reported by Haohui Mai and fixed by Haohui Mai
Protect authentication cookies with the HttpOnly and Secure flags
HADOOP-10374. Major improvement reported by Enis Soztutar and fixed by Enis Soztutar
InterfaceAudience annotations should have RetentionPolicy.RUNTIME
HADOOP-10368. Minor bug reported by Ted Yu and fixed by Tsuyoshi OZAWA (util)
InputStream is not closed in VersionInfo ctor
HADOOP-10355. Major bug reported by Akira AJISAKA and fixed by Haohui Mai
TestLoadGenerator#testLoadGenerator fails
HADOOP-10353. Major bug reported by Tudor Scurtu and fixed by Tudor Scurtu (fs)
FsUrlStreamHandlerFactory is not thread safe
HADOOP-10348. Major improvement reported by Haohui Mai and fixed by Haohui Mai
Deprecate hadoop.ssl.configuration in branch-2, and remove it in trunk
HADOOP-10346. Blocker bug reported by Jason Lowe and fixed by Jason Lowe (security)
Deadlock while logging tokens
HADOOP-10343. Minor improvement reported by Arpit Gupta and fixed by Arpit Gupta
Change info to debug log in LossyRetryInvocationHandler
HADOOP-10338. Major bug reported by Andrew Wang and fixed by Colin Patrick McCabe
Cannot get the FileStatus of the root inode from the new Globber
HADOOP-10337. Major bug reported by Liang Xie and fixed by Liang Xie (metrics)
ConcurrentModificationException from MetricsDynamicMBeanBase.createMBeanInfo()
HADOOP-10333. Trivial improvement reported by René Nyffenegger and fixed by René Nyffenegger
Fix grammatical error in overview.html document
HADOOP-10330. Major bug reported by Arpit Agarwal and fixed by Arpit Agarwal (test)
TestFrameDecoder fails if it cannot bind port 12345
HADOOP-10328. Major bug reported by Arpit Gupta and fixed by Haohui Mai (tools)
loadGenerator exit code is not reliable
HADOOP-10327. Blocker bug reported by Vinayakumar B and fixed by Vinayakumar B (native)
Trunk windows build broken after HDFS-5746
HADOOP-10326. Major bug reported by Manuel DE FERRAN and fixed by bc Wong (security)
M/R jobs can not access S3 if Kerberos is enabled
HADOOP-10320. Trivial bug reported by René Nyffenegger and fixed by René Nyffenegger (documentation)
Javadoc in InterfaceStability.java lacks final </ul>
HADOOP-10314. Major bug reported by Kihwal Lee and fixed by Rushabh S Shah
The ls command help still shows outdated 0.16 format.
HADOOP-10301. Blocker bug reported by Daryn Sharp and fixed by Daryn Sharp (security)
AuthenticationFilter should return Forbidden for failed authentication
HADOOP-10295. Major improvement reported by Jing Zhao and fixed by Jing Zhao (tools/distcp)
Allow distcp to automatically identify the checksum type of source files and use it for the target

Add option for distcp to preserve the checksum type of the source files. Users can use "-pc" as distcp command option to preserve the checksum type.
HADOOP-10285. Major sub-task reported by Chris Li and fixed by
Admin interface to swap callqueue at runtime
HADOOP-10280. Major sub-task reported by Chris Li and fixed by Chris Li
Make Schedulables return a configurable identity of user or group
HADOOP-10278. Major sub-task reported by Chris Li and fixed by Chris Li (ipc)
Refactor to make CallQueue pluggable
HADOOP-10249. Major bug reported by Dilli Arumugam and fixed by Dilli Arumugam
LdapGroupsMapping should trim ldap password read from file
HADOOP-10221. Major improvement reported by Benoy Antony and fixed by Benoy Antony (security)
Add a plugin to specify SaslProperties for RPC protocol based on connection properties
HADOOP-10211. Major improvement reported by Benoy Antony and fixed by Benoy Antony (security)
Enable RPC protocol to negotiate SASL-QOP values between clients and servers

The hadoop.rpc.protection configuration property previously supported specifying a single value: one of authentication, integrity or privacy. An unrecognized value was silently assumed to mean authentication. This configuration property now accepts a comma-separated list of any of the 3 values, and unrecognized values are rejected with an error. Existing configurations containing an invalid value must be corrected. If the property is empty or not specified, authentication is assumed.
HADOOP-10191. Blocker bug reported by Gera Shegalov and fixed by Gera Shegalov (viewfs)
Missing executable permission on viewfs internal dirs
HADOOP-10184. Major new feature reported by Chris Nauroth and fixed by Chris Nauroth (fs , security)
Hadoop Common changes required to support HDFS ACLs.
HADOOP-10139. Major improvement reported by Akira AJISAKA and fixed by Akira AJISAKA (documentation)
Update and improve the Single Cluster Setup document
HADOOP-10085. Blocker bug reported by Karthik Kambatla and fixed by Steve Loughran
CompositeService should allow adding services while being inited
HADOOP-10070. Major bug reported by Aaron T. Myers and fixed by Aaron T. Myers (security)
RPC client doesn't use per-connection conf to determine server's expected Kerberos principal name
HADOOP-10015. Minor bug reported by Haohui Mai and fixed by Nicolas Liochon (security)
UserGroupInformation prints out excessive ERROR warnings
HADOOP-9525. Major test reported by Ivan Mitic and fixed by Ivan Mitic (test , util)
Add tests that validate winutils chmod behavior on folders
HADOOP-9454. Major improvement reported by Jordan Mendelson and fixed by Akira AJISAKA (fs/s3)
Support multipart uploads for s3native
HADOOP-8691. Minor improvement reported by Jason Lowe and fixed by Daryn Sharp (fs)
FsShell can print "Found xxx items" unnecessarily often

The `ls` command only prints "Found foo items" once when listing the directories recursively.

Hadoop 2.3.0 Release Notes

These release notes include new developer and user-facing incompatibilities, features, and major improvements.

Changes since Hadoop 2.2.0

YARN-1642. Blocker sub-task reported by Karthik Kambatla and fixed by Karthik Kambatla (resourcemanager)
RMDTRenewer#getRMClient should use ClientRMProxy

RMDTRenewer#getRMClient gets a proxy to the RM in the conf directly instead of going through ClientRMProxy. {code} final YarnRPC rpc = YarnRPC.create(conf); return (ApplicationClientProtocol)rpc.getProxy(ApplicationClientProtocol.class, addr, conf); {code}
YARN-1630. Major bug reported by Aditya Acharya and fixed by Aditya Acharya (client)
Introduce timeout for async polling operations in YarnClientImpl

I ran an MR2 application that would have been long running, and killed it programmatically using a YarnClient. The app was killed, but the client hung forever. The message that I saw, which spammed the logs, was "Watiting for application application_1389036507624_0018 to be killed." The RM log indicated that the app had indeed transitioned from RUNNING to KILLED, but for some reason future responses to the RPC to kill the application did not indicate that the app had been terminated. I tracked this down to YarnClientImpl.java, and though I was unable to reproduce the bug, I wrote a patch to introduce a bound on the number of times that YarnClientImpl retries the RPC before giving up.
YARN-1629. Major bug reported by Sandy Ryza and fixed by Sandy Ryza (scheduler)
IndexOutOfBoundsException in Fair Scheduler MaxRunningAppsEnforcer

This can occur when the second-to-last app in a queue's pending app list is made runnable. The app is pulled out from under the iterator.
YARN-1628. Major bug reported by Mit Desai and fixed by Vinod Kumar Vavilapalli
TestContainerManagerSecurity fails on trunk

The Test fails with the following error {noformat} java.lang.IllegalArgumentException: java.net.UnknownHostException: InvalidHost at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:377) at org.apache.hadoop.yarn.server.security.BaseNMTokenSecretManager.newInstance(BaseNMTokenSecretManager.java:145) at org.apache.hadoop.yarn.server.security.BaseNMTokenSecretManager.createNMToken(BaseNMTokenSecretManager.java:136) at org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testNMTokens(TestContainerManagerSecurity.java:253) at org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testContainerManager(TestContainerManagerSecurity.java:144) {noformat}
YARN-1624. Major bug reported by Aditya Acharya and fixed by Aditya Acharya (scheduler)
QueuePlacementPolicy format is not easily readable via a JAXB parser

The current format for specifying queue placement rules in the fair scheduler allocations file does not lend itself to easy parsing via a JAXB parser. In particular, relying on the tag name to encode information about which rule to use makes it very difficult for an xsd-based JAXB parser to preserve the order of the rules, which is essential.
YARN-1623. Major improvement reported by Sandy Ryza and fixed by Sandy Ryza (scheduler)
Include queue name in RegisterApplicationMasterResponse

This provides the YARN change necessary to support MAPREDUCE-5732.
YARN-1618. Blocker sub-task reported by Karthik Kambatla and fixed by Karthik Kambatla (resourcemanager)
Fix invalid RMApp transition from NEW to FINAL_SAVING

YARN-891 augments the RMStateStore to store information on completed applications. In the process, it adds transitions from NEW to FINAL_SAVING. This leads to the RM trying to update entries in the state-store that do not exist. On ZKRMStateStore, this leads to the RM crashing. Previous description: ZKRMStateStore fails to handle updates to znodes that don't exist. For instance, this can happen when an app transitions from NEW to FINAL_SAVING. In these cases, the store should create the missing znode and handle the update.
YARN-1616. Trivial improvement reported by Karthik Kambatla and fixed by Karthik Kambatla (resourcemanager)
RMFatalEventDispatcher should log the cause of the event

RMFatalEventDispatcher#handle() logs the receipt of an event and its type, but leaves out the cause. The cause captures why the event was raised and would help debugging issues.
YARN-1608. Trivial bug reported by Karthik Kambatla and fixed by Karthik Kambatla (nodemanager)
LinuxContainerExecutor has a few DEBUG messages at INFO level

LCE has a few INFO level log messages meant to be at debug level. In fact, they are logged both at INFO and DEBUG.
YARN-1607. Major bug reported by Sandy Ryza and fixed by Sandy Ryza
TestRM expects the capacity scheduler

We should either explicitly set the Capacity Scheduler or make it scheduler-agnostic
YARN-1603. Trivial bug reported by Zhijie Shen and fixed by Zhijie Shen
Remove two *.orig files which were unexpectedly committed

FairScheduler.java.orig and TestFifoScheduler.java.orig
YARN-1601. Major bug reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur
3rd party JARs are missing from hadoop-dist output

With the build changes of YARN-888 we are leaving out all 3rd party JArs used directly by YARN under /share/hadoop/yarn/lib/. We did not notice this when running minicluster because they all happen to be in the classpath from hadoop-common and hadoop-yarn. As 3d party JARs are not 'public' interfaces we cannot rely on them being provided to yarn by common and hdfs. (ie if common and hdfs stop using a 3rd party dependency that yarn uses this would break yarn if yarn does not pull that dependency explicitly). Also, this will break bigtop hadoop build when they move to use branch-2 as they expect to find jars in /share/hadoop/yarn/lib/
YARN-1600. Blocker bug reported by Jason Lowe and fixed by Haohui Mai (resourcemanager)
RM does not startup when security is enabled without spnego configured

We have a custom auth filter in front of our various UI pages that handles user authentication. However currently the RM assumes that if security is enabled then the user must have configured spnego as well for the RM web pages which is not true in our case.
YARN-1598. Critical sub-task reported by Karthik Kambatla and fixed by Karthik Kambatla (client , resourcemanager)
HA-related rmadmin commands don't work on a secure cluster

The HA-related commands like -getServiceState -checkHealth etc. don't work in a secure cluster.
YARN-1579. Trivial sub-task reported by Karthik Kambatla and fixed by Karthik Kambatla (resourcemanager)
ActiveRMInfoProto fields should be optional

Per discussion on YARN-1568, ActiveRMInfoProto should have optional fields instead of required fields.
YARN-1575. Critical sub-task reported by Jason Lowe and fixed by Jason Lowe (nodemanager)
Public localizer crashes with "Localized unkown resource"

The public localizer can crash with the error: {noformat} 2014-01-08 14:11:43,212 [Thread-467] ERROR org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Localized unkonwn resource to java.util.concurrent.FutureTask@852e26 2014-01-08 14:11:43,212 [Thread-467] INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Public cache exiting {noformat}
YARN-1574. Blocker sub-task reported by Xuan Gong and fixed by Xuan Gong
RMDispatcher should be reset on transition to standby

Currently, we move rmDispatcher out of ActiveService. But we still register the Event dispatcher, such as schedulerDispatcher, RMAppEventDispatcher when we initiate the ActiveService. Almost every time when we transit RM from Active to Standby, we need to initiate the ActiveService. That means we will register the same event Dispatcher which will cause the same event will be handled several times.
YARN-1573. Major sub-task reported by Karthik Kambatla and fixed by Karthik Kambatla (resourcemanager)
ZK store should use a private password for root-node-acls

Currently, when HA is enabled, ZK store uses cluster-timestamp as the password for root node ACLs to give the Active RM exclusive access to the store. A more private value like a random number might be better.
YARN-1568. Trivial task reported by Karthik Kambatla and fixed by Karthik Kambatla (resourcemanager)
Rename clusterid to clusterId in ActiveRMInfoProto

YARN-1029 introduces ActiveRMInfoProto - just realized it defines a field clusterid, which is inconsistent with other fields. Better to fix it immediately than leave the inconsistency.
YARN-1567. Major improvement reported by Sandy Ryza and fixed by Sandy Ryza (scheduler)
In Fair Scheduler, allow empty queues to change between leaf and parent on allocation file reload
YARN-1560. Major test reported by Ted Yu and fixed by Ted Yu
TestYarnClient#testAMMRTokens fails with null AMRM token

The following can be reproduced locally: {code} testAMMRTokens(org.apache.hadoop.yarn.client.api.impl.TestYarnClient) Time elapsed: 3.341 sec <<< FAILURE! junit.framework.AssertionFailedError: null at junit.framework.Assert.fail(Assert.java:48) at junit.framework.Assert.assertTrue(Assert.java:20) at junit.framework.Assert.assertNotNull(Assert.java:218) at junit.framework.Assert.assertNotNull(Assert.java:211) at org.apache.hadoop.yarn.client.api.impl.TestYarnClient.testAMMRTokens(TestYarnClient.java:382) {code} This test didn't appear in https://builds.apache.org/job/Hadoop-Yarn-trunk/442/consoleFull
YARN-1559. Blocker sub-task reported by Karthik Kambatla and fixed by Karthik Kambatla (resourcemanager)
Race between ServerRMProxy and ClientRMProxy setting RMProxy#INSTANCE

RMProxy#INSTANCE is a non-final static field and both ServerRMProxy and ClientRMProxy set it. This leads to races as witnessed on - YARN-1482. Sample trace: {noformat} java.lang.IllegalArgumentException: RM does not support this client protocol at com.google.common.base.Preconditions.checkArgument(Preconditions.java:88) at org.apache.hadoop.yarn.client.ClientRMProxy.checkAllowedProtocols(ClientRMProxy.java:119) at org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider.init(ConfiguredRMFailoverProxyProvider.java:58) at org.apache.hadoop.yarn.client.RMProxy.createRMFailoverProxyProvider(RMProxy.java:158) at org.apache.hadoop.yarn.client.RMProxy.createRMProxy(RMProxy.java:88) at org.apache.hadoop.yarn.server.api.ServerRMProxy.createRMProxy(ServerRMProxy.java:56) {noformat}
YARN-1549. Major test reported by Ted Yu and fixed by haosdent
TestUnmanagedAMLauncher#testDSShell fails in trunk

The following error is reproducible: {code} testDSShell(org.apache.hadoop.yarn.applications.unmanagedamlauncher.TestUnmanagedAMLauncher) Time elapsed: 14.911 sec <<< ERROR! java.lang.RuntimeException: Failed to receive final expected state in ApplicationReport, CurrentState=RUNNING, ExpectedStates=FINISHED,FAILED,KILLED at org.apache.hadoop.yarn.applications.unmanagedamlauncher.UnmanagedAMLauncher.monitorApplication(UnmanagedAMLauncher.java:447) at org.apache.hadoop.yarn.applications.unmanagedamlauncher.UnmanagedAMLauncher.run(UnmanagedAMLauncher.java:352) at org.apache.hadoop.yarn.applications.unmanagedamlauncher.TestUnmanagedAMLauncher.testDSShell(TestUnmanagedAMLauncher.java:147) {code} See https://builds.apache.org/job/Hadoop-Yarn-trunk/435
YARN-1541. Major bug reported by Jian He and fixed by Jian He
Invalidate AM Host/Port when app attempt is done so that in the mean-while client doesn’t get wrong information.
YARN-1527. Trivial bug reported by Jian He and fixed by Akira AJISAKA
yarn rmadmin command prints wrong usage info:

The usage should be: yarn rmadmin, instead of java RMAdmin, and the -refreshQueues should be in the second line. {code} Usage: java RMAdmin -refreshQueues -refreshNodes -refreshSuperUserGroupsConfiguration -refreshUserToGroupsMappings -refreshAdminAcls -refreshServiceAcl -getGroups [username] -help [cmd] -transitionToActive <serviceId> -transitionToStandby <serviceId> -failover [--forcefence] [--forceactive] <serviceId> <serviceId> -getServiceState <serviceId> -checkHealth <serviceId> {code}
YARN-1523. Major sub-task reported by Bikas Saha and fixed by Karthik Kambatla
Use StandbyException instead of RMNotYetReadyException
YARN-1522. Major bug reported by Liyin Liang and fixed by Liyin Liang
TestApplicationCleanup.testAppCleanup occasionally fails

TestApplicationCleanup is occasionally failing with the error: {code} ------------------------------------------------------------------------------- Test set: org.apache.hadoop.yarn.server.resourcemanager.TestApplicationCleanup ------------------------------------------------------------------------------- Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 6.215 sec <<< FAILURE! - in org.apache.hadoop.yarn.server.resourcemanager.TestApplicationCleanup testAppCleanup(org.apache.hadoop.yarn.server.resourcemanager.TestApplicationCleanup) Time elapsed: 5.555 sec <<< FAILURE! junit.framework.AssertionFailedError: expected:<1> but was:<0> at org.apache.hadoop.yarn.server.resourcemanager.TestApplicationCleanup.testAppCleanup(TestApplicationCleanup.java:119) {code}
YARN-1505. Blocker bug reported by Xuan Gong and fixed by Xuan Gong
WebAppProxyServer should not set localhost as YarnConfiguration.PROXY_ADDRESS by itself

At WebAppProxyServer::startServer(), it will set up YarnConfiguration.PROXY_ADDRESS to localhost:9099 by itself. So, no matter what is the value we set YarnConfiguration.PROXY_ADDRESS in configuration, the proxyserver will bind to localhost:9099
YARN-1491. Trivial bug reported by Jonathan Eagles and fixed by Chen He
Upgrade JUnit3 TestCase to JUnit 4

There are still four references to test classes that extend from junit.framework.TestCase hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/TestYarnVersionInfo.java hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/TestWindowsResourceCalculatorPlugin.java hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/TestLinuxResourceCalculatorPlugin.java hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/TestWindowsBasedProcessTree.java
YARN-1485. Major sub-task reported by Xuan Gong and fixed by Xuan Gong
Enabling HA should verify the RM service addresses configurations have been set for every RM Ids defined in RM_HA_IDs

After YARN-1325, the YarnConfiguration.RM_HA_IDS will contain multiple RM_Ids. We need to verify that the RM service addresses configurations have been set for all of RM_Ids.
YARN-1482. Major sub-task reported by Vinod Kumar Vavilapalli and fixed by Xuan Gong
WebApplicationProxy should be always-on w.r.t HA even if it is embedded in the RM

This way, even if an RM goes to standby mode, we can affect a redirect to the active. And more importantly, users will not suddenly see all their links stop working.
YARN-1481. Major sub-task reported by Vinod Kumar Vavilapalli and fixed by Vinod Kumar Vavilapalli
Move internal services logic from AdminService to ResourceManager

This is something I found while reviewing YARN-1318, but didn't halt that patch as many cycles went there already. Some top level issues - Not easy to follow RM's service life cycle -- RM adds only AdminService as its service directly. -- Other services are added to RM when AdminService's init calls RM.activeServices.init() - Overall, AdminService shouldn't encompass all of RM's HA state management. It was originally supposed to be the implementation of just the RPC server.
YARN-1463. Major test reported by Ted Yu and fixed by Vinod Kumar Vavilapalli
Tests should avoid starting http-server where possible or creates spnego keytab/principals

Here is stack trace: {code} testContainerManager[1](org.apache.hadoop.yarn.server.TestContainerManagerSecurity) Time elapsed: 1.756 sec <<< ERROR! org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.io.IOException: ResourceManager failed to start. Final state is STOPPED at org.apache.hadoop.yarn.server.MiniYARNCluster$ResourceManagerWrapper.serviceStart(MiniYARNCluster.java:253) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) at org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) at org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testContainerManager(TestContainerManagerSecurity.java:110) {code}
YARN-1454. Critical bug reported by Jian He and fixed by Karthik Kambatla
TestRMRestart.testRMDelegationTokenRestoredOnRMRestart is failing intermittently
YARN-1451. Minor bug reported by Sandy Ryza and fixed by Sandy Ryza
TestResourceManager relies on the scheduler assigning multiple containers in a single node update

TestResourceManager rely on the capacity scheduler. It relies on a scheduler that assigns multiple containers in a single heartbeat, which not all schedulers do by default. It also relies on schedulers that don't consider CPU capacities. It would be simple to change the test to use multiple heartbeats and increase the vcore capacities of the nodes in the test.
YARN-1450. Major bug reported by Akira AJISAKA and fixed by Binglin Chang (applications/distributed-shell)
TestUnmanagedAMLauncher#testDSShell fails on trunk

TestUnmanagedAMLauncher fails on trunk. The console output is {code} Running org.apache.hadoop.yarn.applications.unmanagedamlauncher.TestUnmanagedAMLauncher Tests run: 2, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 35.937 sec <<< FAILURE! - in org.apache.hadoop.yarn.applications.unmanagedamlauncher.TestUnmanagedAMLauncher testDSShell(org.apache.hadoop.yarn.applications.unmanagedamlauncher.TestUnmanagedAMLauncher) Time elapsed: 14.558 sec <<< ERROR! java.lang.RuntimeException: Failed to receive final expected state in ApplicationReport, CurrentState=ACCEPTED, ExpectedStates=FINISHED,FAILED,KILLED at org.apache.hadoop.yarn.applications.unmanagedamlauncher.UnmanagedAMLauncher.monitorApplication(UnmanagedAMLauncher.java:447) at org.apache.hadoop.yarn.applications.unmanagedamlauncher.UnmanagedAMLauncher.run(UnmanagedAMLauncher.java:352) at org.apache.hadoop.yarn.applications.unmanagedamlauncher.TestUnmanagedAMLauncher.testDSShell(TestUnmanagedAMLauncher.java:145) {code}
YARN-1448. Major sub-task reported by Wangda Tan and fixed by Wangda Tan (api , resourcemanager)
AM-RM protocol changes to support container resizing

As described in YARN-1197, we need add API in RM to support 1) Add increase request in AllocateRequest 2) Can get successfully increased/decreased containers from RM in AllocateResponse
YARN-1447. Major sub-task reported by Wangda Tan and fixed by Wangda Tan (api)
Common PB type definitions for container resizing

As described in YARN-1197, we need add some common PB types for container resource change, like ResourceChangeContext, etc. These types will be both used by RM/NM protocols
YARN-1446. Major sub-task reported by Jian He and fixed by Jian He (resourcemanager)
Change killing application to wait until state store is done

When user kills an application, it should wait until the state store is done with saving the killed status of the application. Otherwise, if RM crashes in the middle between user killing the application and writing the status to the store, RM will relaunch this application after it restarts.
YARN-1435. Major bug reported by Tassapol Athiapinya and fixed by Xuan Gong (applications/distributed-shell)
Distributed Shell should not run other commands except "sh", and run the custom script at the same time.

Currently, if we want to run custom script at DS. We can do it like this : --shell_command sh --shell_script custom_script.sh But it may be better to separate running shell_command and shell_script
YARN-1425. Major bug reported by Omkar Vinit Joshi and fixed by Omkar Vinit Joshi
TestRMRestart fails because MockRM.waitForState(AttemptId) uses current attempt instead of the attempt passed as argument

TestRMRestart is failing on trunk. Fixing it.
YARN-1423. Major improvement reported by Sandy Ryza and fixed by Ted Malaska (scheduler)
Support queue placement by secondary group in the Fair Scheduler
YARN-1419. Minor bug reported by Jonathan Eagles and fixed by Jonathan Eagles (scheduler)
TestFifoScheduler.testAppAttemptMetrics fails intermittently under jdk7

QueueMetrics holds its data in a static variable causing metrics to bleed over from test to test. clearQueueMetrics is to be called for tests that need to measure metrics correctly for a single test. jdk7 comes into play since tests are run out of order, and in the case make the metrics unreliable.
YARN-1416. Major bug reported by Omkar Vinit Joshi and fixed by Jian He
InvalidStateTransitions getting reported in multiple test cases even though they pass

It might be worth checking why they are reporting this. Testcase : TestRMAppTransitions, TestRM there are large number of such errors. can't handle RMAppEventType.APP_UPDATE_SAVED at RMAppState.FAILED
YARN-1411. Critical sub-task reported by Karthik Kambatla and fixed by Karthik Kambatla
HA config shouldn't affect NodeManager RPC addresses

When HA is turned on, {{YarnConfiguration#getSoketAddress()}} fetches rpc-addresses corresponding to the specified rm-id. This should only be for RM rpc-addresses. Other confs, like NM rpc-addresses shouldn't be affected by this. Currently, the NM address settings in yarn-site.xml aren't reflected in the actual ports.
YARN-1409. Major bug reported by Tsuyoshi OZAWA and fixed by Tsuyoshi OZAWA
NonAggregatingLogHandler can throw RejectedExecutionException

This problem is caused by handling APPLICATION_FINISHED events after calling sched.shotdown() in NonAggregatingLongHandler#serviceStop(). org.apache.hadoop.mapred.TestJobCleanup can fail because of RejectedExecutionException by NonAggregatingLogHandler. {code} 2013-11-13 10:53:06,970 FATAL [AsyncDispatcher event handler] event.AsyncDispatcher (AsyncDispatcher.java:dispatch(166)) - Error in dispatcher thread java.util.concurrent.RejectedExecutionException: Task java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@d51df63 rejected from java.util.concurrent.ScheduledThreadPoolExecutor@7a20e369[Shutting down, pool size = 4, active threads = 0, queued tasks = 7, completed tasks = 0] at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2048) at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:821) at java.util.concurrent.ScheduledThreadPoolExecutor.delayedExecute(ScheduledThreadPoolExecutor.java:325) at java.util.concurrent.ScheduledThreadPoolExecutor.schedule(ScheduledThreadPoolExecutor.java:530) at org.apache.hadoop.yarn.server.nodemanager.containermanager.loghandler.NonAggregatingLogHandler.handle(NonAggregatingLogHandler.java:121) at org.apache.hadoop.yarn.server.nodemanager.containermanager.loghandler.NonAggregatingLogHandler.handle(NonAggregatingLogHandler.java:49) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:159) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:95) at java.lang.Thread.run(Thread.java:724) {code}
YARN-1407. Major bug reported by Sandy Ryza and fixed by Sandy Ryza
RM Web UI and REST APIs should uniformly use YarnApplicationState

RMAppState isn't a public facing enum like YarnApplicationState, so we shouldn't return values or list filters that come from it. However, some Blocks and AppInfo are still using RMAppState. It is not 100% clear to me whether or not fixing this would be a backwards-incompatible change. The change would only reduce the set of possible strings that the API returns, so I think not. We have also been changing the contents of RMAppState since 2.2.0, e.g. in YARN-891. It would still be good to fix this ASAP (i.e. for 2.2.1).
YARN-1405. Major sub-task reported by Yesha Vora and fixed by Jian He
RM hangs on shutdown if calling system.exit in serviceInit or serviceStart

Enable yarn.resourcemanager.recovery.enabled=true and Pass a local path to yarn.resourcemanager.fs.state-store.uri. such as "file:///tmp/MYTMP" if the directory /tmp/MYTMP is not readable or writable, RM should crash and should print "Permission denied Error" Currently, RM throws "java.io.FileNotFoundException: File file:/tmp/MYTMP/FSRMStateRoot/RMDTSecretManagerRoot does not exist" Error. RM returns Exiting status 1 but RM process does not shutdown. Snapshot of Resource manager log: 2013-09-27 18:31:36,621 INFO security.NMTokenSecretManagerInRM (NMTokenSecretManagerInRM.java:rollMasterKey(97)) - Rolling master-key for nm-tokens 2013-09-27 18:31:36,694 ERROR resourcemanager.ResourceManager (ResourceManager.java:serviceStart(640)) - Failed to load/recover state java.io.FileNotFoundException: File file:/tmp/MYTMP/FSRMStateRoot/RMDTSecretManagerRoot does not exist at org.apache.hadoop.fs.RawLocalFileSystem.listStatus(RawLocalFileSystem.java:379) at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1478) at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1518) at org.apache.hadoop.fs.ChecksumFileSystem.listStatus(ChecksumFileSystem.java:564) at org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.loadRMDTSecretManagerState(FileSystemRMStateStore.java:188) at org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.loadState(FileSystemRMStateStore.java:112) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:635) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:855) 2013-09-27 18:31:36,697 INFO util.ExitUtil (ExitUtil.java:terminate(124)) - Exiting with status 1
YARN-1403. Major improvement reported by Sandy Ryza and fixed by Sandy Ryza
Separate out configuration loading from QueueManager in the Fair Scheduler
YARN-1401. Major bug reported by Gera Shegalov and fixed by Gera Shegalov (nodemanager)
With zero sleep-delay-before-sigkill.ms, no signal is ever sent

If you set in yarn-site.xml yarn.nodemanager.sleep-delay-before-sigkill.ms=0 then an unresponsive child JVM is never killed. In MRv1, TT used to immediately SIGKILL in this case.
YARN-1400. Trivial bug reported by Raja Aluri and fixed by Raja Aluri (resourcemanager)
yarn.cmd uses HADOOP_RESOURCEMANAGER_OPTS. Should be YARN_RESOURCEMANAGER_OPTS.

yarn.cmd uses HADOOP_RESOURCEMANAGER_OPTS. Should be YARN_RESOURCEMANAGER_OPTS.
YARN-1395. Major bug reported by Chris Nauroth and fixed by Chris Nauroth (applications/distributed-shell)
Distributed shell application master launched with debug flag can hang waiting for external ls process.

Distributed shell launched with the debug flag will run {{ApplicationMaster#dumpOutDebugInfo}}. This method launches an external process to run ls and print the contents of the current working directory. We've seen that this can cause the application master to hang on {{Process#waitFor}}.
YARN-1392. Major new feature reported by Sandy Ryza and fixed by Sandy Ryza (scheduler)
Allow sophisticated app-to-queue placement policies in the Fair Scheduler

Currently the Fair Scheduler supports app-to-queue placement by username. It would be beneficial to allow more sophisticated policies that rely on primary and secondary groups and fallbacks.
YARN-1388. Trivial bug reported by Liyin Liang and fixed by Liyin Liang (resourcemanager)
Fair Scheduler page always displays blank fair share

YARN-1044 fixed min/max/used resource display problem in the scheduler page. But the "Fair Share" has the same problem and need to fix it.
YARN-1387. Major improvement reported by Karthik Kambatla and fixed by Karthik Kambatla (api)
RMWebServices should use ClientRMService for filtering applications

YARN's REST API allows filtering applications, this should be moved to ClientRMService to allow Java API also support the same functionality.
YARN-1386. Critical bug reported by Jason Lowe and fixed by Jason Lowe (nodemanager)
NodeManager mistakenly loses resources and relocalizes them

When a local resource that should already be present is requested again, the nodemanager checks to see if it still present. However the method it uses to check for presence is via File.exists() as the user of the nodemanager process. If the resource was a private resource localized for another user, it will be localized to a location that is not accessible by the nodemanager user. Therefore File.exists() returns false, the nodemanager mistakenly believes the resource is no longer available, and it proceeds to localize it over and over.
YARN-1381. Minor bug reported by Ted Yu and fixed by Ted Yu
Same relaxLocality appears twice in exception message of AMRMClientImpl#checkLocalityRelaxationConflict()

Here is related code: {code} throw new InvalidContainerRequestException("Cannot submit a " + "ContainerRequest asking for location " + location + " with locality relaxation " + relaxLocality + " when it has " + "already been requested with locality relaxation " + relaxLocality); {code} The last relaxLocality should be reqs.values().iterator().next().remoteRequest.getRelaxLocality()
YARN-1378. Major sub-task reported by Jian He and fixed by Jian He (resourcemanager)
Implement a RMStateStore cleaner for deleting application/attempt info

Now that we are storing the final state of application/attempt instead of removing application/attempt info on application/attempt completion(YARN-891), we need a separate RMStateStore cleaner for cleaning the application/attempt state.
YARN-1374. Blocker bug reported by Devaraj K and fixed by Karthik Kambatla (resourcemanager)
Resource Manager fails to start due to ConcurrentModificationException

Resource Manager is failing to start with the below ConcurrentModificationException. {code:xml} 2013-10-30 20:22:42,371 INFO org.apache.hadoop.util.HostsFileReader: Refreshing hosts (include/exclude) list 2013-10-30 20:22:42,376 INFO org.apache.hadoop.service.AbstractService: Service ResourceManager failed in state INITED; cause: java.util.ConcurrentModificationException java.util.ConcurrentModificationException at java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372) at java.util.AbstractList$Itr.next(AbstractList.java:343) at java.util.Collections$UnmodifiableCollection$1.next(Collections.java:1010) at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:187) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:944) 2013-10-30 20:22:42,378 INFO org.apache.hadoop.yarn.server.resourcemanager.RMHAProtocolService: Transitioning to standby 2013-10-30 20:22:42,378 INFO org.apache.hadoop.yarn.server.resourcemanager.RMHAProtocolService: Transitioned to standby 2013-10-30 20:22:42,378 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error starting ResourceManager java.util.ConcurrentModificationException at java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372) at java.util.AbstractList$Itr.next(AbstractList.java:343) at java.util.Collections$UnmodifiableCollection$1.next(Collections.java:1010) at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:187) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:944) 2013-10-30 20:22:42,379 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: SHUTDOWN_MSG: /************************************************************ SHUTDOWN_MSG: Shutting down ResourceManager at HOST-10-18-40-24/10.18.40.24 ************************************************************/ {code}
YARN-1358. Minor test reported by Chuan Liu and fixed by Chuan Liu (client)
TestYarnCLI fails on Windows due to line endings

The unit test fails on Windows due to incorrect line endings was used for comparing the output from command line output. Error messages are as follows. {noformat} junit.framework.ComparisonFailure: expected:<...argument for options[] usage: application ...> but was:<...argument for options[ ] usage: application ...> at junit.framework.Assert.assertEquals(Assert.java:85) at junit.framework.Assert.assertEquals(Assert.java:91) at org.apache.hadoop.yarn.client.cli.TestYarnCLI.testMissingArguments(TestYarnCLI.java:878) {noformat}
YARN-1357. Minor test reported by Chuan Liu and fixed by Chuan Liu (nodemanager)
TestContainerLaunch.testContainerEnvVariables fails on Windows

This test fails on Windows due to incorrect use of batch script command. Error messages are as follows. {noformat} junit.framework.AssertionFailedError: expected:<java.nio.HeapByteBuffer[pos=0 lim=19 cap=19]> but was:<java.nio.HeapByteBuffer[pos=0 lim=19 cap=19]> at junit.framework.Assert.fail(Assert.java:50) at junit.framework.Assert.failNotEquals(Assert.java:287) at junit.framework.Assert.assertEquals(Assert.java:67) at junit.framework.Assert.assertEquals(Assert.java:74) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch.testContainerEnvVariables(TestContainerLaunch.java:508) {noformat}
YARN-1351. Trivial bug reported by Konstantin Weitz and fixed by Konstantin Weitz (resourcemanager)
Invalid string format in Fair Scheduler log warn message

While trying to print a warning, two values of the wrong type (Resource instead of int) are passed into a String.format method call, leading to a runtime exception, in the file: _trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueueManager.java_. The warning was intended to be printed whenever the resources don't fit into each other, either because the number of virtual cores or the memory is too small. I changed the %d's into %s, this way the warning will contain both the cores and the memory.
YARN-1349. Major bug reported by Chris Nauroth and fixed by Chris Nauroth (client)
yarn.cmd does not support passthrough to any arbitrary class.

The yarn shell script supports passthrough to calling any arbitrary class if the first argument is not one of the per-defined sub-commands. The equivalent cmd script does not implement this and instead fails trying to do a labeled goto to the first argument.
YARN-1343. Critical bug reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur (resourcemanager)
NodeManagers additions/restarts are not reported as node updates in AllocateResponse responses to AMs

If a NodeManager joins the cluster or gets restarted, running AMs never receive the node update indicating the Node is running.
YARN-1335. Major improvement reported by Sandy Ryza and fixed by Sandy Ryza (scheduler)
Move duplicate code from FSSchedulerApp and FiCaSchedulerApp into SchedulerApplication

FSSchedulerApp and FiCaSchedulerApp use duplicate code in a lot of places. They both extend SchedulerApplication. We can move a lot of this duplicate code into SchedulerApplication.
YARN-1333. Major improvement reported by Sandy Ryza and fixed by Tsuyoshi OZAWA (scheduler)
Support blacklisting in the Fair Scheduler
YARN-1332. Minor improvement reported by Sandy Ryza and fixed by Sebastian Wong
In TestAMRMClient, replace assertTrue with assertEquals where possible

TestAMRMClient uses a lot of "assertTrue(amClient.ask.size() == 0)" where "assertEquals(0, amClient.ask.size())" would make it easier to see why it's failing at a glance.
YARN-1331. Trivial bug reported by Chris Nauroth and fixed by Chris Nauroth (client)
yarn.cmd exits with NoClassDefFoundError trying to run rmadmin or logs

The yarn shell script was updated so that the rmadmin and logs sub-commands launch {{org.apache.hadoop.yarn.client.cli.RMAdminCLI}} and {{org.apache.hadoop.yarn.client.cli.LogsCLI}}. The yarn.cmd script also needs to be updated so that the commands work on Windows.
YARN-1325. Major sub-task reported by Tsuyoshi OZAWA and fixed by Xuan Gong (resourcemanager)
Enabling HA should check Configuration contains multiple RMs

Currently, we can enable RM HA configuration without multiple RM ids(YarnConfiguration.RM_HA_IDS). This behaviour can cause wrong operations. ResourceManager should verify that more than 1 RM id must be specified in RM-HA-IDs. One idea is to support "strict mode" to enforce this check as configuration(e.g. yarn.resourcemanager.ha.strict-mode.enabled).
YARN-1323. Major sub-task reported by Karthik Kambatla and fixed by Karthik Kambatla
Set HTTPS webapp address along with other RPC addresses in HAUtil

YARN-1232 adds the ability to configure multiple RMs, but missed out the https web app address. Need to add that in.
YARN-1321. Blocker bug reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur (client)
NMTokenCache is a singleton, prevents multiple AMs running in a single JVM to work correctly

NMTokenCache is a singleton. Because of this, if running multiple AMs in a single JVM NMTokens for the same node from different AMs step on each other and starting containers fail due to mismatch tokens. The error observed in the client side is something like: {code} ERROR org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:llama (auth:PROXY) via llama (auth:SIMPLE) cause:org.apache.hadoop.yarn.exceptions.YarnException: Unauthorized request to start container. NMToken for application attempt : appattempt_1382038445650_0002_000001 was used for starting container with container token issued for application attempt : appattempt_1382038445650_0001_000001 {code}
YARN-1320. Major bug reported by Tassapol Athiapinya and fixed by Xuan Gong (applications/distributed-shell)
Custom log4j properties in Distributed shell does not work properly.

Distributed shell cannot pick up custom log4j properties (specified with -log_properties). It always uses default log4j properties.
YARN-1318. Blocker sub-task reported by Karthik Kambatla and fixed by Karthik Kambatla (resourcemanager)
Promote AdminService to an Always-On service and merge in RMHAProtocolService

Per discussion in YARN-1068, we want AdminService to handle HA-admin operations in addition to the regular non-HA admin operations. To facilitate this, we need to move AdminService an Always-On service.
YARN-1315. Major bug reported by Sandy Ryza and fixed by Sandy Ryza (resourcemanager , scheduler)
TestQueueACLs should also test FairScheduler
YARN-1314. Major bug reported by Tassapol Athiapinya and fixed by Xuan Gong (applications/distributed-shell)
Cannot pass more than 1 argument to shell command

Distributed shell cannot accept more than 1 parameters in argument parts. All of these commands are treated as 1 parameter: /usr/bin/yarn org.apache.hadoop.yarn.applications.distributedshell.Client -jar <distrubuted shell jar> -shell_command echo -shell_args "'"My name" "is Teddy"'" /usr/bin/yarn org.apache.hadoop.yarn.applications.distributedshell.Client -jar <distrubuted shell jar> -shell_command echo -shell_args "''My name' 'is Teddy''" /usr/bin/yarn org.apache.hadoop.yarn.applications.distributedshell.Client -jar <distrubuted shell jar> -shell_command echo -shell_args "'My name' 'is Teddy'"
YARN-1311. Trivial sub-task reported by Vinod Kumar Vavilapalli and fixed by Vinod Kumar Vavilapalli
Fix app specific scheduler-events' names to be app-attempt based

Today, APP_ADDED and APP_REMOVED are sent to the scheduler. They are misnomers as schedulers only deal with AppAttempts today. This JIRA is for fixing their names so that we can add App-level events in the near future, notably for work-preserving RM-restart.
YARN-1307. Major sub-task reported by Tsuyoshi OZAWA and fixed by Tsuyoshi OZAWA (resourcemanager)
Rethink znode structure for RM HA

Rethink for znode structure for RM HA is proposed in some JIRAs(YARN-659, YARN-1222). The motivation of this JIRA is quoted from Bikas' comment in YARN-1222: {quote} We should move to creating a node hierarchy for apps such that all znodes for an app are stored under an app znode instead of the app root znode. This will help in removeApplication and also in scaling better on ZK. The earlier code was written this way to ensure create/delete happens under a root znode for fencing. But given that we have moved to multi-operations globally, this isnt required anymore. {quote}
YARN-1306. Major bug reported by Wei Yan and fixed by Wei Yan
Clean up hadoop-sls sample-conf according to YARN-1228

Move fair scheduler allocations configuration to fair-scheduler.xml, and move all scheduler stuffs to yarn-site.xml
YARN-1305. Major sub-task reported by Tsuyoshi OZAWA and fixed by Tsuyoshi OZAWA (resourcemanager)
RMHAProtocolService#serviceInit should handle HAUtil's IllegalArgumentException

When yarn.resourcemanager.ha.enabled is true, RMHAProtocolService#serviceInit calls HAUtil.setAllRpcAddresses. If the configuration values are null, it just throws IllegalArgumentException. It's messy to analyse which keys are null, so we should handle it and log the name of keys which are null. A current log dump is as follows: {code} 2013-10-15 06:24:53,431 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: registered UNIX signal handlers for [TERM, HUP, INT] 2013-10-15 06:24:54,203 INFO org.apache.hadoop.service.AbstractService: Service RMHAProtocolService failed in state INITED; cause: java.lang.IllegalArgumentException: Property value must not be null java.lang.IllegalArgumentException: Property value must not be null at com.google.common.base.Preconditions.checkArgument(Preconditions.java:88) at org.apache.hadoop.conf.Configuration.set(Configuration.java:816) at org.apache.hadoop.conf.Configuration.set(Configuration.java:798) at org.apache.hadoop.yarn.conf.HAUtil.setConfValue(HAUtil.java:100) at org.apache.hadoop.yarn.conf.HAUtil.setAllRpcAddresses(HAUtil.java:105) at org.apache.hadoop.yarn.server.resourcemanager.RMHAProtocolService.serviceInit(RMHAProtocolService.java:60) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:187) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:940) {code}
YARN-1303. Major improvement reported by Tassapol Athiapinya and fixed by Xuan Gong (applications/distributed-shell)
Allow multiple commands separating with ";" in distributed-shell

In shell, we can do "ls; ls" to run 2 commands at once. In distributed shell, this is not working. We should improve to allow this to occur. There are practical use cases that I know of to run multiple commands or to set environment variables before a command.
YARN-1300. Major bug reported by Ted Yu and fixed by Ted Yu
SLS tests fail because conf puts yarn properties in fair-scheduler.xml

I was looking at https://builds.apache.org/job/PreCommit-YARN-Build/2165//testReport/org.apache.hadoop.yarn.sls/TestSLSRunner/testSimulatorRunning/ I am able to reproduce the failure locally. I found that FairSchedulerConfiguration.getAllocationFile() doesn't read the yarn.scheduler.fair.allocation.file config entry from fair-scheduler.xml This leads to the following: {code} Caused by: org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationConfigurationException: Bad fair scheduler config file: top-level element not <allocations> at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.reloadAllocs(QueueManager.java:302) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.initialize(QueueManager.java:108) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.reinitialize(FairScheduler.java:1145) {code}
YARN-1295. Major bug reported by Sandy Ryza and fixed by Sandy Ryza (nodemanager)
In UnixLocalWrapperScriptBuilder, using bash -c can cause "Text file busy" errors

I missed this when working on YARN-1271.
YARN-1293. Major bug reported by Tsuyoshi OZAWA and fixed by Tsuyoshi OZAWA
TestContainerLaunch.testInvalidEnvSyntaxDiagnostics fails on trunk

{quote} ------------------------------------------------------------------------------- Test set: org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch ------------------------------------------------------------------------------- Tests run: 8, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 12.655 sec <<< FAILURE! - in org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch testInvalidEnvSyntaxDiagnostics(org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch) Time elapsed: 0.114 sec <<< FAILURE! junit.framework.AssertionFailedError: null at junit.framework.Assert.fail(Assert.java:48) at junit.framework.Assert.assertTrue(Assert.java:20) at junit.framework.Assert.assertTrue(Assert.java:27) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch.testInvalidEnvSyntaxDiagnostics(TestContainerLaunch.java:273) {quote}
YARN-1290. Major improvement reported by Wei Yan and fixed by Wei Yan
Let continuous scheduling achieve more balanced task assignment

Currently, in continuous scheduling (YARN-1010), in each round, the thread iterates over pre-ordered nodes and assigns tasks. This mechanism may overload the first several nodes, while the latter nodes have no tasks. We should sort all nodes according to available resource. In each round, always assign tasks to nodes with larger capacity, which can balance the load distribution among all nodes.
YARN-1288. Major bug reported by Sandy Ryza and fixed by Sandy Ryza (scheduler)
Make Fair Scheduler ACLs more user friendly

The Fair Scheduler currently defaults the root queue's acl to empty and all other queues' acl to "*". Now that YARN-1258 enables configuring the root queue, we should reverse this. This will also bring the Fair Scheduler in line with the Capacity Scheduler. We should also not trim the acl strings, which makes it impossible to only specify groups in an acl.
YARN-1284. Blocker bug reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur (nodemanager)
LCE: Race condition leaves dangling cgroups entries for killed containers

When LCE & cgroups are enabled, when a container is is killed (in this case by its owning AM, an MRAM) it seems to be a race condition at OS level when doing a SIGTERM/SIGKILL and when the OS does all necessary cleanup. LCE code, after sending the SIGTERM/SIGKILL and getting the exitcode, immediately attempts to clean up the cgroups entry for the container. But this is failing with an error like: {code} 2013-10-07 15:21:24,359 WARN org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code from container container_1381179532433_0016_01_000011 is : 143 2013-10-07 15:21:24,359 DEBUG org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Processing container_1381179532433_0016_01_000011 of type UPDATE_DIAGNOSTICS_MSG 2013-10-07 15:21:24,359 DEBUG org.apache.hadoop.yarn.server.nodemanager.util.CgroupsLCEResourcesHandler: deleteCgroup: /run/cgroups/cpu/hadoop-yarn/container_1381179532433_0016_01_000011 2013-10-07 15:21:24,359 WARN org.apache.hadoop.yarn.server.nodemanager.util.CgroupsLCEResourcesHandler: Unable to delete cgroup at: /run/cgroups/cpu/hadoop-yarn/container_1381179532433_0016_01_000011 {code} CgroupsLCEResourcesHandler.clearLimits() has logic to wait for 500 ms for AM containers to avoid this problem. it seems this should be done for all containers. Still, waiting for extra 500ms seems too expensive. We should look at a way of doing this in a more 'efficient way' from time perspective, may be spinning while the deleteCgroup() cannot be done with a minimal sleep and a timeout.
YARN-1283. Major sub-task reported by Yesha Vora and fixed by Omkar Vinit Joshi
Invalid 'url of job' mentioned in Job output with yarn.http.policy=HTTPS_ONLY

After setting yarn.http.policy=HTTPS_ONLY, the job output shows incorrect "The url to track the job". Currently, its printing http://RM:<httpsport>/proxy/application_1381162886563_0001/ instead https://RM:<httpsport>/proxy/application_1381162886563_0001/ http://hostname:8088/proxy/application_1381162886563_0001/ is invalid hadoop jar hadoop-mapreduce-client-jobclient-tests.jar sleep -m 1 -r 1 13/10/07 18:39:39 INFO client.RMProxy: Connecting to ResourceManager at hostname/100.00.00.000:8032 13/10/07 18:39:40 INFO mapreduce.JobSubmitter: number of splits:1 13/10/07 18:39:40 INFO Configuration.deprecation: user.name is deprecated. Instead, use mapreduce.job.user.name 13/10/07 18:39:40 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar 13/10/07 18:39:40 INFO Configuration.deprecation: mapred.map.tasks.speculative.execution is deprecated. Instead, use mapreduce.map.speculative 13/10/07 18:39:40 INFO Configuration.deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces 13/10/07 18:39:40 INFO Configuration.deprecation: mapreduce.partitioner.class is deprecated. Instead, use mapreduce.job.partitioner.class 13/10/07 18:39:40 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative 13/10/07 18:39:40 INFO Configuration.deprecation: mapred.mapoutput.value.class is deprecated. Instead, use mapreduce.map.output.value.class 13/10/07 18:39:40 INFO Configuration.deprecation: mapreduce.map.class is deprecated. Instead, use mapreduce.job.map.class 13/10/07 18:39:40 INFO Configuration.deprecation: mapred.job.name is deprecated. Instead, use mapreduce.job.name 13/10/07 18:39:40 INFO Configuration.deprecation: mapreduce.reduce.class is deprecated. Instead, use mapreduce.job.reduce.class 13/10/07 18:39:40 INFO Configuration.deprecation: mapreduce.inputformat.class is deprecated. Instead, use mapreduce.job.inputformat.class 13/10/07 18:39:40 INFO Configuration.deprecation: mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir 13/10/07 18:39:40 INFO Configuration.deprecation: mapreduce.outputformat.class is deprecated. Instead, use mapreduce.job.outputformat.class 13/10/07 18:39:40 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps 13/10/07 18:39:40 INFO Configuration.deprecation: mapred.mapoutput.key.class is deprecated. Instead, use mapreduce.map.output.key.class 13/10/07 18:39:40 INFO Configuration.deprecation: mapred.working.dir is deprecated. Instead, use mapreduce.job.working.dir 13/10/07 18:39:40 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1381162886563_0001 13/10/07 18:39:40 INFO impl.YarnClientImpl: Submitted application application_1381162886563_0001 to ResourceManager at hostname/100.00.00.000:8032 13/10/07 18:39:40 INFO mapreduce.Job: The url to track the job: http://hostname:8088/proxy/application_1381162886563_0001/ 13/10/07 18:39:40 INFO mapreduce.Job: Running job: job_1381162886563_0001 13/10/07 18:39:46 INFO mapreduce.Job: Job job_1381162886563_0001 running in uber mode : false 13/10/07 18:39:46 INFO mapreduce.Job: map 0% reduce 0% 13/10/07 18:39:53 INFO mapreduce.Job: map 100% reduce 0% 13/10/07 18:39:58 INFO mapreduce.Job: map 100% reduce 100% 13/10/07 18:39:58 INFO mapreduce.Job: Job job_1381162886563_0001 completed successfully 13/10/07 18:39:58 INFO mapreduce.Job: Counters: 43 File System Counters FILE: Number of bytes read=26 FILE: Number of bytes written=177279 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=48 HDFS: Number of bytes written=0 HDFS: Number of read operations=1 HDFS: Number of large read operations=0 HDFS: Number of write operations=0 Job Counters Launched map tasks=1 Launched reduce tasks=1 Other local map tasks=1 Total time spent by all maps in occupied slots (ms)=7136 Total time spent by all reduces in occupied slots (ms)=6062 Map-Reduce Framework Map input records=1 Map output records=1 Map output bytes=4 Map output materialized bytes=22 Input split bytes=48 Combine input records=0 Combine output records=0 Reduce input groups=1 Reduce shuffle bytes=22 Reduce input records=1 Reduce output records=0 Spilled Records=2 Shuffled Maps =1 Failed Shuffles=0 Merged Map outputs=1 GC time elapsed (ms)=60 CPU time spent (ms)=1700 Physical memory (bytes) snapshot=567582720 Virtual memory (bytes) snapshot=4292997120 Total committed heap usage (bytes)=846594048 Shuffle Errors BAD_ID=0 CONNECTION=0 IO_ERROR=0 WRONG_LENGTH=0 WRONG_MAP=0 WRONG_REDUCE=0 File Input Format Counters Bytes Read=0 File Output Format Counters Bytes Written=0
YARN-1268. Major bug reported by Sandy Ryza and fixed by Sandy Ryza (scheduler)
TestFairScheduler.testContinuousScheduling is flaky

It looks like there's a timeout in it that's causing it to be flaky.
YARN-1265. Major bug reported by Sandy Ryza and fixed by Sandy Ryza (resourcemanager , scheduler)
Fair Scheduler chokes on unhealthy node reconnect

Only nodes in the RUNNING state are tracked by schedulers. When a node reconnects, RMNodeImpl.ReconnectNodeTransition tries to remove it, even if it's in the RUNNING state. The FairScheduler doesn't guard against this. I think the best way to fix this is to check to see whether a node is RUNNING before telling the scheduler to remove it.
YARN-1259. Trivial bug reported by Sandy Ryza and fixed by Robert Kanter (scheduler)
In Fair Scheduler web UI, queue num pending and num active apps switched

The values returned in FairSchedulerLeafQueueInfo by numPendingApplications and numActiveApplications should be switched.
YARN-1258. Major improvement reported by Sandy Ryza and fixed by Sandy Ryza (scheduler)
Allow configuring the Fair Scheduler root queue

This would be useful for acls, maxRunningApps, scheduling modes, etc. The allocation file should be able to accept both: * An implicit root queue * A root queue at the top of the hierarchy with all queues under/inside of it
YARN-1253. Blocker new feature reported by Alejandro Abdelnur and fixed by Roman Shaposhnik (nodemanager)
Changes to LinuxContainerExecutor to run containers as a single dedicated user in non-secure mode

When using cgroups we require LCE to be configured in the cluster to start containers. When LCE starts containers as the user that submitted the job. While this works correctly in a secure setup, in an un-secure setup this presents a couple issues: * LCE requires all Hadoop users submitting jobs to be Unix users in all nodes * Because users can impersonate other users, any user would have access to any local file of other users Particularly, the second issue is not desirable as a user could get access to ssh keys of other users in the nodes or if there are NFS mounts, get to other users data outside of the cluster.
YARN-1241. Major bug reported by Sandy Ryza and fixed by Sandy Ryza
In Fair Scheduler, maxRunningApps does not work for non-leaf queues

Setting the maxRunningApps property on a parent queue should make it that the sum of apps in all subqueues can't exceed it
YARN-1239. Major sub-task reported by Bikas Saha and fixed by Jian He (resourcemanager)
Save version information in the state store

When creating root dir for the first time we should write version 1. If root dir exists then we should check that the version in the state store matches the version from config.
YARN-1232. Major sub-task reported by Karthik Kambatla and fixed by Karthik Kambatla (resourcemanager)
Configuration to support multiple RMs

We should augment the configuration to allow users specify two RMs and the individual RPC addresses for them.
YARN-1222. Major sub-task reported by Bikas Saha and fixed by Karthik Kambatla
Make improvements in ZKRMStateStore for fencing

Using multi-operations for every ZK interaction. In every operation, automatically creating/deleting a lock znode that is the child of the root znode. This is to achieve fencing by modifying the create/delete permissions on the root znode.
YARN-1210. Major sub-task reported by Vinod Kumar Vavilapalli and fixed by Omkar Vinit Joshi
During RM restart, RM should start a new attempt only when previous attempt exits for real

When RM recovers, it can wait for existing AMs to contact RM back and then kill them forcefully before even starting a new AM. Worst case, RM will start a new AppAttempt after waiting for 10 mins ( the expiry interval). This way we'll minimize multiple AMs racing with each other. This can help issues with downstream components like Pig, Hive and Oozie during RM restart. In the mean while, new apps will proceed as usual as existing apps wait for recovery. This can continue to be useful after work-preserving restart, so that AMs which can properly sync back up with RM can continue to run and those that don't are guaranteed to be killed before starting a new attempt.
YARN-1199. Major improvement reported by Mit Desai and fixed by Mit Desai
Make NM/RM Versions Available

Now as we have the NM and RM Versions available, we can display the YARN version of nodes running in the cluster.
YARN-1188. Trivial bug reported by Akira AJISAKA and fixed by Tsuyoshi OZAWA
The context of QueueMetrics becomes 'default' when using FairScheduler

I found the context of QueueMetrics changed to 'default' from 'yarn' when I was using FairScheduler. The context should always be 'yarn' by adding an annotation to FSQueueMetrics like below: {code} + @Metrics(context="yarn") public class FSQueueMetrics extends QueueMetrics { {code}
YARN-1185. Major sub-task reported by Jason Lowe and fixed by Omkar Vinit Joshi (resourcemanager)
FileSystemRMStateStore can leave partial files that prevent subsequent recovery

FileSystemRMStateStore writes directly to the destination file when storing state. However if the RM were to crash in the middle of the write, the recovery method could encounter a partially-written file and either outright crash during recovery or silently load incomplete state. To avoid this, the data should be written to a temporary file and renamed to the destination file afterwards.
YARN-1183. Major bug reported by Andrey Klochkov and fixed by Andrey Klochkov
MiniYARNCluster shutdown takes several minutes intermittently

As described in MAPREDUCE-5501 sometimes M/R tests leave MRAppMaster java processes living for several minutes after successful completion of the corresponding test. There is a concurrency issue in MiniYARNCluster shutdown logic which leads to this. Sometimes RM stops before an app master sends it's last report, and then the app master keeps retrying for >6 minutes. In some cases it leads to failures in subsequent tests, and it affects performance of tests as app masters eat resources.
YARN-1182. Major bug reported by Karthik Kambatla and fixed by Karthik Kambatla
MiniYARNCluster creates and inits the RM/NM only on start()

MiniYARNCluster creates and inits the RM/NM only on start(). It should create and init() during init() itself.
YARN-1181. Major sub-task reported by Karthik Kambatla and fixed by Karthik Kambatla
Augment MiniYARNCluster to support HA mode

MiniYARNHACluster, along the lines of MiniYARNCluster, is needed for end-to-end HA tests.
YARN-1180. Trivial bug reported by Thomas Graves and fixed by Chen He (capacityscheduler)
Update capacity scheduler docs to include types on the configs

The capacity scheduler docs (http://hadoop.apache.org/docs/r2.1.0-beta/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html) don't include types for all the configs. For instance the minimum-user-limit-percent doesn't say its an Int. It also the only setting for the Resource Allocation configs that is an Int rather then a float.
YARN-1176. Critical bug reported by Thomas Graves and fixed by Jonathan Eagles (resourcemanager)
RM web services ClusterMetricsInfo total nodes doesn't include unhealthy nodes

In the web services api for the cluster/metrics, the totalNodes reported doesn't include the unhealthy nodes. this.totalNodes = activeNodes + lostNodes + decommissionedNodes + rebootedNodes;
YARN-1172. Major sub-task reported by Karthik Kambatla and fixed by Tsuyoshi OZAWA (resourcemanager)
Convert *SecretManagers in the RM to services
YARN-1145. Major bug reported by Rohith and fixed by Rohith
Potential file handle leak in aggregated logs web ui

Any problem in getting aggregated logs for rendering on web ui, then LogReader is not closed. Now, it reader is not closed which causing many connections in close_wait state. hadoopuser@hadoopuser:> jps *27909* JobHistoryServer DataNode port is 50010. When greped with DataNode port, many connections are in CLOSE_WAIT from JHS. hadoopuser@hadoopuser:> netstat -tanlp |grep 50010 tcp 0 0 10.18.40.48:50010 0.0.0.0:* LISTEN 21453/java tcp 1 0 10.18.40.48:20596 10.18.40.48:50010 CLOSE_WAIT *27909*/java tcp 1 0 10.18.40.48:19667 10.18.40.152:50010 CLOSE_WAIT *27909*/java tcp 1 0 10.18.40.48:20593 10.18.40.48:50010 CLOSE_WAIT *27909*/java tcp 1 0 10.18.40.48:12290 10.18.40.48:50010 CLOSE_WAIT *27909*/java tcp 1 0 10.18.40.48:19662 10.18.40.152:50010 CLOSE_WAIT *27909*/java
YARN-1138. Major bug reported by Yingda Chen and fixed by Chuan Liu (api)
yarn.application.classpath is set to point to $HADOOP_CONF_DIR etc., which does not work on Windows

yarn-default.xml has "yarn.application.classpath" entry set to $HADOOP_CONF_DIR,$HADOOP_COMMON_HOME/share/hadoop/common/,$HADOOP_COMMON_HOME/share/hadoop/common/lib/,$HADOOP_HDFS_HOME/share/hadoop/hdfs/,$HADOOP_HDFS_HOME/share/hadoop/hdfs/lib/,$HADOOP_YARN_HOME/share/hadoop/yarn/*,$HADOOP_YARN_HOME/share/hadoop/yarn/lib. It does not work on Windows which needs to be fixed.
YARN-1121. Major sub-task reported by Bikas Saha and fixed by Jian He (resourcemanager)
RMStateStore should flush all pending store events before closing

on serviceStop it should wait for all internal pending events to drain before stopping.
YARN-1119. Major test reported by Robert Parker and fixed by Mit Desai (resourcemanager)
Add ClusterMetrics checks to tho TestRMNodeTransitions tests

YARN-1101 identified an issue where UNHEALTHY nodes could double decrement the active nodes. We should add checks for RUNNING node transitions.
YARN-1109. Major improvement reported by Sandy Ryza and fixed by haosdent (nodemanager)
Demote NodeManager "Sending out status for container" logs to debug

Diagnosing NodeManager and container launch problems is made more difficult by the enormous number of logs like {code} Sending out status for container: container_id {, app_attempt_id {, application_id {, id: 18, cluster_timestamp: 1377559361179, }, attemptId: 1, }, id: 1337, }, state: C_RUNNING, diagnostics: "Container killed by the ApplicationMaster.\n", exit_status: -1000 {code} On an NM with a few containers I am seeing tens of these per second.
YARN-1101. Major bug reported by Robert Parker and fixed by Robert Parker (resourcemanager)
Active nodes can be decremented below 0

The issue is in RMNodeImpl where both RUNNING and UNHEALTHY states that transition to a deactive state (LOST, DECOMMISSIONED, REBOOTED) use the same DeactivateNodeTransition class. The DeactivateNodeTransition class naturally decrements the active node, however the in cases where the node has transition to UNHEALTHY the active count has already been decremented.
YARN-1098. Major sub-task reported by Karthik Kambatla and fixed by Karthik Kambatla (resourcemanager)
Separate out RM services into "Always On" and "Active"

From discussion on YARN-1027, it makes sense to separate out services that are stateful and stateless. The stateless services can run perennially irrespective of whether the RM is in Active/Standby state, while the stateful services need to be started on transitionToActive() and completely shutdown on transitionToStandby(). The external-facing stateless services should respond to the client/AM/NM requests depending on whether the RM is Active/Standby.
YARN-1068. Major sub-task reported by Karthik Kambatla and fixed by Karthik Kambatla (resourcemanager)
Add admin support for HA operations

Support HA admin operations to facilitate transitioning the RM to Active and Standby states.
YARN-1060. Major bug reported by Sandy Ryza and fixed by Niranjan Singh (scheduler)
Two tests in TestFairScheduler are missing @Test annotation

Amazingly, these tests appear to pass with the annotations added.
YARN-1053. Blocker bug reported by Omkar Vinit Joshi and fixed by Omkar Vinit Joshi
Diagnostic message from ContainerExitEvent is ignored in ContainerImpl

If the container launch fails then we send ContainerExitEvent. This event contains exitCode and diagnostic message. Today we are ignoring diagnostic message while handling this event inside ContainerImpl. Fixing it as it is useful in diagnosing the failure.
YARN-1044. Critical bug reported by Sangjin Lee and fixed by Sangjin Lee (resourcemanager , scheduler)
used/min/max resources do not display info in the scheduler page

Go to the scheduler page in RM, and click any queue to display the detailed info. You'll find that none of the resources entries (used, min, or max) would display values. It is because the values contain brackets ("<" and ">") and are not properly html-escaped.
YARN-1033. Major sub-task reported by Nemon Lou and fixed by Karthik Kambatla
Expose RM active/standby state to Web UI and REST API

Both active and standby RM shall expose it's web server and show it's current state (active or standby) on web page. Users should be able to access this information through the REST API as well.
YARN-1029. Major sub-task reported by Bikas Saha and fixed by Karthik Kambatla
Allow embedding leader election into the RM

It should be possible to embed common ActiveStandyElector into the RM such that ZooKeeper based leader election and notification is in-built. In conjunction with a ZK state store, this configuration will be a simple deployment option.
YARN-1028. Major sub-task reported by Bikas Saha and fixed by Karthik Kambatla
Add FailoverProxyProvider like capability to RMProxy

RMProxy layer currently abstracts RM discovery and implements it by looking up service information from configuration. Motivated by HDFS and using existing classes from Common, we can add failover proxy providers that may provide RM discovery in extensible ways.
YARN-1027. Major sub-task reported by Bikas Saha and fixed by Karthik Kambatla
Implement RMHAProtocolService

Implement existing HAServiceProtocol from Hadoop common. This protocol is the single point of interaction between the RM and HA clients/services.
YARN-1022. Trivial bug reported by Bikas Saha and fixed by haosdent
Unnecessary INFO logs in AMRMClientAsync

Logs like the following should be debug or else every legitimate stop causes unnecessary exception traces in the logs. 464 2013-08-03 20:01:34,459 INFO [AMRM Heartbeater thread] org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl: Heartbeater interrupted 465 java.lang.InterruptedException: sleep interrupted 466 at java.lang.Thread.sleep(Native Method) 467 at org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl$HeartbeatThread.run(AMRMClientAsyncImpl.java:249) 468 2013-08-03 20:01:34,460 INFO [AMRM Callback Handler Thread] org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl: Interrupted while waiting for queue 469 java.lang.InterruptedException 470 at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer. java:1961) 471 at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1996) 472 at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399) 473 at org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl$CallbackHandlerThread.run(AMRMClientAsyncImpl.java:275)
YARN-1021. Major new feature reported by Wei Yan and fixed by Wei Yan (scheduler)
Yarn Scheduler Load Simulator

The Yarn Scheduler is a fertile area of interest with different implementations, e.g., Fifo, Capacity and Fair schedulers. Meanwhile, several optimizations are also made to improve scheduler performance for different scenarios and workload. Each scheduler algorithm has its own set of features, and drives scheduling decisions by many factors, such as fairness, capacity guarantee, resource availability, etc. It is very important to evaluate a scheduler algorithm very well before we deploy it in a production cluster. Unfortunately, currently it is non-trivial to evaluate a scheduling algorithm. Evaluating in a real cluster is always time and cost consuming, and it is also very hard to find a large-enough cluster. Hence, a simulator which can predict how well a scheduler algorithm for some specific workload would be quite useful. We want to build a Scheduler Load Simulator to simulate large-scale Yarn clusters and application loads in a single machine. This would be invaluable in furthering Yarn by providing a tool for researchers and developers to prototype new scheduler features and predict their behavior and performance with reasonable amount of confidence, there-by aiding rapid innovation. The simulator will exercise the real Yarn ResourceManager removing the network factor by simulating NodeManagers and ApplicationMasters via handling and dispatching NM/AMs heartbeat events from within the same JVM. To keep tracking of scheduler behavior and performance, a scheduler wrapper will wrap the real scheduler. The simulator will produce real time metrics while executing, including: * Resource usages for whole cluster and each queue, which can be utilized to configure cluster and queue's capacity. * The detailed application execution trace (recorded in relation to simulated time), which can be analyzed to understand/validate the scheduler behavior (individual jobs turn around time, throughput, fairness, capacity guarantee, etc). * Several key metrics of scheduler algorithm, such as time cost of each scheduler operation (allocate, handle, etc), which can be utilized by Hadoop developers to find the code spots and scalability limits. The simulator will provide real time charts showing the behavior of the scheduler and its performance. A short demo is available http://www.youtube.com/watch?v=6thLi8q0qLE, showing how to use simulator to simulate Fair Scheduler and Capacity Scheduler.
YARN-1010. Critical improvement reported by Alejandro Abdelnur and fixed by Wei Yan (scheduler)
FairScheduler: decouple container scheduling from nodemanager heartbeats

Currently scheduling for a node is done when a node heartbeats. For large cluster where the heartbeat interval is set to several seconds this delays scheduling of incoming allocations significantly. We could have a continuous loop scanning all nodes and doing scheduling. If there is availability AMs will get the allocation in the next heartbeat after the one that placed the request.
YARN-985. Major improvement reported by Ravi Prakash and fixed by Ravi Prakash (nodemanager)
Nodemanager should log where a resource was localized

When a resource is localized, we should log WHERE on the local disk it was localized. This helps in debugging afterwards (e.g. if the disk was to go bad).
YARN-976. Major sub-task reported by Sandy Ryza and fixed by Sandy Ryza (documentation)
Document the meaning of a virtual core

As virtual cores are a somewhat novel concept, it would be helpful to have thorough documentation that clarifies their meaning.
YARN-895. Major sub-task reported by Jian He and fixed by Jian He (resourcemanager)
RM crashes if it restarts while the state-store is down
YARN-891. Major sub-task reported by Bikas Saha and fixed by Jian He (resourcemanager)
Store completed application information in RM state store

Store completed application/attempt info in RMStateStore when application/attempt completes. This solves some problems like finished application get lost after RM restart and some other races like YARN-1195
YARN-888. Major bug reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur
clean up POM dependencies

Intermediate 'pom' modules define dependencies inherited by leaf modules. This is causing issues in intellij IDE. We should normalize the leaf modules like in common, hdfs and tools where all dependencies are defined in each leaf module and the intermediate 'pom' module do not define any dependency.
YARN-879. Major bug reported by Junping Du and fixed by Junping Du
Fix tests w.r.t o.a.h.y.server.resourcemanager.Application

getResources() will return a list of containers that allocated by RM. However, it is now return null directly. The worse thing is: if LOG.debug is enabled, then it will definitely cause NPE exception.
YARN-819. Major sub-task reported by Robert Parker and fixed by Robert Parker (nodemanager , resourcemanager)
ResourceManager and NodeManager should check for a minimum allowed version

Our use case is during upgrade on a large cluster several NodeManagers may not restart with the new version. Once the RM comes back up the NodeManager will re-register without issue to the RM. The NM should report the version the RM. The RM should have a configuration to disallow the check (default), equal to the RM (to prevent config change for each release), equal to or greater than RM (to allow NM upgrades), and finally an explicit version or version range. The RM should also have an configuration on how to treat the mismatch: REJECT, or REBOOT the NM.
YARN-807. Major improvement reported by Sandy Ryza and fixed by Sandy Ryza
When querying apps by queue, iterating over all apps is inefficient and limiting

The question "which apps are in queue x" can be asked via the RM REST APIs, through the ClientRMService, and through the command line. In all these cases, the question is answered by scanning through every RMApp and filtering by the app's queue name. All schedulers maintain a mapping of queues to applications. I think it would make more sense to ask the schedulers which applications are in a given queue. This is what was done in MR1. This would also have the advantage of allowing a parent queue to return all the applications on leaf queues under it, and allow queue name aliases, as in the way that "root.default" and "default" refer to the same queue in the fair scheduler.
YARN-786. Major improvement reported by Sandy Ryza and fixed by Sandy Ryza
Expose application resource usage in RM REST API

It might be good to require users to explicitly ask for this information, as it's a little more expensive to collect than the other fields in AppInfo.
YARN-764. Major bug reported by Nemon Lou and fixed by Nemon Lou (resourcemanager)
blank Used Resources on Capacity Scheduler page

Even when there are jobs running,used resources is empty on Capacity Scheduler page for leaf queue.(I use google-chrome on windows 7.) After changing resource.java's toString method by replacing "<>" with "{}",this bug gets fixed.
YARN-709. Major sub-task reported by Jian He and fixed by Jian He (resourcemanager)
verify that new jobs submitted with old RM delegation tokens after RM restart are accepted

More elaborate test for restoring RM delegation tokens on RM restart. New jobs with old RM delegation tokens should be accepted by new RM as long as the token is still valid
YARN-674. Major sub-task reported by Vinod Kumar Vavilapalli and fixed by Omkar Vinit Joshi (resourcemanager)
Slow or failing DelegationToken renewals on submission itself make RM unavailable

This was caused by YARN-280. A slow or a down NameNode for will make it look like RM is unavailable as it may run out of RPC handlers due to blocked client submissions.
YARN-649. Major sub-task reported by Sandy Ryza and fixed by Sandy Ryza (nodemanager)
Make container logs available over HTTP in plain text

It would be good to make container logs available over the REST API for MAPREDUCE-4362 and so that they can be accessed programatically in general.
YARN-584. Major bug reported by Sandy Ryza and fixed by Harshit Daga (scheduler)
In scheduler web UIs, queues unexpand on refresh

In the fair scheduler web UI, you can expand queue information. Refreshing the page causes the expansions to go away, which is annoying for someone who wants to monitor the scheduler page and needs to reopen all the queues they care about each time.
YARN-546. Major bug reported by Lohit Vijayarenu and fixed by Sandy Ryza (scheduler)
Allow disabling the Fair Scheduler event log

Hadoop 1.0 supported an option to turn on/off FairScheduler event logging using mapred.fairscheduler.eventlog.enabled. In Hadoop 2.0, it looks like this option has been removed (or not ported?) which causes event logging to be enabled by default and there is no way to turn it off.
YARN-478. Major sub-task reported by Aleksey Gorshkov and fixed by Aleksey Gorshkov
fix coverage org.apache.hadoop.yarn.webapp.log

fix coverage org.apache.hadoop.yarn.webapp.log one patch for trunk, branch-2, branch-0.23
YARN-465. Major sub-task reported by Aleksey Gorshkov and fixed by Andrey Klochkov
fix coverage org.apache.hadoop.yarn.server.webproxy

fix coverage org.apache.hadoop.yarn.server.webproxy patch YARN-465-trunk.patch for trunk patch YARN-465-branch-2.patch for branch-2 patch YARN-465-branch-0.23.patch for branch-0.23 There is issue in branch-0.23 . Patch does not creating .keep file. For fix it need to run commands: mkdir yhadoop-common/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/webapps/proxy touch yhadoop-common/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/webapps/proxy/.keep
YARN-461. Major bug reported by Sandy Ryza and fixed by Wei Yan (resourcemanager)
Fair scheduler should not accept apps with empty string queue name

When an app is submitted with "" for the queue, the RMAppManager passes it on like it does with any other string.
YARN-427. Major sub-task reported by Aleksey Gorshkov and fixed by Aleksey Gorshkov
Coverage fix for org.apache.hadoop.yarn.server.api.*

Coverage fix for org.apache.hadoop.yarn.server.api.* patch YARN-427-trunk.patch for trunk patch YARN-427-branch-2.patch for branch-2 and branch-0.23
YARN-425. Major sub-task reported by Aleksey Gorshkov and fixed by Aleksey Gorshkov
coverage fix for yarn api

coverage fix for yarn api patch YARN-425-trunk-a.patch for trunk patch YARN-425-branch-2.patch for branch-2 patch YARN-425-branch-0.23.patch for branch-0.23
YARN-408. Minor bug reported by Mayank Bansal and fixed by Mayank Bansal (scheduler)
Capacity Scheduler delay scheduling should not be disabled by default

Capacity Scheduler delay scheduling should not be disabled by default. Enabling it to number of nodes in one rack. Thanks, Mayank
YARN-353. Major sub-task reported by Hitesh Shah and fixed by Karthik Kambatla (resourcemanager)
Add Zookeeper-based store implementation for RMStateStore

Add store that write RM state data to ZK
YARN-312. Major sub-task reported by Junping Du and fixed by Junping Du (api)
Add updateNodeResource in ResourceManagerAdministrationProtocol

Add fundamental RPC (ResourceManagerAdministrationProtocol) to support node's resource change. For design detail, please refer parent JIRA: YARN-291.
YARN-311. Major sub-task reported by Junping Du and fixed by Junping Du (resourcemanager , scheduler)
Dynamic node resource configuration: core scheduler changes

As the first step, we go for resource change on RM side and expose admin APIs (admin protocol, CLI, REST and JMX API) later. In this jira, we will only contain changes in scheduler. The flow to update node's resource and awareness in resource scheduling is: 1. Resource update is through admin API to RM and take effect on RMNodeImpl. 2. When next NM heartbeat for updating status comes, the RMNode's resource change will be aware and the delta resource is added to schedulerNode's availableResource before actual scheduling happens. 3. Scheduler do resource allocation according to new availableResource in SchedulerNode. For more design details, please refer proposal and discussions in parent JIRA: YARN-291.
YARN-305. Critical bug reported by Lohit Vijayarenu and fixed by Lohit Vijayarenu (resourcemanager)
Fair scheduler logs too many "Node offered to app:..." messages

Running fair scheduler YARN shows that RM has lots of messages like the below. {noformat} INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AppSchedulable: Node offered to app: application_1357147147433_0002 reserved: false {noformat} They dont seem to tell much and same line is dumped many times in RM log. It would be good to have it improved with node information or moved to some other logging level with enough debug information
YARN-7. Major sub-task reported by Arun C Murthy and fixed by Junping Du
Add support for DistributedShell to ask for CPUs along with memory
MAPREDUCE-5744. Blocker bug reported by Sangjin Lee and fixed by Gera Shegalov
Job hangs because RMContainerAllocator$AssignedRequests.preemptReduce() violates the comparator contract
MAPREDUCE-5743. Major bug reported by Ted Yu and fixed by Ted Yu
TestRMContainerAllocator is failing
MAPREDUCE-5729. Critical bug reported by Karthik Kambatla and fixed by Karthik Kambatla (mrv2)
mapred job -list throws NPE
MAPREDUCE-5725. Major bug reported by Sandy Ryza and fixed by Sandy Ryza
TestNetworkedJob relies on the Capacity Scheduler
MAPREDUCE-5724. Critical bug reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur (jobhistoryserver)
JobHistoryServer does not start if HDFS is not running
MAPREDUCE-5723. Blocker bug reported by Mohammad Kamrul Islam and fixed by Mohammad Kamrul Islam (applicationmaster)
MR AM container log can be truncated or empty
MAPREDUCE-5694. Major bug reported by Mohammad Kamrul Islam and fixed by Mohammad Kamrul Islam
MR AM container syslog is empty
MAPREDUCE-5693. Major bug reported by Gera Shegalov and fixed by Gera Shegalov (mrv2)
Restore MRv1 behavior for log flush
MAPREDUCE-5692. Major improvement reported by Gera Shegalov and fixed by Gera Shegalov (mrv2)
Add explicit diagnostics when a task attempt is killed due to speculative execution
MAPREDUCE-5689. Critical bug reported by Lohit Vijayarenu and fixed by Lohit Vijayarenu
MRAppMaster does not preempt reducers when scheduled maps cannot be fulfilled
MAPREDUCE-5687. Major test reported by Ted Yu and fixed by Jian He
TestYARNRunner#testResourceMgrDelegate fails with NPE after YARN-1446
MAPREDUCE-5685. Blocker bug reported by Yi Song and fixed by Yi Song (client)
getCacheFiles() api doesn't work in WrappedReducer.java due to typo
MAPREDUCE-5679. Major bug reported by Liyin Liang and fixed by Liyin Liang
TestJobHistoryParsing has race condition
MAPREDUCE-5674. Major bug reported by Chuan Liu and fixed by Chuan Liu (client)
Missing start and finish time in mapred.JobStatus
MAPREDUCE-5672. Major improvement reported by Gera Shegalov and fixed by Gera Shegalov (mr-am , mrv2)
Provide optional RollingFileAppender for container log4j (syslog)
MAPREDUCE-5656. Critical bug reported by Jason Lowe and fixed by Jason Lowe
bzip2 codec can drop records when reading data in splits
MAPREDUCE-5650. Major bug reported by Gera Shegalov and fixed by Gera Shegalov (mrv2)
Job fails when hprof mapreduce.task.profile.map/reduce.params is specified
MAPREDUCE-5645. Major bug reported by Jonathan Eagles and fixed by Mit Desai
TestFixedLengthInputFormat fails with native libs
MAPREDUCE-5640. Trivial improvement reported by Jason Lowe and fixed by Jason Lowe (test)
Rename TestLineRecordReader in jobclient module
MAPREDUCE-5632. Major test reported by Ted Yu and fixed by Jonathan Eagles
TestRMContainerAllocator#testUpdatedNodes fails
MAPREDUCE-5631. Major bug reported by Jonathan Eagles and fixed by Jonathan Eagles
TestJobEndNotifier.testNotifyRetries fails with Should have taken more than 5 seconds in jdk7
MAPREDUCE-5625. Major test reported by Jonathan Eagles and fixed by Mariappan Asokan
TestFixedLengthInputFormat fails in jdk7 environment
MAPREDUCE-5623. Major bug reported by Tsuyoshi OZAWA and fixed by Jason Lowe
TestJobCleanup fails because of RejectedExecutionException and NPE.
MAPREDUCE-5616. Major bug reported by Chris Nauroth and fixed by Chris Nauroth (client)
MR Client-AppMaster RPC max retries on socket timeout is too high.
MAPREDUCE-5613. Major bug reported by Gera Shegalov and fixed by Gera Shegalov (applicationmaster)
DefaultSpeculator holds and checks hashmap that is always empty
MAPREDUCE-5610. Major test reported by Jonathan Eagles and fixed by Jonathan Eagles
TestSleepJob fails in jdk7
MAPREDUCE-5604. Minor bug reported by Chris Nauroth and fixed by Chris Nauroth (test)
TestMRAMWithNonNormalizedCapabilities fails on Windows due to exceeding max path length
MAPREDUCE-5601. Major improvement reported by Sandy Ryza and fixed by Sandy Ryza
ShuffleHandler fadvises file regions as DONTNEED even when fetch fails
MAPREDUCE-5598. Major bug reported by Robert Kanter and fixed by Robert Kanter (test)
TestUserDefinedCounters.testMapReduceJob is flakey
MAPREDUCE-5596. Major improvement reported by Sandy Ryza and fixed by Sandy Ryza
Allow configuring the number of threads used to serve shuffle connections
MAPREDUCE-5587. Major bug reported by Jonathan Eagles and fixed by Jonathan Eagles
TestTextOutputFormat fails on JDK7
MAPREDUCE-5586. Major bug reported by Jonathan Eagles and fixed by Jonathan Eagles
TestCopyMapper#testCopyFailOnBlockSizeDifference fails when run from hadoop-tools/hadoop-distcp directory
MAPREDUCE-5585. Major bug reported by Jonathan Eagles and fixed by Jonathan Eagles
TestCopyCommitter#testNoCommitAction Fails on JDK7
MAPREDUCE-5569. Major bug reported by Nathan Roberts and fixed by Nathan Roberts
FloatSplitter is not generating correct splits
MAPREDUCE-5561. Critical bug reported by Cindy Li and fixed by Karthik Kambatla
org.apache.hadoop.mapreduce.v2.app.job.impl.TestJobImpl testcase failing on trunk
MAPREDUCE-5550. Major bug reported by Vrushali C and fixed by Gera Shegalov
Task Status message (reporter.setStatus) not shown in UI with Hadoop 2.0
MAPREDUCE-5546. Major bug reported by Chuan Liu and fixed by Chuan Liu
mapred.cmd on Windows set HADOOP_OPTS incorrectly
MAPREDUCE-5522. Minor bug reported by Jinghui Wang and fixed by Jinghui Wang (test)
Incorrectly expect the array of JobQueueInfo returned by o.a.h.mapred.QueueManager#getJobQueueInfos to have a specific order.
MAPREDUCE-5518. Trivial bug reported by Albert Chu and fixed by Albert Chu (examples)
Fix typo "can't read paritions file"
MAPREDUCE-5514. Blocker bug reported by Zhijie Shen and fixed by Zhijie Shen
TestRMContainerAllocator fails on trunk
MAPREDUCE-5504. Major bug reported by Thomas Graves and fixed by Kousuke Saruta (client)
mapred queue -info inconsistent with types
MAPREDUCE-5487. Major improvement reported by Sandy Ryza and fixed by Sandy Ryza (performance , task)
In task processes, JobConf is unnecessarily loaded again in Limits
MAPREDUCE-5484. Major improvement reported by Sandy Ryza and fixed by Sandy Ryza (task)
YarnChild unnecessarily loads job conf twice
MAPREDUCE-5481. Blocker bug reported by Jason Lowe and fixed by Sandy Ryza (mrv2 , test)
Enable uber jobs to have multiple reducers
MAPREDUCE-5464. Major task reported by Sandy Ryza and fixed by Sandy Ryza
Add analogs of the SLOTS_MILLIS counters that jive with the YARN resource model
MAPREDUCE-5463. Major task reported by Sandy Ryza and fixed by Tsuyoshi OZAWA
Deprecate SLOTS_MILLIS counters
MAPREDUCE-5457. Major improvement reported by Sandy Ryza and fixed by Sandy Ryza
Add a KeyOnlyTextOutputReader to enable streaming to write out text files without separators
MAPREDUCE-5451. Major bug reported by Mostafa Elhemali and fixed by Yingda Chen
MR uses LD_LIBRARY_PATH which doesn't mean anything in Windows
MAPREDUCE-5431. Major bug reported by Timothy St. Clair and fixed by Timothy St. Clair (build)
Missing pom dependency in MR-client
MAPREDUCE-5411. Major sub-task reported by Ashwin Shankar and fixed by Ashwin Shankar (jobhistoryserver)
Refresh size of loaded job cache on history server
MAPREDUCE-5409. Major sub-task reported by Devaraj K and fixed by Gera Shegalov
MRAppMaster throws InvalidStateTransitonException: Invalid event: TA_TOO_MANY_FETCH_FAILURE at KILLED for TaskAttemptImpl
MAPREDUCE-5404. Major bug reported by Ted Yu and fixed by Ted Yu (jobhistoryserver)
HSAdminServer does not use ephemeral ports in minicluster mode
MAPREDUCE-5386. Major sub-task reported by Ashwin Shankar and fixed by Ashwin Shankar (jobhistoryserver)
Ability to refresh history server job retention and job cleaner settings
MAPREDUCE-5380. Major bug reported by Stephen Chu and fixed by Stephen Chu
Invalid mapred command should return non-zero exit code
MAPREDUCE-5373. Major bug reported by Chuan Liu and fixed by Jonathan Eagles
TestFetchFailure.testFetchFailureMultipleReduces could fail intermittently
MAPREDUCE-5356. Major sub-task reported by Ashwin Shankar and fixed by Ashwin Shankar (jobhistoryserver)
Ability to refresh aggregated log retention period and check interval
MAPREDUCE-5332. Major new feature reported by Jason Lowe and fixed by Jason Lowe (jobhistoryserver)
Support token-preserving restart of history server
MAPREDUCE-5329. Major bug reported by Avner BenHanoch and fixed by Avner BenHanoch (mr-am)
APPLICATION_INIT is never sent to AuxServices other than the builtin ShuffleHandler
MAPREDUCE-5316. Major bug reported by Ashwin Shankar and fixed by Ashwin Shankar (client)
job -list-attempt-ids command does not handle illegal task-state
MAPREDUCE-5266. Major new feature reported by Jason Lowe and fixed by Ashwin Shankar (jobhistoryserver)
Ability to refresh retention settings on history server
MAPREDUCE-5265. Major new feature reported by Jason Lowe and fixed by Ashwin Shankar (jobhistoryserver)
History server admin service to refresh user and superuser group mappings
MAPREDUCE-5186. Critical bug reported by Sangjin Lee and fixed by Robert Parker (job submission)
mapreduce.job.max.split.locations causes some splits created by CombineFileInputFormat to fail
MAPREDUCE-5102. Major test reported by Aleksey Gorshkov and fixed by Andrey Klochkov
fix coverage org.apache.hadoop.mapreduce.lib.db and org.apache.hadoop.mapred.lib.db
MAPREDUCE-5084. Major test reported by Aleksey Gorshkov and fixed by Aleksey Gorshkov
fix coverage org.apache.hadoop.mapreduce.v2.app.webapp and org.apache.hadoop.mapreduce.v2.hs.webapp
MAPREDUCE-5052. Critical bug reported by Kendall Thrapp and fixed by Chen He (jobhistoryserver , webapps)
Job History UI and web services confusing job start time and job submit time
MAPREDUCE-5020. Major bug reported by Trevor Robinson and fixed by Trevor Robinson (client)
Compile failure with JDK8
MAPREDUCE-4680. Major bug reported by Sandy Ryza and fixed by Robert Kanter (jobhistoryserver)
Job history cleaner should only check timestamps of files in old enough directories
MAPREDUCE-4421. Major improvement reported by Arun C Murthy and fixed by Jason Lowe
Run MapReduce framework via the distributed cache
MAPREDUCE-3310. Major improvement reported by Mathias Herberts and fixed by Alejandro Abdelnur (client)
Custom grouping comparator cannot be set for Combiners
MAPREDUCE-1176. Major new feature reported by BitsOfInfo and fixed by Mariappan Asokan
FixedLengthInputFormat and FixedLengthRecordReader

Addition of FixedLengthInputFormat and FixedLengthRecordReader in the org.apache.hadoop.mapreduce.lib.input package. These two classes can be used when you need to read data from files containing fixed length (fixed width) records. Such files have no CR/LF (or any combination thereof), no delimiters etc, but each record is a fixed length, and extra data is padded with spaces. The data is one gigantic line within a file. When creating a job that specifies this input format, the job must have the "mapreduce.input.fixedlengthinputformat.record.length" property set as follows myJobConf.setInt("mapreduce.input.fixedlengthinputformat.record.length",[myFixedRecordLength]); Please see javadoc for more details.
MAPREDUCE-434. Minor improvement reported by Yoram Arnon and fixed by Aaron Kimball
LocalJobRunner limited to single reducer
HDFS-5921. Critical bug reported by Aaron T. Myers and fixed by Aaron T. Myers (namenode)
Cannot browse file system via NN web UI if any directory has the sticky bit set
HDFS-5876. Major bug reported by Haohui Mai and fixed by Haohui Mai (datanode)
SecureDataNodeStarter does not pick up configuration in hdfs-site.xml
HDFS-5873. Major bug reported by Yesha Vora and fixed by Haohui Mai
dfs.http.policy should have higher precedence over dfs.https.enable
HDFS-5845. Blocker bug reported by Andrew Wang and fixed by Andrew Wang (namenode)
SecondaryNameNode dies when checkpointing with cache pools
HDFS-5844. Minor bug reported by Akira AJISAKA and fixed by Akira AJISAKA (documentation)
Fix broken link in WebHDFS.apt.vm
HDFS-5842. Major bug reported by Arpit Gupta and fixed by Jing Zhao (security)
Cannot create hftp filesystem when using a proxy user ugi and a doAs on a secure cluster
HDFS-5841. Major improvement reported by Andrew Wang and fixed by Andrew Wang
Update HDFS caching documentation with new changes
HDFS-5837. Major bug reported by Bryan Beaudreault and fixed by Tao Luo (namenode)
dfs.namenode.replication.considerLoad does not consider decommissioned nodes
HDFS-5833. Trivial improvement reported by Bangtao Zhou and fixed by (namenode)
SecondaryNameNode have an incorrect java doc
HDFS-5830. Blocker bug reported by Yongjun Zhang and fixed by Yongjun Zhang (caching , hdfs-client)
WebHdfsFileSystem.getFileBlockLocations throws IllegalArgumentException when accessing another cluster.
HDFS-5825. Minor improvement reported by Haohui Mai and fixed by Haohui Mai
Use FileUtils.copyFile() to implement DFSTestUtils.copyFile()
HDFS-5806. Major bug reported by Nathan Roberts and fixed by Nathan Roberts (balancer)
balancer should set SoTimeout to avoid indefinite hangs
HDFS-5800. Trivial bug reported by Kousuke Saruta and fixed by Kousuke Saruta (hdfs-client)
Typo: soft-limit for hard-limit in DFSClient
HDFS-5789. Major bug reported by Uma Maheswara Rao G and fixed by Uma Maheswara Rao G (namenode)
Some of snapshot APIs missing checkOperation double check in fsn
HDFS-5788. Major improvement reported by Nathan Roberts and fixed by Nathan Roberts (namenode)
listLocatedStatus response can be very large
HDFS-5784. Major sub-task reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe (namenode)
reserve space in edit log header and fsimage header for feature flag section
HDFS-5777. Major bug reported by Jing Zhao and fixed by Jing Zhao (namenode)
Update LayoutVersion for the new editlog op OP_ADD_BLOCK
HDFS-5766. Major bug reported by Liang Xie and fixed by Liang Xie (hdfs-client)
In DFSInputStream, do not add datanode to deadNodes after InvalidEncryptionKeyException in fetchBlockByteRange
HDFS-5762. Major bug reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe
BlockReaderLocal doesn't return -1 on EOF when doing zero-length reads
HDFS-5756. Major bug reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe (libhdfs)
hadoopRzOptionsSetByteBufferPool does not accept NULL argument, contrary to docs
HDFS-5748. Major improvement reported by Kihwal Lee and fixed by Haohui Mai
Too much information shown in the dfs health page.
HDFS-5747. Minor bug reported by Tsz Wo (Nicholas), SZE and fixed by Arpit Agarwal (namenode)
BlocksMap.getStoredBlock(..) and BlockInfoUnderConstruction.addReplicaIfNotPresent(..) may throw NullPointerException
HDFS-5728. Critical bug reported by Vinayakumar B and fixed by Vinayakumar B (datanode)
[Diskfull] Block recovery will fail if the metafile does not have crc for all chunks of the block
HDFS-5721. Minor improvement reported by Ted Yu and fixed by Ted Yu
sharedEditsImage in Namenode#initializeSharedEdits() should be closed before method returns
HDFS-5719. Minor bug reported by Ted Yu and fixed by Ted Yu (namenode)
FSImage#doRollback() should close prevState before return
HDFS-5710. Major bug reported by Ted Yu and fixed by Uma Maheswara Rao G
FSDirectory#getFullPathName should check inodes against null
HDFS-5704. Major bug reported by Suresh Srinivas and fixed by Jing Zhao (namenode)
Change OP_UPDATE_BLOCKS with a new OP_ADD_BLOCK

Add a new editlog record (OP_ADD_BLOCK) that only records allocation of the new block instead of the entire block list, on every block allocation.
HDFS-5703. Major new feature reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur (webhdfs)
Add support for HTTPS and swebhdfs to HttpFS
HDFS-5695. Major improvement reported by Haohui Mai and fixed by Haohui Mai (test)
Clean up TestOfflineEditsViewer and OfflineEditsViewerHelper
HDFS-5691. Minor bug reported by Akira AJISAKA and fixed by Akira AJISAKA (documentation)
Fix typo in ShortCircuitLocalRead document
HDFS-5690. Blocker bug reported by Haohui Mai and fixed by Haohui Mai
DataNode fails to start in secure mode when dfs.http.policy equals to HTTP_ONLY
HDFS-5681. Major bug reported by Daryn Sharp and fixed by Daryn Sharp (namenode)
renewLease should not hold fsn write lock
HDFS-5677. Minor improvement reported by Vincent Sheffer and fixed by Vincent Sheffer (datanode , ha)
Need error checking for HA cluster configuration
HDFS-5676. Minor improvement reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe (hdfs-client)
fix inconsistent synchronization of CachingStrategy
HDFS-5675. Minor bug reported by Plamen Jeliazkov and fixed by Plamen Jeliazkov (benchmarks)
Add Mkdirs operation to NNThroughputBenchmark
HDFS-5674. Minor improvement reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (namenode)
Editlog code cleanup
HDFS-5671. Critical bug reported by JamesLi and fixed by JamesLi (hdfs-client)
Fix socket leak in DFSInputStream#getBlockReader
HDFS-5667. Major sub-task reported by Eric Sirianni and fixed by Arpit Agarwal (datanode)
Include DatanodeStorage in StorageReport
HDFS-5666. Minor bug reported by Colin Patrick McCabe and fixed by Jimmy Xiang (namenode)
Fix inconsistent synchronization in BPOfferService
HDFS-5663. Major improvement reported by Liang Xie and fixed by Liang Xie (hdfs-client)
make the retry time and interval value configurable in openInfo()

Makes the retries and time between retries getting the length of the last block on file configurable. Below are the new configurations. dfs.client.retry.times.get-last-block-length dfs.client.retry.interval-ms.get-last-block-length They are set to the 3 and 4000 respectively, these being what was previously hardcoded.
HDFS-5662. Major improvement reported by Brandon Li and fixed by Brandon Li (namenode)
Can't decommission a DataNode due to file's replication factor larger than the rest of the cluster size
HDFS-5661. Major bug reported by Benoy Antony and fixed by Benoy Antony
Browsing FileSystem via web ui, should use datanode's fqdn instead of ip address
HDFS-5657. Major bug reported by Brandon Li and fixed by Brandon Li (nfs)
race condition causes writeback state error in NFS gateway
HDFS-5652. Minor improvement reported by Liang Xie and fixed by Liang Xie (hdfs-client)
refactoring/uniforming invalid block token exception handling in DFSInputStream
HDFS-5649. Major bug reported by Brandon Li and fixed by Brandon Li (nfs)
Unregister NFS and Mount service when NFS gateway is shutting down
HDFS-5637. Major improvement reported by Liang Xie and fixed by Liang Xie (hdfs-client , security)
try to refeatchToken while local read InvalidToken occurred
HDFS-5634. Major sub-task reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe (hdfs-client)
allow BlockReaderLocal to switch between checksumming and not
HDFS-5633. Minor improvement reported by Jing Zhao and fixed by Jing Zhao
Improve OfflineImageViewer to use less memory
HDFS-5629. Major sub-task reported by Haohui Mai and fixed by Haohui Mai
Support HTTPS in JournalNode and SecondaryNameNode
HDFS-5592. Major bug reported by Vinayakumar B and fixed by Vinayakumar B
"DIR* completeFile: /file is closed by DFSClient_" should be logged only for successful closure of the file.
HDFS-5590. Major bug reported by Jing Zhao and fixed by Jing Zhao
Block ID and generation stamp may be reused when persistBlocks is set to false
HDFS-5587. Minor improvement reported by Brandon Li and fixed by Brandon Li (nfs)
add debug information when NFS fails to start with duplicate user or group names
HDFS-5582. Minor bug reported by Henry Hung and fixed by sathish
hdfs getconf -excludeFile or -includeFile always failed
HDFS-5581. Major bug reported by Vinayakumar B and fixed by Vinayakumar B (namenode)
NameNodeFsck should use only one instance of BlockPlacementPolicy
HDFS-5580. Major bug reported by Binglin Chang and fixed by Binglin Chang
Infinite loop in Balancer.waitForMoveCompletion
HDFS-5579. Major bug reported by zhaoyunjiong and fixed by zhaoyunjiong (namenode)
Under construction files make DataNode decommission take very long hours
HDFS-5577. Trivial improvement reported by Brandon Li and fixed by Brandon Li (documentation)
NFS user guide update
HDFS-5568. Major improvement reported by Vinayakumar B and fixed by Vinayakumar B (snapshots)
Support inclusion of snapshot paths in Namenode fsck
HDFS-5563. Major improvement reported by Brandon Li and fixed by Brandon Li (nfs)
NFS gateway should commit the buffered data when read request comes after write to the same file
HDFS-5561. Minor improvement reported by Fengdong Yu and fixed by Haohui Mai (namenode)
FSNameSystem#getNameJournalStatus() in JMX should return plain text instead of HTML
HDFS-5560. Major bug reported by Josh Elser and fixed by Josh Elser
Trash configuration log statements prints incorrect units
HDFS-5558. Major bug reported by Kihwal Lee and fixed by Kihwal Lee
LeaseManager monitor thread can crash if the last block is complete but another block is not.
HDFS-5557. Critical bug reported by Kihwal Lee and fixed by Kihwal Lee
Write pipeline recovery for the last packet in the block may cause rejection of valid replicas
HDFS-5552. Major bug reported by Shinichi Yamashita and fixed by Haohui Mai (namenode)
Fix wrong information of "Cluster summay" in dfshealth.html
HDFS-5548. Major improvement reported by Haohui Mai and fixed by Haohui Mai (nfs)
Use ConcurrentHashMap in portmap
HDFS-5545. Major sub-task reported by Haohui Mai and fixed by Haohui Mai
Allow specifying endpoints for listeners in HttpServer
HDFS-5544. Minor bug reported by sathish and fixed by sathish (hdfs-client)
Adding Test case For Checking dfs.checksum type as NULL value
HDFS-5540. Minor bug reported by Binglin Chang and fixed by Binglin Chang
Fix intermittent failure in TestBlocksWithNotEnoughRacks
HDFS-5538. Major sub-task reported by Haohui Mai and fixed by Haohui Mai
URLConnectionFactory should pick up the SSL related configuration by default
HDFS-5536. Major sub-task reported by Haohui Mai and fixed by Haohui Mai
Implement HTTP policy for Namenode and DataNode

Add new HTTP policy configuration. Users can use "dfs.http.policy" to control the HTTP endpoints for NameNode and DataNode. Specifically, The following values are supported: - HTTP_ONLY : Service is provided only on http - HTTPS_ONLY : Service is provided only on https - HTTP_AND_HTTPS : Service is provided both on http and https hadoop.ssl.enabled and dfs.https.enabled are deprecated. When the deprecated configuration properties are still configured, currently http policy is decided based on the following rules: 1. If dfs.http.policy is set to HTTPS_ONLY or HTTP_AND_HTTPS. It picks the specified policy, otherwise it proceeds to 2~4. 2. It picks HTTPS_ONLY if hadoop.ssl.enabled equals to true. 3. It picks HTTP_AND_HTTPS if dfs.https.enable equals to true. 4. It picks HTTP_ONLY for other configurations.
HDFS-5533. Minor bug reported by Binglin Chang and fixed by Binglin Chang (snapshots)
Symlink delete/create should be treated as DELETE/CREATE in snapshot diff report
HDFS-5532. Major improvement reported by Vinayakumar B and fixed by Vinayakumar B (webhdfs)
Enable the webhdfs by default to support new HDFS web UI
HDFS-5526. Blocker bug reported by Tsz Wo (Nicholas), SZE and fixed by Kihwal Lee (datanode)
Datanode cannot roll back to previous layout version
HDFS-5525. Major sub-task reported by Haohui Mai and fixed by Haohui Mai
Inline dust templates
HDFS-5519. Minor sub-task reported by Brandon Li and fixed by Brandon Li (nfs)
COMMIT handler should update the commit status after sync
HDFS-5514. Major sub-task reported by Daryn Sharp and fixed by Daryn Sharp (namenode)
FSNamesystem's fsLock should allow custom implementation
HDFS-5506. Major sub-task reported by Haohui Mai and fixed by Haohui Mai
Use URLConnectionFactory in DelegationTokenFetcher
HDFS-5504. Major bug reported by Vinayakumar B and fixed by Vinayakumar B (snapshots)
In HA mode, OP_DELETE_SNAPSHOT is not decrementing the safemode threshold, leads to NN safemode.
HDFS-5502. Major sub-task reported by Haohui Mai and fixed by Haohui Mai
Fix HTTPS support in HsftpFileSystem

Fix the https support in HsftpFileSystem. With the change the client now verifies the server certificate. In particular, client side will verify the Common Name of the certificate using a strategy specified by the configuration property "hadoop.ssl.hostname.verifier".
HDFS-5495. Major improvement reported by Andrew Wang and fixed by Jarek Jarcec Cecho
Remove further JUnit3 usages from HDFS
HDFS-5489. Major sub-task reported by Haohui Mai and fixed by Haohui Mai
Use TokenAspect in WebHDFSFileSystem
HDFS-5488. Major sub-task reported by Haohui Mai and fixed by Haohui Mai
Clean up TestHftpURLTimeout
HDFS-5487. Major sub-task reported by Haohui Mai and fixed by Haohui Mai
Introduce unit test for TokenAspect
HDFS-5476. Major bug reported by Jing Zhao and fixed by Jing Zhao
Snapshot: clean the blocks/files/directories under a renamed file/directory while deletion
HDFS-5474. Major bug reported by Uma Maheswara Rao G and fixed by sathish (snapshots)
Deletesnapshot can make Namenode in safemode on NN restarts.
HDFS-5469. Major sub-task reported by Brandon Li and fixed by Brandon Li (nfs)
Add configuration property for the sub-directroy export path
HDFS-5467. Trivial improvement reported by Andrew Wang and fixed by Shinichi Yamashita
Remove tab characters in hdfs-default.xml
HDFS-5458. Major bug reported by Andrew Wang and fixed by Mike Mellenthin (datanode)
Datanode failed volume threshold ignored if exception is thrown in getDataDirsFromURIs
HDFS-5456. Critical bug reported by Chris Nauroth and fixed by Chris Nauroth (namenode)
NameNode startup progress creates new steps if caller attempts to create a counter for a step that doesn't already exist.
HDFS-5454. Minor sub-task reported by Eric Sirianni and fixed by Arpit Agarwal (datanode)
DataNode UUID should be assigned prior to FsDataset initialization
HDFS-5449. Blocker bug reported by Kihwal Lee and fixed by Kihwal Lee
WebHdfs compatibility broken between 2.2 and 1.x / 23.x
HDFS-5444. Major sub-task reported by Haohui Mai and fixed by Haohui Mai
Choose default web UI based on browser capabilities
HDFS-5443. Major bug reported by Uma Maheswara Rao G and fixed by Jing Zhao (snapshots)
Delete 0-sized block when deleting an under-construction file that is included in snapshot
HDFS-5440. Major sub-task reported by Haohui Mai and fixed by Haohui Mai
Extract the logic of handling delegation tokens in HftpFileSystem to the TokenAspect class
HDFS-5438. Critical bug reported by Kihwal Lee and fixed by Kihwal Lee (namenode)
Flaws in block report processing can cause data loss
HDFS-5436. Major sub-task reported by Haohui Mai and fixed by Haohui Mai
Move HsFtpFileSystem and HFtpFileSystem into org.apache.hdfs.web
HDFS-5434. Minor bug reported by Buddy and fixed by (namenode)
Write resiliency for replica count 1
HDFS-5433. Critical bug reported by Aaron T. Myers and fixed by Aaron T. Myers (snapshots)
When reloading fsimage during checkpointing, we should clear existing snapshottable directories
HDFS-5432. Trivial bug reported by Chris Nauroth and fixed by Chris Nauroth (datanode , test)
TestDatanodeJsp fails on Windows due to assumption that loopback address resolves to host name localhost.
HDFS-5428. Major bug reported by Vinayakumar B and fixed by Jing Zhao (snapshots)
under construction files deletion after snapshot+checkpoint+nn restart leads nn safemode
HDFS-5427. Major bug reported by Vinayakumar B and fixed by Vinayakumar B (snapshots)
not able to read deleted files from snapshot directly under snapshottable dir after checkpoint and NN restart
HDFS-5425. Major bug reported by sathish and fixed by Jing Zhao (namenode , snapshots)
Renaming underconstruction file with snapshots can make NN failure on restart
HDFS-5413. Major bug reported by Chris Nauroth and fixed by Chris Nauroth (scripts)
hdfs.cmd does not support passthrough to any arbitrary class.
HDFS-5407. Trivial bug reported by Haohui Mai and fixed by Haohui Mai
Fix typos in DFSClientCache
HDFS-5406. Major sub-task reported by Arpit Agarwal and fixed by Arpit Agarwal (datanode)
Send incremental block reports for all storages in a single call
HDFS-5403. Major bug reported by Aaron T. Myers and fixed by Aaron T. Myers (webhdfs)
WebHdfs client cannot communicate with older WebHdfs servers post HDFS-5306
HDFS-5400. Major bug reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe
DFS_CLIENT_MMAP_CACHE_THREAD_RUNS_PER_TIMEOUT constant is set to the wrong value
HDFS-5399. Major improvement reported by Jing Zhao and fixed by Jing Zhao
Revisit SafeModeException and corresponding retry policies
HDFS-5393. Minor sub-task reported by Haohui Mai and fixed by Haohui Mai
Serve bootstrap and jQuery locally
HDFS-5382. Major sub-task reported by Haohui Mai and fixed by Haohui Mai
Implement the UI of browsing filesystems in HTML 5 page
HDFS-5379. Major sub-task reported by Haohui Mai and fixed by Haohui Mai
Update links to datanode information in dfshealth.html
HDFS-5375. Minor bug reported by Chris Nauroth and fixed by Chris Nauroth (tools)
hdfs.cmd does not expose several snapshot commands.
HDFS-5374. Trivial bug reported by Suresh Srinivas and fixed by Suresh Srinivas
Remove deadcode in DFSOutputStream
HDFS-5372. Major bug reported by Tsz Wo (Nicholas), SZE and fixed by Vinayakumar B (namenode)
In FSNamesystem, hasReadLock() returns false if the current thread holds the write lock
HDFS-5371. Minor improvement reported by Jing Zhao and fixed by Jing Zhao (ha , test)
Let client retry the same NN when "dfs.client.test.drop.namenode.response.number" is enabled
HDFS-5370. Trivial bug reported by Kousuke Saruta and fixed by Kousuke Saruta (hdfs-client)
Typo in Error Message: different between range in condition and range in error message
HDFS-5365. Major bug reported by Radim Kolar and fixed by Radim Kolar (build , libhdfs)
Fix libhdfs compile error on FreeBSD9
HDFS-5364. Major sub-task reported by Brandon Li and fixed by Brandon Li (nfs)
Add OpenFileCtx cache
HDFS-5363. Major sub-task reported by Haohui Mai and fixed by Haohui Mai
Refactor WebHdfsFileSystem: move SPENGO-authenticated connection creation to URLConnectionFactory
HDFS-5360. Minor improvement reported by Shinichi Yamashita and fixed by Shinichi Yamashita (snapshots)
Improvement of usage message of renameSnapshot and deleteSnapshot
HDFS-5353. Blocker bug reported by Haohui Mai and fixed by Colin Patrick McCabe
Short circuit reads fail when dfs.encrypt.data.transfer is enabled
HDFS-5352. Minor bug reported by Ted Yu and fixed by Ted Yu
Server#initLog() doesn't close InputStream in httpfs
HDFS-5350. Minor improvement reported by Rob Weltman and fixed by Jimmy Xiang (namenode)
Name Node should report fsimage transfer time as a metric
HDFS-5347. Major sub-task reported by Brandon Li and fixed by Brandon Li (documentation)
add HDFS NFS user guide
HDFS-5346. Major bug reported by Kihwal Lee and fixed by Ravi Prakash (namenode , performance)
Avoid unnecessary call to getNumLiveDataNodes() for each block during IBR processing
HDFS-5344. Minor improvement reported by sathish and fixed by sathish (snapshots , tools)
Make LsSnapshottableDir as Tool interface implementation
HDFS-5343. Major bug reported by sathish and fixed by sathish (hdfs-client)
When cat command is issued on snapshot files getting unexpected result
HDFS-5342. Major sub-task reported by Haohui Mai and fixed by Haohui Mai
Provide more information in the FSNamesystem JMX interfaces
HDFS-5341. Major bug reported by qus-jiawei and fixed by qus-jiawei (datanode)
Reduce fsdataset lock duration during directory scanning.
HDFS-5338. Major improvement reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (namenode)
Add a conf to disable hostname check in DN registration
HDFS-5337. Major sub-task reported by Brandon Li and fixed by Brandon Li (nfs)
should do hsync for a commit request even there is no pending writes
HDFS-5336. Minor bug reported by Akira AJISAKA and fixed by Akira AJISAKA (namenode)
DataNode should not output 'StartupProgress' metrics
HDFS-5335. Major bug reported by Arpit Gupta and fixed by Haohui Mai
DFSOutputStream#close() keeps throwing exceptions when it is called multiple times
HDFS-5334. Major sub-task reported by Haohui Mai and fixed by Haohui Mai
Implement dfshealth.jsp in HTML pages
HDFS-5331. Major improvement reported by Vinayakumar B and fixed by Vinayakumar B (snapshots)
make SnapshotDiff.java to a o.a.h.util.Tool interface implementation
HDFS-5330. Major sub-task reported by Brandon Li and fixed by Brandon Li (nfs)
fix readdir and readdirplus for large directories
HDFS-5329. Major bug reported by Brandon Li and fixed by Brandon Li (namenode , nfs)
Update FSNamesystem#getListing() to handle inode path in startAfter token
HDFS-5325. Major sub-task reported by Haohui Mai and fixed by Haohui Mai
Remove WebHdfsFileSystem#ConnRunner
HDFS-5323. Minor improvement reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe (namenode)
Remove some deadcode in BlockManager
HDFS-5322. Major bug reported by Arpit Gupta and fixed by Jing Zhao (ha)
HDFS delegation token not found in cache errors seen on secure HA clusters
HDFS-5317. Critical sub-task reported by Suresh Srinivas and fixed by Haohui Mai
Go back to DFS Home link does not work on datanode webUI
HDFS-5316. Critical sub-task reported by Suresh Srinivas and fixed by Haohui Mai
Namenode ignores the default https port
HDFS-5312. Major sub-task reported by Haohui Mai and fixed by Haohui Mai
Generate HTTP / HTTPS URL in DFSUtil#getInfoServer() based on the configured http policy
HDFS-5307. Major sub-task reported by Haohui Mai and fixed by Haohui Mai
Support both HTTP and HTTPS in jsp pages
HDFS-5305. Major bug reported by Suresh Srinivas and fixed by Suresh Srinivas
Add https support in HDFS
HDFS-5297. Major bug reported by Akira AJISAKA and fixed by Akira AJISAKA (documentation)
Fix dead links in HDFS site documents
HDFS-5291. Critical bug reported by Arpit Gupta and fixed by Jing Zhao (ha)
Clients need to retry when Active NN is in SafeMode
HDFS-5288. Major sub-task reported by Haohui Mai and fixed by Haohui Mai (nfs)
Close idle connections in portmap
HDFS-5283. Critical bug reported by Vinayakumar B and fixed by Vinayakumar B (snapshots)
NN not coming out of startup safemode due to under construction blocks only inside snapshots also counted in safemode threshhold
HDFS-5281. Major sub-task reported by Brandon Li and fixed by Brandon Li (nfs)
COMMIT request should not block
HDFS-5276. Major bug reported by Chengxiang Li and fixed by Colin Patrick McCabe
FileSystem.Statistics got performance issue on multi-thread read/write.
HDFS-5267. Minor improvement reported by Junping Du and fixed by Junping Du
Remove volatile from LightWeightHashSet
HDFS-5260. Major new feature reported by Chris Nauroth and fixed by Chris Nauroth (hdfs-client , libhdfs)
Merge zero-copy memory-mapped HDFS client reads to trunk and branch-2.
HDFS-5257. Major bug reported by Vinayakumar B and fixed by Vinayakumar B (hdfs-client , namenode)
addBlock() retry should return LocatedBlock with locations else client will get AIOBE
HDFS-5252. Major sub-task reported by Brandon Li and fixed by Brandon Li (nfs)
Stable write is not handled correctly in someplace
HDFS-5240. Major sub-task reported by Daryn Sharp and fixed by Daryn Sharp (namenode)
Separate formatting from logging in the audit logger API
HDFS-5239. Major sub-task reported by Daryn Sharp and fixed by Daryn Sharp (namenode)
Allow FSNamesystem lock fairness to be configurable
HDFS-5220. Major improvement reported by Rob Weltman and fixed by Jimmy Xiang (namenode)
Expose group resolution time as metric
HDFS-5207. Major improvement reported by Junping Du and fixed by Junping Du (namenode)
In BlockPlacementPolicy, update 2 parameters of chooseTarget()
HDFS-5188. Major improvement reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (namenode)
Clean up BlockPlacementPolicy and its implementations
HDFS-5171. Major sub-task reported by Brandon Li and fixed by Haohui Mai (nfs)
NFS should create input stream for a file and try to share it with multiple read requests
HDFS-5170. Trivial bug reported by Andrew Wang and fixed by Andrew Wang
BlockPlacementPolicyDefault uses the wrong classname when alerting to enable debug logging
HDFS-5164. Minor bug reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe (namenode)
deleteSnapshot should check if OperationCategory.WRITE is possible before taking write lock
HDFS-5144. Minor improvement reported by Akira AJISAKA and fixed by Akira AJISAKA (documentation)
Document time unit to NameNodeMetrics.java
HDFS-5136. Major sub-task reported by Brandon Li and fixed by Brandon Li (nfs)
MNT EXPORT should give the full group list which can mount the exports
HDFS-5130. Minor test reported by Binglin Chang and fixed by Binglin Chang (test)
Add test for snapshot related FsShell and DFSAdmin commands
HDFS-5122. Major bug reported by Arpit Gupta and fixed by Haohui Mai (ha , webhdfs)
Support failover and retry in WebHdfsFileSystem for NN HA
HDFS-5110. Major sub-task reported by Brandon Li and fixed by Brandon Li (nfs)
Change FSDataOutputStream to HdfsDataOutputStream for opened streams to fix type cast error
HDFS-5107. Major sub-task reported by Brandon Li and fixed by Brandon Li (nfs)
Fix array copy error in Readdir and Readdirplus responses
HDFS-5104. Major sub-task reported by Brandon Li and fixed by Brandon Li (nfs)
Support dotdot name in NFS LOOKUP operation
HDFS-5093. Minor bug reported by Chuan Liu and fixed by Chuan Liu (test)
TestGlobPaths should re-use the MiniDFSCluster to avoid failure on Windows
HDFS-5078. Major sub-task reported by Brandon Li and fixed by Brandon Li (nfs)
Support file append in NFSv3 gateway to enable data streaming to HDFS
HDFS-5075. Major bug reported by Timothy St. Clair and fixed by Timothy St. Clair
httpfs-config.sh calls out incorrect env script name
HDFS-5074. Major bug reported by Todd Lipcon and fixed by Todd Lipcon (ha , namenode)
Allow starting up from an fsimage checkpoint in the middle of a segment
HDFS-5073. Minor bug reported by Kihwal Lee and fixed by Arpit Agarwal (test)
TestListCorruptFileBlocks fails intermittently
HDFS-5071. Major sub-task reported by Kihwal Lee and fixed by Brandon Li (nfs)
Change hdfs-nfs parent project to hadoop-project
HDFS-5069. Major sub-task reported by Brandon Li and fixed by Brandon Li (nfs)
Include hadoop-nfs and hadoop-hdfs-nfs into hadoop dist for NFS deployment
HDFS-5068. Major improvement reported by Konstantin Shvachko and fixed by Konstantin Shvachko (benchmarks)
Convert NNThroughputBenchmark to a Tool to allow generic options.
HDFS-5065. Major bug reported by Ivan Mitic and fixed by Ivan Mitic (hdfs-client , test)
TestSymlinkHdfsDisable fails on Windows
HDFS-5043. Major bug reported by Brandon Li and fixed by Brandon Li
For HdfsFileStatus, set default value of childrenNum to -1 instead of 0 to avoid confusing applications
HDFS-5037. Critical improvement reported by Todd Lipcon and fixed by Andrew Wang (ha , namenode)
Active NN should trigger its own edit log rolls
HDFS-5035. Major bug reported by Andrew Wang and fixed by Andrew Wang (namenode)
getFileLinkStatus and rename do not correctly check permissions of symlinks
HDFS-5034. Trivial improvement reported by Andrew Wang and fixed by Andrew Wang (namenode)
Remove debug prints from getFileLinkInfo
HDFS-5023. Major bug reported by Ravi Prakash and fixed by Mit Desai (snapshots , test)
TestSnapshotPathINodes.testAllowSnapshot is failing with jdk7
HDFS-5014. Major bug reported by Vinayakumar B and fixed by Vinayakumar B (datanode , ha)
BPOfferService#processCommandFromActor() synchronization on namenode RPC call delays IBR to Active NN, if Stanby NN is unstable
HDFS-5004. Major improvement reported by Trevor Lorimer and fixed by Trevor Lorimer (namenode)
Add additional JMX bean for NameNode status data
HDFS-4997. Major bug reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe (libhdfs)
libhdfs doesn't return correct error codes in most cases

libhdfs now returns correct codes in errno. Previously, due to a bug, many functions set errno to 255 instead of the more specific error code.
HDFS-4995. Major bug reported by Kihwal Lee and fixed by Kihwal Lee (namenode)
Make getContentSummary() less expensive
HDFS-4994. Minor bug reported by Kihwal Lee and fixed by Robert Parker (namenode)
Audit log getContentSummary() calls
HDFS-4983. Major improvement reported by Harsh J and fixed by Yongjun Zhang (webhdfs)
Numeric usernames do not work with WebHDFS FS

Add a new configuration property "dfs.webhdfs.user.provider.user.pattern" for specifying user name filters for WebHDFS.
HDFS-4962. Minor sub-task reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (nfs)
Use enum for nfs constants
HDFS-4949. Major new feature reported by Andrew Wang and fixed by Andrew Wang (datanode , namenode)
Centralized cache management in HDFS
HDFS-4948. Major bug reported by Robert Joseph Evans and fixed by Brandon Li
mvn site for hadoop-hdfs-nfs fails
HDFS-4947. Major sub-task reported by Brandon Li and fixed by Jing Zhao (nfs)
Add NFS server export table to control export by hostname or IP range
HDFS-4885. Major sub-task reported by Junping Du and fixed by Junping Du
Update verifyBlockPlacement() API in BlockPlacementPolicy
HDFS-4879. Major improvement reported by Todd Lipcon and fixed by Todd Lipcon (namenode)
Add "blocked ArrayList" collection to avoid CMS full GCs
HDFS-4860. Major improvement reported by Trevor Lorimer and fixed by Trevor Lorimer (namenode)
Add additional attributes to JMX beans
HDFS-4816. Major bug reported by Andrew Wang and fixed by Andrew Wang (namenode)
transitionToActive blocks if the SBN is doing checkpoint image transfer
HDFS-4772. Minor improvement reported by Brandon Li and fixed by Brandon Li (namenode)
Add number of children in HdfsFileStatus
HDFS-4763. Major sub-task reported by Brandon Li and fixed by Brandon Li (nfs)
Add script changes/utility for starting NFS gateway
HDFS-4762. Major sub-task reported by Brandon Li and fixed by Brandon Li (nfs)
Provide HDFS based NFSv3 and Mountd implementation
HDFS-4657. Major bug reported by Aaron T. Myers and fixed by Aaron T. Myers (namenode)
Limit the number of blocks logged by the NN after a block report to a configurable value.
HDFS-4633. Major bug reported by Chris Nauroth and fixed by Chris Nauroth (hdfs-client , test)
TestDFSClientExcludedNodes fails sporadically if excluded nodes cache expires too quickly
HDFS-4517. Major test reported by Vadim Bondarev and fixed by Ivan A. Veselovsky
Cover class RemoteBlockReader with unit tests
HDFS-4516. Critical bug reported by Uma Maheswara Rao G and fixed by Vinayakumar B (namenode)
Client crash after block allocation and NN switch before lease recovery for the same file can cause readers to fail forever
HDFS-4512. Major test reported by Vadim Bondarev and fixed by Vadim Bondarev
Cover package org.apache.hadoop.hdfs.server.common with tests
HDFS-4511. Major test reported by Vadim Bondarev and fixed by Andrey Klochkov
Cover package org.apache.hadoop.hdfs.tools with unit test
HDFS-4510. Major test reported by Vadim Bondarev and fixed by Andrey Klochkov
Cover classes ClusterJspHelper/NamenodeJspHelper with unit tests
HDFS-4491. Major test reported by Tsuyoshi OZAWA and fixed by Andrey Klochkov (test)
Parallel testing HDFS
HDFS-4376. Major bug reported by Aaron T. Myers and fixed by Junping Du (balancer)
Fix several race conditions in Balancer and resolve intermittent timeout of TestBalancerWithNodeGroup
HDFS-4329. Major bug reported by Andy Isaacson and fixed by Cristina L. Abad (hdfs-client)
DFSShell issues with directories with spaces in name
HDFS-4278. Major improvement reported by Harsh J and fixed by Kousuke Saruta (datanode , namenode)
Log an ERROR when DFS_BLOCK_ACCESS_TOKEN_ENABLE config is disabled but security is turned on.
HDFS-4201. Critical bug reported by Eli Collins and fixed by Jimmy Xiang (namenode)
NPE in BPServiceActor#sendHeartBeat
HDFS-4096. Major sub-task reported by Jing Zhao and fixed by Haohui Mai (datanode , namenode)
Add snapshot information to namenode WebUI
HDFS-3987. Major sub-task reported by Alejandro Abdelnur and fixed by Haohui Mai
Support webhdfs over HTTPS
HDFS-3981. Major bug reported by Xiaobo Peng and fixed by Xiaobo Peng (namenode)
access time is set without holding FSNamesystem write lock
HDFS-3934. Minor bug reported by Andy Isaacson and fixed by Colin Patrick McCabe
duplicative dfs_hosts entries handled wrong
HDFS-2933. Major improvement reported by Philip Zeyliger and fixed by Vivek Ganesan (datanode)
Improve DataNode Web UI Index Page
HADOOP-10317. Major bug reported by Andrew Wang and fixed by Andrew Wang
Rename branch-2.3 release version from 2.4.0-SNAPSHOT to 2.3.0-SNAPSHOT
HADOOP-10313. Major bug reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur (build)
Script and jenkins job to produce Hadoop release artifacts
HADOOP-10311. Blocker bug reported by Suresh Srinivas and fixed by Alejandro Abdelnur
Cleanup vendor names from the code base
HADOOP-10310. Blocker bug reported by Aaron T. Myers and fixed by Aaron T. Myers (security)
SaslRpcServer should be initialized even when no secret manager present
HADOOP-10305. Major bug reported by Akira AJISAKA and fixed by Akira AJISAKA (metrics)
Add "rpc.metrics.quantile.enable" and "rpc.metrics.percentiles.intervals" to core-default.xml
HADOOP-10292. Major bug reported by Haohui Mai and fixed by Haohui Mai
Restore HttpServer from branch-2.2 in branch-2
HADOOP-10291. Major bug reported by Mit Desai and fixed by Mit Desai
TestSecurityUtil#testSocketAddrWithIP fails
HADOOP-10288. Major bug reported by Todd Lipcon and fixed by Todd Lipcon (util)
Explicit reference to Log4JLogger breaks non-log4j users
HADOOP-10274. Minor improvement reported by takeshi.miao and fixed by takeshi.miao (security)
Lower the logging level from ERROR to WARN for UGI.doAs method
HADOOP-10273. Major bug reported by Arpit Agarwal and fixed by Arpit Agarwal (build)
Fix 'mvn site'
HADOOP-10255. Blocker bug reported by Haohui Mai and fixed by Haohui Mai
Rename HttpServer to HttpServer2 to retain older HttpServer in branch-2 for compatibility
HADOOP-10252. Major bug reported by Jimmy Xiang and fixed by Jimmy Xiang
HttpServer can't start if hostname is not specified
HADOOP-10250. Major bug reported by Yongjun Zhang and fixed by Yongjun Zhang
VersionUtil returns wrong value when comparing two versions
HADOOP-10248. Major improvement reported by Ted Yu and fixed by Akira AJISAKA
Property name should be included in the exception where property value is null
HADOOP-10240. Trivial bug reported by Chris Nauroth and fixed by Chris Nauroth (documentation)
Windows build instructions incorrectly state requirement of protoc 2.4.1 instead of 2.5.0
HADOOP-10236. Trivial bug reported by Akira AJISAKA and fixed by Akira AJISAKA
Fix typo in o.a.h.ipc.Client#checkResponse
HADOOP-10235. Major bug reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur (build)
Hadoop tarball has 2 versions of stax-api JARs
HADOOP-10234. Major bug reported by Chris Nauroth and fixed by Chris Nauroth (scripts)
"hadoop.cmd jar" does not propagate exit code.
HADOOP-10228. Minor improvement reported by Haohui Mai and fixed by Haohui Mai (fs)
FsPermission#fromShort() should cache FsAction.values()
HADOOP-10223. Minor bug reported by Ted Yu and fixed by Ted Yu
MiniKdc#main() should close the FileReader it creates
HADOOP-10214. Major bug reported by Liang Xie and fixed by Liang Xie (ha)
Fix multithreaded correctness warnings in ActiveStandbyElector
HADOOP-10212. Major bug reported by Akira AJISAKA and fixed by Akira AJISAKA (documentation)
Incorrect compile command in Native Library document
HADOOP-10208. Trivial improvement reported by Benoy Antony and fixed by Benoy Antony
Remove duplicate initialization in StringUtils.getStringCollection
HADOOP-10207. Minor test reported by Jimmy Xiang and fixed by Jimmy Xiang
TestUserGroupInformation#testLogin is flaky
HADOOP-10203. Major bug reported by Andrei Savu and fixed by Andrei Savu (fs/s3)
Connection leak in Jets3tNativeFileSystemStore#retrieveMetadata
HADOOP-10198. Minor improvement reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe (native)
DomainSocket: add support for socketpair
HADOOP-10193. Minor bug reported by Gregory Chanan and fixed by Gregory Chanan (security)
hadoop-auth's PseudoAuthenticationHandler can consume getInputStream
HADOOP-10178. Major bug reported by shanyu zhao and fixed by shanyu zhao (conf)
Configuration deprecation always emit "deprecated" warnings when a new key is used
HADOOP-10175. Major bug reported by Chuan Liu and fixed by Chuan Liu (fs)
Har files system authority should preserve userinfo
HADOOP-10173. Critical improvement reported by Daryn Sharp and fixed by Daryn Sharp (ipc)
Remove UGI from DIGEST-MD5 SASL server creation
HADOOP-10172. Critical improvement reported by Daryn Sharp and fixed by Daryn Sharp (ipc)
Cache SASL server factories
HADOOP-10171. Major bug reported by Mit Desai and fixed by Mit Desai
TestRPC fails intermittently on jkd7
HADOOP-10169. Minor improvement reported by Liang Xie and fixed by Liang Xie (metrics)
remove the unnecessary synchronized in JvmMetrics class
HADOOP-10168. Major bug reported by Thejas M Nair and fixed by Thejas M Nair
fix javadoc of ReflectionUtils.copy
HADOOP-10167. Major improvement reported by Mikhail Antonov and fixed by (build)
Mark hadoop-common source as UTF-8 in Maven pom files / refactoring
HADOOP-10164. Major improvement reported by Robert Joseph Evans and fixed by Robert Joseph Evans
Allow UGI to login with a known Subject
HADOOP-10162. Major bug reported by Mit Desai and fixed by Mit Desai
Fix symlink-related test failures in TestFileContextResolveAfs and TestStat in branch-2
HADOOP-10147. Minor bug reported by Eric Sirianni and fixed by Steve Loughran (build)
Upgrade to commons-logging 1.1.3 to avoid potential deadlock in MiniDFSCluster
HADOOP-10146. Critical bug reported by Daryn Sharp and fixed by Daryn Sharp (util)
Workaround JDK7 Process fd close bug
HADOOP-10143. Major improvement reported by Liang Xie and fixed by Liang Xie (io)
replace WritableFactories's hashmap with ConcurrentHashMap
HADOOP-10142. Major bug reported by Vinayakumar B and fixed by Vinayakumar B
Avoid groups lookup for unprivileged users such as "dr.who"
HADOOP-10135. Major bug reported by David Dobbins and fixed by David Dobbins (fs)
writes to swift fs over partition size leave temp files and empty output file
HADOOP-10132. Minor improvement reported by Ted Yu and fixed by Ted Yu
RPC#stopProxy() should log the class of proxy when IllegalArgumentException is encountered
HADOOP-10130. Minor bug reported by Binglin Chang and fixed by Binglin Chang
RawLocalFS::LocalFSFileInputStream.pread does not track FS::Statistics
HADOOP-10129. Critical bug reported by Daryn Sharp and fixed by Daryn Sharp (tools/distcp)
Distcp may succeed when it fails
HADOOP-10127. Major bug reported by Karthik Kambatla and fixed by Karthik Kambatla (ipc)
Add ipc.client.connect.retry.interval to control the frequency of connection retries
HADOOP-10126. Minor bug reported by Vinayakumar B and fixed by Vinayakumar B (util)
LightWeightGSet log message is confusing : "2.0% max memory = 2.0 GB"
HADOOP-10125. Major bug reported by Ming Ma and fixed by Ming Ma (ipc)
no need to process RPC request if the client connection has been dropped
HADOOP-10112. Major bug reported by Brandon Li and fixed by Brandon Li (tools)
har file listing doesn't work with wild card
HADOOP-10111. Major improvement reported by Kihwal Lee and fixed by Kihwal Lee
Allow DU to be initialized with an initial value
HADOOP-10110. Blocker bug reported by Chuan Liu and fixed by Chuan Liu (build)
hadoop-auth has a build break due to missing dependency
HADOOP-10109. Major sub-task reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe (test)
Fix test failure in TestOfflineEditsViewer introduced by HADOOP-10052
HADOOP-10107. Major sub-task reported by Tsz Wo (Nicholas), SZE and fixed by Kihwal Lee (ipc)
Server.getNumOpenConnections may throw NPE
HADOOP-10106. Minor bug reported by Ming Ma and fixed by Ming Ma
Incorrect thread name in RPC log messages
HADOOP-10103. Minor sub-task reported by Steve Loughran and fixed by Akira AJISAKA (build)
update commons-lang to 2.6
HADOOP-10102. Minor sub-task reported by Steve Loughran and fixed by Akira AJISAKA (build)
update commons IO from 2.1 to 2.4
HADOOP-10100. Major bug reported by Robert Kanter and fixed by Robert Kanter
MiniKDC shouldn't use apacheds-all artifact
HADOOP-10095. Minor improvement reported by Nicolas Liochon and fixed by Nicolas Liochon (io)
Performance improvement in CodecPool
HADOOP-10094. Trivial bug reported by Enis Soztutar and fixed by Enis Soztutar (util)
NPE in GenericOptionsParser#preProcessForWindows()
HADOOP-10093. Major bug reported by shanyu zhao and fixed by shanyu zhao (conf)
hadoop-env.cmd sets HADOOP_CLIENT_OPTS with a max heap size that is too small.
HADOOP-10090. Major bug reported by Ivan Mitic and fixed by Ivan Mitic (metrics)
Jobtracker metrics not updated properly after execution of a mapreduce job
HADOOP-10088. Major bug reported by Raja Aluri and fixed by Raja Aluri (build)
copy-nativedistlibs.sh needs to quote snappy lib dir
HADOOP-10087. Major bug reported by Yu Gao and fixed by Colin Patrick McCabe (security)
UserGroupInformation.getGroupNames() fails to return primary group first when JniBasedUnixGroupsMappingWithFallback is used
HADOOP-10086. Minor improvement reported by Masatake Iwasaki and fixed by Masatake Iwasaki (documentation)
User document for authentication in secure cluster
HADOOP-10081. Critical bug reported by Jason Lowe and fixed by Tsuyoshi OZAWA (ipc)
Client.setupIOStreams can leak socket resources on exception or error
HADOOP-10079. Major improvement reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe
log a warning message if group resolution takes too long.
HADOOP-10078. Minor bug reported by Robert Kanter and fixed by Robert Kanter (security)
KerberosAuthenticator always does SPNEGO
HADOOP-10072. Trivial bug reported by Chris Nauroth and fixed by Chris Nauroth (nfs , test)
TestNfsExports#testMultiMatchers fails due to non-deterministic timing around cache expiry check.
HADOOP-10067. Minor improvement reported by Robert Rati and fixed by Robert Rati
Missing POM dependency on jsr305
HADOOP-10064. Major improvement reported by Arpit Agarwal and fixed by Arpit Agarwal (build)
Upgrade to maven antrun plugin version 1.7
HADOOP-10058. Minor bug reported by Akira AJISAKA and fixed by Chen He (metrics)
TestMetricsSystemImpl#testInitFirstVerifyStopInvokedImmediately fails on trunk
HADOOP-10055. Trivial bug reported by Eli Collins and fixed by Akira AJISAKA (documentation)
FileSystemShell.apt.vm doc has typo "numRepicas"
HADOOP-10052. Major sub-task reported by Andrew Wang and fixed by Andrew Wang (fs)
Temporarily disable client-side symlink resolution
HADOOP-10047. Major new feature reported by Gopal V and fixed by Gopal V (io)
Add a directbuffer Decompressor API to hadoop

Direct Bytebuffer decompressors for Zlib (Deflate & Gzip) and Snappy
HADOOP-10046. Trivial improvement reported by David S. Wang and fixed by David S. Wang
Print a log message when SSL is enabled
HADOOP-10040. Major bug reported by Yingda Chen and fixed by Chris Nauroth
hadoop.cmd in UNIX format and would not run by default on Windows
HADOOP-10039. Major bug reported by Suresh Srinivas and fixed by Haohui Mai (security)
Add Hive to the list of projects using AbstractDelegationTokenSecretManager
HADOOP-10031. Major bug reported by Chuan Liu and fixed by Chuan Liu (fs)
FsShell -get/copyToLocal/moveFromLocal should support Windows local path
HADOOP-10030. Major bug reported by Chuan Liu and fixed by Chuan Liu
FsShell -put/copyFromLocal should support Windows local path
HADOOP-10029. Major bug reported by Suresh Srinivas and fixed by Suresh Srinivas (fs)
Specifying har file to MR job fails in secure cluster
HADOOP-10028. Minor bug reported by Jing Zhao and fixed by Haohui Mai
Malformed ssl-server.xml.example
HADOOP-10006. Blocker bug reported by Junping Du and fixed by Junping Du (fs , util)
Compilation failure in trunk for o.a.h.fs.swift.util.JSONUtil
HADOOP-10005. Trivial improvement reported by Jackie Chang and fixed by Jackie Chang
No need to check INFO severity level is enabled or not
HADOOP-9998. Major improvement reported by Junping Du and fixed by Junping Du (net)
Provide methods to clear only part of the DNSToSwitchMapping
HADOOP-9982. Major bug reported by Akira AJISAKA and fixed by Akira AJISAKA (documentation)
Fix dead links in hadoop site docs
HADOOP-9981. Critical bug reported by Kihwal Lee and fixed by Colin Patrick McCabe
globStatus should minimize its listStatus and getFileStatus calls
HADOOP-9964. Major bug reported by Junping Du and fixed by Junping Du (util)
O.A.H.U.ReflectionUtils.printThreadInfo() is not thread-safe which cause TestHttpServer pending 10 minutes or longer.
HADOOP-9956. Major sub-task reported by Daryn Sharp and fixed by Daryn Sharp (ipc)
RPC listener inefficiently assigns connections to readers
HADOOP-9955. Major sub-task reported by Daryn Sharp and fixed by Daryn Sharp (ipc)
RPC idle connection closing is extremely inefficient
HADOOP-9929. Major bug reported by Jason Lowe and fixed by Colin Patrick McCabe (fs)
Insufficient permissions for a path reported as file not found
HADOOP-9915. Trivial improvement reported by Binglin Chang and fixed by Binglin Chang
o.a.h.fs.Stat support on Macosx
HADOOP-9909. Major improvement reported by Shinichi Yamashita and fixed by (fs)
org.apache.hadoop.fs.Stat should permit other LANG
HADOOP-9908. Major bug reported by Todd Lipcon and fixed by Todd Lipcon (util)
Fix NPE when versioninfo properties file is missing
HADOOP-9898. Minor bug reported by Todd Lipcon and fixed by Todd Lipcon (ipc , net)
Set SO_KEEPALIVE on all our sockets
HADOOP-9897. Trivial improvement reported by Binglin Chang and fixed by Binglin Chang (fs)
Add method to get path start position without drive specifier in o.a.h.fs.Path
HADOOP-9889. Major bug reported by Wei Yan and fixed by Wei Yan
Refresh the Krb5 configuration when creating a new kdc in Hadoop-MiniKDC
HADOOP-9887. Major bug reported by Chris Nauroth and fixed by Chuan Liu (fs)
globStatus does not correctly handle paths starting with a drive spec on Windows
HADOOP-9875. Minor bug reported by Aaron T. Myers and fixed by Aaron T. Myers (test)
TestDoAsEffectiveUser can fail on JDK 7
HADOOP-9871. Minor bug reported by Luke Lu and fixed by Junping Du
Fix intermittent findbug warnings in DefaultMetricsSystem
HADOOP-9866. Major test reported by Alejandro Abdelnur and fixed by Wei Yan (test)
convert hadoop-auth testcases requiring kerberos to use minikdc
HADOOP-9865. Major bug reported by Chuan Liu and fixed by Chuan Liu
FileContext.globStatus() has a regression with respect to relative path
HADOOP-9860. Major improvement reported by Wei Yan and fixed by Wei Yan
Remove class HackedKeytab and HackedKeytabEncoder from hadoop-minikdc once jira DIRSERVER-1882 solved
HADOOP-9848. Major new feature reported by Wei Yan and fixed by Wei Yan (security , test)
Create a MiniKDC for use with security testing
HADOOP-9847. Minor bug reported by Andrew Wang and fixed by Colin Patrick McCabe
TestGlobPath symlink tests fail to cleanup properly
HADOOP-9833. Minor improvement reported by Steve Loughran and fixed by Kousuke Saruta (build)
move slf4j to version 1.7.5
HADOOP-9830. Trivial bug reported by Dmitry Lysnichenko and fixed by Kousuke Saruta (documentation)
Typo at http://hadoop.apache.org/docs/current/
HADOOP-9820. Blocker bug reported by Daryn Sharp and fixed by Daryn Sharp (ipc , security)
RPCv9 wire protocol is insufficient to support multiplexing
HADOOP-9817. Major bug reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe
FileSystem#globStatus and FileContext#globStatus need to work with symlinks
HADOOP-9806. Major bug reported by Brandon Li and fixed by Brandon Li (nfs)
PortmapInterface should check if the procedure is out-of-range
HADOOP-9791. Major bug reported by Ivan Mitic and fixed by Ivan Mitic
Add a test case covering long paths for new FileUtil access check methods
HADOOP-9787. Major bug reported by Karthik Kambatla and fixed by Karthik Kambatla (util)
ShutdownHelper util to shutdown threads and threadpools
HADOOP-9784. Major improvement reported by Junping Du and fixed by Junping Du
Add a builder for HttpServer
HADOOP-9748. Critical sub-task reported by Daryn Sharp and fixed by Daryn Sharp (security)
Reduce blocking on UGI.ensureInitialized
HADOOP-9703. Minor bug reported by Mark Miller and fixed by Tsuyoshi OZAWA
org.apache.hadoop.ipc.Client leaks threads on stop.
HADOOP-9698. Blocker sub-task reported by Daryn Sharp and fixed by Daryn Sharp (ipc)
RPCv9 client must honor server's SASL negotiate response

The RPC client now waits for the Server's SASL negotiate response before instantiating its SASL client.
HADOOP-9693. Trivial improvement reported by Steve Loughran and fixed by
Shell should add a probe for OSX
HADOOP-9686. Major improvement reported by Jason Lowe and fixed by Jason Lowe (conf)
Easy access to final parameters in Configuration
HADOOP-9683. Blocker sub-task reported by Luke Lu and fixed by Daryn Sharp (ipc)
Wrap IpcConnectionContext in RPC headers

Connection context is now sent as a rpc header wrapped protobuf.
HADOOP-9660. Major bug reported by Enis Soztutar and fixed by Enis Soztutar (scripts , util)
[WINDOWS] Powershell / cmd parses -Dkey=value from command line as [-Dkey, value] which breaks GenericsOptionParser
HADOOP-9652. Major improvement reported by Colin Patrick McCabe and fixed by Andrew Wang
Allow RawLocalFs#getFileLinkStatus to fill in the link owner and mode if requested
HADOOP-9635. Major bug reported by V. Karthik Kumar and fixed by (native)
Fix Potential Stack Overflow in DomainSocket.c
HADOOP-9623. Major improvement reported by Timothy St. Clair and fixed by Amandeep Khurana (fs/s3)
Update jets3t dependency to 0.9.0
HADOOP-9618. Major new feature reported by Todd Lipcon and fixed by Todd Lipcon (util)
Add thread which detects JVM pauses
HADOOP-9611. Major improvement reported by Timothy St. Clair and fixed by Timothy St. Clair (build)
mvn-rpmbuild against google-guice > 3.0 yields missing cglib dependency
HADOOP-9598. Major test reported by Aleksey Gorshkov and fixed by Andrey Klochkov
Improve code coverage of RMAdminCLI
HADOOP-9594. Major improvement reported by Timothy St. Clair and fixed by Timothy St. Clair (build)
Update apache commons math dependency
HADOOP-9582. Major bug reported by Ashwin Shankar and fixed by Ashwin Shankar (conf)
Non-existent file to "hadoop fs -conf" doesn't throw error
HADOOP-9527. Major bug reported by Arpit Agarwal and fixed by Arpit Agarwal (fs , test)
Add symlink support to LocalFileSystem on Windows
HADOOP-9515. Major new feature reported by Brandon Li and fixed by Brandon Li
Add general interface for NFS and Mount
HADOOP-9509. Major new feature reported by Brandon Li and fixed by Brandon Li
Implement ONCRPC and XDR
HADOOP-9494. Major improvement reported by Dennis Y and fixed by Andrey Klochkov
Excluded auto-generated and examples code from clover reports
HADOOP-9487. Major improvement reported by Steve Loughran and fixed by (conf)
Deprecation warnings in Configuration should go to their own log or otherwise be suppressible
HADOOP-9470. Major improvement reported by Ivan A. Veselovsky and fixed by Ivan A. Veselovsky (test)
eliminate duplicate FQN tests in different Hadoop modules
HADOOP-9432. Minor new feature reported by Steve Loughran and fixed by (build , documentation)
Add support for markdown .md files in site documentation
HADOOP-9421. Blocker sub-task reported by Sanjay Radia and fixed by Daryn Sharp
Convert SASL to use ProtoBuf and provide negotiation capabilities

Raw SASL protocol now uses protobufs wrapped with RPC headers. The negotiation sequence incorporates the state of the exchange. The server now has the ability to advertise its supported auth types.
HADOOP-9420. Major bug reported by Todd Lipcon and fixed by Liang Xie (ipc , metrics)
Add percentile or max metric for rpcQueueTime, processing time
HADOOP-9417. Major sub-task reported by Andrew Wang and fixed by Andrew Wang (fs)
Support for symlink resolution in LocalFileSystem / RawLocalFileSystem
HADOOP-9350. Minor bug reported by Steve Loughran and fixed by Robert Kanter (build)
Hadoop not building against Java7 on OSX
HADOOP-9319. Major improvement reported by Arpit Agarwal and fixed by Binglin Chang
Update bundled lz4 source to latest version
HADOOP-9291. Major test reported by Ivan A. Veselovsky and fixed by Ivan A. Veselovsky
enhance unit-test coverage of package o.a.h.metrics2
HADOOP-9254. Major test reported by Vadim Bondarev and fixed by Vadim Bondarev
Cover packages org.apache.hadoop.util.bloom, org.apache.hadoop.util.hash
HADOOP-9241. Trivial improvement reported by Harsh J and fixed by Harsh J
DU refresh interval is not configurable

The 'du' (disk usage command from Unix) script refresh monitor is now configurable in the same way as its 'df' counterpart, via the property 'fs.du.interval', the default of which is 10 minute (in ms).
HADOOP-9225. Major test reported by Vadim Bondarev and fixed by Andrey Klochkov
Cover package org.apache.hadoop.compress.Snappy
HADOOP-9199. Major test reported by Vadim Bondarev and fixed by Andrey Klochkov
Cover package org.apache.hadoop.io with unit tests
HADOOP-9114. Minor bug reported by liuyang and fixed by sathish
After defined the dfs.checksum.type as the NULL, write file and hflush will through java.lang.ArrayIndexOutOfBoundsException
HADOOP-9078. Major test reported by Ivan A. Veselovsky and fixed by Ivan A. Veselovsky
enhance unit-test coverage of class org.apache.hadoop.fs.FileContext
HADOOP-9063. Minor test reported by Ivan A. Veselovsky and fixed by Ivan A. Veselovsky
enhance unit-test coverage of class org.apache.hadoop.fs.FileUtil
HADOOP-9016. Minor bug reported by Ivan A. Veselovsky and fixed by Ivan A. Veselovsky
org.apache.hadoop.fs.HarFileSystem.HarFSDataInputStream.HarFsInputStream.skip(long) must never return negative value.
HADOOP-8814. Minor improvement reported by Brandon Li and fixed by Brandon Li (conf , fs , fs/s3 , ha , io , metrics , performance , record , security , util)
Inefficient comparison with the empty string. Use isEmpty() instead
HADOOP-8753. Minor bug reported by Nishan Shetty, Huawei and fixed by Benoy Antony
LocalDirAllocator throws "ArithmeticException: / by zero" when there is no available space on configured local dir
HADOOP-8704. Major improvement reported by Thomas Graves and fixed by Jonathan Eagles
add request logging to jetty/httpserver
HADOOP-8545. Major new feature reported by Tim Miller and fixed by Dmitry Mezhensky (fs)
Filesystem Implementation for OpenStack Swift

Added file system implementation for OpenStack Swift. There are two implementation: block and native (similar to Amazon S3 integration). Data locality issue solved by patch in Swift, commit procedure to OpenStack is in progress. To use implementation add to core-site.xml following: ... <property> <name>fs.swift.impl</name> <value>com.mirantis.fs.SwiftFileSystem</value> </property> <property> <name>fs.swift.block.impl</name> <value>com.mirantis.fs.block.SwiftBlockFileSystem</value> </property> ... In MapReduce job specify following configs for OpenStack Keystone authentication: conf.set("swift.auth.url", "http://172.18.66.117:5000/v2.0/tokens"); conf.set("swift.tenant", "superuser"); conf.set("swift.username", "admin1"); conf.set("swift.password", "password"); conf.setInt("swift.http.port", 8080); conf.setInt("swift.https.port", 443); Additional information specified on github: https://github.com/DmitryMezhensky/Hadoop-and-Swift-integration
HADOOP-7344. Major bug reported by Daryn Sharp and fixed by Colin Patrick McCabe (fs)
globStatus doesn't grok groupings with a slash

Hadoop 2.2.0 Release Notes

These release notes include new developer and user-facing incompatibilities, features, and major improvements.

Changes since Hadoop 2.1.1-beta

YARN-1278. Blocker bug reported by Yesha Vora and fixed by Hitesh Shah
New AM does not start after rm restart

The new AM fails to start after RM restarts. It fails to start new Application master and job fails with below error. /usr/bin/mapred job -status job_1380985373054_0001 13/10/05 15:04:04 INFO client.RMProxy: Connecting to ResourceManager at hostname Job: job_1380985373054_0001 Job File: /user/abc/.staging/job_1380985373054_0001/job.xml Job Tracking URL : http://hostname:8088/cluster/app/application_1380985373054_0001 Uber job : false Number of maps: 0 Number of reduces: 0 map() completion: 0.0 reduce() completion: 0.0 Job state: FAILED retired: false reason for failure: There are no failed tasks for the job. Job is failed due to some other reason and reason can be found in the logs. Counters: 0
YARN-1277. Major sub-task reported by Suresh Srinivas and fixed by Omkar Vinit Joshi
Add http policy support for YARN daemons

This YARN part of HADOOP-10022.
YARN-1274. Blocker bug reported by Alejandro Abdelnur and fixed by Siddharth Seth (nodemanager)
LCE fails to run containers that don't have resources to localize

LCE container launch assumes the usercache/USER directory exists and it is owned by the user running the container process. But the directory is created only if there are resources to localize by the LCE localization command, if there are not resourcdes to localize, LCE localization never executes and launching fails reporting 255 exit code and the NM logs have something like: {code} 2013-10-04 14:07:56,425 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: main : command provided 1 2013-10-04 14:07:56,425 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: main : user is llama 2013-10-04 14:07:56,425 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Can't create directory llama in /yarn/nm/usercache/llama/appcache/application_1380853306301_0004/container_1380853306301_0004_01_000004 - Permission denied {code}
YARN-1273. Major bug reported by Hitesh Shah and fixed by Hitesh Shah
Distributed shell does not account for start container failures reported asynchronously.

2013-10-04 22:09:15,234 ERROR [org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl #1] distributedshell.ApplicationMaster (ApplicationMaster.java:onStartContainerError(719)) - Failed to start Container container_1380920347574_0018_01_000006
YARN-1271. Major bug reported by Sandy Ryza and fixed by Sandy Ryza (nodemanager)
"Text file busy" errors launching containers again

The error is shown below in the comments. MAPREDUCE-2374 fixed this by removing "-c" when running the container launch script. It looks like the "-c" got brought back during the windows branch merge, so we should remove it again.
YARN-1262. Major bug reported by Sandy Ryza and fixed by Karthik Kambatla
TestApplicationCleanup relies on all containers assigned in a single heartbeat

TestApplicationCleanup submits container requests and waits for allocations to come in. It only sends a single node heartbeat to the node, expecting multiple containers to be assigned on this heartbeat, which not all schedulers do by default. This is causing the test to fail when run with the Fair Scheduler.
YARN-1260. Major sub-task reported by Yesha Vora and fixed by Omkar Vinit Joshi
RM_HOME link breaks when webapp.https.address related properties are not specified

This issue happens in multiple node cluster where resource manager and node manager are running on different machines. Steps to reproduce: 1) set yarn.resourcemanager.hostname = <resourcemanager host> in yarn-site.xml 2) set hadoop.ssl.enabled = true in core-site.xml 3) Do not specify below property in yarn-site.xml yarn.nodemanager.webapp.https.address and yarn.resourcemanager.webapp.https.address Here, the default value of above two property will be considered. 4) Go to nodemanager web UI "https://<nodemanager host>:8044/node" 5) Click on RM_HOME link This link redirects to "https://<nodemanager host>:8090/cluster" instead "https://<resourcemanager host>:8090/cluster"
YARN-1256. Critical sub-task reported by Bikas Saha and fixed by Xuan Gong
NM silently ignores non-existent service in StartContainerRequest

A container can set token service metadata for a service, say shuffle_service. If that service does not exist then the errors is silently ignored. Later, when the next container wants to access data written to shuffle_service by the first task, then it fails because the service does not have the token that was supposed to be set by the first task.
YARN-1254. Major sub-task reported by Vinod Kumar Vavilapalli and fixed by Omkar Vinit Joshi
NM is polluting container's credentials

Before launching the container, NM is using the same credential object and so is polluting what container should see. We should fix this.
YARN-1251. Major bug reported by Junping Du and fixed by Xuan Gong (applications/distributed-shell)
TestDistributedShell#TestDSShell failed with timeout

TestDistributedShell#TestDSShell on trunk Jenkins are failed consistently recently. The Stacktrace is: {code} java.lang.Exception: test timed out after 90000 milliseconds at com.google.protobuf.LiteralByteString.<init>(LiteralByteString.java:234) at com.google.protobuf.ByteString.copyFromUtf8(ByteString.java:255) at org.apache.hadoop.ipc.protobuf.ProtobufRpcEngineProtos$RequestHeaderProto.getMethodNameBytes(ProtobufRpcEngineProtos.java:286) at org.apache.hadoop.ipc.protobuf.ProtobufRpcEngineProtos$RequestHeaderProto.getSerializedSize(ProtobufRpcEngineProtos.java:462) at com.google.protobuf.AbstractMessageLite.writeDelimitedTo(AbstractMessageLite.java:84) at org.apache.hadoop.ipc.ProtobufRpcEngine$RpcMessageWithHeader.write(ProtobufRpcEngine.java:302) at org.apache.hadoop.ipc.Client$Connection.sendRpcRequest(Client.java:989) at org.apache.hadoop.ipc.Client.call(Client.java:1377) at org.apache.hadoop.ipc.Client.call(Client.java:1357) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) at $Proxy70.getApplicationReport(Unknown Source) at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getApplicationReport(ApplicationClientProtocolPBClientImpl.java:137) at sun.reflect.GeneratedMethodAccessor40.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:185) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:101) at $Proxy71.getApplicationReport(Unknown Source) at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getApplicationReport(YarnClientImpl.java:195) at org.apache.hadoop.yarn.applications.distributedshell.Client.monitorApplication(Client.java:622) at org.apache.hadoop.yarn.applications.distributedshell.Client.run(Client.java:597) at org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:125) {code} For details, please refer: https://builds.apache.org/job/PreCommit-YARN-Build/2039//testReport/
YARN-1247. Major bug reported by Roman Shaposhnik and fixed by Roman Shaposhnik (nodemanager)
test-container-executor has gotten out of sync with the changes to container-executor

If run under the super-user account test-container-executor.c fails in multiple different places. It would be nice to fix it so that we have better testing of LCE functionality.
YARN-1246. Minor improvement reported by Arpit Gupta and fixed by Arpit Gupta
Log application status in the rm log when app is done running

Since there is no yarn history server it becomes difficult to determine what the status of an old application is. One has to be familiar with the state transition in yarn to know what means a success. We should add a log at info level that captures what the finalStatus of an app is. This would be helpful while debugging applications if the RM has restarted and we no longer can use the UI.
YARN-1236. Major bug reported by Sandy Ryza and fixed by Sandy Ryza (resourcemanager)
FairScheduler setting queue name in RMApp is not working

The fair scheduler sometimes picks a different queue than the one an application was submitted to, such as when user-as-default-queue is turned on. It needs to update the queue name in the RMApp so that this choice will be reflected in the UI. This isn't working because the scheduler is looking up the RMApp by application attempt id instead of app id and failing to find it.
YARN-1229. Blocker bug reported by Tassapol Athiapinya and fixed by Xuan Gong (nodemanager)
Define constraints on Auxiliary Service names. Change ShuffleHandler service name from mapreduce.shuffle to mapreduce_shuffle.

I run sleep job. If AM fails to start, this exception could occur: 13/09/20 11:00:23 INFO mapreduce.Job: Job job_1379673267098_0020 failed with state FAILED due to: Application application_1379673267098_0020 failed 1 times due to AM Container for appattempt_1379673267098_0020_000001 exited with exitCode: 1 due to: Exception from container-launch: org.apache.hadoop.util.Shell$ExitCodeException: /myappcache/application_1379673267098_0020/container_1379673267098_0020_01_000001/launch_container.sh: line 12: export: `NM_AUX_SERVICE_mapreduce.shuffle=AAA0+gAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA= ': not a valid identifier at org.apache.hadoop.util.Shell.runCommand(Shell.java:464) at org.apache.hadoop.util.Shell.run(Shell.java:379) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:270) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:78) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) .Failing this attempt.. Failing the application.
YARN-1228. Major improvement reported by Sandy Ryza and fixed by Sandy Ryza (scheduler)
Clean up Fair Scheduler configuration loading

Currently the Fair Scheduler is configured in two ways * An allocations file that has a different format than the standard Hadoop configuration file, which makes it easier to specify hierarchical objects like queues and their properties. * With properties like yarn.scheduler.fair.max.assign that are specified in the standard Hadoop configuration format. The standard and default way of configuring it is to use fair-scheduler.xml as the allocations file and to put the yarn.scheduler properties in yarn-site.xml. It is also possible to specify a different file as the allocations file, and to place the yarn.scheduler properties in fair-scheduler.xml, which will be interpreted as in the standard Hadoop configuration format. This flexibility is both confusing and unnecessary. Additionally, the allocation file is loaded as fair-scheduler.xml from the classpath if it is not specified, but is loaded as a File if it is. This causes two problems 1. We see different behavior when not setting the yarn.scheduler.fair.allocation.file, and setting it to fair-scheduler.xml, which is its default. 2. Classloaders may choose to cache resources, which can break the reload logic when yarn.scheduler.fair.allocation.file is not specified. We should never allow the yarn.scheduler properties to go into fair-scheduler.xml. And we should always load the allocations file as a file, not as a resource on the classpath. To preserve existing behavior and allow loading files from the classpath, we can look for files on the classpath, but strip of their scheme and interpret them as Files.
YARN-1221. Major bug reported by Sandy Ryza and fixed by Siqi Li (resourcemanager , scheduler)
With Fair Scheduler, reserved MB reported in RM web UI increases indefinitely
YARN-1219. Major bug reported by shanyu zhao and fixed by shanyu zhao (nodemanager)
FSDownload changes file suffix making FileUtil.unTar() throw exception

While running a Hive join operation on Yarn, I saw exception as described below. This is caused by FSDownload copy the files into a temp file and change the suffix into ".tmp" before unpacking it. In unpack(), it uses FileUtil.unTar() which will determine if the file is "gzipped" by looking at the file suffix: {code} boolean gzipped = inFile.toString().endsWith("gz"); {code} To fix this problem, we can remove the ".tmp" in the temp file name. Here is the detailed exception: org.apache.commons.compress.archivers.tar.TarArchiveInputStream.getNextTarEntry(TarArchiveInputStream.java:240) at org.apache.hadoop.fs.FileUtil.unTarUsingJava(FileUtil.java:676) at org.apache.hadoop.fs.FileUtil.unTar(FileUtil.java:625) at org.apache.hadoop.yarn.util.FSDownload.unpack(FSDownload.java:203) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:287) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:50) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722)
YARN-1215. Major bug reported by Chuan Liu and fixed by Chuan Liu (api)
Yarn URL should include userinfo

In the {{org.apache.hadoop.yarn.api.records.URL}} class, we don't have an userinfo as part of the URL. When converting a {{java.net.URI}} object into the YARN URL object in {{ConverterUtils.getYarnUrlFromURI()}} method, we will set uri host as the url host. If the uri has a userinfo part, the userinfo is discarded. This will lead to information loss if the original uri has the userinfo, e.g. foo://username:password@example.com will be converted to foo://example.com and username/password information is lost during the conversion.
YARN-1214. Critical sub-task reported by Jian He and fixed by Jian He (resourcemanager)
Register ClientToken MasterKey in SecretManager after it is saved

Currently, app attempt ClientToken master key is registered before it is saved. This can cause problem that before the master key is saved, client gets the token and RM also crashes, RM cannot reloads the master key back after it restarts as it is not saved. As a result, client is holding an invalid token. We can register the client token master key after it is saved in the store.
YARN-1213. Major improvement reported by Sandy Ryza and fixed by Sandy Ryza (scheduler)
Restore config to ban submitting to undeclared pools in the Fair Scheduler
YARN-1204. Major sub-task reported by Yesha Vora and fixed by Omkar Vinit Joshi
Need to add https port related property in Yarn

There is no yarn property available to configure https port for Resource manager, nodemanager and history server. Currently, Yarn services uses the port defined for http [defined by 'mapreduce.jobhistory.webapp.address','yarn.nodemanager.webapp.address', 'yarn.resourcemanager.webapp.address'] for running services on https protocol. Yarn should have list of property to assign https port for RM, NM and JHS. It can be like below. yarn.nodemanager.webapp.https.address yarn.resourcemanager.webapp.https.address mapreduce.jobhistory.webapp.https.address
YARN-1203. Major sub-task reported by Yesha Vora and fixed by Omkar Vinit Joshi
Application Manager UI does not appear with Https enabled

Need to add support to disable 'hadoop.ssl.enabled' for MR jobs. A job should be able to run on http protocol by setting 'hadoop.ssl.enabled' property at job level.
YARN-1167. Major bug reported by Tassapol Athiapinya and fixed by Xuan Gong (applications/distributed-shell)
Submitted distributed shell application shows appMasterHost = empty

Submit distributed shell application. Once the application turns to be RUNNING state, app master host should not be empty. In reality, it is empty. ==console logs== distributedshell.Client: Got application report from ASM for, appId=12, clientToAMToken=null, appDiagnostics=, appMasterHost=, appQueue=default, appMasterRpcPort=0, appStartTime=1378505161360, yarnAppState=RUNNING, distributedFinalState=UNDEFINED,
YARN-1157. Major bug reported by Tassapol Athiapinya and fixed by Xuan Gong (resourcemanager)
ResourceManager UI has invalid tracking URL link for distributed shell application

Submit YARN distributed shell application. Goto ResourceManager Web UI. The application definitely appears. In Tracking UI column, there will be history link. Click on that link. Instead of showing application master web UI, HTTP error 500 would appear.
YARN-1149. Major bug reported by Ramya Sunil and fixed by Xuan Gong
NM throws InvalidStateTransitonException: Invalid event: APPLICATION_LOG_HANDLING_FINISHED at RUNNING

When nodemanager receives a kill signal when an application has finished execution but log aggregation has not kicked in, InvalidStateTransitonException: Invalid event: APPLICATION_LOG_HANDLING_FINISHED at RUNNING is thrown {noformat} 2013-08-25 20:45:00,875 INFO logaggregation.AppLogAggregatorImpl (AppLogAggregatorImpl.java:finishLogAggregation(254)) - Application just finished : application_1377459190746_0118 2013-08-25 20:45:00,876 INFO logaggregation.AppLogAggregatorImpl (AppLogAggregatorImpl.java:uploadLogsForContainer(105)) - Starting aggregate log-file for app application_1377459190746_0118 at /app-logs/foo/logs/application_1377459190746_0118/<host>_45454.tmp 2013-08-25 20:45:00,876 INFO logaggregation.LogAggregationService (LogAggregationService.java:stopAggregators(151)) - Waiting for aggregation to complete for application_1377459190746_0118 2013-08-25 20:45:00,891 INFO logaggregation.AppLogAggregatorImpl (AppLogAggregatorImpl.java:uploadLogsForContainer(122)) - Uploading logs for container container_1377459190746_0118_01_000004. Current good log dirs are /tmp/yarn/local 2013-08-25 20:45:00,915 INFO logaggregation.AppLogAggregatorImpl (AppLogAggregatorImpl.java:doAppLogAggregation(182)) - Finished aggregate log-file for app application_1377459190746_0118 2013-08-25 20:45:00,925 WARN application.Application (ApplicationImpl.java:handle(427)) - Can't handle this event at current state org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: APPLICATION_LOG_HANDLING_FINISHED at RUNNING at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:425) at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:59) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:697) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:689) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:134) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:81) at java.lang.Thread.run(Thread.java:662) 2013-08-25 20:45:00,926 INFO application.Application (ApplicationImpl.java:handle(430)) - Application application_1377459190746_0118 transitioned from RUNNING to null 2013-08-25 20:45:00,927 WARN monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(463)) - org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl is interrupted. Exiting. 2013-08-25 20:45:00,938 INFO ipc.Server (Server.java:stop(2437)) - Stopping server on 8040 {noformat}
YARN-1141. Major bug reported by Zhijie Shen and fixed by Zhijie Shen
Updating resource requests should be decoupled with updating blacklist

Currently, in CapacityScheduler and FifoScheduler, blacklist is updated together with resource requests, only when the incoming resource requests are not empty. Therefore, when the incoming resource requests are empty, the blacklist will not be updated even when blacklist additions and removals are not empty.
YARN-1131. Minor sub-task reported by Tassapol Athiapinya and fixed by Siddharth Seth (client)
$yarn logs command should return an appropriate error message if YARN application is still running

In the case when log aggregation is enabled, if a user submits MapReduce job and runs $ yarn logs -applicationId <app ID> while the YARN application is running, the command will return no message and return user back to shell. It is nice to tell the user that log aggregation is in progress. {code} -bash-4.1$ /usr/bin/yarn logs -applicationId application_1377900193583_0002 -bash-4.1$ {code} At the same time, if invalid application ID is given, YARN CLI should say that the application ID is incorrect rather than throwing NoSuchElementException. {code} $ /usr/bin/yarn logs -applicationId application_00000 Exception in thread "main" java.util.NoSuchElementException at com.google.common.base.AbstractIterator.next(AbstractIterator.java:75) at org.apache.hadoop.yarn.util.ConverterUtils.toApplicationId(ConverterUtils.java:124) at org.apache.hadoop.yarn.util.ConverterUtils.toApplicationId(ConverterUtils.java:119) at org.apache.hadoop.yarn.logaggregation.LogDumper.run(LogDumper.java:110) at org.apache.hadoop.yarn.logaggregation.LogDumper.main(LogDumper.java:255) {code}
YARN-1128. Major bug reported by Sandy Ryza and fixed by Karthik Kambatla (scheduler)
FifoPolicy.computeShares throws NPE on empty list of Schedulables

FifoPolicy gives all of a queue's share to the earliest-scheduled application. {code} Schedulable earliest = null; for (Schedulable schedulable : schedulables) { if (earliest == null || schedulable.getStartTime() < earliest.getStartTime()) { earliest = schedulable; } } earliest.setFairShare(Resources.clone(totalResources)); {code} If the queue has no schedulables in it, earliest will be left null, leading to an NPE on the last line.
YARN-1090. Major bug reported by Yesha Vora and fixed by Jian He
Job does not get into Pending State

When there is no resource available to run a job, next job should go in pending state. RM UI should show next job as pending app and the counter for the pending app should be incremented. But Currently. Next job stays in ACCEPTED state and No AM has been assigned to this job.Though Pending App count is not incremented. Running 'job status <nextjob>' shows job state=PREP. $ mapred job -status job_1377122233385_0002 13/08/21 21:59:23 INFO client.RMProxy: Connecting to ResourceManager at host1/ip1 Job: job_1377122233385_0002 Job File: /ABC/.staging/job_1377122233385_0002/job.xml Job Tracking URL : http://host1:port1/application_1377122233385_0002/ Uber job : false Number of maps: 0 Number of reduces: 0 map() completion: 0.0 reduce() completion: 0.0 Job state: PREP retired: false reason for failure:
YARN-1070. Major sub-task reported by Hitesh Shah and fixed by Zhijie Shen (nodemanager)
ContainerImpl State Machine: Invalid event: CONTAINER_KILLED_ON_REQUEST at CONTAINER_CLEANEDUP_AFTER_KILL
YARN-1032. Critical bug reported by Lohit Vijayarenu and fixed by Lohit Vijayarenu
NPE in RackResolve

We found a case where our rack resolve script was not returning rack due to problem with resolving host address. This exception was see in RackResolver.java as NPE, ultimately caught in RMContainerAllocator. {noformat} 2013-08-01 07:11:37,708 ERROR [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: ERROR IN CONTACTING RM. java.lang.NullPointerException at org.apache.hadoop.yarn.util.RackResolver.coreResolve(RackResolver.java:99) at org.apache.hadoop.yarn.util.RackResolver.resolve(RackResolver.java:92) at org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator$ScheduledRequests.assignMapsWithLocality(RMContainerAllocator.java:1039) at org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator$ScheduledRequests.assignContainers(RMContainerAllocator.java:925) at org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator$ScheduledRequests.assign(RMContainerAllocator.java:861) at org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator$ScheduledRequests.access$400(RMContainerAllocator.java:681) at org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.heartbeat(RMContainerAllocator.java:219) at org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator$1.run(RMCommunicator.java:243) at java.lang.Thread.run(Thread.java:722) {noformat}
YARN-899. Major sub-task reported by Sandy Ryza and fixed by Xuan Gong (scheduler)
Get queue administration ACLs working

The Capacity Scheduler documents the yarn.scheduler.capacity.root.<queue-path>.acl_administer_queue config option for controlling who can administer a queue, but it is not hooked up to anything. The Fair Scheduler could make use of a similar option as well. This is a feature-parity regression from MR1.
YARN-890. Major bug reported by Trupti Dhavle and fixed by Xuan Gong (resourcemanager)
The roundup for memory values on resource manager UI is misleading

From the yarn-site.xml, I see following values- <property> <name>yarn.nodemanager.resource.memory-mb</name> <value>4192</value> </property> <property> <name>yarn.scheduler.maximum-allocation-mb</name> <value>4192</value> </property> <property> <name>yarn.scheduler.minimum-allocation-mb</name> <value>1024</value> </property> However the resourcemanager UI shows total memory as 5MB
YARN-876. Major bug reported by PengZhang and fixed by PengZhang (resourcemanager)
Node resource is added twice when node comes back from unhealthy to healthy

When an unhealthy restarts, its resource maybe added twice in scheduler. First time is at node's reconnection, while node's final state is still "UNHEALTHY". And second time is at node's update, while node's state changing from "UNHEALTHY" to "HEALTHY".
YARN-621. Critical sub-task reported by Allen Wittenauer and fixed by Omkar Vinit Joshi (resourcemanager)
RM triggers web auth failure before first job

On a secure YARN setup, before the first job is executed, going to the web interface of the resource manager triggers authentication errors.
YARN-49. Major sub-task reported by Hitesh Shah and fixed by Vinod Kumar Vavilapalli (applications/distributed-shell)
Improve distributed shell application to work on a secure cluster
MAPREDUCE-5562. Major sub-task reported by Zhijie Shen and fixed by Zhijie Shen
MR AM should exit when unregister() throws exception
MAPREDUCE-5554. Minor bug reported by Robert Kanter and fixed by Robert Kanter (test)
hdfs-site.xml included in hadoop-mapreduce-client-jobclient tests jar is breaking tests for downstream components
MAPREDUCE-5551. Blocker sub-task reported by Zhijie Shen and fixed by Zhijie Shen
Binary Incompatibility of O.A.H.U.mapred.SequenceFileAsBinaryOutputFormat.WritableValueBytes
MAPREDUCE-5545. Major bug reported by Robert Kanter and fixed by Robert Kanter
org.apache.hadoop.mapred.TestTaskAttemptListenerImpl.testCommitWindow times out
MAPREDUCE-5544. Major bug reported by Sandy Ryza and fixed by Sandy Ryza
JobClient#getJob loads job conf twice
MAPREDUCE-5538. Blocker sub-task reported by Zhijie Shen and fixed by Zhijie Shen
MRAppMaster#shutDownJob shouldn't send job end notification before checking isLastRetry
MAPREDUCE-5536. Blocker bug reported by Yesha Vora and fixed by Omkar Vinit Joshi
mapreduce.jobhistory.webapp.https.address property is not respected
MAPREDUCE-5533. Major bug reported by Tassapol Athiapinya and fixed by Xuan Gong (applicationmaster)
Speculative execution does not function for reduce
MAPREDUCE-5531. Blocker sub-task reported by Robert Kanter and fixed by Robert Kanter (mrv1 , mrv2)
Binary and source incompatibility in mapreduce.TaskID and mapreduce.TaskAttemptID between branch-1 and branch-2
MAPREDUCE-5530. Blocker sub-task reported by Robert Kanter and fixed by Robert Kanter (mrv1 , mrv2)
Binary and source incompatibility in mapred.lib.CombineFileInputFormat between branch-1 and branch-2
MAPREDUCE-5529. Blocker sub-task reported by Robert Kanter and fixed by Robert Kanter (mrv1 , mrv2)
Binary incompatibilities in mapred.lib.TotalOrderPartitioner between branch-1 and branch-2
MAPREDUCE-5525. Minor test reported by Chuan Liu and fixed by Chuan Liu (mrv2 , test)
Increase timeout of TestDFSIO.testAppend and TestMRJobsWithHistoryService.testJobHistoryData
MAPREDUCE-5523. Major bug reported by Omkar Vinit Joshi and fixed by Omkar Vinit Joshi
Need to add https port related property in Job history server
MAPREDUCE-5515. Major bug reported by Omkar Vinit Joshi and fixed by Omkar Vinit Joshi
Application Manager UI does not appear with Https enabled
MAPREDUCE-5513. Major bug reported by Jason Lowe and fixed by Robert Parker
ConcurrentModificationException in JobControl
MAPREDUCE-5505. Critical sub-task reported by Jian He and fixed by Zhijie Shen
Clients should be notified job finished only after job successfully unregistered
MAPREDUCE-5503. Blocker bug reported by Jason Lowe and fixed by Jian He (mrv2)
TestMRJobClient.testJobClient is failing
MAPREDUCE-5489. Critical bug reported by Yesha Vora and fixed by Zhijie Shen
MR jobs hangs as it does not use the node-blacklisting feature in RM requests
MAPREDUCE-5488. Major bug reported by Arpit Gupta and fixed by Jian He
Job recovery fails after killing all the running containers for the app
MAPREDUCE-5459. Major bug reported by Zhijie Shen and fixed by Zhijie Shen
Update the doc of running MRv1 examples jar on YARN
MAPREDUCE-5442. Major bug reported by Yingda Chen and fixed by Yingda Chen (client)
$HADOOP_MAPRED_HOME/$HADOOP_CONF_DIR setting not working on Windows
MAPREDUCE-5170. Trivial bug reported by Sangjin Lee and fixed by Sangjin Lee (mrv2)
incorrect exception message if min node size > min rack size
HDFS-5308. Major improvement reported by Haohui Mai and fixed by Haohui Mai
Replace HttpConfig#getSchemePrefix with implicit schemes in HDFS JSP
HDFS-5306. Major sub-task reported by Suresh Srinivas and fixed by Suresh Srinivas (datanode , namenode)
Datanode https port is not available at the namenode
HDFS-5300. Major bug reported by Vinay and fixed by Vinay (namenode)
FSNameSystem#deleteSnapshot() should not check owner in case of permissions disabled
HDFS-5299. Blocker bug reported by Vinay and fixed by Vinay (namenode)
DFS client hangs in updatePipeline RPC when failover happened
HDFS-5289. Major bug reported by Aaron T. Myers and fixed by Aaron T. Myers (test)
Race condition in TestRetryCacheWithHA#testCreateSymlink causes spurious test failure
HDFS-5279. Major bug reported by Chris Nauroth and fixed by Chris Nauroth (namenode)
Guard against NullPointerException in NameNode JSP pages before initialization of FSNamesystem.
HDFS-5268. Major bug reported by Brandon Li and fixed by Brandon Li (nfs)
NFS write commit verifier is not set in a few places
HDFS-5265. Major bug reported by Haohui Mai and fixed by Haohui Mai
Namenode fails to start when dfs.https.port is unspecified
HDFS-5259. Major sub-task reported by Yesha Vora and fixed by Brandon Li (nfs)
Support client which combines appended data with old data before sends it to NFS server
HDFS-5258. Minor bug reported by Chris Nauroth and fixed by Chuan Liu (test)
Skip tests in TestHDFSCLI that are not applicable on Windows.
HDFS-5256. Major improvement reported by Haohui Mai and fixed by Haohui Mai (nfs)
Use guava LoadingCache to implement DFSClientCache
HDFS-5255. Major bug reported by Yesha Vora and fixed by Arpit Agarwal
Distcp job fails with hsftp when https is enabled in insecure cluster
HDFS-5251. Major bug reported by Haohui Mai and fixed by Haohui Mai
Race between the initialization of NameNode and the http server
HDFS-5246. Major sub-task reported by Jinghui Wang and fixed by Jinghui Wang (nfs)
Make Hadoop nfs server port and mount daemon port configurable
HDFS-5230. Major sub-task reported by Haohui Mai and fixed by Haohui Mai (nfs)
Introduce RpcInfo to decouple XDR classes from the RPC API
HDFS-5228. Blocker bug reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (hdfs-client)
The RemoteIterator returned by DistributedFileSystem.listFiles(..) may throw NPE
HDFS-5186. Minor test reported by Chuan Liu and fixed by Chuan Liu (namenode , test)
TestFileJournalManager fails on Windows due to file handle leaks
HDFS-5139. Major improvement reported by Arpit Agarwal and fixed by Arpit Agarwal (tools)
Remove redundant -R option from setrep
HDFS-5031. Blocker bug reported by Vinay and fixed by Vinay (datanode)
BlockScanner scans the block multiple times and on restart scans everything
HDFS-4817. Minor improvement reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe (hdfs-client)
make HDFS advisory caching configurable on a per-file basis
HADOOP-10020. Blocker sub-task reported by Colin Patrick McCabe and fixed by Sanjay Radia (fs)
disable symlinks temporarily

During review of symbolic links, many issues were found related impact on semantics of existing APIs such FileSystem#listStatus, FileSystem#globStatus etc. There were also many issues brought up about symbolic links and the impact on security and functionality of HDFS. All these issues will be address in the upcoming release 2.3. Until then the feature is temporarily disabled.
HADOOP-10017. Major sub-task reported by Jing Zhao and fixed by Haohui Mai
Fix NPE in DFSClient#getDelegationToken when doing Distcp from a secured cluster to an insecured cluster
HADOOP-10012. Blocker bug reported by Arpit Gupta and fixed by Suresh Srinivas (ha)
Secure Oozie jobs fail with delegation token renewal exception in Namenode HA setup
HADOOP-10003. Major bug reported by Jason Dere and fixed by (fs)
HarFileSystem.listLocatedStatus() fails
HADOOP-9976. Major bug reported by Karthik Kambatla and fixed by Karthik Kambatla
Different versions of avro and avro-maven-plugin
HADOOP-9948. Minor test reported by Chuan Liu and fixed by Chuan Liu (test)
Add a config value to CLITestHelper to skip tests on Windows
HADOOP-9776. Major bug reported by shanyu zhao and fixed by shanyu zhao (fs)
HarFileSystem.listStatus() returns invalid authority if port number is empty
HADOOP-9761. Blocker bug reported by Andrew Wang and fixed by Andrew Wang (viewfs)
ViewFileSystem#rename fails when using DistributedFileSystem
HADOOP-9758. Major improvement reported by Andrew Wang and fixed by Andrew Wang
Provide configuration option for FileSystem/FileContext symlink resolution
HADOOP-8315. Major improvement reported by Todd Lipcon and fixed by Todd Lipcon (auto-failover , ha)
Support SASL-authenticated ZooKeeper in ActiveStandbyElector

Hadoop 2.1.1-beta Release Notes

These release notes include new developer and user-facing incompatibilities, features, and major improvements.

Changes since Hadoop 2.1.0-beta

YARN-1194. Minor bug reported by Roman Shaposhnik and fixed by Roman Shaposhnik (nodemanager)
TestContainerLogsPage fails with native builds

Running TestContainerLogsPage on trunk while Native IO is enabled makes it fail
YARN-1189. Blocker bug reported by Jason Lowe and fixed by Omkar Vinit Joshi
NMTokenSecretManagerInNM is not being told when applications have finished

The {{appFinished}} method is not being called when applications have finished. This causes a couple of leaks as {{oldMasterKeys}} and {{appToAppAttemptMap}} are never being pruned.
YARN-1184. Major bug reported by J.Andreina and fixed by Chris Douglas (capacityscheduler , resourcemanager)
ClassCastException is thrown during preemption When a huge job is submitted to a queue B whose resources is used by a job in queueA

preemption is enabled. Queue = a,b a capacity = 30% b capacity = 70% Step 1: Assign a big job to queue a ( so that job_a will utilize some resources from queue b) Step 2: Assigne a big job to queue b. Following exception is thrown at Resource Manager {noformat} 2013-09-12 10:42:32,535 ERROR [SchedulingMonitor (ProportionalCapacityPreemptionPolicy)] yarn.YarnUncaughtExceptionHandler (YarnUncaughtExceptionHandler.java:uncaughtException(68)) - Thread Thread[SchedulingMonitor (ProportionalCapacityPreemptionPolicy),5,main] threw an Exception. java.lang.ClassCastException: java.util.Collections$UnmodifiableSet cannot be cast to java.util.NavigableSet at org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy.getContainersToPreempt(ProportionalCapacityPreemptionPolicy.java:403) at org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy.containerBasedPreemptOrKill(ProportionalCapacityPreemptionPolicy.java:202) at org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy.editSchedule(ProportionalCapacityPreemptionPolicy.java:173) at org.apache.hadoop.yarn.server.resourcemanager.monitor.SchedulingMonitor.invokePolicy(SchedulingMonitor.java:72) at org.apache.hadoop.yarn.server.resourcemanager.monitor.SchedulingMonitor$PreemptionChecker.run(SchedulingMonitor.java:82) at java.lang.Thread.run(Thread.java:662) {noformat}
YARN-1176. Critical bug reported by Thomas Graves and fixed by Jonathan Eagles (resourcemanager)
RM web services ClusterMetricsInfo total nodes doesn't include unhealthy nodes

In the web services api for the cluster/metrics, the totalNodes reported doesn't include the unhealthy nodes. this.totalNodes = activeNodes + lostNodes + decommissionedNodes + rebootedNodes;
YARN-1170. Blocker bug reported by Arun C Murthy and fixed by Binglin Chang
yarn proto definitions should specify package as 'hadoop.yarn'

yarn proto definitions should specify package as 'hadoop.yarn' similar to protos with 'hadoop.common' & 'hadoop.hdfs' in Common & HDFS respectively.
YARN-1152. Blocker bug reported by Jason Lowe and fixed by Jason Lowe (resourcemanager)
Invalid key to HMAC computation error when getting application report for completed app attempt

On a secure cluster, an invalid key to HMAC error is thrown when trying to get an application report for an application with an attempt that has unregistered.
YARN-1144. Critical bug reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur (resourcemanager)
Unmanaged AMs registering a tracking URI should not be proxy-fied

Unmanaged AMs do not run in the cluster, their tracking URL should not be proxy-fied.
YARN-1137. Major improvement reported by Alejandro Abdelnur and fixed by Roman Shaposhnik (nodemanager)
Add support whitelist for system users to Yarn container-executor.c

Currently container-executor.c has a banned set of users (mapred, hdfs & bin) and configurable min.user.id (defaulting to 1000). This presents a problem for systems that run as system users (below 1000) if these systems want to start containers. Systems like Impala fit in this category. A (local) 'impala' system user is created when installing Impala on the nodes. Note that the same thing happens when installing system like HDFS, Yarn, Oozie, from packages (Bigtop); local system users are created. For Impala to be able to run containers in a secure cluster, the 'impala' system user must whitelisted. For this, adding a configuration 'allowed.system.users' option in the container-executor.cfg and the logic in container-executor.c would allow the usernames in that list. Because system users are not guaranteed to have the same UID in different machines, the 'allowed.system.users' property should use usernames and not UIDs.
YARN-1124. Blocker bug reported by Omkar Vinit Joshi and fixed by Xuan Gong
By default yarn application -list should display all the applications in a state other than FINISHED / FAILED

Today we are just listing application in RUNNING state by default for "yarn application -list". Instead we should show all the applications which are either submitted/accepted/running.
YARN-1120. Minor bug reported by Chuan Liu and fixed by Chuan Liu
Make ApplicationConstants.Environment.USER definition OS neutral

In YARN-557, we added some code to make {{ApplicationConstants.Environment.USER}} has OS-specific definition in order to fix the unit test TestUnmanagedAMLauncher. In YARN-571, the relevant test code was corrected. In YARN-602, we actually will explicitly set the environment variables for the child containers. With these changes, I think we can revert the YARN-557 change to make {{ApplicationConstants.Environment.USER}} OS neutral. The main benefit is that we can use the same method over the Enum constants. This should also fix the TestContainerLaunch#testContainerEnvVariables failure on Windows.
YARN-1117. Major improvement reported by Tassapol Athiapinya and fixed by Xuan Gong (client)
Improve help message for $ yarn applications and $yarn node

There is standardization of help message in YARN-1080. It is nice to have similar changes for $ yarn appications and yarn node
YARN-1116. Major sub-task reported by Jian He and fixed by Jian He (resourcemanager)
Populate AMRMTokens back to AMRMTokenSecretManager after RM restarts

The AMRMTokens are now only saved in RMStateStore and not populated back to AMRMTokenSecretManager after RM restarts. This is more needed now since AMRMToken also becomes used in non-secure env.
YARN-1107. Blocker bug reported by Arpit Gupta and fixed by Omkar Vinit Joshi (resourcemanager)
Job submitted with Delegation token in secured environment causes RM to fail during RM restart

If secure RM with recovery enabled is restarted while oozie jobs are running rm fails to come up.
YARN-1101. Major bug reported by Robert Parker and fixed by Robert Parker (resourcemanager)
Active nodes can be decremented below 0

The issue is in RMNodeImpl where both RUNNING and UNHEALTHY states that transition to a deactive state (LOST, DECOMMISSIONED, REBOOTED) use the same DeactivateNodeTransition class. The DeactivateNodeTransition class naturally decrements the active node, however the in cases where the node has transition to UNHEALTHY the active count has already been decremented.
YARN-1094. Blocker bug reported by Yesha Vora and fixed by Vinod Kumar Vavilapalli
RM restart throws Null pointer Exception in Secure Env

Enable rmrestart feature And restart Resorce Manager while a job is running. Resorce Manager fails to start with below error 2013-08-23 17:57:40,705 INFO resourcemanager.RMAppManager (RMAppManager.java:recover(370)) - Recovering application application_1377280618693_0001 2013-08-23 17:57:40,763 ERROR resourcemanager.ResourceManager (ResourceManager.java:serviceStart(617)) - Failed to load/recover state java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.setTimerForTokenRenewal(DelegationTokenRenewer.java:371) at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.addApplication(DelegationTokenRenewer.java:307) at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:291) at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:371) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:819) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:613) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:832) 2013-08-23 17:57:40,766 INFO util.ExitUtil (ExitUtil.java:terminate(124)) - Exiting with status 1
YARN-1093. Major bug reported by Wing Yew Poon and fixed by (documentation)
Corrections to Fair Scheduler documentation

The fair scheduler is still evolving, but the current documentation contains some inaccuracies.
YARN-1085. Blocker task reported by Jaimin D Jetly and fixed by Omkar Vinit Joshi (nodemanager , resourcemanager)
Yarn and MRv2 should do HTTP client authentication in kerberos setup.

In kerberos setup it's expected for a http client to authenticate to kerberos before allowing user to browse any information.
YARN-1083. Major bug reported by Yesha Vora and fixed by Zhijie Shen (resourcemanager)
ResourceManager should fail when yarn.nm.liveness-monitor.expiry-interval-ms is set less than heartbeat interval

if 'yarn.nm.liveness-monitor.expiry-interval-ms' is set to less than heartbeat iterval, all the node managers will be added in 'Lost Nodes' Instead, Resource Manager should validate these property and It should fail to start if combination of such property is invalid.
YARN-1082. Blocker bug reported by Arpit Gupta and fixed by Vinod Kumar Vavilapalli (resourcemanager)
Secure RM with recovery enabled and rm state store on hdfs fails with gss exception
YARN-1081. Minor improvement reported by Tassapol Athiapinya and fixed by Akira AJISAKA (client)
Minor improvement to output header for $ yarn node -list

Output of $ yarn node -list shows number of running containers at each node. I found a case when new user of YARN thinks that this is container ID, use it later in other YARN commands and find an error due to misunderstanding. {code:title=current output} 2013-07-31 04:00:37,814|beaver.machine|INFO|RUNNING: /usr/bin/yarn node -list 2013-07-31 04:00:38,746|beaver.machine|INFO|Total Nodes:1 2013-07-31 04:00:38,747|beaver.machine|INFO|Node-Id Node-State Node-Http-Address Running-Containers 2013-07-31 04:00:38,747|beaver.machine|INFO|myhost:45454 RUNNING myhost:50060 2 {code} {code:title=proposed output} 2013-07-31 04:00:37,814|beaver.machine|INFO|RUNNING: /usr/bin/yarn node -list 2013-07-31 04:00:38,746|beaver.machine|INFO|Total Nodes:1 2013-07-31 04:00:38,747|beaver.machine|INFO|Node-Id Node-State Node-Http-Address Number-of-Running-Containers 2013-07-31 04:00:38,747|beaver.machine|INFO|myhost:45454 RUNNING myhost:50060 2 {code}
YARN-1080. Major improvement reported by Tassapol Athiapinya and fixed by Xuan Gong (client)
Improve help message for $ yarn logs

There are 2 parts I am proposing in this jira. They can be fixed together in one patch. 1. Standardize help message for required parameter of $ yarn logs YARN CLI has a command "logs" ($ yarn logs). The command always requires a parameter of "-applicationId <arg>". However, help message of the command does not make it clear. It lists -applicationId as optional parameter. If I don't set it, YARN CLI will complain this is missing. It is better to use standard required notation used in other Linux command for help message. Any user familiar to the command can understand that this parameter is needed more easily. {code:title=current help message} -bash-4.1$ yarn logs usage: general options are: -applicationId <arg> ApplicationId (required) -appOwner <arg> AppOwner (assumed to be current user if not specified) -containerId <arg> ContainerId (must be specified if node address is specified) -nodeAddress <arg> NodeAddress in the format nodename:port (must be specified if container id is specified) {code} {code:title=proposed help message} -bash-4.1$ yarn logs usage: yarn logs -applicationId <application ID> [OPTIONS] general options are: -appOwner <arg> AppOwner (assumed to be current user if not specified) -containerId <arg> ContainerId (must be specified if node address is specified) -nodeAddress <arg> NodeAddress in the format nodename:port (must be specified if container id is specified) {code} 2. Add description for help command. As far as I know, a user cannot get logs for running job. Since I spent some time trying to get logs of running applications, it should be nice to say this in command description. {code:title=proposed help} Retrieve logs for completed/killed YARN application usage: general options are... {code}
YARN-1078. Minor bug reported by Chuan Liu and fixed by Chuan Liu
TestNodeManagerResync, TestNodeManagerShutdown, and TestNodeStatusUpdater fail on Windows

The three unit tests fail on Windows due to host name resolution differences on Windows, i.e. 127.0.0.1 does not resolve to host name "localhost". {noformat} org.apache.hadoop.security.token.SecretManager$InvalidToken: Given Container container_0_0000_01_000000 identifier is not valid for current Node manager. Expected : 127.0.0.1:12345 Found : localhost:12345 {noformat} {noformat} testNMConnectionToRM(org.apache.hadoop.yarn.server.nodemanager.TestNodeStatusUpdater) Time elapsed: 8343 sec <<< FAILURE! org.junit.ComparisonFailure: expected:<[localhost]:12345> but was:<[127.0.0.1]:12345> at org.junit.Assert.assertEquals(Assert.java:125) at org.junit.Assert.assertEquals(Assert.java:147) at org.apache.hadoop.yarn.server.nodemanager.TestNodeStatusUpdater$MyResourceTracker6.registerNodeManager(TestNodeStatusUpdater.java:712) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:101) at $Proxy26.registerNodeManager(Unknown Source) at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.registerWithRM(NodeStatusUpdaterImpl.java:212) at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.serviceStart(NodeStatusUpdaterImpl.java:149) at org.apache.hadoop.yarn.server.nodemanager.TestNodeStatusUpdater$MyNodeStatusUpdater4.serviceStart(TestNodeStatusUpdater.java:369) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) at org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:101) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceStart(NodeManager.java:213) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) at org.apache.hadoop.yarn.server.nodemanager.TestNodeStatusUpdater.testNMConnectionToRM(TestNodeStatusUpdater.java:985) {noformat}
YARN-1077. Minor bug reported by Chuan Liu and fixed by Chuan Liu
TestContainerLaunch fails on Windows

Several cases in this unit tests fail on Windows. (Append error log at the end.) testInvalidEnvSyntaxDiagnostics fails because the difference between cmd and bash script error handling. If some command fails in the cmd script, cmd will continue execute the the rest of the script command. Error handling needs to be explicitly carried out in the script file. The error code of the last command will be returned as the error code of the whole script. In this test, some error happened in the middle of the cmd script, the test expect an exception and non-zero error code. In the cmd script, the intermediate errors are ignored. The last command "call" succeeded and there is no exception. testContainerLaunchStdoutAndStderrDiagnostics fails due to wrong cmd commands used by the test. testContainerEnvVariables and testDelayedKill fail due to a regression from YARN-906. {noformat} ------------------------------------------------------------------------------- Test set: org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch ------------------------------------------------------------------------------- Tests run: 7, Failures: 4, Errors: 0, Skipped: 0, Time elapsed: 11.526 sec <<< FAILURE! testInvalidEnvSyntaxDiagnostics(org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch) Time elapsed: 583 sec <<< FAILURE! junit.framework.AssertionFailedError: Should catch exception at junit.framework.Assert.fail(Assert.java:50) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch.testInvalidEnvSyntaxDiagnostics(TestContainerLaunch.java:269) ... testContainerLaunchStdoutAndStderrDiagnostics(org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch) Time elapsed: 561 sec <<< FAILURE! junit.framework.AssertionFailedError: Should catch exception at junit.framework.Assert.fail(Assert.java:50) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch.testContainerLaunchStdoutAndStderrDiagnostics(TestContainerLaunch.java:314) ... testContainerEnvVariables(org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch) Time elapsed: 4136 sec <<< FAILURE! junit.framework.AssertionFailedError: expected:<137> but was:<143> at junit.framework.Assert.fail(Assert.java:50) at junit.framework.Assert.failNotEquals(Assert.java:287) at junit.framework.Assert.assertEquals(Assert.java:67) at junit.framework.Assert.assertEquals(Assert.java:199) at junit.framework.Assert.assertEquals(Assert.java:205) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch.testContainerEnvVariables(TestContainerLaunch.java:500) ... testDelayedKill(org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch) Time elapsed: 2744 sec <<< FAILURE! junit.framework.AssertionFailedError: expected:<137> but was:<143> at junit.framework.Assert.fail(Assert.java:50) at junit.framework.Assert.failNotEquals(Assert.java:287) at junit.framework.Assert.assertEquals(Assert.java:67) at junit.framework.Assert.assertEquals(Assert.java:199) at junit.framework.Assert.assertEquals(Assert.java:205) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch.testDelayedKill(TestContainerLaunch.java:601) ... {noformat}
YARN-1074. Major improvement reported by Tassapol Athiapinya and fixed by Xuan Gong (client)
Clean up YARN CLI app list to show only running apps.

Once a user brings up YARN daemon, runs jobs, jobs will stay in output returned by $ yarn application -list even after jobs complete already. We want YARN command line to clean up this list. Specifically, we want to remove applications with FINISHED state(not Final-State) or KILLED state from the result. {code} [user1@host1 ~]$ yarn application -list Total Applications:150 Application-Id Application-Name Application-Type User Queue State Final-State Progress Tracking-URL application_1374638600275_0109 Sleep job MAPREDUCE user1 default KILLED KILLED 100% host1:54059 application_1374638600275_0121 Sleep job MAPREDUCE user1 default FINISHED SUCCEEDED 100% host1:19888/jobhistory/job/job_1374638600275_0121 application_1374638600275_0020 Sleep job MAPREDUCE user1 default FINISHED SUCCEEDED 100% host1:19888/jobhistory/job/job_1374638600275_0020 application_1374638600275_0038 Sleep job MAPREDUCE user1 default .... {code}
YARN-1049. Blocker bug reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur (api)
ContainerExistStatus should define a status for preempted containers

With the current behavior is impossible to determine if a container has been preempted or lost due to a NM crash. Adding a PREEMPTED exit status (-102) will help an AM determine that a container has been preempted. Note the change of scope from the original summary/description. The original scope proposed API/behavior changes. Because we are passed 2.1.0-beta I'm reducing the scope of this JIRA.
YARN-1034. Trivial task reported by Sandy Ryza and fixed by Karthik Kambatla (documentation , scheduler)
Remove "experimental" in the Fair Scheduler documentation

The YARN Fair Scheduler is largely stable now, and should no longer be declared experimental.
YARN-1025. Major bug reported by Chris Nauroth and fixed by Chris Nauroth (nodemanager , resourcemanager)
ResourceManager and NodeManager do not load native libraries on Windows.

ResourceManager and NodeManager do not have the correct setting for java.library.path when launched on Windows. This prevents the processes from loading native code from hadoop.dll. The native code is required for correct functioning on Windows (not optional), so this ultimately can cause failures.
YARN-1008. Major bug reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur (nodemanager)
MiniYARNCluster with multiple nodemanagers, all nodes have same key for allocations

While the NMs are keyed using the NodeId, the allocation is done based on the hostname. This makes the different nodes indistinguishable to the scheduler. There should be an option to enabled the host:port instead just port for allocations. The nodes reported to the AM should report the 'key' (host or host:port).
YARN-1006. Major bug reported by Jian He and fixed by Xuan Gong
Nodes list web page on the RM web UI is broken

The nodes web page which list all the connected nodes of the cluster is broken. 1. The page is not showing in correct format/style. 2. If we restart the NM, the node list is not refreshed, but just add the new started NM to the list. The old NMs information still remain.
YARN-1001. Blocker task reported by Srimanth Gunturi and fixed by Zhijie Shen (api)
YARN should provide per application-type and state statistics

In Ambari we plan to show for MR2 the number of applications finished, running, waiting, etc. It would be efficient if YARN could provide per application-type and state aggregated counts.
YARN-994. Major bug reported by Xuan Gong and fixed by Xuan Gong
HeartBeat thread in AMRMClientAsync does not handle runtime exception correctly

YARN-654 performs sanity checks for parameters of public methods in AMRMClient. Those may create runtime exception. Currently, heartBeat thread in AMRMClientAsync only captures IOException and YarnException, and will not handle Runtime Exception properly. Possible solution can be: heartbeat thread will catch throwable and notify the callbackhandler thread via existing savedException
YARN-981. Major bug reported by Xuan Gong and fixed by Jian He
YARN/MR2/Job-history /logs link does not have correct content
YARN-966. Major bug reported by Zhijie Shen and fixed by Zhijie Shen
The thread of ContainerLaunch#call will fail without any signal if getLocalizedResources() is called when the container is not at LOCALIZED

In ContainerImpl.getLocalizedResources(), there's: {code} assert ContainerState.LOCALIZED == getContainerState(); // TODO: FIXME!! {code} ContainerImpl.getLocalizedResources() is called in ContainerLaunch.call(), which is scheduled on a separate thread. If the container is not at LOCALIZED (e.g. it is at KILLING, see YARN-906), an AssertError will be thrown and fails the thread without notifying NM. Therefore, the container cannot receive more events, which are supposed to be sent from ContainerLaunch.call(), and move towards completion.
YARN-957. Blocker bug reported by Omkar Vinit Joshi and fixed by Omkar Vinit Joshi
Capacity Scheduler tries to reserve the memory more than what node manager reports.

I have 2 node managers. * one with 1024 MB memory.(nm1) * second with 2048 MB memory.(nm2) I am submitting simple map reduce application with 1 mapper and one reducer with 1024mb each. The steps to reproduce this are * stop nm2 with 2048MB memory.( This I am doing to make sure that this node's heartbeat doesn't reach RM first). * now submit application. As soon as it receives first node's (nm1) heartbeat it will try to reserve memory for AM-container (2048MB). However it has only 1024MB of memory. * now start nm2 with 2048 MB memory. It hangs forever... Ideally this has two potential issues. * It should not try to reserve memory on a node manager which is never going to give requested memory. i.e. Current max capability of node manager is 1024MB but 2048MB is reserved on it. But it still does that. * Say 2048MB is reserved on nm1 but nm2 comes back with 2048MB available memory. In this case if the original request was made without any locality then scheduler should unreserve memory on nm1 and allocate requested 2048MB container on nm2.
YARN-948. Major bug reported by Omkar Vinit Joshi and fixed by Omkar Vinit Joshi
RM should validate the release container list before actually releasing them

At present we are blinding passing the allocate request containing containers to be released to the scheduler. This may result into one application releasing another application's container. {code} @Override @Lock(Lock.NoLock.class) public Allocation allocate(ApplicationAttemptId applicationAttemptId, List<ResourceRequest> ask, List<ContainerId> release, List<String> blacklistAdditions, List<String> blacklistRemovals) { FiCaSchedulerApp application = getApplication(applicationAttemptId); .... .... // Release containers for (ContainerId releasedContainerId : release) { RMContainer rmContainer = getRMContainer(releasedContainerId); if (rmContainer == null) { RMAuditLogger.logFailure(application.getUser(), AuditConstants.RELEASE_CONTAINER, "Unauthorized access or invalid container", "CapacityScheduler", "Trying to release container not owned by app or with invalid id", application.getApplicationId(), releasedContainerId); } completedContainer(rmContainer, SchedulerUtils.createAbnormalContainerStatus( releasedContainerId, SchedulerUtils.RELEASED_CONTAINER), RMContainerEventType.RELEASED); } {code} Current checks are not sufficient and we should prevent this..... thoughts?
YARN-942. Major bug reported by Sandy Ryza and fixed by Akira AJISAKA (scheduler)
In Fair Scheduler documentation, inconsistency on which properties have prefix

locality.threshold.node and locality.threshold.rack should have the yarn.scheduler.fair prefix like the items before them http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html
YARN-910. Major improvement reported by Sandy Ryza and fixed by Alejandro Abdelnur (nodemanager)
Allow auxiliary services to listen for container starts and completions

Making container start and completion events available to auxiliary services would allow them to be resource-aware. The auxiliary service would be able to notify a co-located service that is opportunistically using free capacity of allocation changes.
YARN-906. Major sub-task reported by Zhijie Shen and fixed by Zhijie Shen
Cancelling ContainerLaunch#call at KILLING causes that the container cannot be completed

See https://builds.apache.org/job/PreCommit-YARN-Build/1435//testReport/org.apache.hadoop.yarn.client.api.impl/TestNMClient/testNMClientNoCleanupOnStop/
YARN-903. Major bug reported by Abhishek Kapoor and fixed by Omkar Vinit Joshi (applications/distributed-shell)
DistributedShell throwing Errors in logs after successfull completion

I have tried running DistributedShell and also used ApplicationMaster of the same for my test. The application is successfully running through logging some errors which would be useful to fix. Below are the logs from NodeManager and ApplicationMasterode Log Snippet for NodeManager ============================= 2013-07-07 13:39:18,787 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Connecting to ResourceManager at localhost/127.0.0.1:9990. current no. of attempts is 1 2013-07-07 13:39:19,050 INFO org.apache.hadoop.yarn.server.nodemanager.security.NMContainerTokenSecretManager: Rolling master-key for container-tokens, got key with id -325382586 2013-07-07 13:39:19,052 INFO org.apache.hadoop.yarn.server.nodemanager.security.NMTokenSecretManagerInNM: Rolling master-key for nm-tokens, got key with id :1005046570 2013-07-07 13:39:19,053 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Registered with ResourceManager as sunny-Inspiron:9993 with total resource of <memory:10240, vCores:8> 2013-07-07 13:39:19,053 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Notifying ContainerManager to unblock new container-requests 2013-07-07 13:39:35,256 INFO SecurityLogger.org.apache.hadoop.ipc.Server: Auth successful for appattempt_1373184544832_0001_000001 (auth:SIMPLE) 2013-07-07 13:39:35,492 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: Start request for container_1373184544832_0001_01_000001 by user sunny 2013-07-07 13:39:35,507 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: Creating a new application reference for app application_1373184544832_0001 2013-07-07 13:39:35,511 INFO org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=sunny IP=127.0.0.1 OPERATION=Start Container Request TARGET=ContainerManageImpl RESULT=SUCCESS APPID=application_1373184544832_0001 CONTAINERID=container_1373184544832_0001_01_000001 2013-07-07 13:39:35,511 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Application application_1373184544832_0001 transitioned from NEW to INITING 2013-07-07 13:39:35,512 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Adding container_1373184544832_0001_01_000001 to application application_1373184544832_0001 2013-07-07 13:39:35,518 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Application application_1373184544832_0001 transitioned from INITING to RUNNING 2013-07-07 13:39:35,528 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1373184544832_0001_01_000001 transitioned from NEW to LOCALIZING 2013-07-07 13:39:35,540 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource: Resource hdfs://localhost:9000/application/test.jar transitioned from INIT to DOWNLOADING 2013-07-07 13:39:35,540 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Created localizer for container_1373184544832_0001_01_000001 2013-07-07 13:39:35,675 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Writing credentials to the nmPrivate file /home/sunny/Hadoop2/hadoopdata/nodemanagerdata/nmPrivate/container_1373184544832_0001_01_000001.tokens. Credentials list: 2013-07-07 13:39:35,694 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Initializing user sunny 2013-07-07 13:39:35,803 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Copying from /home/sunny/Hadoop2/hadoopdata/nodemanagerdata/nmPrivate/container_1373184544832_0001_01_000001.tokens to /home/sunny/Hadoop2/hadoopdata/nodemanagerdata/usercache/sunny/appcache/application_1373184544832_0001/container_1373184544832_0001_01_000001.tokens 2013-07-07 13:39:35,803 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: CWD set to /home/sunny/Hadoop2/hadoopdata/nodemanagerdata/usercache/sunny/appcache/application_1373184544832_0001 = file:/home/sunny/Hadoop2/hadoopdata/nodemanagerdata/usercache/sunny/appcache/application_1373184544832_0001 2013-07-07 13:39:36,136 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Sending out status for container: container_id {, app_attempt_id {, application_id {, id: 1, cluster_timestamp: 1373184544832, }, attemptId: 1, }, id: 1, }, state: C_RUNNING, diagnostics: "", exit_status: -1000, 2013-07-07 13:39:36,406 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource: Resource hdfs://localhost:9000/application/test.jar transitioned from DOWNLOADING to LOCALIZED 2013-07-07 13:39:36,409 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1373184544832_0001_01_000001 transitioned from LOCALIZING to LOCALIZED 2013-07-07 13:39:36,524 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1373184544832_0001_01_000001 transitioned from LOCALIZED to RUNNING 2013-07-07 13:39:36,692 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: launchContainer: [bash, -c, /home/sunny/Hadoop2/hadoopdata/nodemanagerdata/usercache/sunny/appcache/application_1373184544832_0001/container_1373184544832_0001_01_000001/default_container_executor.sh] 2013-07-07 13:39:37,144 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Sending out status for container: container_id {, app_attempt_id {, application_id {, id: 1, cluster_timestamp: 1373184544832, }, attemptId: 1, }, id: 1, }, state: C_RUNNING, diagnostics: "", exit_status: -1000, 2013-07-07 13:39:38,147 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Sending out status for container: container_id {, app_attempt_id {, application_id {, id: 1, cluster_timestamp: 1373184544832, }, attemptId: 1, }, id: 1, }, state: C_RUNNING, diagnostics: "", exit_status: -1000, 2013-07-07 13:39:39,151 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Sending out status for container: container_id {, app_attempt_id {, application_id {, id: 1, cluster_timestamp: 1373184544832, }, attemptId: 1, }, id: 1, }, state: C_RUNNING, diagnostics: "", exit_status: -1000, 2013-07-07 13:39:39,209 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Starting resource-monitoring for container_1373184544832_0001_01_000001 2013-07-07 13:39:39,259 WARN org.apache.hadoop.yarn.util.ProcfsBasedProcessTree: Unexpected: procfs stat file is not in the expected format for process with pid 11552 2013-07-07 13:39:39,264 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Memory usage of ProcessTree 29524 for container-id container_1373184544832_0001_01_000001: 79.9 MB of 1 GB physical memory used; 2.2 GB of 2.1 GB virtual memory used 2013-07-07 13:39:39,645 INFO SecurityLogger.org.apache.hadoop.ipc.Server: Auth successful for appattempt_1373184544832_0001_000001 (auth:SIMPLE) 2013-07-07 13:39:39,651 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: Start request for container_1373184544832_0001_01_000002 by user sunny 2013-07-07 13:39:39,651 INFO org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=sunny IP=127.0.0.1 OPERATION=Start Container Request TARGET=ContainerManageImpl RESULT=SUCCESS APPID=application_1373184544832_0001 CONTAINERID=container_1373184544832_0001_01_000002 2013-07-07 13:39:39,651 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Adding container_1373184544832_0001_01_000002 to application application_1373184544832_0001 2013-07-07 13:39:39,652 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1373184544832_0001_01_000002 transitioned from NEW to LOCALIZED 2013-07-07 13:39:39,660 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: Getting container-status for container_1373184544832_0001_01_000002 2013-07-07 13:39:39,661 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: Returning container_id {, app_attempt_id {, application_id {, id: 1, cluster_timestamp: 1373184544832, }, attemptId: 1, }, id: 2, }, state: C_RUNNING, diagnostics: "", exit_status: -1000, 2013-07-07 13:39:39,728 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1373184544832_0001_01_000002 transitioned from LOCALIZED to RUNNING 2013-07-07 13:39:39,873 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: launchContainer: [bash, -c, /home/sunny/Hadoop2/hadoopdata/nodemanagerdata/usercache/sunny/appcache/application_1373184544832_0001/container_1373184544832_0001_01_000002/default_container_executor.sh] 2013-07-07 13:39:39,898 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: Container container_1373184544832_0001_01_000002 succeeded 2013-07-07 13:39:39,899 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1373184544832_0001_01_000002 transitioned from RUNNING to EXITED_WITH_SUCCESS 2013-07-07 13:39:39,900 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: Cleaning up container container_1373184544832_0001_01_000002 2013-07-07 13:39:39,942 INFO org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=sunny OPERATION=Container Finished - Succeeded TARGET=ContainerImpl RESULT=SUCCESS APPID=application_1373184544832_0001 CONTAINERID=container_1373184544832_0001_01_000002 2013-07-07 13:39:39,943 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1373184544832_0001_01_000002 transitioned from EXITED_WITH_SUCCESS to DONE 2013-07-07 13:39:39,944 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Removing container_1373184544832_0001_01_000002 from application application_1373184544832_0001 2013-07-07 13:39:40,155 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Sending out status for container: container_id {, app_attempt_id {, application_id {, id: 1, cluster_timestamp: 1373184544832, }, attemptId: 1, }, id: 1, }, state: C_RUNNING, diagnostics: "", exit_status: -1000, 2013-07-07 13:39:40,157 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Sending out status for container: container_id {, app_attempt_id {, application_id {, id: 1, cluster_timestamp: 1373184544832, }, attemptId: 1, }, id: 2, }, state: C_COMPLETE, diagnostics: "", exit_status: 0, 2013-07-07 13:39:40,158 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Removed completed container container_1373184544832_0001_01_000002 2013-07-07 13:39:40,683 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: Getting container-status for container_1373184544832_0001_01_000002 2013-07-07 13:39:40,686 ERROR org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:appattempt_1373184544832_0001_000001 (auth:TOKEN) cause:org.apache.hadoop.yarn.exceptions.YarnException: Container container_1373184544832_0001_01_000002 is not handled by this NodeManager 2013-07-07 13:39:40,687 INFO org.apache.hadoop.ipc.Server: IPC Server handler 4 on 9993, call org.apache.hadoop.yarn.api.ContainerManagementProtocolPB.stopContainer from 127.0.0.1:51085: error: org.apache.hadoop.yarn.exceptions.YarnException: Container container_1373184544832_0001_01_000002 is not handled by this NodeManager org.apache.hadoop.yarn.exceptions.YarnException: Container container_1373184544832_0001_01_000002 is not handled by this NodeManager at org.apache.hadoop.yarn.ipc.RPCUtil.getRemoteException(RPCUtil.java:45) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.authorizeGetAndStopContainerRequest(ContainerManagerImpl.java:614) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.stopContainer(ContainerManagerImpl.java:538) at org.apache.hadoop.yarn.api.impl.pb.service.ContainerManagementProtocolPBServiceImpl.stopContainer(ContainerManagementProtocolPBServiceImpl.java:88) at org.apache.hadoop.yarn.proto.ContainerManagementProtocol$ContainerManagementProtocolService$2.callBlockingMethod(ContainerManagementProtocol.java:85) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:605) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1033) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1868) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1864) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1489) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1862) 2013-07-07 13:39:41,162 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Sending out status for container: container_id {, app_attempt_id {, application_id {, id: 1, cluster_timestamp: 1373184544832, }, attemptId: 1, }, id: 1, }, state: C_RUNNING, diagnostics: "", exit_status: -1000, 2013-07-07 13:39:41,691 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: Container container_1373184544832_0001_01_000001 succeeded 2013-07-07 13:39:41,692 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1373184544832_0001_01_000001 transitioned from RUNNING to EXITED_WITH_SUCCESS 2013-07-07 13:39:41,692 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: Cleaning up container container_1373184544832_0001_01_000001 2013-07-07 13:39:41,714 INFO org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=sunny OPERATION=Container Finished - Succeeded TARGET=ContainerImpl RESULT=SUCCESS APPID=application_1373184544832_0001 CONTAINERID=container_1373184544832_0001_01_000001 2013-07-07 13:39:41,714 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1373184544832_0001_01_000001 transitioned from EXITED_WITH_SUCCESS to DONE 2013-07-07 13:39:41,714 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Removing container_1373184544832_0001_01_000001 from application application_1373184544832_0001 2013-07-07 13:39:42,166 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Sending out status for container: container_id {, app_attempt_id {, application_id {, id: 1, cluster_timestamp: 1373184544832, }, attemptId: 1, }, id: 1, }, state: C_COMPLETE, diagnostics: "", exit_status: 0, 2013-07-07 13:39:42,166 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Removed completed container container_1373184544832_0001_01_000001 2013-07-07 13:39:42,191 INFO SecurityLogger.org.apache.hadoop.ipc.Server: Auth successful for appattempt_1373184544832_0001_000001 (auth:SIMPLE) 2013-07-07 13:39:42,195 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: Getting container-status for container_1373184544832_0001_01_000001 2013-07-07 13:39:42,196 ERROR org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:appattempt_1373184544832_0001_000001 (auth:TOKEN) cause:org.apache.hadoop.yarn.exceptions.YarnException: Container container_1373184544832_0001_01_000001 is not handled by this NodeManager 2013-07-07 13:39:42,196 INFO org.apache.hadoop.ipc.Server: IPC Server handler 5 on 9993, call org.apache.hadoop.yarn.api.ContainerManagementProtocolPB.stopContainer from 127.0.0.1:51086: error: org.apache.hadoop.yarn.exceptions.YarnException: Container container_1373184544832_0001_01_000001 is not handled by this NodeManager org.apache.hadoop.yarn.exceptions.YarnException: Container container_1373184544832_0001_01_000001 is not handled by this NodeManager at org.apache.hadoop.yarn.ipc.RPCUtil.getRemoteException(RPCUtil.java:45) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.authorizeGetAndStopContainerRequest(ContainerManagerImpl.java:614) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.stopContainer(ContainerManagerImpl.java:538) at org.apache.hadoop.yarn.api.impl.pb.service.ContainerManagementProtocolPBServiceImpl.stopContainer(ContainerManagementProtocolPBServiceImpl.java:88) at org.apache.hadoop.yarn.proto.ContainerManagementProtocol$ContainerManagementProtocolService$2.callBlockingMethod(ContainerManagementProtocol.java:85) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:605) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1033) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1868) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1864) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1489) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1862) 2013-07-07 13:39:42,264 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Starting resource-monitoring for container_1373184544832_0001_01_000002 2013-07-07 13:39:42,265 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Stopping resource-monitoring for container_1373184544832_0001_01_000002 2013-07-07 13:39:42,265 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Stopping resource-monitoring for container_1373184544832_0001_01_000001 2013-07-07 13:39:43,173 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Application application_1373184544832_0001 transitioned from RUNNING to APPLICATION_RESOURCES_CLEANINGUP 2013-07-07 13:39:43,174 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: Got event APPLICATION_STOP for appId application_1373184544832_0001 2013-07-07 13:39:43,180 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Application application_1373184544832_0001 transitioned from APPLICATION_RESOURCES_CLEANINGUP to FINISHED 2013-07-07 13:39:43,180 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.loghandler.NonAggregatingLogHandler: Scheduling Log Deletion for application: application_1373184544832_0001, with delay of 10800 seconds Log Snippet for Application Manager ================================== 13/07/07 13:39:36 INFO client.SimpleApplicationMaster: Initializing ApplicationMaster 13/07/07 13:39:37 INFO client.SimpleApplicationMaster: Application master for app, appId=1, clustertimestamp=1373184544832, attemptId=1 13/07/07 13:39:37 INFO client.SimpleApplicationMaster: Starting ApplicationMaster 13/07/07 13:39:37 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 13/07/07 13:39:37 INFO impl.NMClientAsyncImpl: Upper bound of the thread pool size is 500 13/07/07 13:39:37 INFO impl.ContainerManagementProtocolProxy: yarn.client.max-nodemanagers-proxies : 500 13/07/07 13:39:37 INFO client.SimpleApplicationMaster: Max mem capabililty of resources in this cluster 8192 13/07/07 13:39:37 INFO client.SimpleApplicationMaster: Requested container ask: Capability[<memory:100, vCores:0>]Priority[0]ContainerCount[1] 13/07/07 13:39:39 INFO client.SimpleApplicationMaster: Got response from RM for container ask, allocatedCnt=1 13/07/07 13:39:39 INFO client.SimpleApplicationMaster: Launching shell command on a new container., containerId=container_1373184544832_0001_01_000002, containerNode=sunny-Inspiron:9993, containerNodeURI=sunny-Inspiron:8042, containerResourceMemory1024 13/07/07 13:39:39 INFO client.SimpleApplicationMaster: Setting up container launch container for containerid=container_1373184544832_0001_01_000002 13/07/07 13:39:39 INFO impl.NMClientAsyncImpl: Processing Event EventType: START_CONTAINER for Container container_1373184544832_0001_01_000002 13/07/07 13:39:39 INFO impl.ContainerManagementProtocolProxy: Opening proxy : sunny-Inspiron:9993 13/07/07 13:39:39 INFO client.SimpleApplicationMaster: Succeeded to start Container container_1373184544832_0001_01_000002 13/07/07 13:39:39 INFO impl.NMClientAsyncImpl: Processing Event EventType: QUERY_CONTAINER for Container container_1373184544832_0001_01_000002 13/07/07 13:39:40 INFO client.SimpleApplicationMaster: Got response from RM for container ask, completedCnt=1 13/07/07 13:39:40 INFO client.SimpleApplicationMaster: Got container status for containerID=container_1373184544832_0001_01_000002, state=COMPLETE, exitStatus=0, diagnostics= 13/07/07 13:39:40 INFO client.SimpleApplicationMaster: Container completed successfully., containerId=container_1373184544832_0001_01_000002 13/07/07 13:39:40 INFO client.SimpleApplicationMaster: Application completed. Stopping running containers 13/07/07 13:39:40 ERROR impl.NMClientImpl: Failed to stop Container container_1373184544832_0001_01_000002when stopping NMClientImpl 13/07/07 13:39:40 INFO impl.ContainerManagementProtocolProxy: Closing proxy : sunny-Inspiron:9993 13/07/07 13:39:40 INFO client.SimpleApplicationMaster: Application completed. Signalling finish to RM 13/07/07 13:39:41 INFO impl.AMRMClientAsyncImpl: Interrupted while waiting for queue java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:1899) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1934) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399) at org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl$CallbackHandlerThread.run(AMRMClientAsyncImpl.java:281) 13/07/07 13:39:41 INFO client.SimpleApplicationMaster: Application Master completed successfully. exiting
YARN-881. Major bug reported by Jian He and fixed by Jian He
Priority#compareTo method seems to be wrong.

if lower int value means higher priority, shouldn't we "return other.getPriority() - this.getPriority() "
YARN-771. Major sub-task reported by Bikas Saha and fixed by Junping Du
AMRMClient support for resource blacklisting

After YARN-750 AMRMClient should support blacklisting via the new YARN API's
YARN-758. Minor improvement reported by Bikas Saha and fixed by Karthik Kambatla
Augment MockNM to use multiple cores

YARN-757 got fixed by changing the scheduler from Fair to default (which is capacity).
YARN-707. Blocker improvement reported by Bikas Saha and fixed by Jason Lowe
Add user info in the YARN ClientToken

If user info is present in the client token then it can be used to do limited authz in the AM.
YARN-696. Major improvement reported by Trevor Lorimer and fixed by Trevor Lorimer (resourcemanager)
Enable multiple states to to be specified in Resource Manager apps REST call

Within the YARN Resource Manager REST API the GET call which returns all Applications can be filtered by a single State query parameter (http://<rm http address:port>/ws/v1/cluster/apps). There are 8 possible states (New, Submitted, Accepted, Running, Finishing, Finished, Failed, Killed), if no state parameter is specified all states are returned, however if a sub-set of states is required then multiple REST calls are required (max. of 7). The proposal is to be able to specify multiple states in a single REST call.
YARN-643. Major bug reported by Jian He and fixed by Xuan Gong
WHY appToken is removed both in BaseFinalTransition and AMUnregisteredTransition AND clientToken is removed in FinalTransition and not BaseFinalTransition

The jira is tracking why appToken and clientToAMToken is removed separately, and why they are distributed in different transitions, ideally there may be a common place where these two tokens can be removed at the same time.
YARN-602. Major bug reported by Xuan Gong and fixed by Kenji Kikushima
NodeManager should mandatorily set some Environment variables into every containers that it launches

NodeManager should mandatorily set some Environment variables into every containers that it launches, such as Environment.user, Environment.pwd. If both users and NodeManager set those variables, the value set by NM should be used
YARN-589. Major improvement reported by Sandy Ryza and fixed by Sandy Ryza (scheduler)
Expose a REST API for monitoring the fair scheduler

The fair scheduler should have an HTTP interface that exposes information such as applications per queue, fair shares, demands, current allocations.
YARN-573. Critical sub-task reported by Omkar Vinit Joshi and fixed by Omkar Vinit Joshi
Shared data structures in Public Localizer and Private Localizer are not Thread safe.

PublicLocalizer 1) pending accessed by addResource (part of event handling) and run method (as a part of PublicLocalizer.run() ). PrivateLocalizer 1) pending accessed by addResource (part of event handling) and findNextResource (i.remove()). Also update method should be fixed. It too is sharing pending list.
YARN-540. Major sub-task reported by Jian He and fixed by Jian He (resourcemanager)
Race condition causing RM to potentially relaunch already unregistered AMs on RM restart

When job succeeds and successfully call finishApplicationMaster, RM shutdown and restart-dispatcher is stopped before it can process REMOVE_APP event. The next time RM comes back, it will reload the existing state files even though the job is succeeded
YARN-502. Major sub-task reported by Lohit Vijayarenu and fixed by Mayank Bansal
RM crash with NPE on NODE_REMOVED event with FairScheduler

While running some test and adding/removing nodes, we see RM crashed with the below exception. We are testing with fair scheduler and running hadoop-2.0.3-alpha {noformat} 2013-03-22 18:54:27,015 INFO org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Deactivating Node YYYY:55680 as it is now LOST 2013-03-22 18:54:27,015 INFO org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: YYYY:55680 Node Transitioned from UNHEALTHY to LOST 2013-03-22 18:54:27,015 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in handling event type NODE_REMOVED to the scheduler java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeNode(FairScheduler.java:619) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:856) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:98) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:375) at java.lang.Thread.run(Thread.java:662) 2013-03-22 18:54:27,016 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye.. 2013-03-22 18:54:27,020 INFO org.mortbay.log: Stopped SelectChannelConnector@XXXX:50030 {noformat}
YARN-337. Major bug reported by Jason Lowe and fixed by Jason Lowe (resourcemanager)
RM handles killed application tracking URL poorly

When the ResourceManager kills an application, it leaves the proxy URL redirecting to the original tracking URL for the application even though the ApplicationMaster is no longer there to service it. It should redirect it somewhere more useful, like the RM's web page for the application, where the user can find that the application was killed and links to the AM logs. In addition, sometimes the AM during teardown from the kill can attempt to unregister and provide an updated tracking URL, but unfortunately the RM has "forgotten" the AM due to the kill and refuses to process the unregistration. Instead it logs: {noformat} 2013-01-09 17:37:49,671 [IPC Server handler 2 on 8030] ERROR org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: AppAttemptId doesnt exist in cache appattempt_1357575694478_28614_000001 {noformat} It should go ahead and process the unregistration to update the tracking URL since the application offered it.
YARN-292. Major sub-task reported by Devaraj K and fixed by Zhijie Shen (resourcemanager)
ResourceManager throws ArrayIndexOutOfBoundsException while handling CONTAINER_ALLOCATED for application attempt

{code:xml} 2012-12-26 08:41:15,030 ERROR org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler: Calling allocate on removed or non existant application appattempt_1356385141279_49525_000001 2012-12-26 08:41:15,031 ERROR org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in handling event type CONTAINER_ALLOCATED for applicationAttempt application_1356385141279_49525 java.lang.ArrayIndexOutOfBoundsException: 0 at java.util.Arrays$ArrayList.get(Arrays.java:3381) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AMContainerAllocatedTransition.transition(RMAppAttemptImpl.java:655) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AMContainerAllocatedTransition.transition(RMAppAttemptImpl.java:644) at org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:357) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:298) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:490) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:80) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:433) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:414) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75) at java.lang.Thread.run(Thread.java:662) {code}
YARN-107. Major bug reported by Devaraj K and fixed by Xuan Gong (resourcemanager)
ClientRMService.forceKillApplication() should handle the non-RUNNING applications properly
MAPREDUCE-5497. Major bug reported by Jian He and fixed by Jian He
'5s sleep' in MRAppMaster.shutDownJob is only needed before stopping ClientService
MAPREDUCE-5493. Blocker bug reported by Jason Lowe and fixed by Jason Lowe (mrv2)
In-memory map outputs can be leaked after shuffle completes
MAPREDUCE-5483. Major bug reported by Alejandro Abdelnur and fixed by Robert Kanter (distcp)
revert MAPREDUCE-5357
MAPREDUCE-5478. Minor improvement reported by Sandy Ryza and fixed by Sandy Ryza (examples)
TeraInputFormat unnecessarily defines its own FileSplit subclass
MAPREDUCE-5476. Blocker bug reported by Jian He and fixed by Jian He
Job can fail when RM restarts after staging dir is cleaned but before MR successfully unregister with RM
MAPREDUCE-5475. Blocker bug reported by Jason Lowe and fixed by Jason Lowe (mr-am , mrv2)
MRClientService does not verify ACLs properly
MAPREDUCE-5470. Major bug reported by Chris Nauroth and fixed by Sandy Ryza
LocalJobRunner does not work on Windows.
MAPREDUCE-5468. Blocker bug reported by Yesha Vora and fixed by Vinod Kumar Vavilapalli
AM recovery does not work for map only jobs
MAPREDUCE-5466. Blocker bug reported by Yesha Vora and fixed by Jian He
Historyserver does not refresh the result of restarted jobs after RM restart
MAPREDUCE-5462. Major sub-task reported by Sandy Ryza and fixed by Sandy Ryza (performance , task)
In map-side sort, swap entire meta entries instead of indexes for better cache performance
MAPREDUCE-5454. Major bug reported by Karthik Kambatla and fixed by Karthik Kambatla (test)
TestDFSIO fails intermittently on JDK7
MAPREDUCE-5446. Major bug reported by Jason Lowe and fixed by Jason Lowe (mrv2 , test)
TestJobHistoryEvents and TestJobHistoryParsing have race conditions
MAPREDUCE-5441. Major bug reported by Rohith Sharma K S and fixed by Jian He (applicationmaster , client)
JobClient exit whenever RM issue Reboot command to 1st attempt App Master.
MAPREDUCE-5440. Major bug reported by Robert Parker and fixed by Robert Parker (mrv2)
TestCopyCommitter Fails on JDK7
MAPREDUCE-5428. Major bug reported by Jason Lowe and fixed by Karthik Kambatla (jobhistoryserver , mrv2)
HistoryFileManager doesn't stop threads when service is stopped
MAPREDUCE-5425. Major bug reported by Ashwin Shankar and fixed by Robert Parker (jobhistoryserver)
Junit in TestJobHistoryServer failing in jdk 7
MAPREDUCE-5414. Major bug reported by Nemon Lou and fixed by Nemon Lou (test)
TestTaskAttempt fails jdk7 with NullPointerException
MAPREDUCE-5385. Blocker bug reported by Omkar Vinit Joshi and fixed by Omkar Vinit Joshi
JobContext cache files api are broken
MAPREDUCE-5379. Major improvement reported by Sandy Ryza and fixed by Karthik Kambatla (job submission , security)
Include token tracking ids in jobconf
MAPREDUCE-5367. Major improvement reported by Sandy Ryza and fixed by Sandy Ryza
Local jobs all use same local working directory
MAPREDUCE-5358. Major bug reported by Devaraj K and fixed by Devaraj K (mr-am)
MRAppMaster throws invalid transitions for JobImpl
MAPREDUCE-5317. Major bug reported by Ravi Prakash and fixed by Ravi Prakash (mrv2)
Stale files left behind for failed jobs
MAPREDUCE-5251. Major bug reported by Jason Lowe and fixed by Ashwin Shankar (mrv2)
Reducer should not implicate map attempt if it has insufficient space to fetch map output
MAPREDUCE-5164. Major bug reported by Nemon Lou and fixed by Nemon Lou
command "mapred job" and "mapred queue" omit HADOOP_CLIENT_OPTS
MAPREDUCE-5020. Major bug reported by Trevor Robinson and fixed by Trevor Robinson (client)
Compile failure with JDK8
MAPREDUCE-5001. Major bug reported by Brock Noland and fixed by Sandy Ryza
LocalJobRunner has race condition resulting in job failures
MAPREDUCE-3193. Major bug reported by Ramgopal N and fixed by Devaraj K (mrv1 , mrv2)
FileInputFormat doesn't read files recursively in the input path dir
MAPREDUCE-1981. Major improvement reported by Hairong Kuang and fixed by Hairong Kuang (job submission)
Improve getSplits performance by using listLocatedStatus
HDFS-5199. Major sub-task reported by Brandon Li and fixed by Brandon Li (nfs)
Add more debug trace for NFS READ and WRITE
HDFS-5192. Minor bug reported by Jing Zhao and fixed by Jing Zhao
NameNode may fail to start when dfs.client.test.drop.namenode.response.number is set
HDFS-5159. Major bug reported by Aaron T. Myers and fixed by Aaron T. Myers (namenode)
Secondary NameNode fails to checkpoint if error occurs downloading edits on first checkpoint
HDFS-5150. Blocker bug reported by Kihwal Lee and fixed by Kihwal Lee
Allow per NN SPN for internal SPNEGO.
HDFS-5140. Blocker bug reported by Arpit Gupta and fixed by Jing Zhao (ha)
Too many safemode monitor threads being created in the standby namenode causing it to fail with out of memory error
HDFS-5136. Major sub-task reported by Brandon Li and fixed by Brandon Li (nfs)
MNT EXPORT should give the full group list which can mount the exports
HDFS-5132. Blocker bug reported by Arpit Gupta and fixed by Kihwal Lee (namenode)
Deadlock in NameNode between SafeModeMonitor#run and DatanodeManager#handleHeartbeat
HDFS-5128. Critical improvement reported by Kihwal Lee and fixed by Kihwal Lee
Allow multiple net interfaces to be used with HA namenode RPC server
HDFS-5124. Blocker bug reported by Deepesh Khandelwal and fixed by Daryn Sharp (namenode)
DelegationTokenSecretManager#retrievePassword can cause deadlock in NameNode
HDFS-5118. Major new feature reported by Jing Zhao and fixed by Jing Zhao
Provide testing support for DFSClient to drop RPC responses

Used for testing when NameNode HA is enabled. Users can use a new configuration property "dfs.client.test.drop.namenode.response.number" to specify the number of responses that DFSClient will drop in each RPC call. This feature can help testing functionalities such as NameNode retry cache.
HDFS-5111. Minor bug reported by Jing Zhao and fixed by Jing Zhao (snapshots)
Remove duplicated error message for snapshot commands when processing invalid arguments
HDFS-5110. Major sub-task reported by Brandon Li and fixed by Brandon Li (nfs)
Change FSDataOutputStream to HdfsDataOutputStream for opened streams to fix type cast error
HDFS-5107. Major sub-task reported by Brandon Li and fixed by Brandon Li (nfs)
Fix array copy error in Readdir and Readdirplus responses
HDFS-5106. Minor bug reported by Chuan Liu and fixed by Chuan Liu (test)
TestDatanodeBlockScanner fails on Windows due to incorrect path format
HDFS-5105. Minor bug reported by Chuan Liu and fixed by Chuan Liu
TestFsck fails on Windows
HDFS-5104. Major sub-task reported by Brandon Li and fixed by Brandon Li (nfs)
Support dotdot name in NFS LOOKUP operation
HDFS-5103. Minor bug reported by Chuan Liu and fixed by Chuan Liu (test)
TestDirectoryScanner fails on Windows
HDFS-5102. Major bug reported by Aaron T. Myers and fixed by Jing Zhao (snapshots)
Snapshot names should not be allowed to contain slash characters
HDFS-5100. Minor bug reported by Chuan Liu and fixed by Chuan Liu (test)
TestNamenodeRetryCache fails on Windows due to incorrect cleanup
HDFS-5099. Major bug reported by Chuan Liu and fixed by Chuan Liu (namenode)
Namenode#copyEditLogSegmentsToSharedDir should close EditLogInputStreams upon finishing
HDFS-5091. Minor bug reported by Jing Zhao and fixed by Jing Zhao
Support for spnego keytab separate from the JournalNode keytab for secure HA
HDFS-5085. Major sub-task reported by Brandon Li and fixed by Jing Zhao (nfs)
Refactor o.a.h.nfs to support different types of authentications
HDFS-5080. Major bug reported by Jing Zhao and fixed by Jing Zhao (ha , qjm)
BootstrapStandby not working with QJM when the existing NN is active
HDFS-5078. Major sub-task reported by Brandon Li and fixed by Brandon Li (nfs)
Support file append in NFSv3 gateway to enable data streaming to HDFS
HDFS-5076. Minor new feature reported by Jing Zhao and fixed by Jing Zhao
Add MXBean methods to query NN's transaction information and JournalNode's journal status
HDFS-5071. Major sub-task reported by Kihwal Lee and fixed by Brandon Li (nfs)
Change hdfs-nfs parent project to hadoop-project
HDFS-5069. Major sub-task reported by Brandon Li and fixed by Brandon Li (nfs)
Include hadoop-nfs and hadoop-hdfs-nfs into hadoop dist for NFS deployment
HDFS-5067. Major sub-task reported by Brandon Li and fixed by Brandon Li (nfs)
Support symlink operations
HDFS-5061. Major improvement reported by Arpit Agarwal and fixed by Arpit Agarwal (namenode)
Make FSNameSystem#auditLoggers an unmodifiable list
HDFS-5055. Blocker bug reported by Allen Wittenauer and fixed by Vinay (namenode)
nn fails to download checkpointed image from snn in some setups
HDFS-5047. Major bug reported by Kihwal Lee and fixed by Robert Parker (namenode)
Supress logging of full stack trace of quota and lease exceptions
HDFS-5045. Minor improvement reported by Jing Zhao and fixed by Jing Zhao
Add more unit tests for retry cache to cover all AtMostOnce methods
HDFS-5043. Major bug reported by Brandon Li and fixed by Brandon Li
For HdfsFileStatus, set default value of childrenNum to -1 instead of 0 to avoid confusing applications
HDFS-5028. Major bug reported by zhaoyunjiong and fixed by zhaoyunjiong
LeaseRenewer throw java.util.ConcurrentModificationException when timeout
HDFS-4993. Major bug reported by Kihwal Lee and fixed by Robert Parker
fsck can fail if a file is renamed or deleted
HDFS-4962. Minor sub-task reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (nfs)
Use enum for nfs constants
HDFS-4947. Major sub-task reported by Brandon Li and fixed by Jing Zhao (nfs)
Add NFS server export table to control export by hostname or IP range
HDFS-4926. Trivial improvement reported by Joseph Lorenzini and fixed by Vivek Ganesan (namenode)
namenode webserver's page has a tooltip that is inconsistent with the datanode HTML link
HDFS-4905. Minor improvement reported by Arpit Agarwal and fixed by Arpit Agarwal (tools)
Add appendToFile command to "hdfs dfs"
HDFS-4898. Minor bug reported by Eric Sirianni and fixed by Tsz Wo (Nicholas), SZE (namenode)
BlockPlacementPolicyWithNodeGroup.chooseRemoteRack() fails to properly fallback to local rack
HDFS-4763. Major sub-task reported by Brandon Li and fixed by Brandon Li (nfs)
Add script changes/utility for starting NFS gateway
HDFS-4680. Major bug reported by Andrew Wang and fixed by Andrew Wang (namenode , security)
Audit logging of delegation tokens for MR tracing
HDFS-4632. Major bug reported by Chris Nauroth and fixed by Chuan Liu (test)
globStatus using backslash for escaping does not work on Windows
HDFS-4594. Minor bug reported by Arpit Gupta and fixed by Chris Nauroth (webhdfs)
WebHDFS open sets Content-Length header to what is specified by length parameter rather than how much data is actually returned.
HDFS-4329. Major bug reported by Andy Isaacson and fixed by Cristina L. Abad (hdfs-client)
DFSShell issues with directories with spaces in name
HDFS-3245. Major improvement reported by Todd Lipcon and fixed by Ravi Prakash (namenode)
Add metrics and web UI for cluster version summary
HDFS-2933. Major improvement reported by Philip Zeyliger and fixed by Vivek Ganesan (datanode)
Improve DataNode Web UI Index Page
HADOOP-9962. Major improvement reported by Roman Shaposhnik and fixed by Roman Shaposhnik (build)
in order to avoid dependency divergence within Hadoop itself lets enable DependencyConvergence
HADOOP-9961. Minor bug reported by Roman Shaposhnik and fixed by Roman Shaposhnik (build)
versions of a few transitive dependencies diverged between hadoop subprojects
HADOOP-9960. Blocker bug reported by Brock Noland and fixed by Karthik Kambatla
Upgrade Jersey version to 1.9
HADOOP-9958. Major bug reported by Andrew Wang and fixed by Andrew Wang
Add old constructor back to DelegationTokenInformation to unbreak downstream builds
HADOOP-9945. Minor improvement reported by Karthik Kambatla and fixed by Karthik Kambatla (ha)
HAServiceState should have a state for stopped services
HADOOP-9944. Blocker bug reported by Arun C Murthy and fixed by Arun C Murthy
RpcRequestHeaderProto defines callId as uint32 while ipc.Client.CONNECTION_CONTEXT_CALL_ID is signed (-3)
HADOOP-9932. Blocker bug reported by Kihwal Lee and fixed by Kihwal Lee
Improper synchronization in RetryCache
HADOOP-9924. Major bug reported by shanyu zhao and fixed by shanyu zhao (fs)
FileUtil.createJarWithClassPath() does not generate relative classpath correctly
HADOOP-9918. Minor improvement reported by Karthik Kambatla and fixed by Karthik Kambatla
Add addIfService() to CompositeService
HADOOP-9916. Minor bug reported by Binglin Chang and fixed by Binglin Chang
Race condition in ipc.Client causes TestIPC timeout
HADOOP-9910. Minor bug reported by André Kelpe and fixed by
proxy server start and stop documentation wrong
HADOOP-9906. Minor bug reported by Karthik Kambatla and fixed by Karthik Kambatla (ha)
Move HAZKUtil to o.a.h.util.ZKUtil and make inner-classes public
HADOOP-9899. Minor bug reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (security)
Remove the debug message added by HADOOP-8855
HADOOP-9886. Minor improvement reported by Arpit Gupta and fixed by Arpit Gupta
Turn warning message in RetryInvocationHandler to debug
HADOOP-9880. Blocker bug reported by Kihwal Lee and fixed by Daryn Sharp
SASL changes from HADOOP-9421 breaks Secure HA NN
HADOOP-9879. Minor improvement reported by Karthik Kambatla and fixed by Karthik Kambatla (build)
Move the version info of zookeeper dependencies to hadoop-project/pom
HADOOP-9868. Blocker bug reported by Daryn Sharp and fixed by Daryn Sharp (ipc)
Server must not advertise kerberos realm
HADOOP-9858. Trivial bug reported by Chris Nauroth and fixed by Chris Nauroth (fs)
Remove unused private RawLocalFileSystem#execCommand method from branch-2.
HADOOP-9857. Major bug reported by Chris Nauroth and fixed by Chris Nauroth (build , test)
Tests block and sometimes timeout on Windows due to invalid entropy source.
HADOOP-9833. Minor improvement reported by Steve Loughran and fixed by Kousuke Saruta (build)
move slf4j to version 1.7.5
HADOOP-9831. Minor improvement reported by Chris Nauroth and fixed by Chris Nauroth (bin)
Make checknative shell command accessible on Windows.
HADOOP-9821. Minor improvement reported by Tsuyoshi OZAWA and fixed by Tsuyoshi OZAWA
ClientId should have getMsb/getLsb methods
HADOOP-9820. Blocker bug reported by Daryn Sharp and fixed by Daryn Sharp (ipc , security)
RPCv9 wire protocol is insufficient to support multiplexing
HADOOP-9806. Major bug reported by Brandon Li and fixed by Brandon Li (nfs)
PortmapInterface should check if the procedure is out-of-range
HADOOP-9803. Minor improvement reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (ipc)
Add generic type parameter to RetryInvocationHandler
HADOOP-9802. Major improvement reported by Chris Nauroth and fixed by Chris Nauroth (io)
Support Snappy codec on Windows.
HADOOP-9801. Major bug reported by Chris Nauroth and fixed by Chris Nauroth (conf)
Configuration#writeXml uses platform defaulting encoding, which may mishandle multi-byte characters.
HADOOP-9789. Critical new feature reported by Daryn Sharp and fixed by Daryn Sharp (ipc , security)
Support server advertised kerberos principals
HADOOP-9774. Major bug reported by shanyu zhao and fixed by shanyu zhao (fs)
RawLocalFileSystem.listStatus() return absolute paths when input path is relative on Windows
HADOOP-9768. Major bug reported by Chris Nauroth and fixed by Chris Nauroth (fs)
chown and chgrp reject users and groups with spaces on platforms where spaces are otherwise acceptable
HADOOP-9757. Major bug reported by Jason Lowe and fixed by Cristina L. Abad (fs)
Har metadata cache can grow without limit
HADOOP-9686. Major improvement reported by Jason Lowe and fixed by Jason Lowe (conf)
Easy access to final parameters in Configuration
HADOOP-9672. Major improvement reported by Sandy Ryza and fixed by Sandy Ryza
Upgrade Avro dependency to 1.7.4
HADOOP-9557. Major bug reported by Lohit Vijayarenu and fixed by Lohit Vijayarenu (build)
hadoop-client excludes commons-httpclient
HADOOP-9446. Major improvement reported by Yu Gao and fixed by Yu Gao (security)
Support Kerberos HTTP SPNEGO authentication for non-SUN JDK
HADOOP-9435. Major bug reported by Tian Hong Wang and fixed by Tian Hong Wang (build)
Support building the JNI code against the IBM JVM
HADOOP-9381. Trivial bug reported by Keegan Witt and fixed by Keegan Witt
Document dfs cp -f option
HADOOP-9315. Major bug reported by Dennis Y and fixed by Chris Nauroth (build)
Port HADOOP-9249 hadoop-maven-plugins Clover fix to branch-2 to fix build failures
HADOOP-8814. Minor improvement reported by Brandon Li and fixed by Brandon Li (conf , fs , fs/s3 , ha , io , metrics , performance , record , security , util)
Inefficient comparison with the empty string. Use isEmpty() instead

Hadoop 2.1.0-beta Release Notes

These release notes include new developer and user-facing incompatibilities, features, and major improvements.

Changes since Hadoop 2.0.5-alpha

YARN-968. Blocker bug reported by Kihwal Lee and fixed by Vinod Kumar Vavilapalli
RM admin commands don't work

If an RM admin command is issued using CLI, I get something like following: 13/07/24 17:19:40 INFO client.RMProxy: Connecting to ResourceManager at xxxx.com/1.2.3.4:1234 refreshQueues: Unknown protocol: org.apache.hadoop.yarn.api.ResourceManagerAdministrationProtocolPB
YARN-961. Blocker bug reported by Omkar Vinit Joshi and fixed by Omkar Vinit Joshi
ContainerManagerImpl should enforce token on server. Today it is [TOKEN, SIMPLE]

We should only accept SecurityAuthMethod.TOKEN for ContainerManagementProtocol. Today it also accepts SIMPLE for unsecured environment.
YARN-960. Blocker bug reported by Alejandro Abdelnur and fixed by Daryn Sharp
TestMRCredentials and TestBinaryTokenFile are failing on trunk

Not sure, but this may be a fallout from YARN-701 and/or related to YARN-945. Making it a blocker until full impact of the issue is scoped.
YARN-945. Blocker bug reported by Bikas Saha and fixed by Vinod Kumar Vavilapalli
AM register failing after AMRMToken

509 2013-07-19 15:53:55,569 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 54313: readAndProcess from client 127.0.0.1 threw exception [org.apache.hadoop.security.AccessControlException: SIMPLE authentication is not enabled. Available:[TOKEN]] 510 org.apache.hadoop.security.AccessControlException: SIMPLE authentication is not enabled. Available:[TOKEN] 511 at org.apache.hadoop.ipc.Server$Connection.initializeAuthContext(Server.java:1531) 512 at org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:1482) 513 at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:788) 514 at org.apache.hadoop.ipc.Server$Listener$Reader.doRunLoop(Server.java:587) 515 at org.apache.hadoop.ipc.Server$Listener$Reader.run(Server.java:562)
YARN-937. Blocker bug reported by Arun C Murthy and fixed by Alejandro Abdelnur
Fix unmanaged AM in non-secure/secure setup post YARN-701

Fix unmanaged AM in non-secure/secure setup post YARN-701 since app-tokens will be used in both scenarios.
YARN-932. Major bug reported by Sandy Ryza and fixed by Karthik Kambatla
TestResourceLocalizationService.testLocalizationInit can fail on JDK7

It looks like this is occurring when testLocalizationInit doesn't run first. Somehow yarn.nodemanager.log-dirs is getting set by one of the other tests (to ${yarn.log.dir}/userlogs), but yarn.log.dir isn't being set.
YARN-927. Major task reported by Bikas Saha and fixed by Bikas Saha
Change ContainerRequest to not have more than 1 container count and remove StoreContainerRequest

The downside is having to use more than 1 container request when requesting more than 1 container at * priority. For most other use cases that have specific locations we anyways need to make multiple container requests. This will also remove unnecessary duplication caused by StoredContainerRequest. It will make the getMatchingRequest() always available and easy to use removeContainerRequest().
YARN-926. Blocker bug reported by Vinod Kumar Vavilapalli and fixed by Jian He
ContainerManagerProtcol APIs should take in requests for multiple containers

AMs typically have to launch multiple containers on a node and the current single container APIs aren't helping. We should have all the APIs take in multiple requests and return multiple responses. The client libraries could expose both the single and multi-container requests.
YARN-922. Major sub-task reported by Jian He and fixed by Jian He (resourcemanager)
Change FileSystemRMStateStore to use directories

Store each app and its attempts in the same directory so that removing application state is only one operation
YARN-919. Minor bug reported by Mayank Bansal and fixed by Mayank Bansal
Document setting default heap sizes in yarn env

Right now there are no defaults in yarn env scripts for resource manager nad node manager and if user wants to override that, then user has to go to documentation and find the variables and change the script. There is no straight forward way to change it in script. Just updating the variables with defaults.
YARN-918. Blocker bug reported by Vinod Kumar Vavilapalli and fixed by Vinod Kumar Vavilapalli
ApplicationMasterProtocol doesn't need ApplicationAttemptId in the payload after YARN-701

Once we use AMRMToken irrespective of kerberos after YARN-701, we don't need ApplicationAttemptId in the RPC pay load. This is an API change, so doing it as a blocker for 2.1.0-beta.
YARN-912. Major bug reported by Bikas Saha and fixed by Mayank Bansal
Create exceptions package in common/api for yarn and move client facing exceptions to them

Exceptions like InvalidResourceBlacklistRequestException, InvalidResourceRequestException, InvalidApplicationMasterRequestException etc are currently inside ResourceManager and not visible to clients.
YARN-909. Minor bug reported by Chuan Liu and fixed by Chuan Liu (nodemanager)
Disable TestLinuxContainerExecutorWithMocks on Windows

This unit test tests a Linux specific feature. We should skip this unit test on Windows. A similar unit test 'TestLinuxContainerExecutor' was already skipped when running on Windows.
YARN-897. Blocker bug reported by Djellel Eddine Difallah and fixed by Djellel Eddine Difallah (capacityscheduler)
CapacityScheduler wrongly sorted queues

The childQueues of a ParentQueue are stored in a TreeSet where UsedCapacity defines the sort order. This ensures the queue with least UsedCapacity to receive resources next. On containerAssignment we correctly update the order, but we miss to do so on container completions. This corrupts the TreeSet structure, and under-capacity queues might starve for resources.
YARN-894. Minor bug reported by Chuan Liu and fixed by Chuan Liu (nodemanager)
NodeHealthScriptRunner timeout checking is inaccurate on Windows

In {{NodeHealthScriptRunner}} method, we will set HealthChecker status based on the Shell execution results. Some status are based on the exception thrown during the Shell script execution. Currently, we will catch a non-ExitCodeException from ShellCommandExecutor, and if Shell has the timeout status set at the same time, we will also set HealthChecker status to timeout. We have following execution sequence in Shell: 1) In main thread, schedule a delayed timer task that will kill the original process upon timeout. 2) In main thread, open a buffered reader and feed in the process's standard input stream. 3) When timeout happens, the timer task will call {{Process#destroy()}} to kill the main process. On Linux, when timeout happened and process killed, the buffered reader will thrown an IOException with message: "Stream closed" in main thread. On Windows, we don't have the IOException. Only "-1" was returned from the reader that indicates the buffer is finished. As a result, the timeout status is not set on Windows, and {{TestNodeHealthService}} fails on Windows because of this.
YARN-883. Major improvement reported by Sandy Ryza and fixed by Sandy Ryza (scheduler)
Expose Fair Scheduler-specific queue metrics

When the Fair Scheduler is enabled, QueueMetrics should include fair share, minimum share, and maximum share.
YARN-877. Major sub-task reported by Junping Du and fixed by Junping Du (scheduler)
Allow for black-listing resources in FifoScheduler

YARN-750 already addressed black-list staff in YARN API and CS scheduler, this jira add implementation for FifoScheduler.
YARN-875. Major bug reported by Bikas Saha and fixed by Xuan Gong
Application can hang if AMRMClientAsync callback thread has exception

Currently that thread will die and then never callback. App can hang. Possible solution could be to catch Throwable in the callback and then call client.onError().
YARN-874. Blocker bug reported by Vinod Kumar Vavilapalli and fixed by Vinod Kumar Vavilapalli
Tracking YARN/MR test failures after HADOOP-9421 and YARN-827

HADOOP-9421 and YARN-827 broke some YARN/MR tests. Tracking those..
YARN-873. Major sub-task reported by Bikas Saha and fixed by Xuan Gong
YARNClient.getApplicationReport(unknownAppId) returns a null report

How can the client find out that app does not exist?
YARN-869. Blocker bug reported by Vinod Kumar Vavilapalli and fixed by Vinod Kumar Vavilapalli
ResourceManagerAdministrationProtocol should neither be public(yet) nor in yarn.api

This is a admin only api that we don't know yet if people can or should write new tools against. I am going to move it to yarn.server.api and make it @Private..
YARN-866. Major test reported by Wei Yan and fixed by Wei Yan
Add test for class ResourceWeights

Add test case for the class ResourceWeights
YARN-865. Major improvement reported by Xuan Gong and fixed by Xuan Gong
RM webservices can't query based on application Types

The resource manager web service api to get the list of apps doesn't have a query parameter for appTypes.
YARN-861. Critical bug reported by Devaraj K and fixed by Vinod Kumar Vavilapalli (nodemanager)
TestContainerManager is failing

https://builds.apache.org/job/Hadoop-Yarn-trunk/246/ {code:xml} Running org.apache.hadoop.yarn.server.nodemanager.containermanager.TestContainerManager Tests run: 7, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 19.249 sec <<< FAILURE! testContainerManagerInitialization(org.apache.hadoop.yarn.server.nodemanager.containermanager.TestContainerManager) Time elapsed: 286 sec <<< FAILURE! junit.framework.ComparisonFailure: expected:<[asf009.sp2.ygridcore.ne]t> but was:<[localhos]t> at junit.framework.Assert.assertEquals(Assert.java:85) {code}
YARN-854. Blocker bug reported by Ramya Sunil and fixed by Omkar Vinit Joshi
App submission fails on secure deploy

App submission on secure cluster fails with the following exception: {noformat} INFO mapreduce.Job: Job jobID failed with state FAILED due to: Application applicationID failed 2 times due to AM Container for appattemptID exited with exitCode: -1000 due to: App initialization failed (255) with output: main : command provided 0 main : user is qa_user javax.security.sasl.SaslException: DIGEST-MD5: digest response format violation. Mismatched response. [Caused by org.apache.hadoop.ipc.RemoteException(javax.security.sasl.SaslException): DIGEST-MD5: digest response format violation. Mismatched response.] at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53) at org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:104) at org.apache.hadoop.yarn.server.nodemanager.api.impl.pb.client.LocalizationProtocolPBClientImpl.heartbeat(LocalizationProtocolPBClientImpl.java:65) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.localizeFiles(ContainerLocalizer.java:235) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.runLocalization(ContainerLocalizer.java:169) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.main(ContainerLocalizer.java:348) Caused by: org.apache.hadoop.ipc.RemoteException(javax.security.sasl.SaslException): DIGEST-MD5: digest response format violation. Mismatched response. at org.apache.hadoop.ipc.Client.call(Client.java:1298) at org.apache.hadoop.ipc.Client.call(Client.java:1250) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:204) at $Proxy7.heartbeat(Unknown Source) at org.apache.hadoop.yarn.server.nodemanager.api.impl.pb.client.LocalizationProtocolPBClientImpl.heartbeat(LocalizationProtocolPBClientImpl.java:62) ... 3 more .Failing this attempt.. Failing the application. {noformat}
YARN-853. Major bug reported by Devaraj K and fixed by Devaraj K (capacityscheduler)
maximum-am-resource-percent doesn't work after refreshQueues command

If we update yarn.scheduler.capacity.maximum-am-resource-percent / yarn.scheduler.capacity.<queue-path>.maximum-am-resource-percent configuration and then do the refreshNodes, it uses the new config value to calculate Max Active Applications and Max Active Application Per User. If we add new node after issuing 'rmadmin -refreshQueues' command, it uses the old maximum-am-resource-percent config value to calculate Max Active Applications and Max Active Application Per User.
YARN-852. Minor bug reported by Chuan Liu and fixed by Chuan Liu
TestAggregatedLogFormat.testContainerLogsFileAccess fails on Windows

The YARN unit test case fails on Windows when comparing expected message with log message in the file. The expected message constructed in the test case has two problems: 1) it uses Path.separator to concatenate path string. Path.separator is always a forward slash, which does not match the backslash used in the log message. 2) On Windows, the default file owner is Administrators group if the file is created by an Administrators user. The test expect the user to be the current user.
YARN-851. Major bug reported by Omkar Vinit Joshi and fixed by Omkar Vinit Joshi
Share NMTokens using NMTokenCache (api-based) instead of memory based approach which is used currently.

It is a follow up ticket for YARN-694. Changing the way NMTokens are shared.
YARN-850. Major sub-task reported by Jian He and fixed by Jian He
Rename getClusterAvailableResources to getAvailableResources in AMRMClients
YARN-848. Major bug reported by Hitesh Shah and fixed by Hitesh Shah
Nodemanager does not register with RM using the fully qualified hostname

If the hostname is misconfigured to not be fully qualified ( i.e. hostname returns foo and hostname -f returns foo.bar.xyz ), the NM ends up registering with the RM using only "foo". This can create problems if DNS cannot resolve the hostname properly. Furthermore, HDFS uses fully qualified hostnames which can end up affecting locality matches when allocating containers based on block locations.
YARN-846. Major sub-task reported by Jian He and fixed by Jian He
Move pb Impl from yarn-api to yarn-common
YARN-845. Major sub-task reported by Arpit Gupta and fixed by Mayank Bansal (resourcemanager)
RM crash with NPE on NODE_UPDATE

the following stack trace is generated in rm {code} n, service: 68.142.246.147:45454 }, ] resource=<memory:1536, vCores:1> queue=default: capacity=1.0, absoluteCapacity=1.0, usedResources=<memory:44544, vCores:29>usedCapacity=0.90625, absoluteUsedCapacity=0.90625, numApps=1, numContainers=29 usedCapacity=0.90625 absoluteUsedCapacity=0.90625 used=<memory:44544, vCores:29> cluster=<memory:49152, vCores:48> 2013-06-17 12:43:53,655 INFO capacity.ParentQueue (ParentQueue.java:completedContainer(696)) - completedContainer queue=root usedCapacity=0.90625 absoluteUsedCapacity=0.90625 used=<memory:44544, vCores:29> cluster=<memory:49152, vCores:48> 2013-06-17 12:43:53,656 INFO capacity.CapacityScheduler (CapacityScheduler.java:completedContainer(832)) - Application appattempt_1371448527090_0844_000001 released container container_1371448527090_0844_01_000005 on node: host: hostXX:45454 #containers=4 available=2048 used=6144 with event: FINISHED 2013-06-17 12:43:53,656 INFO capacity.CapacityScheduler (CapacityScheduler.java:nodeUpdate(661)) - Trying to fulfill reservation for application application_1371448527090_0844 on node: hostXX:45454 2013-06-17 12:43:53,656 INFO fica.FiCaSchedulerApp (FiCaSchedulerApp.java:unreserve(435)) - Application application_1371448527090_0844 unreserved on node host: hostXX:45454 #containers=4 available=2048 used=6144, currently has 4 at priority 20; currentReservation <memory:6144, vCores:4> 2013-06-17 12:43:53,656 INFO scheduler.AppSchedulingInfo (AppSchedulingInfo.java:updateResourceRequests(168)) - checking for deactivate... 2013-06-17 12:43:53,657 FATAL resourcemanager.ResourceManager (ResourceManager.java:run(422)) - Error in handling event type NODE_UPDATE to the scheduler java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.unreserve(FiCaSchedulerApp.java:432) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.unreserve(LeafQueue.java:1416) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainer(LeafQueue.java:1346) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignOffSwitchContainers(LeafQueue.java:1221) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainersOnNode(LeafQueue.java:1180) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignReservedContainer(LeafQueue.java:939) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:803) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.nodeUpdate(CapacityScheduler.java:665) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:727) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:83) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:413) at java.lang.Thread.run(Thread.java:662) 2013-06-17 12:43:53,659 INFO resourcemanager.ResourceManager (ResourceManager.java:run(426)) - Exiting, bbye.. 2013-06-17 12:43:53,665 INFO mortbay.log (Slf4jLog.java:info(67)) - Stopped SelectChannelConnector@hostXX:8088 2013-06-17 12:43:53,765 ERROR delegation.AbstractDelegationTokenSecretManager (AbstractDelegationTokenSecretManager.java:run(513)) - InterruptedExcpetion recieved for ExpiredTokenRemover thread java.lang.InterruptedException: sleep interrupted 2013-06-17 12:43:53,766 INFO impl.MetricsSystemImpl (MetricsSystemImpl.java:stop(200)) - Stopping ResourceManager metrics system... 2013-06-17 12:43:53,767 INFO impl.MetricsSystemImpl (MetricsSystemImpl.java:stop(206)) - ResourceManager metrics system stopped. 2013-06-17 12:43:53,767 INFO impl.MetricsSystemImpl (MetricsSystemImpl.java:shutdown(572)) - ResourceManager metrics system shutdown complete. 2013-06-17 12:43:53,768 WARN amlauncher.ApplicationMasterLauncher (ApplicationMasterLauncher.java:run(98)) - org.apache.hadoop.yarn.server.resourcemanager.amlauncher.ApplicationMasterLauncher$LauncherThread interrupted. Returning. 2013-06-17 12:43:53,768 INFO ipc.Server (Server.java:stop(2167)) - Stopping server on 8033 2013-06-17 12:43:53,770 INFO ipc.Server (Server.java:run(686)) - Stopping IPC Server listener on 8033 2013-06-17 12:43:53,770 INFO ipc.Server (Server.java:stop(2167)) - Stopping server on 8032 2013-06-17 12:43:53,770 INFO ipc.Server (Server.java:run(828)) - Stopping IPC Server Responder 2013-06-17 12:43:53,771 INFO ipc.Server (Server.java:run(686)) - Stopping IPC Server listener on 8032 2013-06-17 12:43:53,771 INFO ipc.Server (Server.java:run(828)) - Stopping IPC Server Responder 2013-06-17 12:43:53,771 INFO ipc.Server (Server.java:stop(2167)) - Stopping server on 8030 2013-06-17 12:43:53,773 INFO ipc.Server (Server.java:run(686)) - Stopping IPC Server listener on 8030 2013-06-17 12:43:53,773 INFO ipc.Server (Server.java:stop(2167)) - Stopping server on 8031 2013-06-17 12:43:53,773 INFO ipc.Server (Server.java:run(828)) - Stopping IPC Server Responder 2013-06-17 12:43:53,774 INFO ipc.Server (Server.java:run(686)) - Stopping IPC Server listener on 8031 2013-06-17 12:43:53,775 INFO ipc.Server (Server.java:run(828)) - Stopping IPC Server Responder {code}
YARN-841. Major sub-task reported by Siddharth Seth and fixed by Vinod Kumar Vavilapalli
Annotate and document AuxService APIs

For users writing their own AuxServices, these APIs should be annotated and need better documentation. Also, the classes may need to move out of the NodeManager.
YARN-840. Major sub-task reported by Jian He and fixed by Jian He
Move ProtoUtils to yarn.api.records.pb.impl
YARN-839. Minor bug reported by Chuan Liu and fixed by Chuan Liu
TestContainerLaunch.testContainerEnvVariables fails on Windows

The unit test case fails on Windows due to job id or container id was not printed out as part of the container script. Later, the test tries to read the pid from output of the file, and fails. Exception in trunk: {noformat} Running org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch Tests run: 3, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 9.903 sec <<< FAILURE! testContainerEnvVariables(org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch) Time elapsed: 1307 sec <<< ERROR! java.lang.NullPointerException at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch.testContainerEnvVariables(TestContainerLaunch.java:278) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) at org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:62) {noformat}
YARN-837. Major sub-task reported by Zhijie Shen and fixed by Zhijie Shen
ClusterInfo.java doesn't seem to belong to org.apache.hadoop.yarn
YARN-834. Blocker sub-task reported by Arun C Murthy and fixed by Zhijie Shen
Review/fix annotations for yarn-client module and clearly differentiate *Async apis

Review/fix annotations for yarn-client module
YARN-833. Major bug reported by Zhijie Shen and fixed by Zhijie Shen
Move Graph and VisualizeStateMachine into yarn.state package

Graph and VisualizeStateMachine are only used by state machine, they should belong to state package.
YARN-831. Blocker sub-task reported by Jian He and fixed by Jian He
Remove resource min from GetNewApplicationResponse
YARN-829. Major bug reported by Zhijie Shen and fixed by Zhijie Shen
Rename RMTokenSelector to be RMDelegationTokenSelector

Therefore, the name of it will be consistent with that of RMDelegationTokenIdentifier.
YARN-828. Major bug reported by Zhijie Shen and fixed by Zhijie Shen
Remove YarnVersionAnnotation

YarnVersionAnnotation is not used at all, and the version information can be accessed through YarnVersionInfo instead.
YARN-827. Critical sub-task reported by Bikas Saha and fixed by Jian He
Need to make Resource arithmetic methods accessible

org.apache.hadoop.yarn.server.resourcemanager.resource has stuff like Resources and Calculators that help compare/add resources etc. Without these users will be forced to replicate the logic, potentially incorrectly.
YARN-826. Major sub-task reported by Zhijie Shen and fixed by Zhijie Shen
Move Clock/SystemClock to util package

Clock/SystemClock should belong to util.
YARN-825. Blocker sub-task reported by Vinod Kumar Vavilapalli and fixed by Vinod Kumar Vavilapalli
Fix yarn-common javadoc annotations
YARN-824. Major sub-task reported by Jian He and fixed by Jian He
Add static factory to yarn client lib interface and change it to abstract class

Do this for AMRMClient, NMClient, YarnClient. and annotate its impl as private. The purpose is not to expose impl
YARN-823. Major sub-task reported by Jian He and fixed by Jian He
Move RMAdmin from yarn.client to yarn.client.cli and rename as RMAdminCLI
YARN-822. Major sub-task reported by Omkar Vinit Joshi and fixed by Omkar Vinit Joshi
Rename ApplicationToken to AMRMToken

API change. At present this token is getting used on scheduler api AMRMProtocol. Right now name wise it is little confusing as it might be useful for the application to talk to complete yarn system (RM/NM) but that is not the case after YARN-694. NM will have specific NMToken so it is better to name it as AMRMToken.
YARN-821. Major sub-task reported by Jian He and fixed by Jian He
Rename FinishApplicationMasterRequest.setFinishApplicationStatus to setFinalApplicationStatus to be consistent with getter
YARN-820. Major sub-task reported by Bikas Saha and fixed by Mayank Bansal
NodeManager has invalid state transition after error in resource localization
YARN-814. Major sub-task reported by Hitesh Shah and fixed by Jian He
Difficult to diagnose a failed container launch when error due to invalid environment variable

The container's launch script sets up environment variables, symlinks etc. If there is any failure when setting up the basic context ( before the actual user's process is launched ), nothing is captured by the NM. This makes it impossible to diagnose the reason for the failure. To reproduce, set an env var where the value contains characters that throw syntax errors in bash.
YARN-812. Major bug reported by Ramya Sunil and fixed by Siddharth Seth
Enabling app summary logs causes 'FileNotFound' errors

RM app summary logs have been enabled as per the default config: {noformat} # # Yarn ResourceManager Application Summary Log # # Set the ResourceManager summary log filename yarn.server.resourcemanager.appsummary.log.file=rm-appsummary.log # Set the ResourceManager summary log level and appender yarn.server.resourcemanager.appsummary.logger=INFO,RMSUMMARY # Appender for ResourceManager Application Summary Log # Requires the following properties to be set # - hadoop.log.dir (Hadoop Log directory) # - yarn.server.resourcemanager.appsummary.log.file (resource manager app summary log filename) # - yarn.server.resourcemanager.appsummary.logger (resource manager app summary log level and appender) log4j.logger.org.apache.hadoop.yarn.server.resourcemanager.RMAppManager$ApplicationSummary=${yarn.server.resourcemanager.appsummary.logger} log4j.additivity.org.apache.hadoop.yarn.server.resourcemanager.RMAppManager$ApplicationSummary=false log4j.appender.RMSUMMARY=org.apache.log4j.RollingFileAppender log4j.appender.RMSUMMARY.File=${hadoop.log.dir}/${yarn.server.resourcemanager.appsummary.log.file} log4j.appender.RMSUMMARY.MaxFileSize=256MB log4j.appender.RMSUMMARY.MaxBackupIndex=20 log4j.appender.RMSUMMARY.layout=org.apache.log4j.PatternLayout log4j.appender.RMSUMMARY.layout.ConversionPattern=%d{ISO8601} %p %c{2}: %m%n {noformat} This however, throws errors while running commands as non-superuser: {noformat} -bash-4.1$ hadoop dfs -ls / DEPRECATED: Use of this script to execute hdfs command is deprecated. Instead use the hdfs command for it. log4j:ERROR setFile(null,true) call failed. java.io.FileNotFoundException: /var/log/hadoop/hadoopqa/rm-appsummary.log (No such file or directory) at java.io.FileOutputStream.openAppend(Native Method) at java.io.FileOutputStream.<init>(FileOutputStream.java:192) at java.io.FileOutputStream.<init>(FileOutputStream.java:116) at org.apache.log4j.FileAppender.setFile(FileAppender.java:294) at org.apache.log4j.RollingFileAppender.setFile(RollingFileAppender.java:207) at org.apache.log4j.FileAppender.activateOptions(FileAppender.java:165) at org.apache.log4j.config.PropertySetter.activate(PropertySetter.java:307) at org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:172) at org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:104) at org.apache.log4j.PropertyConfigurator.parseAppender(PropertyConfigurator.java:842) at org.apache.log4j.PropertyConfigurator.parseCategory(PropertyConfigurator.java:768) at org.apache.log4j.PropertyConfigurator.parseCatsAndRenderers(PropertyConfigurator.java:672) at org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:516) at org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:580) at org.apache.log4j.helpers.OptionConverter.selectAndConfigure(OptionConverter.java:526) at org.apache.log4j.LogManager.<clinit>(LogManager.java:127) at org.apache.log4j.Logger.getLogger(Logger.java:104) at org.apache.commons.logging.impl.Log4JLogger.getLogger(Log4JLogger.java:289) at org.apache.commons.logging.impl.Log4JLogger.<init>(Log4JLogger.java:109) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at org.apache.commons.logging.impl.LogFactoryImpl.createLogFromClass(LogFactoryImpl.java:1116) at org.apache.commons.logging.impl.LogFactoryImpl.discoverLogImplementation(LogFactoryImpl.java:858) at org.apache.commons.logging.impl.LogFactoryImpl.newInstance(LogFactoryImpl.java:604) at org.apache.commons.logging.impl.LogFactoryImpl.getInstance(LogFactoryImpl.java:336) at org.apache.commons.logging.impl.LogFactoryImpl.getInstance(LogFactoryImpl.java:310) at org.apache.commons.logging.LogFactory.getLog(LogFactory.java:685) at org.apache.hadoop.fs.FsShell.<clinit>(FsShell.java:41) Found 1 items drwxr-xr-x - hadoop hadoop 0 2013-06-12 21:28 /user {noformat}
YARN-806. Major sub-task reported by Jian He and fixed by Jian He
Move ContainerExitStatus from yarn.api to yarn.api.records
YARN-805. Blocker sub-task reported by Jian He and fixed by Jian He
Fix yarn-api javadoc annotations
YARN-803. Major improvement reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur (resourcemanager , scheduler)
factor out scheduler config validation from the ResourceManager to each scheduler implementation

Per discussion in YARN-789 we should factor out from the ResourceManager class the scheduler config validations.
YARN-799. Major bug reported by Chris Riccomini and fixed by Chris Riccomini (nodemanager)
CgroupsLCEResourcesHandler tries to write to cgroup.procs

The implementation of bq. ./hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/util/CgroupsLCEResourcesHandler.java Tells the container-executor to write PIDs to cgroup.procs: {code} public String getResourcesOption(ContainerId containerId) { String containerName = containerId.toString(); StringBuilder sb = new StringBuilder("cgroups="); if (isCpuWeightEnabled()) { sb.append(pathForCgroup(CONTROLLER_CPU, containerName) + "/cgroup.procs"); sb.append(","); } if (sb.charAt(sb.length() - 1) == ',') { sb.deleteCharAt(sb.length() - 1); } return sb.toString(); } {code} Apparently, this file has not always been writeable: https://patchwork.kernel.org/patch/116146/ http://lkml.indiana.edu/hypermail/linux/kernel/1004.1/00536.html https://lists.linux-foundation.org/pipermail/containers/2009-July/019679.html The RHEL version of the Linux kernel that I'm using has a CGroup module that has a non-writeable cgroup.procs file. {quote} $ uname -a Linux criccomi-ld 2.6.32-131.4.1.el6.x86_64 #1 SMP Fri Jun 10 10:54:26 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux {quote} As a result, when the container-executor tries to run, it fails with this error message: bq. fprintf(LOGFILE, "Failed to write pid %s (%d) to file %s - %s\n", This is because the executor is given a resource by the CgroupsLCEResourcesHandler that includes cgroup.procs, which is non-writeable: {quote} $ pwd /cgroup/cpu/hadoop-yarn/container_1370986842149_0001_01_000001 $ ls -l total 0 -r--r--r-- 1 criccomi eng 0 Jun 11 14:43 cgroup.procs -rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 cpu.rt_period_us -rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 cpu.rt_runtime_us -rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 cpu.shares -rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 notify_on_release -rw-r--r-- 1 criccomi eng 0 Jun 11 14:43 tasks {quote} I patched CgroupsLCEResourcesHandler to use /tasks instead of /cgroup.procs, and this appears to have fixed the problem. I can think of several potential resolutions to this ticket: 1. Ignore the problem, and make people patch YARN when they hit this issue. 2. Write to /tasks instead of /cgroup.procs for everyone 3. Check permissioning on /cgroup.procs prior to writing to it, and fall back to /tasks. 4. Add a config to yarn-site that lets admins specify which file to write to. Thoughts?
YARN-795. Major bug reported by Wei Yan and fixed by Wei Yan (scheduler)
Fair scheduler queue metrics should subtract allocated vCores from available vCores

The queue metrics of fair scheduler doesn't subtract allocated vCores from available vCores, causing the available vCores returned is incorrect. This is happening because {code}QueueMetrics.getAllocateResources(){code} doesn't return the allocated vCores.
YARN-792. Major sub-task reported by Jian He and fixed by Jian He
Move NodeHealthStatus from yarn.api.record to yarn.server.api.record
YARN-791. Blocker sub-task reported by Sandy Ryza and fixed by Sandy Ryza (api , resourcemanager)
Ensure that RM RPC APIs that return nodes are consistent with /nodes REST API
YARN-789. Major improvement reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur (scheduler)
Enable zero capabilities resource requests in fair scheduler

Per discussion in YARN-689, reposting updated use case: 1. I have a set of services co-existing with a Yarn cluster. 2. These services run out of band from Yarn. They are not started as yarn containers and they don't use Yarn containers for processing. 3. These services use, dynamically, different amounts of CPU and memory based on their load. They manage their CPU and memory requirements independently. In other words, depending on their load, they may require more CPU but not memory or vice-versa. By using YARN as RM for these services I'm able share and utilize the resources of the cluster appropriately and in a dynamic way. Yarn keeps tab of all the resources. These services run an AM that reserves resources on their behalf. When this AM gets the requested resources, the services bump up their CPU/memory utilization out of band from Yarn. If the Yarn allocations are released/preempted, the services back off on their resources utilization. By doing this, Yarn and these service correctly share the cluster resources, being Yarn RM the only one that does the overall resource bookkeeping. The services AM, not to break the lifecycle of containers, start containers in the corresponding NMs. These container processes do basically a sleep forever (i.e. sleep 10000d). They are almost not using any CPU nor memory (less than 1MB). Thus it is reasonable to assume their required CPU and memory utilization is NIL (more on hard enforcement later). Because of this almost NIL utilization of CPU and memory, it is possible to specify, when doing a request, zero as one of the dimensions (CPU or memory). The current limitation is that the increment is also the minimum. If we set the memory increment to 1MB. When doing a pure CPU request, we would have to specify 1MB of memory. That would work. However it would allow discretionary memory requests without a desired normalization (increments of 256, 512, etc). If we set the CPU increment to 1CPU. When doing a pure memory request, we would have to specify 1CPU. CPU amounts a much smaller than memory amounts, and because we don't have fractional CPUs, it would mean that all my pure memory requests will be wasting 1 CPU thus reducing the overall utilization of the cluster. Finally, on hard enforcement. * For CPU. Hard enforcement can be done via a cgroup cpu controller. Using an absolute minimum of a few CPU shares (ie 10) in the LinuxContainerExecutor we ensure there is enough CPU cycles to run the sleep process. This absolute minimum would only kick-in if zero is allowed, otherwise will never kick in as the shares for 1 CPU are 1024. * For Memory. Hard enforcement is currently done by the ProcfsBasedProcessTree.java, using a minimum absolute of 1 or 2 MBs would take care of zero memory resources. And again, this absolute minimum would only kick-in if zero is allowed, otherwise will never kick in as the increment memory is in several MBs if not 1GB.
YARN-787. Blocker sub-task reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur (api)
Remove resource min from Yarn client API

Per discussions in YARN-689 and YARN-769 we should remove minimum from the API as this is a scheduler internal thing.
YARN-782. Critical improvement reported by Sandy Ryza and fixed by Sandy Ryza (nodemanager)
vcores-pcores ratio functions differently from vmem-pmem ratio in misleading way

The vcores-pcores ratio functions differently from the vmem-pmem ratio in the sense that the vcores-pcores ratio has an impact on allocations and the vmem-pmem ratio does not. If I double my vmem-pmem ratio, the only change that occurs is that my containers, after being scheduled, are less likely to be killed for using too much virtual memory. But if I double my vcore-pcore ratio, my nodes will appear to the ResourceManager to contain double the amount of CPU space, which will affect scheduling decisions. The lack of consistency will exacerbate the already difficult problem of resource configuration.
YARN-781. Major sub-task reported by Devaraj Das and fixed by Jian He
Expose LOGDIR that containers should use for logging

The LOGDIR is known. We should expose this to the container's environment.
YARN-777. Major sub-task reported by Jian He and fixed by Jian He
Remove unreferenced objects from proto
YARN-773. Major sub-task reported by Jian He and fixed by Jian He
Move YarnRuntimeException from package api.yarn to api.yarn.exceptions
YARN-767. Major bug reported by Jian He and fixed by Jian He
Initialize Application status metrics when QueueMetrics is initialized

Applications: ResourceManager.QueueMetrics.AppsSubmitted, ResourceManager.QueueMetrics.AppsRunning, ResourceManager.QueueMetrics.AppsPending, ResourceManager.QueueMetrics.AppsCompleted, ResourceManager.QueueMetrics.AppsKilled, ResourceManager.QueueMetrics.AppsFailed For now these metrics are created only when they are needed, we want to make them be seen when QueueMetrics is initialized
YARN-764. Major bug reported by nemon lou and fixed by nemon lou (resourcemanager)
blank Used Resources on Capacity Scheduler page

Even when there are jobs running,used resources is empty on Capacity Scheduler page for leaf queue.(I use google-chrome on windows 7.) After changing resource.java's toString method by replacing "<>" with "{}",this bug gets fixed.
YARN-763. Major bug reported by Bikas Saha and fixed by Xuan Gong
AMRMClientAsync should stop heartbeating after receiving shutdown from RM
YARN-761. Major bug reported by Vinod Kumar Vavilapalli and fixed by Zhijie Shen
TestNMClientAsync fails sometimes

See https://builds.apache.org/job/PreCommit-YARN-Build/1101//testReport/. It passed on my machine though.
YARN-760. Major bug reported by Sandy Ryza and fixed by Niranjan Singh (nodemanager)
NodeManager throws AvroRuntimeException on failed start

NodeManager wraps exceptions that occur in its start method in AvroRuntimeExceptions, even though it doesn't use Avro anywhere else.
YARN-759. Major sub-task reported by Bikas Saha and fixed by Bikas Saha
Create Command enum in AllocateResponse

Use command enums for shutdown/resync instead of booleans.
YARN-757. Blocker bug reported by Bikas Saha and fixed by Bikas Saha
TestRMRestart failing/stuck on trunk
YARN-756. Major sub-task reported by Jian He and fixed by Jian He
Move PreemptionContainer/PremptionContract/PreemptionMessage/StrictPreemptionContract/PreemptionResourceRequest to api.records
YARN-755. Major sub-task reported by Bikas Saha and fixed by Bikas Saha
Rename AllocateResponse.reboot to AllocateResponse.resync

For work preserving rm restart the am's will be resyncing instead of rebooting. rebooting is an action that currently satisfies the resync requirement. Changing the name now so that it continues to make sense in the real resync case.
YARN-753. Major sub-task reported by Jian He and fixed by Jian He
Add individual factory method for api protocol records
YARN-752. Major improvement reported by Sandy Ryza and fixed by Sandy Ryza (api , applications)
In AMRMClient, automatically add corresponding rack requests for requested nodes

A ContainerRequest that includes node-level requests must also include matching rack-level requests for the racks that those nodes are on. When a node is present without its rack, it makes sense for the client to automatically add the node's rack.
YARN-750. Major sub-task reported by Arun C Murthy and fixed by Arun C Murthy
Allow for black-listing resources in YARN API and Impl in CS

YARN-392 and YARN-398 enhance scheduler api to allow for white-lists of resources. This jira is a companion to allow for black-listing (in CS).
YARN-749. Major sub-task reported by Arun C Murthy and fixed by Arun C Murthy
Rename ResourceRequest (get,set)HostName to (get,set)ResourceName

We should rename ResourceRequest (get,set)HostName to (get,set)ResourceName since the name can be host, rack or *.
YARN-748. Major sub-task reported by Jian He and fixed by Jian He
Move BuilderUtils from yarn-common to yarn-server-common
YARN-746. Major sub-task reported by Steve Loughran and fixed by Steve Loughran
rename Service.register() and Service.unregister() to registerServiceListener() & unregisterServiceListener() respectively

make it clear what you are registering on a {{Service}} by naming the methods {{registerServiceListener()}} & {{unregisterServiceListener()}} respectively. This only affects a couple of production classes; {{Service.register()}} and is used in some of the lifecycle tests of the YARN-530. There are no tests of {{Service.unregister()}}, which is something that could be corrected.
YARN-742. Major bug reported by Kihwal Lee and fixed by Jason Lowe (nodemanager)
Log aggregation causes a lot of redundant setPermission calls

In one of our clusters, namenode RPC is spending 45% of its time on serving setPermission calls. Further investigation has revealed that most calls are redundantly made on /mapred/logs/<user>/logs. Also mkdirs calls are made before this.
YARN-739. Major sub-task reported by Siddharth Seth and fixed by Omkar Vinit Joshi
NM startContainer should validate the NodeId

The NM validates certain fields from the ContainerToken on a startContainer call. It shoudl also validate the NodeId (which needs to be added to the ContianerToken).
YARN-737. Major sub-task reported by Jian He and fixed by Jian He
Some Exceptions no longer need to be wrapped by YarnException and can be directly thrown out after YARN-142
YARN-736. Major improvement reported by Sandy Ryza and fixed by Sandy Ryza (scheduler)
Add a multi-resource fair sharing metric

Currently, at a regular interval, the fair scheduler computes a fair memory share for each queue and application inside it. This fair share is not used for scheduling decisions, but is displayed in the web UI, exposed as a metric, and used for preemption decisions. With DRF and multi-resource scheduling, assigning a memory share as the fair share metric to every queue no longer makes sense. It's not obvious what the replacement should be, but probably something like fractional fairness within a queue, or distance from an ideal cluster state.
YARN-735. Major sub-task reported by Jian He and fixed by Jian He
Make ApplicationAttemptID, ContainerID, NodeID immutable
YARN-733. Major bug reported by Zhijie Shen and fixed by Zhijie Shen
TestNMClient fails occasionally

The problem happens at: {code} // getContainerStatus can be called after stopContainer try { ContainerStatus status = nmClient.getContainerStatus( container.getId(), container.getNodeId(), container.getContainerToken()); assertEquals(container.getId(), status.getContainerId()); assertEquals(ContainerState.RUNNING, status.getState()); assertTrue("" + i, status.getDiagnostics().contains( "Container killed by the ApplicationMaster.")); assertEquals(-1000, status.getExitStatus()); } catch (YarnRemoteException e) { fail("Exception is not expected"); } {code} NMClientImpl#stopContainer returns, but container hasn't been stopped immediately. ContainerManangerImpl implements stopContainer in async style. Therefore, the container's status is in transition. NMClientImpl#getContainerStatus immediately after stopContainer will get either the RUNNING status or the COMPLETE one. There will be the similar problem wrt NMClientImpl#startContainer.
YARN-731. Major sub-task reported by Siddharth Seth and fixed by Zhijie Shen
RPCUtil.unwrapAndThrowException should unwrap remote RuntimeExceptions

Will be required for YARN-662. Also, remote NPEs show up incorrectly for some unit tests.
YARN-727. Blocker sub-task reported by Siddharth Seth and fixed by Xuan Gong
ClientRMProtocol.getAllApplications should accept ApplicationType as a parameter

Now that an ApplicationType is registered on ApplicationSubmission, getAllApplications should be able to use this string to query for a specific application type.
YARN-726. Critical bug reported by Siddharth Seth and fixed by Mayank Bansal
Queue, FinishTime fields broken on RM UI

The queue shows up as "Invalid Date" Finish Time shows up as a Long value.
YARN-724. Major sub-task reported by Jian He and fixed by Jian He
Move ProtoBase from api.records to api.records.impl.pb

Simply move ProtoBase to records.impl.pb
YARN-720. Major sub-task reported by Siddharth Seth and fixed by Zhijie Shen
container-log4j.properties should not refer to mapreduce properties

This refers to yarn.app.mapreduce.container.log.dir and yarn.app.mapreduce.container.log.filesize. This should either be moved into the MR codebase. Alternately the parameters should be renamed.
YARN-719. Major sub-task reported by Vinod Kumar Vavilapalli and fixed by Vinod Kumar Vavilapalli
Move RMIdentifier from Container to ContainerTokenIdentifier

This needs to be done for YARN-684 to happen.
YARN-717. Major sub-task reported by Jian He and fixed by Jian He
Copy BuilderUtil methods into token-related records

This is separated from YARN-711,as after changing yarn.api.token from interface to abstract class, eg: ClientTokenPBImpl has to extend two classes: both TokenPBImpl and ClientToken abstract class, which is not allowed in JAVA. We may remove the ClientToken/ContainerToken/DelegationToken interface and just use the common Token interface
YARN-716. Major task reported by Siddharth Seth and fixed by Siddharth Seth
Make ApplicationID immutable
YARN-715. Major bug reported by Siddharth Seth and fixed by Vinod Kumar Vavilapalli
TestDistributedShell and TestUnmanagedAMLauncher are failing

Tests are timing out. Looks like this is related to YARN-617. {code} 2013-05-21 17:40:23,693 ERROR [IPC Server handler 0 on 54024] containermanager.ContainerManagerImpl (ContainerManagerImpl.java:authorizeRequest(412)) - Unauthorized request to start container. Expected containerId: user Found: container_1369183214008_0001_01_000001 2013-05-21 17:40:23,694 ERROR [IPC Server handler 0 on 54024] security.UserGroupInformation (UserGroupInformation.java:doAs(1492)) - PriviledgedActionException as:user (auth:SIMPLE) cause:org.apache.hado Expected containerId: user Found: container_1369183214008_0001_01_000001 2013-05-21 17:40:23,695 INFO [IPC Server handler 0 on 54024] ipc.Server (Server.java:run(1864)) - IPC Server handler 0 on 54024, call org.apache.hadoop.yarn.api.ContainerManagerPB.startContainer from 10. Expected containerId: user Found: container_1369183214008_0001_01_000001 org.apache.hadoop.yarn.exceptions.YarnRemoteException: Unauthorized request to start container. Expected containerId: user Found: container_1369183214008_0001_01_000001 at org.apache.hadoop.yarn.ipc.RPCUtil.getRemoteException(RPCUtil.java:43) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.authorizeRequest(ContainerManagerImpl.java:413) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.startContainer(ContainerManagerImpl.java:440) at org.apache.hadoop.yarn.api.impl.pb.service.ContainerManagerPBServiceImpl.startContainer(ContainerManagerPBServiceImpl.java:72) at org.apache.hadoop.yarn.proto.ContainerManager$ContainerManagerService$2.callBlockingMethod(ContainerManager.java:83) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:527) {code}
YARN-714. Major sub-task reported by Omkar Vinit Joshi and fixed by Omkar Vinit Joshi
AMRM protocol changes for sending NMToken list

NMToken will be sent to AM on allocate call if 1) AM doesn't already have NMToken for the underlying NM 2) Key rolled over on RM and AM gets new container on the same NM. On allocate call RM will send a consolidated list of all required NMTokens.
YARN-711. Major sub-task reported by Vinod Kumar Vavilapalli and fixed by Jian He
Copy BuilderUtil methods into individual records

BuilderUtils is one giant utils class which has all the factory methods needed for creating records. It is painful for users to figure out how to create records. We are better off having the factories in each record, that way users can easily create records. As a first step, we should just copy all the factory methods into individual classes, deprecate BuilderUtils and then slowly move all code off BuilderUtils.
YARN-708. Major task reported by Siddharth Seth and fixed by Siddharth Seth
Move RecordFactory classes to hadoop-yarn-api, miscellaneous fixes to the interfaces

This is required for additional changes in YARN-528. Some of the interfaces could use some cleanup as well - they shouldn't be declaring YarnException (Runtime) in their signature.
YARN-706. Major bug reported by Zhijie Shen and fixed by Zhijie Shen
Race Condition in TestFSDownload

See the test failure in YARN-695 https://builds.apache.org/job/PreCommit-YARN-Build/957//testReport/org.apache.hadoop.yarn.util/TestFSDownload/testDownloadPatternJar/
YARN-701. Blocker sub-task reported by Vinod Kumar Vavilapalli and fixed by Vinod Kumar Vavilapalli
ApplicationTokens should be used irrespective of kerberos

- Single code path for secure and non-secure cases is useful for testing, coverage. - Having this in non-secure mode will help us avoid accidental bugs in AMs DDos'ing and bringing down RM.
YARN-700. Major bug reported by Ivan Mitic and fixed by Ivan Mitic
TestInfoBlock fails on Windows because of line ending missmatch

Exception: {noformat} Running org.apache.hadoop.yarn.webapp.view.TestInfoBlock Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.962 sec <<< FAILURE! testMultilineInfoBlock(org.apache.hadoop.yarn.webapp.view.TestInfoBlock) Time elapsed: 873 sec <<< FAILURE! java.lang.AssertionError: at org.junit.Assert.fail(Assert.java:91) at org.junit.Assert.assertTrue(Assert.java:43) at org.junit.Assert.assertTrue(Assert.java:54) at org.apache.hadoop.yarn.webapp.view.TestInfoBlock.testMultilineInfoBlock(TestInfoBlock.java:79) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) at org.junit.internal.runners.statements.FailOnTimeout$1.run(FailOnTimeout.java:28) {noformat}
YARN-695. Major sub-task reported by Zhijie Shen and fixed by Zhijie Shen
masterContainer and status are in ApplicationReportProto but not in ApplicationReport

If masterContainer and status are no longer part of ApplicationReport, they should be removed from proto as well.
YARN-694. Major bug reported by Omkar Vinit Joshi and fixed by Omkar Vinit Joshi
Start using NMTokens to authenticate all communication with NM

AM uses the NMToken to authenticate all the AM-NM communication. NM will validate NMToken in below manner * If NMToken is using current or previous master key then the NMToken is valid. In this case it will update its cache with this key corresponding to appId. * If NMToken is using the master key which is present in NM's cache corresponding to AM's appId then it will be validated based on this. * If NMToken is invalid then NM will reject AM calls. Modification for ContainerToken * At present RPC validates AM-NM communication based on ContainerToken. It will be replaced with NMToken. Also now onwards AM will use NMToken per NM (replacing earlier behavior of ContainerToken per container per NM). * startContainer in case of Secured environment is using ContainerToken from UGI YARN-617; however after this it will use it from the payload (Container). * ContainerToken will exist and it will only be used to validate the AM's container start request.
YARN-693. Major bug reported by Omkar Vinit Joshi and fixed by Omkar Vinit Joshi
Sending NMToken to AM on allocate call

This is part of YARN-613. As per the updated design, AM will receive per NM, NMToken in following scenarios * AM is receiving first container on underlying NM. * AM is receiving container on underlying NM after either NM or RM rebooted. ** After RM reboot, as RM doesn't remember (persist) the information about keys issued per AM per NM, it will reissue tokens in case AM gets new container on underlying NM. However on NM side NM will still retain older token until it receives new token to support long running jobs (in work preserving environment). ** After NM reboot, RM will delete the token information corresponding to that AM for all AMs. * AM is receiving container on underlying NM after NMToken master key is rolled over on RM side. In all the cases if AM receives new NMToken then it is suppose to store it for future NM communication until it receives a new one. AMRMClient should expose these NMToken to client.
YARN-692. Major bug reported by Omkar Vinit Joshi and fixed by Omkar Vinit Joshi
Creating NMToken master key on RM and sharing it with NM as a part of RM-NM heartbeat.

This is related to YARN-613 . Here we will be implementing NMToken generation on RM side and sharing it with NM during RM-NM heartbeat. As a part of this JIRA mater key will only be made available to NM but there will be no validation done until AM-NM communication is fixed.
YARN-690. Blocker bug reported by Daryn Sharp and fixed by Daryn Sharp (resourcemanager)
RM exits on token cancel/renew problems

The DelegationTokenRenewer thread is critical to the RM. When a non-IOException occurs, the thread calls System.exit to prevent the RM from running w/o the thread. It should be exiting only on non-RuntimeExceptions. The problem is especially bad in 23 because the yarn protobuf layer converts IOExceptions into UndeclaredThrowableExceptions (RuntimeException) which causes the renewer to abort the process. An UnknownHostException takes down the RM...
YARN-688. Major bug reported by Jian He and fixed by Jian He
Containers not cleaned up when NM received SHUTDOWN event from NodeStatusUpdater

Currently, both SHUTDOWN event from nodeStatusUpdater and CleanupContainers event happens to be on the same dispatcher thread, CleanupContainers Event will not be processed until SHUTDOWN event is processed. see similar problem on YARN-495. On normal NM shutdown, this is not a problem since normal stop happens on shutdownHook thread.
YARN-686. Major sub-task reported by Sandy Ryza and fixed by Sandy Ryza (api)
Flatten NodeReport

The NodeReport returned by getClusterNodes or given to AMs in heartbeat responses includes both a NodeState (enum) and a NodeHealthStatus (object). As UNHEALTHY is already NodeState, a separate NodeHealthStatus doesn't seem necessary. I propose eliminating NodeHealthStatus#getIsNodeHealthy and moving its two other methods, getHealthReport and getLastHealthReportTime, into NodeReport.
YARN-684. Major sub-task reported by Vinod Kumar Vavilapalli and fixed by Vinod Kumar Vavilapalli
ContainerManager.startContainer needs to only have ContainerTokenIdentifier instead of the whole Container

The NM only needs the token, the whole Container is unnecessary.
YARN-663. Major sub-task reported by Xuan Gong and fixed by Xuan Gong
Change ResourceTracker API and LocalizationProtocol API to throw YarnRemoteException and IOException
YARN-661. Major bug reported by Jason Lowe and fixed by Omkar Vinit Joshi (nodemanager)
NM fails to cleanup local directories for users

YARN-71 added deletion of local directories on startup, but in practice it fails to delete the directories because of permission problems. The top-level usercache directory is owned by the user but is in a directory that is not writable by the user. Therefore the deletion of the user's usercache directory, as the user, fails due to lack of permissions.
YARN-660. Major sub-task reported by Bikas Saha and fixed by Bikas Saha
Improve AMRMClient with matching requests
YARN-655. Major bug reported by Sandy Ryza and fixed by Sandy Ryza (scheduler)
Fair scheduler metrics should subtract allocated memory from available memory

In the scheduler web UI, cluster metrics reports that the "Memory Total" goes up when an application is allocated resources.
YARN-654. Major bug reported by Bikas Saha and fixed by Xuan Gong
AMRMClient: Perform sanity checks for parameters of public methods
YARN-651. Major sub-task reported by Xuan Gong and fixed by Xuan Gong
Change ContainerManagerPBClientImpl and RMAdminProtocolPBClientImpl to throw IOException and YarnRemoteException

YARN-632 AND YARN-633 changes RMAdmin and ContainerManager api to throw YarnRemoteException and IOException. RMAdminProtocolPBClientImpl and ContainerManagerPBClientImpl should do the same changes
YARN-648. Major bug reported by Karthik Kambatla and fixed by Karthik Kambatla (scheduler)
FS: Add documentation for pluggable policy

YARN-469 and YARN-482 make the scheduling policy in FS pluggable. Need to add documentation on how to use this.
YARN-646. Major bug reported by Dapeng Sun and fixed by Dapeng Sun (documentation)
Some issues in Fair Scheduler's document

Issues are found in the doc page for Fair Scheduler http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html: 1.In the section “Configuration”, It contains two properties named “yarn.scheduler.fair.minimum-allocation-mb”, the second one should be “yarn.scheduler.fair.maximum-allocation-mb” 2.In the section “Allocation file format”, the document tells “ The format contains three types of elements”, but it lists four types of elements following that.
YARN-645. Major bug reported by Jian He and fixed by Jian He
Move RMDelegationTokenSecretManager from yarn-server-common to yarn-server-resourcemanager

RMDelegationTokenSecretManager is specific to resource manager, should not belong to server-common
YARN-642. Major bug reported by Sandy Ryza and fixed by Sandy Ryza (api , resourcemanager)
Fix up /nodes REST API to have 1 param and be consistent with the Java API

The code behind the /nodes RM REST API is unnecessarily muddled, logs the same misspelled INFO message repeatedly, and does not return unhealthy nodes, even when asked.
YARN-639. Major bug reported by Zhijie Shen and fixed by Zhijie Shen (applications/distributed-shell)
Make AM of Distributed Shell Use NMClient

YARN-422 adds NMClient. AM of Distributed Shell should use it instead of using ContainerManager directly.
YARN-638. Major sub-task reported by Jian He and fixed by Jian He (resourcemanager)
Restore RMDelegationTokens after RM Restart

This is missed in YARN-581. After RM restart, RMDelegationTokens need to be added both in DelegationTokenRenewer (addressed in YARN-581), and delegationTokenSecretManager
YARN-637. Major bug reported by Karthik Kambatla and fixed by Karthik Kambatla (scheduler)
FS: maxAssign is not honored

maxAssign limits the number of containers that can be assigned in a single heartbeat. Currently, FS doesn't keep track of number of assigned containers to check this.
YARN-635. Major sub-task reported by Xuan Gong and fixed by Siddharth Seth
Rename YarnRemoteException to YarnException
YARN-634. Major sub-task reported by Siddharth Seth and fixed by Siddharth Seth
Make YarnRemoteException not backed by PB and introduce a SerializedException

LocalizationProtocol sends an exception over the wire. This currently uses YarnRemoteException. Post YARN-627, this needs to be changed and a new serialized exception is required.
YARN-633. Major sub-task reported by Xuan Gong and fixed by Xuan Gong
Change RMAdminProtocol api to throw IOException and YarnRemoteException
YARN-632. Major sub-task reported by Xuan Gong and fixed by Xuan Gong
Change ContainerManager api to throw IOException and YarnRemoteException
YARN-631. Major sub-task reported by Xuan Gong and fixed by Xuan Gong
Change ClientRMProtocol api to throw IOException and YarnRemoteException
YARN-630. Major sub-task reported by Xuan Gong and fixed by Xuan Gong
Change AMRMProtocol api to throw IOException and YarnRemoteException
YARN-629. Major sub-task reported by Xuan Gong and fixed by Xuan Gong
Make YarnRemoteException not be rooted at IOException

After HADOOP-9343, it should be possible for YarnException to not be rooted at IOException
YARN-628. Major sub-task reported by Siddharth Seth and fixed by Siddharth Seth
Fix YarnException unwrapping

Unwrapping of YarnRemoteExceptions (currently in YarnRemoteExceptionPBImpl, RPCUtil post YARN-625) is broken, and often ends up throwin UndeclaredThrowableException. This needs to be fixed.
YARN-625. Major sub-task reported by Siddharth Seth and fixed by Siddharth Seth
Move unwrapAndThrowException from YarnRemoteExceptionPBImpl to RPCUtil
YARN-618. Major bug reported by Jian He and fixed by Jian He
Modify RM_INVALID_IDENTIFIER to a -ve number

RM_INVALID_IDENTIFIER set to 0 doesnt sound right as many tests set it to 0. Probably a -ve number is what we want.
YARN-617. Minor sub-task reported by Vinod Kumar Vavilapalli and fixed by Omkar Vinit Joshi
In unsercure mode, AM can fake resource requirements

Without security, it is impossible to completely avoid AMs faking resources. We can at the least make it as difficult as possible by using the same container tokens and the RM-NM shared key mechanism over unauthenticated RM-NM channel. In the minimum, this will avoid accidental bugs in AMs in unsecure mode.
YARN-615. Major sub-task reported by Vinod Kumar Vavilapalli and fixed by Vinod Kumar Vavilapalli
ContainerLaunchContext.containerTokens should simply be called tokens

ContainerToken is the name of the specific token that AMs use to launch containers on NMs, so we should rename CLC.containerTokens to be simply tokens.
YARN-613. Major sub-task reported by Bikas Saha and fixed by Omkar Vinit Joshi
Create NM proxy per NM instead of per container

Currently a new NM proxy has to be created per container since the secure authentication is using a containertoken from the container.
YARN-610. Blocker sub-task reported by Siddharth Seth and fixed by Omkar Vinit Joshi
ClientToken (ClientToAMToken) should not be set in the environment

Similar to YARN-579, this can be set via ContainerTokens
YARN-605. Major bug reported by Hitesh Shah and fixed by Hitesh Shah
Failing unit test in TestNMWebServices when using git for source control

Failed tests: testNode(org.apache.hadoop.yarn.server.nodemanager.webapp.TestNMWebServices): hadoopBuildVersion doesn't match, got: 3.0.0-SNAPSHOT from fddcdcfb3cfe7dcc4f77c1ac953dd2cc0a890c62 (HEAD, origin/trunk, origin/HEAD, mrx-track) by Hitesh source checksum f89f5c9b9c9d44cf3be5c2686f2d789 expected: 3.0.0-SNAPSHOT from fddcdcfb3cfe7dcc4f77c1ac953dd2cc0a890c62 (HEAD, origin/trunk, origin/HEAD, mrx-track) by Hitesh source checksum f89f5c9b9c9d44cf3be5c2686f2d789 testNodeSlash(org.apache.hadoop.yarn.server.nodemanager.webapp.TestNMWebServices): hadoopBuildVersion doesn't match, got: 3.0.0-SNAPSHOT from fddcdcfb3cfe7dcc4f77c1ac953dd2cc0a890c62 (HEAD, origin/trunk, origin/HEAD, mrx-track) by Hitesh source checksum f89f5c9b9c9d44cf3be5c2686f2d789 expected: 3.0.0-SNAPSHOT from fddcdcfb3cfe7dcc4f77c1ac953dd2cc0a890c62 (HEAD, origin/trunk, origin/HEAD, mrx-track) by Hitesh source checksum f89f5c9b9c9d44cf3be5c2686f2d789 testNodeDefault(org.apache.hadoop.yarn.server.nodemanager.webapp.TestNMWebServices): hadoopBuildVersion doesn't match, got: 3.0.0-SNAPSHOT from fddcdcfb3cfe7dcc4f77c1ac953dd2cc0a890c62 (HEAD, origin/trunk, origin/HEAD, mrx-track) by Hitesh source checksum f89f5c9b9c9d44cf3be5c2686f2d789 expected: 3.0.0-SNAPSHOT from fddcdcfb3cfe7dcc4f77c1ac953dd2cc0a890c62 (HEAD, origin/trunk, origin/HEAD, mrx-track) by Hitesh source checksum f89f5c9b9c9d44cf3be5c2686f2d789 testNodeInfo(org.apache.hadoop.yarn.server.nodemanager.webapp.TestNMWebServices): hadoopBuildVersion doesn't match, got: 3.0.0-SNAPSHOT from fddcdcfb3cfe7dcc4f77c1ac953dd2cc0a890c62 (HEAD, origin/trunk, origin/HEAD, mrx-track) by Hitesh source checksum f89f5c9b9c9d44cf3be5c2686f2d789 expected: 3.0.0-SNAPSHOT from fddcdcfb3cfe7dcc4f77c1ac953dd2cc0a890c62 (HEAD, origin/trunk, origin/HEAD, mrx-track) by Hitesh source checksum f89f5c9b9c9d44cf3be5c2686f2d789 testNodeInfoSlash(org.apache.hadoop.yarn.server.nodemanager.webapp.TestNMWebServices): hadoopBuildVersion doesn't match, got: 3.0.0-SNAPSHOT from fddcdcfb3cfe7dcc4f77c1ac953dd2cc0a890c62 (HEAD, origin/trunk, origin/HEAD, mrx-track) by Hitesh source checksum f89f5c9b9c9d44cf3be5c2686f2d789 expected: 3.0.0-SNAPSHOT from fddcdcfb3cfe7dcc4f77c1ac953dd2cc0a890c62 (HEAD, origin/trunk, origin/HEAD, mrx-track) by Hitesh source checksum f89f5c9b9c9d44cf3be5c2686f2d789 testNodeInfoDefault(org.apache.hadoop.yarn.server.nodemanager.webapp.TestNMWebServices): hadoopBuildVersion doesn't match, got: 3.0.0-SNAPSHOT from fddcdcfb3cfe7dcc4f77c1ac953dd2cc0a890c62 (HEAD, origin/trunk, origin/HEAD, mrx-track) by Hitesh source checksum f89f5c9b9c9d44cf3be5c2686f2d789 expected: 3.0.0-SNAPSHOT from fddcdcfb3cfe7dcc4f77c1ac953dd2cc0a890c62 (HEAD, origin/trunk, origin/HEAD, mrx-track) by Hitesh source checksum f89f5c9b9c9d44cf3be5c2686f2d789 testSingleNodesXML(org.apache.hadoop.yarn.server.nodemanager.webapp.TestNMWebServices): hadoopBuildVersion doesn't match, got: 3.0.0-SNAPSHOT from fddcdcfb3cfe7dcc4f77c1ac953dd2cc0a890c62 (HEAD, origin/trunk, origin/HEAD, mrx-track) by Hitesh source checksum f89f5c9b9c9d44cf3be5c2686f2d789 expected: 3.0.0-SNAPSHOT from fddcdcfb3cfe7dcc4f77c1ac953dd2cc0a890c62 (HEAD, origin/trunk, origin/HEAD, mrx-track) by Hitesh source checksum f89f5c9b9c9d44cf3be5c2686f2d789
YARN-600. Major improvement reported by Sandy Ryza and fixed by Sandy Ryza (resourcemanager , scheduler)
Hook up cgroups CPU settings to the number of virtual cores allocated

YARN-3 introduced CPU isolation and monitoring through cgroups. YARN-2 and introduced CPU scheduling in the capacity scheduler, and YARN-326 will introduce it in the fair scheduler. The number of virtual cores allocated to a container should be used to weight the number of cgroups CPU shares given to it.
YARN-599. Major bug reported by Zhijie Shen and fixed by Zhijie Shen
Refactoring submitApplication in ClientRMService and RMAppManager

Currently, ClientRMService#submitApplication call RMAppManager#handle, and consequently call RMAppMangager#submitApplication directly, though the code looks like scheduling an APP_SUBMIT event. In addition, the validation code before creating an RMApp instance is not well organized. Ideally, the dynamic validation, which depends on the RM's configuration, should be put in RMAppMangager#submitApplication. RMAppMangager#submitApplication is called by ClientRMService#submitApplication and RMAppMangager#recover. Since the configuration may be changed after RM restarts, the validation needs to be done again even in recovery mode. Therefore, resource request validation, which based on min/max resource limits, should be moved from ClientRMService#submitApplication to RMAppMangager#submitApplication. On the other hand, the static validation, which is independent of the RM's configuration should be put in ClientRMService#submitApplication, because it is only need to be done once during the first submission. Furthermore, try-catch flow in RMAppMangager#submitApplication has a flaw. RMAppMangager#submitApplication has a flaw is not synchronized. If two application submissions with the same application ID enter the function, and one progresses to the completion of RMApp instantiation, and the other progresses the completion of putting the RMApp instance into rmContext, the slower submission will cause an exception due to the duplicate application ID. However, the exception will cause the RMApp instance already in rmContext (belongs to the faster submission) being rejected with the current code flow.
YARN-598. Major improvement reported by Sandy Ryza and fixed by Sandy Ryza (resourcemanager , scheduler)
Add virtual cores to queue metrics

QueueMetrics includes allocatedMB, availableMB, pendingMB, reservedMB. It should have equivalents for CPU.
YARN-597. Major bug reported by Ivan Mitic and fixed by Ivan Mitic
TestFSDownload fails on Windows because of dependencies on tar/gzip/jar tools

{{testDownloadArchive}}, {{testDownloadPatternJar}} and {{testDownloadArchiveZip}} fail with the similar Shell ExitCodeException: {code} testDownloadArchiveZip(org.apache.hadoop.yarn.util.TestFSDownload) Time elapsed: 480 sec <<< ERROR! org.apache.hadoop.util.Shell$ExitCodeException: bash: line 0: cd: /D:/svn/t/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/target/TestFSDownload: No such file or directory gzip: 1: No such file or directory at org.apache.hadoop.util.Shell.runCommand(Shell.java:377) at org.apache.hadoop.util.Shell.run(Shell.java:292) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:497) at org.apache.hadoop.yarn.util.TestFSDownload.createZipFile(TestFSDownload.java:225) at org.apache.hadoop.yarn.util.TestFSDownload.testDownloadArchiveZip(TestFSDownload.java:503) {code}
YARN-595. Major sub-task reported by Sandy Ryza and fixed by Sandy Ryza (scheduler)
Refactor fair scheduler to use common Resources

resourcemanager.fair and resourcemanager.resources have two copies of basically the same code for operations on Resource objects
YARN-594. Major bug reported by Jian He and fixed by Jian He
Update test and add comments in YARN-534

This jira is simply to add some comments in the patch YARN-534 and update the test case
YARN-593. Major bug reported by Chris Nauroth and fixed by Chris Nauroth (nodemanager)
container launch on Windows does not correctly populate classpath with new process's environment variables and localized resources

On Windows, we must bundle the classpath of a launched container in an intermediate jar with a manifest. Currently, this logic incorrectly uses the nodemanager process's environment variables for substitution. Instead, it needs to use the new environment for the launched process. Also, the bundled classpath is missing some localized resources for directories, due to a quirk in the way {{File#toURI}} decides whether or not to append a trailing '/'.
YARN-591. Major sub-task reported by Vinod Kumar Vavilapalli and fixed by Vinod Kumar Vavilapalli
RM recovery related records do not belong to the API

We need to move out AppliationStateData and ApplicationAttemptStateData into resourcemanager module. They are not part of the public API..
YARN-590. Major improvement reported by Vinod Kumar Vavilapalli and fixed by Mayank Bansal
Add an optional mesage to RegisterNodeManagerResponse as to why NM is being asked to resync or shutdown

We should log such message in NM itself. Helps in debugging issues on NM directly instead of distributed debugging between RM and NM when such an action is received from RM.
YARN-586. Trivial bug reported by Zhijie Shen and fixed by Zhijie Shen
Typo in ApplicationSubmissionContext#setApplicationId

The parameter should be applicationId instead of appplicationId
YARN-585. Major bug reported by Zhijie Shen and fixed by Zhijie Shen
TestFairScheduler#testNotAllowSubmitApplication is broken due to YARN-514

TestFairScheduler#testNotAllowSubmitApplication is broken due to YARN-514. See the discussions in YARN-514.
YARN-583. Major sub-task reported by Omkar Vinit Joshi and fixed by Omkar Vinit Joshi
Application cache files should be localized under local-dir/usercache/userid/appcache/appid/filecache

Currently application cache files are getting localized under local-dir/usercache/userid/appcache/appid/. however they should be localized under filecache sub directory.
YARN-582. Major sub-task reported by Bikas Saha and fixed by Jian He (resourcemanager)
Restore appToken and clientToken for app attempt after RM restart

These need to be saved and restored on a per app attempt basis. This is required only when work preserving restart is implemented for secure clusters. In non-preserving restart app attempts are killed and so this does not matter.
YARN-581. Major sub-task reported by Bikas Saha and fixed by Jian He (resourcemanager)
Test and verify that app delegation tokens are added to tokenRenewer after RM restart

The code already saves the delegation tokens in AppSubmissionContext. Upon restart the AppSubmissionContext is used to submit the application again and so restores the delegation tokens. This jira tracks testing and verifying this functionality in a secure setup.
YARN-579. Major sub-task reported by Vinod Kumar Vavilapalli and fixed by Vinod Kumar Vavilapalli
Make ApplicationToken part of Container's token list to help RM-restart

Container is already persisted for helping RM restart. Instead of explicitly setting ApplicationToken in AM's env, if we change it to be in Container, we can avoid env and can also help restart.
YARN-578. Major sub-task reported by Vinod Kumar Vavilapalli and fixed by Omkar Vinit Joshi (nodemanager)
NodeManager should use SecureIOUtils for serving and aggregating logs

Log servlets for serving logs and the ShuffleService for serving intermediate outputs both should use SecureIOUtils for avoiding symlink attacks.
YARN-577. Major sub-task reported by Hitesh Shah and fixed by Hitesh Shah
ApplicationReport does not provide progress value of application

An application sends its progress % to the RM via AllocateRequest. This should be able to be retrieved by a client via the ApplicationReport.
YARN-576. Major bug reported by Hitesh Shah and fixed by Kenji Kikushima
RM should not allow registrations from NMs that do not satisfy minimum scheduler allocations

If the minimum resource allocation configured for the RM scheduler is 1 GB, the RM should drop all NMs that register with a total capacity of less than 1 GB.
YARN-571. Major sub-task reported by Hitesh Shah and fixed by Omkar Vinit Joshi
User should not be part of ContainerLaunchContext

Today, a user is expected to set the user name in the CLC when either submitting an application or launching a container from the AM. This does not make sense as the user can/has been identified by the RM as part of the RPC layer. Solution would be to move the user information into either the Container object or directly into the ContainerToken which can then be used by the NM to launch the container. This user information would set into the container by the RM.
YARN-569. Major sub-task reported by Carlo Curino and fixed by Carlo Curino (capacityscheduler)
CapacityScheduler: support for preemption (using a capacity monitor)

There is a tension between the fast-pace reactive role of the CapacityScheduler, which needs to respond quickly to applications resource requests, and node updates, and the more introspective, time-based considerations needed to observe and correct for capacity balance. To this purpose we opted instead of hacking the delicate mechanisms of the CapacityScheduler directly to add support for preemption by means of a "Capacity Monitor", which can be run optionally as a separate service (much like the NMLivelinessMonitor). The capacity monitor (similarly to equivalent functionalities in the fairness scheduler) operates running on intervals (e.g., every 3 seconds), observe the state of the assignment of resources to queues from the capacity scheduler, performs off-line computation to determine if preemption is needed, and how best to "edit" the current schedule to improve capacity, and generates events that produce four possible actions: # Container de-reservations # Resource-based preemptions # Container-based preemptions # Container killing The actions listed above are progressively more costly, and it is up to the policy to use them as desired to achieve the rebalancing goals. Note that due to the "lag" in the effect of these actions the policy should operate at the macroscopic level (e.g., preempt tens of containers from a queue) and not trying to tightly and consistently micromanage container allocations. ------------- Preemption policy (ProportionalCapacityPreemptionPolicy): ------------- Preemption policies are by design pluggable, in the following we present an initial policy (ProportionalCapacityPreemptionPolicy) we have been experimenting with. The ProportionalCapacityPreemptionPolicy behaves as follows: # it gathers from the scheduler the state of the queues, in particular, their current capacity, guaranteed capacity and pending requests (*) # if there are pending requests from queues that are under capacity it computes a new ideal balanced state (**) # it computes the set of preemptions needed to repair the current schedule and achieve capacity balance (accounting for natural completion rates, and respecting bounds on the amount of preemption we allow for each round) # it selects which applications to preempt from each over-capacity queue (the last one in the FIFO order) # it remove reservations from the most recently assigned app until the amount of resource to reclaim is obtained, or until no more reservations exits # (if not enough) it issues preemptions for containers from the same applications (reverse chronological order, last assigned container first) again until necessary or until no containers except the AM container are left, # (if not enough) it moves onto unreserve and preempt from the next application. # containers that have been asked to preempt are tracked across executions. If a containers is among the one to be preempted for more than a certain time, the container is moved in a the list of containers to be forcibly killed. Notes: (*) at the moment, in order to avoid double-counting of the requests, we only look at the "ANY" part of pending resource requests, which means we might not preempt on behalf of AMs that ask only for specific locations but not any. (**) The ideal balance state is one in which each queue has at least its guaranteed capacity, and the spare capacity is distributed among queues (that wants some) as a weighted fair share. Where the weighting is based on the guaranteed capacity of a queue, and the function runs to a fix point. Tunables of the ProportionalCapacityPreemptionPolicy: # observe-only mode (i.e., log the actions it would take, but behave as read-only) # how frequently to run the policy # how long to wait between preemption and kill of a container # which fraction of the containers I would like to obtain should I preempt (has to do with the natural rate at which containers are returned) # deadzone size, i.e., what % of over-capacity should I ignore (if we are off perfect balance by some small % we ignore it) # overall amount of preemption we can afford for each run of the policy (in terms of total cluster capacity) In our current experiments this set of tunables seem to be a good start to shape the preemption action properly. More sophisticated preemption policies could take into account different type of applications running, job priorities, cost of preemption, integral of capacity imbalance. This is very much a control-theory kind of problem, and some of the lessons on designing and tuning controllers are likely to apply. Generality: The monitor-based scheduler edit, and the preemption mechanisms we introduced here are designed to be more general than enforcing capacity/fairness, in fact, we are considering other monitors that leverage the same idea of "schedule edits" to target different global properties (e.g., allocate enough resources to guarantee deadlines for important jobs, or data-locality optimizations, IO-balancing among nodes, etc...). Note that by default the preemption policy we describe is disabled in the patch. Depends on YARN-45 and YARN-567, is related to YARN-568
YARN-568. Major improvement reported by Carlo Curino and fixed by Carlo Curino (scheduler)
FairScheduler: support for work-preserving preemption

In the attached patch, we modified the FairScheduler to substitute its preemption-by-killling with a work-preserving version of preemption (followed by killing if the AMs do not respond quickly enough). This should allows to run preemption checking more often, but kill less often (proper tuning to be investigated). Depends on YARN-567 and YARN-45, is related to YARN-569.
YARN-567. Major sub-task reported by Carlo Curino and fixed by Carlo Curino (resourcemanager)
RM changes to support preemption for FairScheduler and CapacityScheduler

A common tradeoff in scheduling jobs is between keeping the cluster busy and enforcing capacity/fairness properties. FairScheduler and CapacityScheduler takes opposite stance on how to achieve this. The FairScheduler, leverages task-killing to quickly reclaim resources from currently running jobs and redistributing them among new jobs, thus keeping the cluster busy but waste useful work. The CapacityScheduler is typically tuned to limit the portion of the cluster used by each queue so that the likelihood of violating capacity is low, thus never wasting work, but risking to keep the cluster underutilized or have jobs waiting to obtain their rightful capacity. By introducing the notion of a work-preserving preemption we can remove this tradeoff. This requires a protocol for preemption (YARN-45), and ApplicationMasters that can answer to preemption efficiently (e.g., by saving their intermediate state, this will be posted for MapReduce in a separate JIRA soon), together with a scheduler that can issues preemption requests (discussed in separate JIRAs YARN-568 and YARN-569). The changes we track with this JIRA are common to FairScheduler and CapacityScheduler, and are mostly propagation of preemption decisions through the ApplicationMastersService.
YARN-563. Major sub-task reported by Thomas Weise and fixed by Mayank Bansal
Add application type to ApplicationReport

This field is needed to distinguish different types of applications (app master implementations). For example, we may run applications of type XYZ in a cluster alongside MR and would like to filter applications by type.
YARN-562. Major sub-task reported by Jian He and fixed by Jian He (resourcemanager)
NM should reject containers allocated by previous RM

Its possible that after RM shutdown, before AM goes down,AM still call startContainer on NM with containers allocated by previous RM. When RM comes back, NM doesn't know whether this container launch request comes from previous RM or the current RM. we should reject containers allocated by previous RM
YARN-561. Major sub-task reported by Hitesh Shah and fixed by Xuan Gong
Nodemanager should set some key information into the environment of every container that it launches.

Information such as containerId, nodemanager hostname, nodemanager port is not set in the environment when any container is launched. For an AM, the RM does all of this for it but for a container launched by an application, all of the above need to be set by the ApplicationMaster. At the minimum, container id would be a useful piece of information. If the container wishes to talk to its local NM, the nodemanager related information would also come in handy.
YARN-557. Major bug reported by Chris Nauroth and fixed by Chris Nauroth (applications)
TestUnmanagedAMLauncher fails on Windows

{{TestUnmanagedAMLauncher}} fails on Windows due to attempting to run a Unix-specific command in distributed shell and use of a Unix-specific environment variable to determine username for the {{ContainerLaunchContext}}.
YARN-553. Minor sub-task reported by Harsh J and fixed by Karthik Kambatla (client)
Have YarnClient generate a directly usable ApplicationSubmissionContext

Right now, we're doing multiple steps to create a relevant ApplicationSubmissionContext for a pre-received GetNewApplicationResponse. {code} GetNewApplicationResponse newApp = yarnClient.getNewApplication(); ApplicationId appId = newApp.getApplicationId(); ApplicationSubmissionContext appContext = Records.newRecord(ApplicationSubmissionContext.class); appContext.setApplicationId(appId); {code} A simplified way may be to have the GetNewApplicationResponse itself provide a helper method that builds a usable ApplicationSubmissionContext for us. Something like: {code} GetNewApplicationResponse newApp = yarnClient.getNewApplication(); ApplicationSubmissionContext appContext = newApp.generateApplicationSubmissionContext(); {code} [The above method can also take an arg for the container launch spec, or perhaps pre-load defaults like min-resource, etc. in the returned object, aside of just associating the application ID automatically.]
YARN-549. Major sub-task reported by Zhijie Shen and fixed by Zhijie Shen
YarnClient.submitApplication should wait for application to be accepted by the RM

Currently, when submitting an application, storeApplication will be called for recovery. However, it is a blocking API, and is likely to block concurrent application submissions. Therefore, it is good to make application submission asynchronous, and postpone storeApplication. YarnClient needs to change to wait for the whole operation to complete so that clients can be notified after the application is really submitted. YarnClient needs to wait for application to reach SUBMITTED state or beyond.
YARN-548. Major sub-task reported by Vadim Bondarev and fixed by Vadim Bondarev
Add tests for YarnUncaughtExceptionHandler
YARN-547. Major sub-task reported by Omkar Vinit Joshi and fixed by Omkar Vinit Joshi
Race condition in Public / Private Localizer may result into resource getting downloaded again

Public Localizer : At present when multiple containers try to request a localized resource * If the resource is not present then first it is created and Resource Localization starts ( LocalizedResource is in DOWNLOADING state) * Now if in this state multiple ResourceRequestEvents arrive then ResourceLocalizationEvents are sent for all of them. Most of the times it is not resulting into a duplicate resource download but there is a race condition present there. Inside ResourceLocalization (for public download) all the requests are added to local attempts map. If a new request comes in then first it is checked in this map before a new download starts for the same. For the current download the request will be there in the map. Now if a same resource request comes in then it will rejected (i.e. resource is getting downloaded already). However if the current download completes then the request will be removed from this local map. Now after this removal if the LocalizerRequestEvent comes in then as it is not present in local map the resource will be downloaded again. PrivateLocalizer : Here a different but similar race condition is present. * Here inside findNextResource method call; each LocalizerRunner tries to grab a lock on LocalizerResource. If the lock is not acquired then it will keep trying until the resource state changes to LOCALIZED. This lock will be released by the LocalizerRunner when download completes. * Now if another ContainerLocalizer tries to grab the lock on a resource before LocalizedResource state changes to LOCALIZED then resource will be downloaded again. At both the places the root cause of this is that all the threads try to acquire the lock on resource however current state of the LocalizedResource is not taken into consideration.
YARN-542. Major bug reported by Vinod Kumar Vavilapalli and fixed by Zhijie Shen
Change the default global AM max-attempts value to be not one

Today, the global AM max-attempts is set to 1 which is a bad choice. AM max-attempts accounts for both AM level failures as well as container crashes due to localization issue, lost nodes etc. To account for AM crashes due to problems that are not caused by user code, mainly lost nodes, we want to give AMs some retires. I propose we change it to atleast two. Can change it to 4 to match other retry-configs.
YARN-541. Blocker bug reported by Krishna Kishore Bonagiri and fixed by Bikas Saha (resourcemanager)
getAllocatedContainers() is not returning all the allocated containers

I am running an application that was written and working well with the hadoop-2.0.0-alpha but when I am running the same against 2.0.3-alpha, the getAllocatedContainers() method called on AMResponse is not returning all the containers allocated sometimes. For example, I request for 10 containers and this method gives me only 9 containers sometimes, and when I looked at the log of Resource Manager, the 10th container is also allocated. It happens only sometimes randomly and works fine all other times. If I send one more request for the remaining container to RM after it failed to give them the first time(and before releasing already acquired ones), it could allocate that container. I am running only one application at a time, but 1000s of them one after another. My main worry is, even though the RM's log is saying that all 10 requested containers are allocated, the getAllocatedContainers() method is not returning me all of them, it returned only 9 surprisingly. I never saw this kind of issue in the previous version, i.e. hadoop-2.0.0-alpha. Thanks, Kishore
YARN-539. Major sub-task reported by Omkar Vinit Joshi and fixed by Omkar Vinit Joshi
LocalizedResources are leaked in memory in case resource localization fails

If resource localization fails then resource remains in memory and is 1) Either cleaned up when next time cache cleanup runs and there is space crunch. (If sufficient space in cache is available then it will remain in memory). 2) reused if LocalizationRequest comes again for the same resource. I think when resource localization fails then that event should be sent to LocalResourceTracker which will then remove it from its cache.
YARN-538. Major improvement reported by Sandy Ryza and fixed by Sandy Ryza
RM address DNS lookup can cause unnecessary slowness on every JHS page load

When I run the job history server locally, every page load takes in the 10s of seconds. I profiled the process and discovered that all the extra time was spent inside YarnConfiguration#getRMWebAppURL, trying to resolve 0.0.0.0 to a hostname. When I changed my yarn.resourcemanager.address to localhost, the page load times decreased drastically. There's no that we need to perform this resolution on every page load.
YARN-536. Major sub-task reported by Xuan Gong and fixed by Xuan Gong
Remove ContainerStatus, ContainerState from Container api interface as they will not be called by the container object

Remove containerstate, containerStatus from container interface. They will not be called by container object
YARN-534. Major sub-task reported by Jian He and fixed by Jian He (resourcemanager)
AM max attempts is not checked when RM restart and try to recover attempts

Currently,AM max attempts is only checked if the current attempt fails and check to see whether to create new attempt. If the RM restarts before the max-attempt fails, it'll not clean the state store, when RM comes back, it will retry attempt again.
YARN-532. Major bug reported by Siddharth Seth and fixed by Siddharth Seth
RMAdminProtocolPBClientImpl should implement Closeable

Required for RPC.stopProxy to work. Already done in most of the other protocols. (MAPREDUCE-5117 addressing the one other protocol missing this)
YARN-530. Major sub-task reported by Steve Loughran and fixed by Steve Loughran
Define Service model strictly, implement AbstractService for robust subclassing, migrate yarn-common services

# Extend the YARN {{Service}} interface as discussed in YARN-117 # Implement the changes in {{AbstractService}} and {{FilterService}}. # Migrate all services in yarn-common to the more robust service model, test.
YARN-525. Major improvement reported by Thomas Graves and fixed by Thomas Graves (capacityscheduler)
make CS node-locality-delay refreshable

the config yarn.scheduler.capacity.node-locality-delay doesn't change when you change the value in capacity_scheduler.xml and then run yarn rmadmin -refreshQueues.
YARN-523. Major sub-task reported by Vinod Kumar Vavilapalli and fixed by Jian He
Container localization failures aren't reported from NM to RM

This is mainly a pain on crashing AMs, but once we fix this, containers also can benefit - same fix for both.
YARN-521. Major sub-task reported by Sandy Ryza and fixed by Sandy Ryza (api)
Augment AM - RM client module to be able to request containers only at specific locations

When YARN-392 and YARN-398 are completed, it would be good for AMRMClient to offer an easy way to access their functionality
YARN-518. Major improvement reported by Dapeng Sun and fixed by Sandy Ryza (documentation)
Fair Scheduler's document link could be added to the hadoop 2.x main doc page

Currently the doc page for Fair Scheduler looks good and it’s here, http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html. It would be better to add the document link to the YARN section in the Hadoop 2.x main doc page, so that users can easily find the doc to experimentally try Fair Scheduler as Capacity Scheduler.
YARN-515. Blocker bug reported by Robert Joseph Evans and fixed by Robert Joseph Evans
Node Manager not getting the master key

On branch-2 the latest version I see the following on a secure cluster. {noformat} 2013-03-28 19:21:06,243 [main] INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Security enabled - updating secret keys now 2013-03-28 19:21:06,243 [main] INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Registered with ResourceManager as RM:PORT with total resource of <me mory:12288, vCores:16> 2013-03-28 19:21:06,244 [main] INFO org.apache.hadoop.yarn.service.AbstractService: Service:org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl is started. 2013-03-28 19:21:06,245 [main] INFO org.apache.hadoop.yarn.service.AbstractService: Service:org.apache.hadoop.yarn.server.nodemanager.NodeManager is started. 2013-03-28 19:21:07,257 [Node Status Updater] ERROR org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Caught exception in status-updater java.lang.NullPointerException at org.apache.hadoop.yarn.server.security.BaseContainerTokenSecretManager.getCurrentKey(BaseContainerTokenSecretManager.java:121) at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl$1.run(NodeStatusUpdaterImpl.java:407) {noformat} The Null pointer exception just keeps repeating and all of the nodes end up being lost. It looks like it never gets the secret key when it registers.
YARN-514. Major sub-task reported by Bikas Saha and fixed by Zhijie Shen (resourcemanager)
Delayed store operations should not result in RM unavailability for app submission

Currently, app submission is the only store operation performed synchronously because the app must be stored before the request returns with success. This makes the RM susceptible to blocking all client threads on slow store operations, resulting in RM being perceived as unavailable by clients.
YARN-513. Major sub-task reported by Bikas Saha and fixed by Jian He (resourcemanager)
Create common proxy client for communicating with RM

When the RM is restarting, the NM, AM and Clients should wait for some time for the RM to come back up.
YARN-512. Minor bug reported by Jason Lowe and fixed by Maysam Yabandeh (nodemanager)
Log aggregation root directory check is more expensive than it needs to be

The log aggregation root directory check first does an {{exists}} call followed by a {{getFileStatus}} call. That effectively stats the file twice. It should just use {{getFileStatus}} and catch {{FileNotFoundException}} to handle the non-existent case. In addition we may consider caching the presence of the directory rather than checking it each time a node aggregates logs for an application.
YARN-507. Minor bug reported by Karthik Kambatla and fixed by Karthik Kambatla (scheduler)
Add interface visibility and stability annotations to FS interfaces/classes

Many of FS classes/interfaces are missing annotations on visibility and stability.
YARN-506. Major bug reported by Ivan Mitic and fixed by Ivan Mitic
Move to common utils FileUtil#setReadable/Writable/Executable and FileUtil#canRead/Write/Execute

Move to common utils described in HADOOP-9413 that work well cross-platform.
YARN-500. Major bug reported by Nishan Shetty and fixed by Kenji Kikushima (resourcemanager)
ResourceManager webapp is using next port if configured port is already in use
YARN-496. Minor bug reported by Sandy Ryza and fixed by Sandy Ryza (scheduler)
Fair scheduler configs are refreshed inconsistently in reinitialize

When FairScheduler#reinitialize is called, some of the scheduler-wide configs are refreshed and others aren't. They should all be refreshed. Ones that are refreshed: userAsDefaultQueue, nodeLocalityThreshold, rackLocalityThreshold, preemptionEnabled Ones that aren't: minimumAllocation, maximumAllocation, assignMultiple, maxAssign
YARN-495. Major bug reported by Jian He and fixed by Jian He
Change NM behavior of reboot to resync

When a reboot command is sent from RM, the node manager doesn't clean up the containers while its stopping.
YARN-493. Major bug reported by Chris Nauroth and fixed by Chris Nauroth (nodemanager)
NodeManager job control logic flaws on Windows

Both product and test code contain some platform-specific assumptions, such as availability of bash for executing a command in a container and signals to check existence of a process and terminate it.
YARN-491. Major bug reported by Chris Nauroth and fixed by Chris Nauroth (nodemanager)
TestContainerLogsPage fails on Windows

{{TestContainerLogsPage}} contains some code for initializing a log directory that doesn't work correctly on Windows.
YARN-490. Major bug reported by Chris Nauroth and fixed by Chris Nauroth (applications/distributed-shell)
TestDistributedShell fails on Windows

There are a few platform-specific assumption in distributed shell (both main code and test code) that prevent it from working correctly on Windows.
YARN-488. Major bug reported by Chris Nauroth and fixed by Chris Nauroth (nodemanager)
TestContainerManagerSecurity fails on Windows

These tests are failing to launch containers correctly when running on Windows.
YARN-487. Major bug reported by Chris Nauroth and fixed by Chris Nauroth (nodemanager)
TestDiskFailures fails on Windows due to path mishandling

{{TestDiskFailures#testDirFailuresOnStartup}} fails due to insertion of an extra leading '/' on the path within {{LocalDirsHandlerService}} when running on Windows. The test assertions also fail to account for the fact that {{Path}} normalizes '\' to '/'.
YARN-486. Major sub-task reported by Bikas Saha and fixed by Xuan Gong
Change startContainer NM API to accept Container as a parameter and make ContainerLaunchContext user land

Currently, id, resource request etc need to be copied over from Container to ContainerLaunchContext. This can be brittle. Also it leads to duplication of information (such as Resource from CLC and Resource from Container and Container.tokens). Sending Container directly to startContainer solves these problems. It also makes CLC clean by only having stuff in it that it set by the client/AM.
YARN-485. Major bug reported by Karthik Kambatla and fixed by Karthik Kambatla
TestProcfsProcessTree#testProcessTree() doesn't wait long enough for the process to die

TestProcfsProcessTree#testProcessTree fails occasionally with the following stack trace {noformat} Stack Trace: junit.framework.AssertionFailedError: expected:<false> but was:<true> at org.apache.hadoop.util.TestProcfsBasedProcessTree.testProcessTree(TestProcfsBasedProcessTree.java) {noformat} kill -9 is executed asynchronously, the signal is delivered when the process comes out of the kernel (sys call). Checking if the process died immediately after can fail at times.
YARN-482. Major sub-task reported by Karthik Kambatla and fixed by Karthik Kambatla (scheduler)
FS: Extend SchedulingMode to intermediate queues

FS allows setting {{SchedulingMode}} for leaf queues. Extending this to non-leaf queues allows using different kinds of fairness: e.g., root can have three child queues - fair-mem, drf-cpu-mem, drf-cpu-disk-mem taking different number of resources into account. In turn, this allows users to decide on the scheduling latency vs sophistication of the scheduling mode.
YARN-481. Major bug reported by Chris Riccomini and fixed by Chris Riccomini (client)
Add AM Host and RPC Port to ApplicationCLI Status Output

Hey Guys, I noticed that the ApplicationCLI is just randomly not printing some of the values in the ApplicationReport. I've added the getHost and getRpcPort. These are useful for me, since I want to make an RPC call to the AM (not the tracker call). Thanks! Chris
YARN-479. Major bug reported by Hitesh Shah and fixed by Jian He
NM retry behavior for connection to RM should be similar for lost heartbeats

Regardless of connection loss at the start or at an intermediate point, NM's retry behavior to the RM should follow the same flow.
YARN-476. Minor bug reported by Jason Lowe and fixed by Sandy Ryza
ProcfsBasedProcessTree info message confuses users

ProcfsBasedProcessTree has a habit of emitting not-so-helpful messages such as the following: {noformat} 2013-03-13 12:41:51,957 INFO [communication thread] org.apache.hadoop.yarn.util.ProcfsBasedProcessTree: The process 28747 may have finished in the interim. 2013-03-13 12:41:51,958 INFO [communication thread] org.apache.hadoop.yarn.util.ProcfsBasedProcessTree: The process 28978 may have finished in the interim. 2013-03-13 12:41:51,958 INFO [communication thread] org.apache.hadoop.yarn.util.ProcfsBasedProcessTree: The process 28979 may have finished in the interim. {noformat} As described in MAPREDUCE-4570, this is something that naturally occurs in the process of monitoring processes via procfs. It's uninteresting at best and can confuse users who think it's a reason their job isn't running as expected when it appears in their logs. We should either make this DEBUG or remove it entirely.
YARN-475. Major sub-task reported by Hitesh Shah and fixed by Hitesh Shah
Remove ApplicationConstants.AM_APP_ATTEMPT_ID_ENV as it is no longer set in an AM's environment

AMs are expected to use ApplicationConstants.AM_CONTAINER_ID_ENV and derive the application attempt id from the container id.
YARN-474. Major bug reported by Hitesh Shah and fixed by Zhijie Shen (capacityscheduler)
CapacityScheduler does not activate applications when maximum-am-resource-percent configuration is refreshed

Submit 3 applications to a cluster where capacity scheduler limits allow only 1 running application. Modify capacity scheduler config to increase value of yarn.scheduler.capacity.maximum-am-resource-percent and invoke refresh queues. The 2 applications not yet in running state do not get launched even though limits are increased.
YARN-469. Major sub-task reported by Karthik Kambatla and fixed by Karthik Kambatla (scheduler)
Make scheduling mode in FS pluggable

Currently, scheduling mode in FS is limited to Fair and FIFO. The code typically has an if condition at multiple places to determine the correct course of action. Making the scheduling mode pluggable helps in simplifying this process, particularly as we add new modes (DRF in this case).
YARN-468. Major sub-task reported by Aleksey Gorshkov and fixed by Aleksey Gorshkov
coverage fix for org.apache.hadoop.yarn.server.webproxy.amfilter

coverage fix org.apache.hadoop.yarn.server.webproxy.amfilter patch YARN-468-trunk.patch for trunk, branch-2, branch-0.23
YARN-467. Major sub-task reported by Omkar Vinit Joshi and fixed by Omkar Vinit Joshi (nodemanager)
Jobs fail during resource localization when public distributed-cache hits unix directory limits

If we have multiple jobs which uses distributed cache with small size of files, the directory limit reaches before reaching the cache size and fails to create any directories in file cache (PUBLIC). The jobs start failing with the below exception. java.io.IOException: mkdir of /tmp/nm-local-dir/filecache/3901886847734194975 failed at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:909) at org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:143) at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189) at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:706) at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:703) at org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2325) at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:703) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:147) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) we need to have a mechanism where in we can create directory hierarchy and limit number of files per directory.
YARN-460. Blocker bug reported by Thomas Graves and fixed by Thomas Graves (capacityscheduler)
CS user left in list of active users for the queue even when application finished

We have seen a user get left in the queues list of active users even though the application was removed. This can cause everyone else in the queue to get less resources if using the minimum user limit percent config.
YARN-458. Major bug reported by Sandy Ryza and fixed by Sandy Ryza (nodemanager , resourcemanager)
YARN daemon addresses must be placed in many different configs

The YARN resourcemanager's address is included in four different configs: yarn.resourcemanager.scheduler.address, yarn.resourcemanager.resource-tracker.address, yarn.resourcemanager.address, and yarn.resourcemanager.admin.address A new user trying to configure a cluster needs to know the names of all these four configs. The same issue exists for nodemanagers. It would be much easier if they could simply specify yarn.resourcemanager.hostname and yarn.nodemanager.hostname and default ports for the other ones would kick in.
YARN-450. Major sub-task reported by Bikas Saha and fixed by Zhijie Shen
Define value for * in the scheduling protocol

The ResourceRequest has a string field to specify node/rack locations. For the cross-rack/cluster-wide location (ie when there is no locality constraint) the "*" string is used everywhere. However, its not defined anywhere and each piece of code either defines a local constant or uses the string literal. Defining "*" in the protocol and removing other local references from the code base will be good.
YARN-448. Major bug reported by Kihwal Lee and fixed by Kihwal Lee (nodemanager)
Remove unnecessary hflush from log aggregation

AggregatedLogFormat#writeVersion() calls hflush() after writing the version. Calling hflush does not seem to be necessary. It can add a lot of load to hdfs in a big busy cluster.
YARN-447. Minor improvement reported by nemon lou and fixed by nemon lou (scheduler)
applicationComparator improvement for CS

Now the compare code is : return a1.getApplicationId().getId() - a2.getApplicationId().getId(); Will be replaced with : return a1.getApplicationId().compareTo(a2.getApplicationId()); This will bring some benefits: 1,leave applicationId compare logic to ApplicationId class; 2,In future's HA mode,cluster time stamp may change,ApplicationId class already takes care of this condition.
YARN-444. Major sub-task reported by Sandy Ryza and fixed by Sandy Ryza (api , applications/distributed-shell)
Move special container exit codes from YarnConfiguration to API

YarnConfiguration currently contains the special container exit codes INVALID_CONTAINER_EXIT_STATUS = -1000, ABORTED_CONTAINER_EXIT_STATUS = -100, and DISKS_FAILED = -101. These are not really not really related to configuration, and YarnConfiguration should not become a place to put miscellaneous constants. Per discussion on YARN-417, appmaster writers need to be able to provide special handling for them, so it might make sense to move these to their own user-facing class.
YARN-441. Major sub-task reported by Siddharth Seth and fixed by Xuan Gong
Clean up unused collection methods in various APIs

There's a bunch of unused methods like getAskCount() and getAsk(index) in AllocateRequest, and other interfaces. These should be removed. In YARN, found them in. MR will have it's own set. AllocateRequest StartContaienrResponse
YARN-440. Major sub-task reported by Siddharth Seth and fixed by Xuan Gong
Flatten RegisterNodeManagerResponse

RegisterNodeManagerResponse has another wrapper RegistrationResponse under it, which can be removed.
YARN-439. Major sub-task reported by Siddharth Seth and fixed by Xuan Gong
Flatten NodeHeartbeatResponse

NodeheartbeatResponse has another wrapper HeartbeatResponse under it, which can be removed.
YARN-426. Critical bug reported by Jason Lowe and fixed by Jason Lowe (nodemanager)
Failure to download a public resource on a node prevents further downloads of the resource from that node

If the NM encounters an error while downloading a public resource, it fails to empty the list of request events corresponding to the resource request in {{attempts}}. If the same public resource is subsequently requested on that node, {{PublicLocalizer.addResource}} will skip the download since it will mistakenly believe a download of that resource is already in progress. At that point any container that requests the public resource will just hang in the {{LOCALIZING}} state.
YARN-422. Major sub-task reported by Bikas Saha and fixed by Zhijie Shen
Add NM client library

Create a simple wrapper over the ContainerManager protocol to provide hide the details of the protocol implementation.
YARN-417. Major sub-task reported by Sandy Ryza and fixed by Sandy Ryza (api , applications)
Create AMRMClient wrapper that provides asynchronous callbacks

Writing AMs would be easier for some if they did not have to handle heartbeating to the RM on their own.
YARN-412. Minor bug reported by Roger Hoover and fixed by Roger Hoover (scheduler)
FifoScheduler incorrectly checking for node locality

In the FifoScheduler, the assignNodeLocalContainers method is checking if the data is local to a node by searching for the nodeAddress of the node in the set of outstanding requests for the app. This seems to be incorrect as it should be checking hostname instead. The offending line of code is 455: application.getResourceRequest(priority, node.getRMNode().getNodeAddress()); Requests are formated by hostname (e.g. host1.foo.com) whereas node addresses are a concatenation of hostname and command port (e.g. host1.foo.com:1234) In the CapacityScheduler, it's done using hostname. See LeafQueue.assignNodeLocalContainers, line 1129 application.getResourceRequest(priority, node.getHostName()); Note that this bug does not affect the actual scheduling decisions made by the FifoScheduler because even though it incorrect determines that a request is not local to the node, it will still schedule the request immediately because it's rack-local. However, this bug may be adversely affecting the reporting of job status by underreporting the number of tasks that were node local.
YARN-410. Major bug reported by Vinod Kumar Vavilapalli and fixed by Omkar Vinit Joshi
New lines in diagnostics for a failed app on the per-application page make it hard to read

We need to fix the following issues on YARN web-UI: - Remove the "Note" column from the application list. When a failure happens, this "Note" spoils the table layout. - When the Application is still not running, the Tracking UI should be title "UNASSIGNED", for some reason it is titled "ApplicationMaster" but (correctly) links to "#". - The per-application page has all the RM related information like version, start-time etc. Must be some accidental change by one of the patches. - The diagnostics for a failed app on the per-application page don't retain new lines and wrap'em around - looks hard to read.
YARN-406. Minor improvement reported by Hitesh Shah and fixed by Hitesh Shah
TestRackResolver fails when local network resolves "host1" to a valid host
YARN-400. Critical bug reported by Jason Lowe and fixed by Jason Lowe (resourcemanager)
RM can return null application resource usage report leading to NPE in client

RMAppImpl.createAndGetApplicationReport can return a report with a null resource usage report if full access to the app is allowed but the application has no current attempt. This leads to NPEs in client code that assumes an app report will always have at least an empty resource usage report.
YARN-398. Major sub-task reported by Arun C Murthy and fixed by Arun C Murthy
Enhance CS to allow for white-list of resources

Allow white-list and black-list of resources in scheduler api.
YARN-396. Major sub-task reported by Bikas Saha and fixed by Zhijie Shen
Rationalize AllocateResponse in RM scheduler API

AllocateResponse contains an AMResponse and cluster node count. AMResponse that more data. Unless there is a good reason for this object structure, there should be either AMResponse or AllocateResponse.
YARN-392. Major sub-task reported by Bikas Saha and fixed by Sandy Ryza (resourcemanager)
Make it possible to specify hard locality constraints in resource requests

Currently its not possible to specify scheduling requests for specific nodes and nowhere else. The RM automatically relaxes locality to rack and * and assigns non-specified machines to the app.
YARN-391. Trivial improvement reported by Steve Loughran and fixed by Steve Loughran (nodemanager)
detabify LCEResourcesHandler classes

the LCEResourcesHandler classes from YARN-3 have had some tab chars that have snuck into the source tree. fix this before that code starts getting branched off and it's too late
YARN-390. Major bug reported by Chris Nauroth and fixed by Chris Nauroth (client)
ApplicationCLI and NodeCLI use hard-coded platform-specific line separator, which causes test failures on Windows

{{ApplicationCLI}}, {{NodeCLI}}, and the corresponding test {{TestYarnCLI}} all use a hard-coded '\n' as the line separator. This causes test failures on Windows.
YARN-387. Blocker sub-task reported by Vinod Kumar Vavilapalli and fixed by Vinod Kumar Vavilapalli
Fix inconsistent protocol naming

We now have different and inconsistent naming schemes for various protocols. It was hard to explain to users, mainly in direct interactions at talks/presentations and user group meetings, with such naming. We should fix these before we go beta.
YARN-385. Major improvement reported by Sandy Ryza and fixed by Sandy Ryza (api)
ResourceRequestPBImpl's toString() is missing location and # containers

ResourceRequestPBImpl's toString method includes priority and resource capability, but omits location and number of containers.
YARN-383. Minor bug reported by Hitesh Shah and fixed by Hitesh Shah
AMRMClientImpl should handle null rmClient in stop()

2013-02-06 09:31:33,813 INFO [Thread-2] service.CompositeService (CompositeService.java:stop(101)) - Error stopping org.apache.hadoop.yarn.client.AMRMClientImpl org.apache.hadoop.HadoopIllegalArgumentException: Cannot close proxy since it is null at org.apache.hadoop.ipc.RPC.stopProxy(RPC.java:605) at org.apache.hadoop.yarn.client.AMRMClientImpl.stop(AMRMClientImpl.java:150) at org.apache.hadoop.yarn.service.CompositeService.stop(CompositeService.java:99) at org.apache.hadoop.yarn.service.CompositeService.stop(CompositeService.java:89)
YARN-382. Major improvement reported by Thomas Graves and fixed by Zhijie Shen (scheduler)
SchedulerUtils improve way normalizeRequest sets the resource capabilities

In YARN-370, we changed it from setting the capability to directly setting memory and cores: - ask.setCapability(normalized); + ask.getCapability().setMemory(normalized.getMemory()); + ask.getCapability().setVirtualCores(normalized.getVirtualCores()); We did this because it is directly setting the values in the original resource object passed in when the AM gets allocated and without it the AM doesn't get the resource normalized correctly in the submission context. See YARN-370 for more details. I think we should find a better way of doing this long term, one so we don't have to keep adding things there when new resources are added, two because its a bit confusing as to what its doing and prone to someone accidentally breaking it in the future again. Something closer to what Arun suggested in YARN-370 would be better but we need to make sure all the places work and get some more testing on it before putting it in.
YARN-381. Minor improvement reported by Eli Collins and fixed by Sandy Ryza (documentation)
Improve FS docs

The MR2 FS docs could use some improvements. Configuration: - sizebasedweight - what is the "size" here? Total memory usage? Pool properties: - minResources - what does min amount of aggregate memory mean given that this is not a reservation? - maxResources - is this a hard limit? - weight: How is this ratio configured? Eg base is 1 and all weights are relative to that? - schedulingMode - what is the default? Is fifo pure fifo, eg waits until all tasks for the job are finished before launching the next job? There's no mention of ACLs, even though they're supported. See the CS docs for comparison. Also there are a couple typos worth fixing while we're at it, eg "finish. apps to run" Worth keeping in mind that some of these will need to be updated to reflect that resource calculators are now pluggable.
YARN-380. Major bug reported by Thomas Graves and fixed by Omkar Vinit Joshi (client)
yarn node -status prints Last-Last-Health-Update

I assume the Last-Last-Health-Update is a typo and it should just be Last-Health-Update. $ yarn node -status foo.com:8041 Node Report : Node-Id : foo.com:8041 Rack : /10.10.10.0 Node-State : RUNNING Node-Http-Address : foo.com:8042 Health-Status(isNodeHealthy) : true Last-Last-Health-Update : 1360118400219 Health-Report : Containers : 0 Memory-Used : 0M Memory-Capacity : 24576
YARN-378. Major sub-task reported by xieguiming and fixed by Zhijie Shen (client , resourcemanager)
ApplicationMaster retry times should be set by Client

We should support that different client or user have different ApplicationMaster retry times. It also say that "yarn.resourcemanager.am.max-retries" should be set by client.
YARN-377. Minor bug reported by Tsz Wo (Nicholas), SZE and fixed by Chris Nauroth
Fix TestContainersMonitor for HADOOP-9252

HADOOP-9252 slightly changed the format of some StringUtils outputs. It caused TestContainersMonitor to fail. Also, some methods were deprecated by HADOOP-9252. The use of them should be replaced with the new methods.
YARN-376. Blocker bug reported by Jason Lowe and fixed by Jason Lowe (resourcemanager)
Apps that have completed can appear as RUNNING on the NM UI

On a busy cluster we've noticed a growing number of applications appear as RUNNING on a nodemanager web pages but the applications have long since finished. Looking at the NM logs, it appears the RM never told the nodemanager that the application had finished. This is also reflected in a jstack of the NM process, since many more log aggregation threads are running then one would expect from the number of actively running applications.
YARN-369. Major sub-task reported by Hitesh Shah and fixed by Mayank Bansal (resourcemanager)
Handle ( or throw a proper error when receiving) status updates from application masters that have not registered

Currently, an allocate call from an unregistered application is allowed and the status update for it throws a statemachine error that is silently dropped. org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: STATUS_UPDATE at LAUNCHED at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:445) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:588) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:99) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:471) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:452) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:130) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:77) at java.lang.Thread.run(Thread.java:680) ApplicationMasterService should likely throw an appropriate error for applications' requests that should not be handled in such cases.
YARN-368. Trivial bug reported by Albert Chu and fixed by Albert Chu
Fix typo "defiend" should be "defined" in error output

Noticed the following in an error log output while doing some experiements ./1066018/nodes/hyperion987/log/yarn-achu-nodemanager-hyperion987.out:java.lang.RuntimeException: No class defiend for uda.shuffle "defiend" should be "defined"
YARN-365. Major sub-task reported by Siddharth Seth and fixed by Xuan Gong (resourcemanager , scheduler)
Each NM heartbeat should not generate an event for the Scheduler

Follow up from YARN-275 https://issues.apache.org/jira/secure/attachment/12567075/Prototype.txt
YARN-363. Major bug reported by Jason Lowe and fixed by Kenji Kikushima
yarn proxyserver fails to find webapps/proxy directory on startup

Starting up the proxy server fails with this error: {noformat} 2013-01-29 17:37:41,357 FATAL webproxy.WebAppProxy (WebAppProxy.java:start(99)) - Could not start proxy web server java.io.FileNotFoundException: webapps/proxy not found in CLASSPATH at org.apache.hadoop.http.HttpServer.getWebAppsPath(HttpServer.java:533) at org.apache.hadoop.http.HttpServer.<init>(HttpServer.java:225) at org.apache.hadoop.http.HttpServer.<init>(HttpServer.java:164) at org.apache.hadoop.yarn.server.webproxy.WebAppProxy.start(WebAppProxy.java:90) at org.apache.hadoop.yarn.service.CompositeService.start(CompositeService.java:68) at org.apache.hadoop.yarn.server.webproxy.WebAppProxyServer.main(WebAppProxyServer.java:94) {noformat}
YARN-362. Minor bug reported by Jason Lowe and fixed by Ravi Prakash
Unexpected extra results when using webUI table search

When using the search box on the web UI to search for a specific task number (e.g.: "0831"), sometimes unexpected extra results are shown. Using the web browser's built-in search-within-page does not show any hits, so these look like completely spurious results. It looks like the raw timestamp value for time columns, which is not shown in the table, is also being searched with the search box.
YARN-347. Major improvement reported by Junping Du and fixed by Junping Du (client)
YARN CLI should show CPU info besides memory info in node status

With YARN-2 checked in, CPU info are taken into consideration in resource scheduling. yarn node -status <NodeID> should show CPU used and capacity info as memory info.
YARN-345. Critical bug reported by Devaraj K and fixed by Robert Parker (nodemanager)
Many InvalidStateTransitonException errors for ApplicationImpl in Node Manager

{code:xml} org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: FINISH_APPLICATION at FINISHED at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443) at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:398) at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:58) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:520) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:512) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75) at java.lang.Thread.run(Thread.java:662) {code} {code:xml} 2013-01-17 04:03:46,726 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Can't handle this event at current state org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: FINISH_APPLICATION at APPLICATION_RESOURCES_CLEANINGUP at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443) at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:398) at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:58) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:520) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:512) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75) at java.lang.Thread.run(Thread.java:662) {code} {code:xml} 2013-01-17 00:01:11,006 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Can't handle this event at current state org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: FINISH_APPLICATION at FINISHING_CONTAINERS_WAIT at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443) at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:398) at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:58) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:520) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:512) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75) at java.lang.Thread.run(Thread.java:662) {code} {code:xml} 2013-01-17 10:56:36,975 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1358385982671_1304_01_000001 transitioned from NEW to DONE 2013-01-17 10:56:36,975 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Can't handle this event at current state org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: APPLICATION_CONTAINER_FINISHED at FINISHED at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443) at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:398) at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:58) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:520) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:512) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75) at java.lang.Thread.run(Thread.java:662) 2013-01-17 10:56:36,975 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Application application_1358385982671_1304 transitioned from FINISHED to null {code} {code:xml} 2013-01-17 10:56:36,026 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Can't handle this event at current state org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: INIT_CONTAINER at FINISHED at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443) at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:398) at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:58) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:520) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:512) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75) at java.lang.Thread.run(Thread.java:662) 2013-01-17 10:56:36,026 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Application application_1358385982671_1304 transitioned from FINISHED to null {code}
YARN-333. Major bug reported by Sandy Ryza and fixed by Sandy Ryza
Schedulers cannot control the queue-name of an application

Currently, if an app is submitted without a queue, RMAppManager sets the RMApp's queue to "default". A scheduler may wish to make its own decision on which queue to place an app in if none is specified. For example, when the fair scheduler user-as-default-queue config option is set to true, and an app is submitted with no queue specified, the fair scheduler should assign the app to a queue with the user's name.
YARN-326. Major new feature reported by Sandy Ryza and fixed by Sandy Ryza (scheduler)
Add multi-resource scheduling to the fair scheduler

With YARN-2 in, the capacity scheduler has the ability to schedule based on multiple resources, using dominant resource fairness. The fair scheduler should be able to do multiple resource scheduling as well, also using dominant resource fairness. More details to come on how the corner cases with fair scheduler configs such as min and max resources will be handled.
YARN-319. Major bug reported by shenhong and fixed by shenhong (resourcemanager , scheduler)
Submit a job to a queue that not allowed in fairScheduler, client will hold forever.

RM use fairScheduler, when client submit a job to a queue, but the queue do not allow the user to submit job it, in this case, client will hold forever.
YARN-309. Major sub-task reported by Xuan Gong and fixed by Xuan Gong (resourcemanager)
Make RM provide heartbeat interval to NM
YARN-297. Major improvement reported by Arun C Murthy and fixed by Xuan Gong
Improve hashCode implementations for PB records

As [~hsn] pointed out in YARN-2, we use very small primes in all our hashCode implementations.
YARN-295. Major sub-task reported by Devaraj K and fixed by Mayank Bansal (resourcemanager)
Resource Manager throws InvalidStateTransitonException: Invalid event: CONTAINER_FINISHED at ALLOCATED for RMAppAttemptImpl

{code:xml} 2012-12-28 14:03:56,956 ERROR org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: Can't handle this event at current state org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: CONTAINER_FINISHED at ALLOCATED at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:490) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:80) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:433) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:414) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75) at java.lang.Thread.run(Thread.java:662) {code}
YARN-289. Major bug reported by Sandy Ryza and fixed by Sandy Ryza
Fair scheduler allows reservations that won't fit on node

An application requests a container with 1024 MB. It then requests a container with 2048 MB. A node shows up with 1024 MB available. Even if the application is the only one running, neither request will be scheduled on it.
YARN-269. Major bug reported by Thomas Graves and fixed by Jason Lowe (resourcemanager)
Resource Manager not logging the health_check_script result when taking it out

The Resource Manager not logging the health_check_script result when taking it out. This was added to jobtracker in 1.x with MAPREDUCE-2451, we should do the same thing for RM.
YARN-249. Major improvement reported by Ravi Prakash and fixed by Ravi Prakash (capacityscheduler)
Capacity Scheduler web page should show list of active users per queue like it used to (in 1.x)

On the jobtracker, the web ui showed the active users for each queue and how much resources each of those users were using. That currently isn't being displayed on the RM capacity scheduler web ui.
YARN-237. Major improvement reported by Ravi Prakash and fixed by Jian He (resourcemanager)
Refreshing the RM page forgets how many rows I had in my Datatables

If I choose a 100 rows, and then refresh the page, DataTables goes back to showing me 20 rows. This user preference should be stored in a cookie.
YARN-236. Major bug reported by Jason Lowe and fixed by Jason Lowe (resourcemanager)
RM should point tracking URL to RM web page when app fails to start

Similar to YARN-165, the RM should redirect the tracking URL to the specific app page on the RM web UI when the application fails to start. For example, if the AM completely fails to start due to bad AM config or bad job config like invalid queuename, then the user gets the unhelpful "The requested application exited before setting a tracking URL". Usually the diagnostic string on the RM app page has something useful, so we might as well point there.
YARN-227. Major bug reported by Jason Lowe and fixed by Jason Lowe (resourcemanager)
Application expiration difficult to debug for end-users

When an AM attempt expires the AMLivelinessMonitor in the RM will kill the job and mark it as failed. However there are no diagnostic messages set for the application indicating that the application failed because of expiration. Even if the AM logs are examined, it's often not obvious that the application was externally killed. The only evidence of what happened to the application is currently in the RM logs, and those are often not accessible by users.
YARN-209. Major bug reported by Bikas Saha and fixed by Zhijie Shen (capacityscheduler)
Capacity scheduler doesn't trigger app-activation after adding nodes

Say application A is submitted but at that time it does not meet the bar for activation because of resource limit settings for applications. After that if more hardware is added to the system and the application becomes valid it still remains in pending state, likely forever. This might be rare to hit in real life because enough NM's heartbeat to the RM before applications can get submitted. But a change in settings or heartbeat interval might make it easier to repro. In RM restart scenarios, this will likely hit more if its implemented by re-playing events and re-submitting applications to the scheduler before the RPC to NM's is activated.
YARN-200. Major sub-task reported by Robert Joseph Evans and fixed by Ravi Prakash
yarn log does not output all needed information, and is in a binary format

yarn logs does not output attemptid, nodename, or container-id. Missing these makes it very difficult to look through the logs for failed containers and tie them back to actual tasks and task attempts. Also the output currently includes several binary characters. This is OK for being machine readable, but difficult for being human readable, or even for using standard tool like grep. The help message can also be more useful to users
YARN-198. Minor improvement reported by Ramgopal N and fixed by Jian He (nodemanager)
If we are navigating to Nodemanager UI from Resourcemanager,then there is not link to navigate back to Resource manager

If we are navigating to Nodemanager by clicking on the node link in RM,there is no link provided on the NM to navigate back to RM. If there is a link to navigate back to RM it would be good
YARN-196. Major bug reported by Ramgopal N and fixed by Xuan Gong (nodemanager)
Nodemanager should be more robust in handling connection failure to ResourceManager when a cluster is started

If NM is started before starting the RM ,NM is shutting down with the following error {code} ERROR org.apache.hadoop.yarn.service.CompositeService: Error starting services org.apache.hadoop.yarn.server.nodemanager.NodeManager org.apache.avro.AvroRuntimeException: java.lang.reflect.UndeclaredThrowableException at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.start(NodeStatusUpdaterImpl.java:149) at org.apache.hadoop.yarn.service.CompositeService.start(CompositeService.java:68) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.start(NodeManager.java:167) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:242) Caused by: java.lang.reflect.UndeclaredThrowableException at org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.registerNodeManager(ResourceTrackerPBClientImpl.java:66) at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.registerWithRM(NodeStatusUpdaterImpl.java:182) at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.start(NodeStatusUpdaterImpl.java:145) ... 3 more Caused by: com.google.protobuf.ServiceException: java.net.ConnectException: Call From HOST-10-18-52-230/10.18.52.230 to HOST-10-18-52-250:8025 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:131) at $Proxy23.registerNodeManager(Unknown Source) at org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.registerNodeManager(ResourceTrackerPBClientImpl.java:59) ... 5 more Caused by: java.net.ConnectException: Call From HOST-10-18-52-230/10.18.52.230 to HOST-10-18-52-250:8025 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:857) at org.apache.hadoop.ipc.Client.call(Client.java:1141) at org.apache.hadoop.ipc.Client.call(Client.java:1100) at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:128) ... 7 more Caused by: java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574) at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:659) at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:469) at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:563) at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:211) at org.apache.hadoop.ipc.Client.getConnection(Client.java:1247) at org.apache.hadoop.ipc.Client.call(Client.java:1117) ... 9 more 2012-01-16 15:04:13,336 WARN org.apache.hadoop.yarn.event.AsyncDispatcher: AsyncDispatcher thread interrupted java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:1899) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1934) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:76) at java.lang.Thread.run(Thread.java:619) 2012-01-16 15:04:13,337 INFO org.apache.hadoop.yarn.service.AbstractService: Service:Dispatcher is stopped. 2012-01-16 15:04:13,392 INFO org.mortbay.log: Stopped SelectChannelConnector@0.0.0.0:9999 2012-01-16 15:04:13,493 INFO org.apache.hadoop.yarn.service.AbstractService: Service:org.apache.hadoop.yarn.server.nodemanager.webapp.WebServer is stopped. 2012-01-16 15:04:13,493 INFO org.apache.hadoop.ipc.Server: Stopping server on 24290 2012-01-16 15:04:13,494 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server listener on 24290 2012-01-16 15:04:13,495 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server Responder 2012-01-16 15:04:13,496 INFO org.apache.hadoop.yarn.service.AbstractService: Service:org.apache.hadoop.yarn.server.nodemanager.containermanager.loghandler.NonAggregatingLogHandler is stopped. 2012-01-16 15:04:13,496 WARN org.apache.hadoop.yarn.event.AsyncDispatcher: AsyncDispatcher thread interrupted java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:1899) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1934) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:76) at java.lang.Thread.run(Thread.java:619) {code}
YARN-193. Major bug reported by Hitesh Shah and fixed by Zhijie Shen (resourcemanager)
Scheduler.normalizeRequest does not account for allocation requests that exceed maximumAllocation limits
YARN-142. Blocker task reported by Siddharth Seth and fixed by
[Umbrella] Cleanup YARN APIs w.r.t exceptions

Ref: MAPREDUCE-4067 All YARN APIs currently throw YarnRemoteException. 1) This cannot be extended in it's current form. 2) The RPC layer can throw IOExceptions. These end up showing up as UndeclaredThrowableExceptions.
YARN-125. Minor sub-task reported by Steve Loughran and fixed by Steve Loughran
Make Yarn Client service shutdown operations robust

Make the yarn client services more robust against being shut down while not started, or shutdown more than once, by null-checking fields before closing them, setting to null afterwards to prevent double-invocation. This is a subset of MAPREDUCE-3502
YARN-124. Minor sub-task reported by Steve Loughran and fixed by Steve Loughran
Make Yarn Node Manager services robust against shutdown

Add the nodemanager bits of MAPREDUCE-3502 to shut down the Nodemanager services. This is done by checking for fields being non-null before shutting down/closing etc, and setting the fields to null afterwards -to be resilient against re-entrancy. No tests other than manual review.
YARN-123. Minor sub-task reported by Steve Loughran and fixed by Steve Loughran
Make yarn Resource Manager services robust against shutdown

Split MAPREDUCE-3502 patches to make the RM code more resilient to being stopped more than once, or before started. This depends on MAPREDUCE-4014.
YARN-117. Major improvement reported by Steve Loughran and fixed by Steve Loughran
Enhance YARN service model

Having played the YARN service model, there are some issues that I've identified based on past work and initial use. This JIRA issue is an overall one to cover the issues, with solutions pushed out to separate JIRAs. h2. state model prevents stopped state being entered if you could not successfully start the service. In the current lifecycle you cannot stop a service unless it was successfully started, but * {{init()}} may acquire resources that need to be explicitly released * if the {{start()}} operation fails partway through, the {{stop()}} operation may be needed to release resources. *Fix:* make {{stop()}} a valid state transition from all states and require the implementations to be able to stop safely without requiring all fields to be non null. Before anyone points out that the {{stop()}} operations assume that all fields are valid; and if called before a {{start()}} they will NPE; MAPREDUCE-3431 shows that this problem arises today, MAPREDUCE-3502 is a fix for this. It is independent of the rest of the issues in this doc but it will aid making {{stop()}} execute from all states other than "stopped". MAPREDUCE-3502 is too big a patch and needs to be broken down for easier review and take up; this can be done with issues linked to this one. h2. AbstractService doesn't prevent duplicate state change requests. The {{ensureState()}} checks to verify whether or not a state transition is allowed from the current state are performed in the base {{AbstractService}} class -yet subclasses tend to call this *after* their own {{init()}}, {{start()}} & {{stop()}} operations. This means that these operations can be performed out of order, and even if the outcome of the call is an exception, all actions performed by the subclasses will have taken place. MAPREDUCE-3877 demonstrates this. This is a tricky one to address. In HADOOP-3128 I used a base class instead of an interface and made the {{init()}}, {{start()}} & {{stop()}} methods {{final}}. These methods would do the checks, and then invoke protected inner methods, {{innerStart()}}, {{innerStop()}}, etc. It should be possible to retrofit the same behaviour to everything that extends {{AbstractService}} -something that must be done before the class is considered stable (because once the lifecycle methods are declared final, all subclasses that are out of the source tree will need fixing by the respective developers. h2. AbstractService state change doesn't defend against race conditions. There's no concurrency locks on the state transitions. Whatever fix for wrong state calls is added should correct this to prevent re-entrancy, such as {{stop()}} being called from two threads. h2. Static methods to choreograph of lifecycle operations Helper methods to move things through lifecycles. init->start is common, stop-if-service!=null another. Some static methods can execute these, and even call {{stop()}} if {{init()}} raises an exception. These could go into a class {{ServiceOps}} in the same package. These can be used by those services that wrap other services, and help manage more robust shutdowns. h2. state transition failures are something that registered service listeners may wish to be informed of. When a state transition fails a {{RuntimeException}} can be thrown -and the service listeners are not informed as the notification point isn't reached. They may wish to know this, especially for management and diagnostics. *Fix:* extend {{ServiceStateChangeListener}} with a callback such as {{stateChangeFailed(Service service,Service.State targeted-state, RuntimeException e)}} that is invoked from the (final) state change methods in the {{AbstractService}} class (once they delegate to their inner {{innerStart()}}, {{innerStop()}} methods; make a no-op on the existing implementations of the interface. h2. Service listener failures not handled Is this an error an error or not? Log and ignore may not be what is desired. *Proposed:* during {{stop()}} any exception by a listener is caught and discarded, to increase the likelihood of a better shutdown, but do not add try-catch clauses to the other state changes. h2. Support static listeners for all AbstractServices Add support to {{AbstractService}} that allow callers to register listeners for all instances. The existing listener interface could be used. This allows management tools to hook into the events. The static listeners would be invoked for all state changes except creation (base class shouldn't be handing out references to itself at this point). These static events could all be async, pushed through a shared {{ConcurrentLinkedQueue}}; failures logged at warn and the rest of the listeners invoked. h2. Add some example listeners for management/diagnostics * event to commons log for humans. * events for machines hooked up to the JSON logger. * for testing: something that be told to fail. h2. Services should support signal interruptibility The services would benefit from a way of shutting them down on a kill signal; this can be done via a runtime hook. It should not be automatic though, as composite services will get into a very complex state during shutdown. Better to provide a hook that lets you register/unregister services to terminate, and have the relevant {{main()}} entry points tell their root services to register themselves.
YARN-112. Major sub-task reported by Jason Lowe and fixed by Omkar Vinit Joshi (nodemanager)
Race in localization can cause containers to fail

On one of our 0.23 clusters, I saw a case of two containers, corresponding to two map tasks of a MR job, that were launched almost simultaneously on the same node. It appears they both tried to localize job.jar and job.xml at the same time. One of the containers failed when it couldn't rename the temporary job.jar directory to its final name because the target directory wasn't empty. Shortly afterwards the second container failed because job.xml could not be found, presumably because the first container removed it when it cleaned up.
YARN-109. Major bug reported by Jason Lowe and fixed by Mayank Bansal (nodemanager)
.tmp file is not deleted for localized archives

When archives are localized they are initially created as a .tmp file and unpacked from that file. However the .tmp file is not deleted afterwards.
YARN-101. Minor bug reported by xieguiming and fixed by Xuan Gong (nodemanager)
If the heartbeat message loss, the nodestatus info of complete container will loss too.

see the red color: org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.java protected void startStatusUpdater() { new Thread("Node Status Updater") { @Override @SuppressWarnings("unchecked") public void run() { int lastHeartBeatID = 0; while (!isStopped) { // Send heartbeat try { synchronized (heartbeatMonitor) { heartbeatMonitor.wait(heartBeatInterval); } {color:red} // Before we send the heartbeat, we get the NodeStatus, // whose method removes completed containers. NodeStatus nodeStatus = getNodeStatus(); {color} nodeStatus.setResponseId(lastHeartBeatID); NodeHeartbeatRequest request = recordFactory .newRecordInstance(NodeHeartbeatRequest.class); request.setNodeStatus(nodeStatus); {color:red} // But if the nodeHeartbeat fails, we've already removed the containers away to know about it. We aren't handling a nodeHeartbeat failure case here. HeartbeatResponse response = resourceTracker.nodeHeartbeat(request).getHeartbeatResponse(); {color} if (response.getNodeAction() == NodeAction.SHUTDOWN) { LOG .info("Recieved SHUTDOWN signal from Resourcemanager as part of heartbeat," + " hence shutting down."); NodeStatusUpdaterImpl.this.stop(); break; } if (response.getNodeAction() == NodeAction.REBOOT) { LOG.info("Node is out of sync with ResourceManager," + " hence rebooting."); NodeStatusUpdaterImpl.this.reboot(); break; } lastHeartBeatID = response.getResponseId(); List<ContainerId> containersToCleanup = response .getContainersToCleanupList(); if (containersToCleanup.size() != 0) { dispatcher.getEventHandler().handle( new CMgrCompletedContainersEvent(containersToCleanup)); } List<ApplicationId> appsToCleanup = response.getApplicationsToCleanupList(); //Only start tracking for keepAlive on FINISH_APP trackAppsForKeepAlive(appsToCleanup); if (appsToCleanup.size() != 0) { dispatcher.getEventHandler().handle( new CMgrCompletedAppsEvent(appsToCleanup)); } } catch (Throwable e) { // TODO Better error handling. Thread can die with the rest of the // NM still running. LOG.error("Caught exception in status-updater", e); } } } }.start(); } private NodeStatus getNodeStatus() { NodeStatus nodeStatus = recordFactory.newRecordInstance(NodeStatus.class); nodeStatus.setNodeId(this.nodeId); int numActiveContainers = 0; List<ContainerStatus> containersStatuses = new ArrayList<ContainerStatus>(); for (Iterator<Entry<ContainerId, Container>> i = this.context.getContainers().entrySet().iterator(); i.hasNext();) { Entry<ContainerId, Container> e = i.next(); ContainerId containerId = e.getKey(); Container container = e.getValue(); // Clone the container to send it to the RM org.apache.hadoop.yarn.api.records.ContainerStatus containerStatus = container.cloneAndGetContainerStatus(); containersStatuses.add(containerStatus); ++numActiveContainers; LOG.info("Sending out status for container: " + containerStatus); {color:red} // Here is the part that removes the completed containers. if (containerStatus.getState() == ContainerState.COMPLETE) { // Remove i.remove(); {color} LOG.info("Removed completed container " + containerId); } } nodeStatus.setContainersStatuses(containersStatuses); LOG.debug(this.nodeId + " sending out status for " + numActiveContainers + " containers"); NodeHealthStatus nodeHealthStatus = this.context.getNodeHealthStatus(); nodeHealthStatus.setHealthReport(healthChecker.getHealthReport()); nodeHealthStatus.setIsNodeHealthy(healthChecker.isHealthy()); nodeHealthStatus.setLastHealthReportTime( healthChecker.getLastHealthReportTime()); if (LOG.isDebugEnabled()) { LOG.debug("Node's health-status : " + nodeHealthStatus.getIsNodeHealthy() + ", " + nodeHealthStatus.getHealthReport()); } nodeStatus.setNodeHealthStatus(nodeHealthStatus); List<ApplicationId> keepAliveAppIds = createKeepAliveApplicationList(); nodeStatus.setKeepAliveApplications(keepAliveAppIds); return nodeStatus; }
YARN-99. Major sub-task reported by Devaraj K and fixed by Omkar Vinit Joshi (nodemanager)
Jobs fail during resource localization when private distributed-cache hits unix directory limits

If we have multiple jobs which uses distributed cache with small size of files, the directory limit reaches before reaching the cache size and fails to create any directories in file cache. The jobs start failing with the below exception. {code:xml} java.io.IOException: mkdir of /tmp/nm-local-dir/usercache/root/filecache/1701886847734194975 failed at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:909) at org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:143) at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189) at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:706) at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:703) at org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2325) at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:703) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:147) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) {code} We should have a mechanism to clean the cache files if it crosses specified number of directories like cache size.
YARN-84. Minor improvement reported by Brandon Li and fixed by Brandon Li
Use Builder to get RPC server in YARN

In HADOOP-8736, a Builder is introduced to replace all the getServer() variants. This JIRA is the change in YARN.
YARN-71. Critical bug reported by Vinod Kumar Vavilapalli and fixed by Xuan Gong (nodemanager)
Ensure/confirm that the NodeManager cleans up local-dirs on restart

We have to make sure that NodeManagers cleanup their local files on restart. It may already be working like that in which case we should have tests validating this.
YARN-62. Major sub-task reported by Vinod Kumar Vavilapalli and fixed by Omkar Vinit Joshi
AM should not be able to abuse container tokens for repetitive container launches

Clone of YARN-51. ApplicationMaster should not be able to store container tokens and use the same set of tokens for repetitive container launches. The possibility of such abuse is there in the current code, for a duration of 1d+10mins, we need to fix this.
YARN-45. Major sub-task reported by Chris Douglas and fixed by Carlo Curino (resourcemanager)
Scheduler feedback to AM to release containers

The ResourceManager strikes a balance between cluster utilization and strict enforcement of resource invariants in the cluster. Individual allocations of containers must be reclaimed- or reserved- to restore the global invariants when cluster load shifts. In some cases, the ApplicationMaster can respond to fluctuations in resource availability without losing the work already completed by that task (MAPREDUCE-4584). Supplying it with this information would be helpful for overall cluster utilization [1]. To this end, we want to establish a protocol for the RM to ask the AM to release containers. [1] http://research.yahoo.com/files/yl-2012-003.pdf
YARN-24. Major bug reported by Jason Lowe and fixed by Sandy Ryza (nodemanager)
Nodemanager fails to start if log aggregation enabled and namenode unavailable

If log aggregation is enabled and the namenode is currently unavailable, the nodemanager fails to startup.
MAPREDUCE-5421. Blocker bug reported by Junping Du and fixed by Junping Du (test)
TestNonExistentJob is failed due to recent changes in YARN
MAPREDUCE-5419. Major bug reported by Robert Parker and fixed by Robert Parker (mrv2)
TestSlive is getting FileNotFound Exception
MAPREDUCE-5412. Major bug reported by Jian He and fixed by Jian He
Change MR to use multiple containers API of ContainerManager after YARN-926
MAPREDUCE-5398. Major improvement reported by Bikas Saha and fixed by Jian He
MR changes for YARN-513
MAPREDUCE-5366. Minor bug reported by Chuan Liu and fixed by Chuan Liu (test)
TestMRAsyncDiskService fails on Windows
MAPREDUCE-5360. Minor bug reported by Chuan Liu and fixed by Chuan Liu (test)
TestMRJobClient fails on Windows due to path format
MAPREDUCE-5359. Minor bug reported by Chuan Liu and fixed by Chuan Liu
JobHistory should not use File.separator to match timestamp in path
MAPREDUCE-5357. Minor bug reported by Chuan Liu and fixed by Chuan Liu
Job staging directory owner checking could fail on Windows
MAPREDUCE-5355. Minor bug reported by Chuan Liu and fixed by Chuan Liu
MiniMRYarnCluster with localFs does not work on Windows
MAPREDUCE-5349. Minor bug reported by Chuan Liu and fixed by Chuan Liu
TestClusterMapReduceTestCase and TestJobName fail on Windows in branch-2
MAPREDUCE-5334. Blocker bug reported by Vinod Kumar Vavilapalli and fixed by Vinod Kumar Vavilapalli
TestContainerLauncherImpl is failing
MAPREDUCE-5333. Major test reported by Alejandro Abdelnur and fixed by Wei Yan (mr-am)
Add test that verifies MRAM works correctly when sending requests with non-normalized capabilities
MAPREDUCE-5328. Major bug reported by Omkar Vinit Joshi and fixed by Omkar Vinit Joshi
ClientToken should not be set in the environment
MAPREDUCE-5326. Blocker bug reported by Arun C Murthy and fixed by Zhijie Shen
Add version to shuffle header
MAPREDUCE-5325. Major bug reported by Xuan Gong and fixed by Xuan Gong
ClientRMProtocol.getAllApplications should accept ApplicationType as a parameter---MR changes
MAPREDUCE-5319. Major bug reported by yeshavora and fixed by Xuan Gong
Job.xml file does not has 'user.name' property for Hadoop2
MAPREDUCE-5315. Critical bug reported by Mithun Radhakrishnan and fixed by Mithun Radhakrishnan (distcp)
DistCp reports success even on failure.
MAPREDUCE-5312. Major bug reported by Alejandro Abdelnur and fixed by Sandy Ryza
TestRMNMInfo is failing
MAPREDUCE-5310. Major bug reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur (applicationmaster)
MRAM should not normalize allocation request capabilities
MAPREDUCE-5308. Major bug reported by Nathan Roberts and fixed by Nathan Roberts
Shuffling to memory can get out-of-sync when fetching multiple compressed map outputs
MAPREDUCE-5304. Blocker sub-task reported by Alejandro Abdelnur and fixed by Karthik Kambatla
mapreduce.Job killTask/failTask/getTaskCompletionEvents methods have incompatible signature changes
MAPREDUCE-5303. Major bug reported by Jian He and fixed by Jian He
Changes on MR after moving ProtoBase to package impl.pb on YARN-724
MAPREDUCE-5301. Major bug reported by Siddharth Seth and fixed by Siddharth Seth
Update MR code to work with YARN-635 changes
MAPREDUCE-5300. Major sub-task reported by Zhijie Shen and fixed by Zhijie Shen
Two function signature changes in filecache.DistributedCache
MAPREDUCE-5299. Major sub-task reported by Zhijie Shen and fixed by Zhijie Shen
Mapred API: void setTaskID(TaskAttemptID) is missing in TaskCompletionEvent
MAPREDUCE-5298. Major new feature reported by Steve Loughran and fixed by Steve Loughran (applicationmaster)
Move MapReduce services to YARN-117 stricter lifecycle
MAPREDUCE-5297. Major bug reported by Jian He and fixed by Jian He
Update MR App since BuilderUtils is moved to yarn-server-common after YARN-748
MAPREDUCE-5296. Major sub-task reported by Zhijie Shen and fixed by Zhijie Shen
Mapred API: Function signature change in JobControl
MAPREDUCE-5291. Major bug reported by Siddharth Seth and fixed by Zhijie Shen
Change MR App to use update property names in container-log4j.properties
MAPREDUCE-5289. Major bug reported by Vinod Kumar Vavilapalli and fixed by Jian He
Update MR App to use Token directly after YARN-717
MAPREDUCE-5286. Major task reported by Siddharth Seth and fixed by Vinod Kumar Vavilapalli
startContainer call should use the ContainerToken instead of Container [YARN-684]
MAPREDUCE-5285. Major bug reported by Jian He and fixed by
Update MR App to use immutable ApplicationAttemptID, ContainerID, NodeID after YARN-735
MAPREDUCE-5283. Major improvement reported by Sandy Ryza and fixed by Sandy Ryza (applicationmaster , test)
Over 10 different tests have near identical implementations of AppContext
MAPREDUCE-5282. Major bug reported by Vinod Kumar Vavilapalli and fixed by Siddharth Seth
Update MR App to use immutable ApplicationID after YARN-716
MAPREDUCE-5280. Major sub-task reported by Zhijie Shen and fixed by Mayank Bansal
Mapreduce API: ClusterMetrics incompatibility issues with MR1
MAPREDUCE-5275. Major sub-task reported by Zhijie Shen and fixed by Mayank Bansal
Mapreduce API: TokenCache incompatibility issues with MR1
MAPREDUCE-5274. Major sub-task reported by Zhijie Shen and fixed by Mayank Bansal
Mapreduce API: String toHex(byte[]) is removed from SecureShuffleUtils
MAPREDUCE-5273. Major sub-task reported by Zhijie Shen and fixed by Mayank Bansal
Protected variables are removed from CombineFileRecordReader in both mapred and mapreduce
MAPREDUCE-5270. Major bug reported by Jian He and fixed by Jian He
Migrate from using BuilderUtil factory methods to individual record factory method on MapReduce side
MAPREDUCE-5268. Major improvement reported by Jason Lowe and fixed by Karthik Kambatla (jobhistoryserver)
Improve history server startup performance
MAPREDUCE-5263. Major sub-task reported by Zhijie Shen and fixed by Zhijie Shen
filecache.DistributedCache incompatiblity issues with MR1
MAPREDUCE-5259. Major bug reported by Ivan Mitic and fixed by Ivan Mitic (test)
TestTaskLog fails on Windows because of path separators missmatch
MAPREDUCE-5257. Major bug reported by Jason Lowe and fixed by Omkar Vinit Joshi (mr-am , mrv2)
TestContainerLauncherImpl fails
MAPREDUCE-5246. Major improvement reported by Mayank Bansal and fixed by Mayank Bansal
Adding application type to submission context
MAPREDUCE-5245. Major sub-task reported by Zhijie Shen and fixed by Zhijie Shen
A number of public static variables are removed from JobConf
MAPREDUCE-5244. Major sub-task reported by Zhijie Shen and fixed by Zhijie Shen
Two functions changed their visibility in JobStatus
MAPREDUCE-5240. Blocker bug reported by Roman Shaposhnik and fixed by Vinod Kumar Vavilapalli (mrv2)
inside of FileOutputCommitter the initialized Credentials cache appears to be empty
MAPREDUCE-5239. Major bug reported by Vinod Kumar Vavilapalli and fixed by Siddharth Seth
Update MR App to reflect YarnRemoteException changes after YARN-634
MAPREDUCE-5237. Major sub-task reported by Zhijie Shen and fixed by Zhijie Shen
ClusterStatus incompatiblity issues with MR1
MAPREDUCE-5235. Major sub-task reported by Zhijie Shen and fixed by Mayank Bansal
mapred.Counters incompatiblity issues with MR1
MAPREDUCE-5234. Major sub-task reported by Zhijie Shen and fixed by Mayank Bansal
Signature changes for getTaskId of TaskReport in mapred
MAPREDUCE-5233. Major sub-task reported by Zhijie Shen and fixed by Mayank Bansal
Functions are changed or removed from Job in jobcontrol
MAPREDUCE-5231. Major sub-task reported by Zhijie Shen and fixed by Zhijie Shen
Constructor of DBInputFormat.DBRecordReader in mapred is changed
MAPREDUCE-5230. Major sub-task reported by Zhijie Shen and fixed by Mayank Bansal
createFileSplit is removed from NLineInputFormat of mapred
MAPREDUCE-5229. Major sub-task reported by Zhijie Shen and fixed by Zhijie Shen
TEMP_DIR_NAME is removed from of FileOutputCommitter of mapreduce
MAPREDUCE-5228. Major sub-task reported by Zhijie Shen and fixed by Mayank Bansal
Enum Counter is removed from FileInputFormat and FileOutputFormat of both mapred and mapreduce
MAPREDUCE-5226. Major bug reported by Xuan Gong and fixed by Xuan Gong
Handle exception related changes in YARN's AMRMProtocol api after YARN-630
MAPREDUCE-5222. Major sub-task reported by Karthik Kambatla and fixed by Karthik Kambatla
Fix JobClient incompatibilities with MR1
MAPREDUCE-5220. Major sub-task reported by Sandy Ryza and fixed by Zhijie Shen (client)
Mapred API: TaskCompletionEvent incompatibility issues with MR1
MAPREDUCE-5213. Minor bug reported by Karthik Kambatla and fixed by Karthik Kambatla
Re-assess TokenCache methods marked @Private
MAPREDUCE-5212. Major bug reported by Xuan Gong and fixed by Xuan Gong
Handle exception related changes in YARN's ClientRMProtocol api after YARN-631
MAPREDUCE-5209. Minor bug reported by Radim Kolar and fixed by Tsuyoshi OZAWA (mrv2)
ShuffleScheduler log message incorrect
MAPREDUCE-5208. Major bug reported by Omkar Vinit Joshi and fixed by Omkar Vinit Joshi
SpillRecord and ShuffleHandler should use SecureIOUtils for reading index file and map output
MAPREDUCE-5205. Blocker bug reported by Vinod Kumar Vavilapalli and fixed by Vinod Kumar Vavilapalli
Apps fail in secure cluster setup
MAPREDUCE-5204. Major bug reported by Xuan Gong and fixed by Xuan Gong
Handle YarnRemoteException separately from IOException in MR api
MAPREDUCE-5199. Blocker sub-task reported by Vinod Kumar Vavilapalli and fixed by Daryn Sharp (security)
AppTokens file can/should be removed
MAPREDUCE-5194. Minor task reported by Chris Douglas and fixed by Chris Douglas (task)
Heed interrupts during Fetcher shutdown
MAPREDUCE-5193. Major bug reported by Aaron T. Myers and fixed by Andrew Wang (test)
A few MR tests use block sizes which are smaller than the default minimum block size
MAPREDUCE-5192. Minor task reported by Chris Douglas and fixed by Chris Douglas (task)
Separate TCE resolution from fetch
MAPREDUCE-5191. Major bug reported by Ivan Mitic and fixed by Ivan Mitic
TestQueue#testQueue fails with timeout on Windows
MAPREDUCE-5187. Major bug reported by Chuan Liu and fixed by Chuan Liu (mrv2)
Create mapreduce command scripts on Windows
MAPREDUCE-5184. Major sub-task reported by Arun C Murthy and fixed by Zhijie Shen (documentation)
Document MR Binary Compatibility vis-a-vis hadoop-1 and hadoop-2

Document MR Binary Compatibility vis-a-vis hadoop-1 and hadoop-2 for end-users.
MAPREDUCE-5181. Major bug reported by Siddharth Seth and fixed by Vinod Kumar Vavilapalli (applicationmaster)
RMCommunicator should not use AMToken from the env
MAPREDUCE-5179. Major bug reported by Hitesh Shah and fixed by Hitesh Shah
Change TestHSWebServices to do string equal check on hadoop build version similar to YARN-605
MAPREDUCE-5178. Major bug reported by Hitesh Shah and fixed by Hitesh Shah
Fix use of BuilderUtils#newApplicationReport as a result of YARN-577.
MAPREDUCE-5177. Major bug reported by Ivan Mitic and fixed by Ivan Mitic
Move to common utils FileUtil#setReadable/Writable/Executable and FileUtil#canRead/Write/Execute
MAPREDUCE-5176. Major improvement reported by Carlo Curino and fixed by Carlo Curino (mrv2)
Preemptable annotations (to support preemption in MR)
MAPREDUCE-5175. Major bug reported by Vinod Kumar Vavilapalli and fixed by Xuan Gong
Update MR App to not set envs that will be set by NMs anyways after YARN-561
MAPREDUCE-5171. Major improvement reported by Sandy Ryza and fixed by Sandy Ryza (applicationmaster)
Expose blacklisted nodes from the MR AM REST API
MAPREDUCE-5167. Major bug reported by Vinod Kumar Vavilapalli and fixed by Jian He
Update MR App after YARN-562
MAPREDUCE-5166. Blocker bug reported by Gunther Hagleitner and fixed by Sandy Ryza
ConcurrentModificationException in LocalJobRunner
MAPREDUCE-5163. Major bug reported by Vinod Kumar Vavilapalli and fixed by Xuan Gong
Update MR App after YARN-441
MAPREDUCE-5159. Major sub-task reported by Zhijie Shen and fixed by Zhijie Shen
Aggregatewordcount and aggregatewordhist in hadoop-1 examples are not binary compatible with hadoop-2 mapred.lib.aggregate
MAPREDUCE-5157. Major sub-task reported by Zhijie Shen and fixed by Zhijie Shen
Sort in hadoop-1 examples is not binary compatible with hadoop-2 mapred.lib
MAPREDUCE-5156. Blocker sub-task reported by Zhijie Shen and fixed by Zhijie Shen
Hadoop-examples-1.x.x.jar cannot run on Yarn
MAPREDUCE-5152. Major bug reported by Vinod Kumar Vavilapalli and fixed by Vinod Kumar Vavilapalli
MR App is not using Container from RM
MAPREDUCE-5151. Major bug reported by Vinod Kumar Vavilapalli and fixed by Sandy Ryza
Update MR App after YARN-444
MAPREDUCE-5147. Major bug reported by Robert Parker and fixed by Robert Parker (mrv2)
Maven build should create hadoop-mapreduce-client-app-VERSION.jar directly
MAPREDUCE-5146. Minor bug reported by Sangjin Lee and fixed by Sangjin Lee (task)
application classloader may be used too early to load classes
MAPREDUCE-5145. Major bug reported by Zhijie Shen and fixed by Zhijie Shen
Change default max-attempts to be more than one for MR jobs as well
MAPREDUCE-5140. Major bug reported by Zhijie Shen and fixed by Zhijie Shen
MR part of YARN-514
MAPREDUCE-5139. Major bug reported by Vinod Kumar Vavilapalli and fixed by Xuan Gong
Update MR App after YARN-486
MAPREDUCE-5138. Major bug reported by Vinod Kumar Vavilapalli and fixed by Omkar Vinit Joshi
Fix LocalDistributedCacheManager after YARN-112
MAPREDUCE-5137. Major bug reported by Thomas Graves and fixed by Thomas Graves (applicationmaster)
AM web UI: clicking on Map Task results in 500 error
MAPREDUCE-5136. Major bug reported by Amir Sanjar and fixed by Amir Sanjar
TestJobImpl->testJobNoTasks fails with IBM JAVA
MAPREDUCE-5129. Minor new feature reported by Billie Rinaldi and fixed by Billie Rinaldi
Add tag info to JH files
MAPREDUCE-5128. Major improvement reported by Sandy Ryza and fixed by Sandy Ryza (documentation , jobhistoryserver)
mapred-default.xml is missing a bunch of history server configs
MAPREDUCE-5113. Major bug reported by Sandy Ryza and fixed by Sandy Ryza
Streaming input/output types are ignored with java mapper/reducer
MAPREDUCE-5098. Major bug reported by Karthik Kambatla and fixed by Karthik Kambatla (contrib/gridmix)
Fix findbugs warnings in gridmix
MAPREDUCE-5086. Major bug reported by Jian He and fixed by Jian He
MR app master deletes staging dir when sent a reboot command from the RM
MAPREDUCE-5079. Critical improvement reported by Jason Lowe and fixed by Jason Lowe (mr-am)
Recovery should restore task state from job history info directly
MAPREDUCE-5078. Major bug reported by Chris Nauroth and fixed by Chris Nauroth (client)
TestMRAppMaster fails on Windows due to mismatched path separators
MAPREDUCE-5077. Minor bug reported by Karthik Kambatla and fixed by Karthik Kambatla (mrv2)
Cleanup: mapreduce.util.ResourceCalculatorPlugin and related code should be removed
MAPREDUCE-5075. Major bug reported by Chris Nauroth and fixed by Chris Nauroth (distcp)
DistCp leaks input file handles
MAPREDUCE-5069. Minor improvement reported by Sangjin Lee and fixed by (mrv1 , mrv2)
add concrete common implementations of CombineFileInputFormat
MAPREDUCE-5066. Major bug reported by Ivan Mitic and fixed by Ivan Mitic
JobTracker should set a timeout when calling into job.end.notification.url
MAPREDUCE-5065. Major bug reported by Mithun Radhakrishnan and fixed by Mithun Radhakrishnan (distcp)
DistCp should skip checksum comparisons if block-sizes are different on source/target.
MAPREDUCE-5062. Major bug reported by Vinod Kumar Vavilapalli and fixed by Zhijie Shen
MR AM should read max-retries information from the RM
MAPREDUCE-5060. Critical bug reported by Robert Joseph Evans and fixed by Robert Joseph Evans
Fetch failures that time out only count against the first map task
MAPREDUCE-5059. Major bug reported by Jason Lowe and fixed by Omkar Vinit Joshi (jobhistoryserver , webapps)
Job overview shows average merge time larger than for any reduce attempt
MAPREDUCE-5043. Blocker bug reported by Jason Lowe and fixed by Jason Lowe (mr-am)
Fetch failure processing can cause AM event queue to backup and eventually OOM
MAPREDUCE-5042. Blocker bug reported by Jason Lowe and fixed by Jason Lowe (mr-am , security)
Reducer unable to fetch for a map task that was recovered
MAPREDUCE-5033. Minor improvement reported by Andrew Wang and fixed by Andrew Wang
mapred shell script should respect usage flags (--help -help -h)
MAPREDUCE-5027. Major bug reported by Jason Lowe and fixed by Robert Parker
Shuffle does not limit number of outstanding connections
MAPREDUCE-5015. Major test reported by Aleksey Gorshkov and fixed by Aleksey Gorshkov
Coverage fix for org.apache.hadoop.mapreduce.tools.CLI
MAPREDUCE-5013. Major bug reported by Sandy Ryza and fixed by Sandy Ryza (client)
mapred.JobStatus compatibility: MR2 missing constructors from MR1
MAPREDUCE-5009. Critical bug reported by Robert Parker and fixed by Robert Parker (mrv1)
Killing the Task Attempt slated for commit does not clear the value from the Task commitAttempt member
MAPREDUCE-5008. Major bug reported by Sandy Ryza and fixed by Sandy Ryza
Merger progress miscounts with respect to EOF_MARKER
MAPREDUCE-5007. Major test reported by Aleksey Gorshkov and fixed by Aleksey Gorshkov
fix coverage org.apache.hadoop.mapreduce.v2.hs
MAPREDUCE-5000. Critical bug reported by Jason Lowe and fixed by Jason Lowe (mr-am)
TaskImpl.getCounters() can return the counters for the wrong task attempt when task is speculating
MAPREDUCE-4994. Major bug reported by Sandy Ryza and fixed by Sandy Ryza (client)
-jt generic command line option does not work
MAPREDUCE-4992. Critical bug reported by Robert Parker and fixed by Robert Parker (mr-am)
AM hangs in RecoveryService when recovering tasks with speculative attempts
MAPREDUCE-4991. Major test reported by Aleksey Gorshkov and fixed by Aleksey Gorshkov
coverage for gridmix
MAPREDUCE-4990. Trivial improvement reported by Karthik Kambatla and fixed by Karthik Kambatla
Construct debug strings conditionally in ShuffleHandler.Shuffle#sendMapOutput()
MAPREDUCE-4989. Major improvement reported by Ravi Prakash and fixed by Ravi Prakash (jobhistoryserver , mr-am)
JSONify DataTables input data for Attempts page
MAPREDUCE-4987. Major bug reported by Chris Nauroth and fixed by Chris Nauroth (distributed-cache , nodemanager)
TestMRJobs#testDistributedCache fails on Windows due to classpath problems and unexpected behavior of symlinks
MAPREDUCE-4985. Trivial bug reported by Plamen Jeliazkov and fixed by Plamen Jeliazkov
TestDFSIO supports compression but usages doesn't reflect
MAPREDUCE-4981. Minor bug reported by Plamen Jeliazkov and fixed by Plamen Jeliazkov
WordMean, WordMedian, WordStandardDeviation missing from ExamplesDriver
MAPREDUCE-4974. Major improvement reported by Arun A K and fixed by Gelesh (mrv1 , mrv2 , performance)
Optimising the LineRecordReader initialize() method
MAPREDUCE-4972. Major test reported by Aleksey Gorshkov and fixed by Aleksey Gorshkov
Coverage fixing for org.apache.hadoop.mapreduce.jobhistory
MAPREDUCE-4951. Major bug reported by Sandy Ryza and fixed by Sandy Ryza (applicationmaster , mr-am , mrv2)
Container preemption interpreted as task failure
MAPREDUCE-4942. Major sub-task reported by Robert Kanter and fixed by Robert Kanter (mrv2)
mapreduce.Job has a bunch of methods that throw InterruptedException so its incompatible with MR1
MAPREDUCE-4932. Major bug reported by Robert Kanter and fixed by Robert Kanter (mrv2)
mapreduce.job#getTaskCompletionEvents incompatible with Hadoop 1
MAPREDUCE-4927. Major bug reported by Jason Lowe and fixed by Ashwin Shankar (jobhistoryserver)
Historyserver 500 error due to NPE when accessing specific counters page for failed job
MAPREDUCE-4898. Major bug reported by Robert Kanter and fixed by Robert Kanter (mrv2)
FileOutputFormat.checkOutputSpecs and FileOutputFormat.setOutputPath incompatible with MR1
MAPREDUCE-4896. Major bug reported by Sandy Ryza and fixed by Sandy Ryza (client , scheduler)
"mapred queue -info" spits out ugly exception when queue does not exist
MAPREDUCE-4892. Major bug reported by Bikas Saha and fixed by Bikas Saha
CombineFileInputFormat node input split can be skewed on small clusters
MAPREDUCE-4885. Major bug reported by Chris Nauroth and fixed by Chris Nauroth (contrib/streaming , test)
Streaming tests have multiple failures on Windows
MAPREDUCE-4875. Major test reported by Aleksey Gorshkov and fixed by Aleksey Gorshkov (test)
coverage fixing for org.apache.hadoop.mapred
MAPREDUCE-4871. Major bug reported by Jason Lowe and fixed by Jason Lowe (mrv2)
AM uses mapreduce.jobtracker.split.metainfo.maxsize but mapred-default has mapreduce.job.split.metainfo.maxsize
MAPREDUCE-4846. Major improvement reported by Sandy Ryza and fixed by Sandy Ryza (client)
Some JobQueueInfo methods are public in MR1 but protected in MR2
MAPREDUCE-4794. Major bug reported by Jason Lowe and fixed by Jason Lowe (applicationmaster)
DefaultSpeculator generates error messages on normal shutdown
MAPREDUCE-4737. Major bug reported by Daniel Dai and fixed by Arun C Murthy
Hadoop does not close output file / does not call Mapper.cleanup if exception in map

Ensure that mapreduce APIs are semantically consistent with mapred API w.r.t Mapper.cleanup and Reducer.cleanup; in the sense that cleanup is now called even if there is an error. The old mapred API already ensures that Mapper.close and Reducer.close are invoked during error handling. Note that it is an incompatible change, however end-users can override Mapper.run and Reducer.run to get the old (inconsistent) behaviour.
MAPREDUCE-4716. Major bug reported by Thomas Graves and fixed by Thomas Graves (jobhistoryserver)
TestHsWebServicesJobsQuery.testJobsQueryStateInvalid fails with jdk7
MAPREDUCE-4693. Major bug reported by Jason Lowe and fixed by Xuan Gong (jobhistoryserver , mrv2)
Historyserver should provide counters for failed tasks
MAPREDUCE-4671. Major bug reported by Bikas Saha and fixed by Bikas Saha
AM does not tell the RM about container requests that are no longer needed
MAPREDUCE-4571. Major bug reported by Thomas Graves and fixed by Thomas Graves (webapps)
TestHsWebServicesJobs fails on jdk7
MAPREDUCE-4374. Minor bug reported by Chuan Liu and fixed by Chuan Liu (mrv2)
Fix child task environment variable config and add support for Windows
MAPREDUCE-4356. Major bug reported by Ravi Gummadi and fixed by Ravi Gummadi (tools/rumen)
Provide access to ParsedTask.obtainTaskAttempts()

Made the method ParsedTask.obtainTaskAttempts() public.
MAPREDUCE-4149. Major bug reported by Ravi Gummadi and fixed by Ravi Gummadi (tools/rumen)
Rumen fails to parse certain counter strings

Fixes Rumen to parse counter strings containing the special characters "{" and "}".
MAPREDUCE-4100. Minor bug reported by Karam Singh and fixed by Amar Kamat (contrib/gridmix)
Sometimes gridmix emulates data larger much larger then acutal counter for map only jobs

Bug fixed in compression emulation feature for map only jobs.
MAPREDUCE-4087. Major bug reported by Ravi Gummadi and fixed by Ravi Gummadi
[Gridmix] GenerateDistCacheData job of Gridmix can become slow in some cases

Fixes the issue of GenerateDistCacheData job slowness.
MAPREDUCE-4083. Major bug reported by Karam Singh and fixed by Amar Kamat (contrib/gridmix)
GridMix emulated job tasks.resource-usage emulator for CPU usage throws NPE when Trace contains cumulativeCpuUsage value of 0 at attempt level

Fixes NPE in cpu emulation in Gridmix
MAPREDUCE-4067. Critical bug reported by Jitendra Nath Pandey and fixed by Xuan Gong
Replace YarnRemoteException with IOException in MRv2 APIs
MAPREDUCE-4019. Minor bug reported by B Anil Kumar and fixed by Ashwin Shankar (client)
-list-attempt-ids is not working
MAPREDUCE-3953. Major bug reported by Ravi Gummadi and fixed by Ravi Gummadi
Gridmix throws NPE and does not simulate a job if the trace contains null taskStatus for a task

Fixes NPE and makes Gridmix simulate succeeded-jobs-with-failed-tasks. All tasks of such simulated jobs(including the failed ones of original job) will succeed.
MAPREDUCE-3872. Major bug reported by Patrick Hunt and fixed by Robert Kanter (client , mrv2)
event handling races in ContainerLauncherImpl and TestContainerLauncher
MAPREDUCE-3829. Major bug reported by Ravi Gummadi and fixed by Ravi Gummadi (contrib/gridmix)
[Gridmix] Gridmix should give better error message when input-data directory already exists and -generate option is given

Makes Gridmix emit out correct error message when the input data directory already exists and -generate option is used. Makes Gridmix exit with proper exit codes when Gridmix fails in args-processing, startup/setup.
MAPREDUCE-3787. Major improvement reported by Amar Kamat and fixed by Amar Kamat (contrib/gridmix)
[Gridmix] Improve STRESS mode

JobMonitor can now deploy multiple threads for faster job-status polling. Use 'gridmix.job-monitor.thread-count' to set the number of threads. Stress mode now relies on the updates from the job monitor instead of polling for job status. Failures in job submission now get reported to the statistics module and ultimately reported to the user via summary.
MAPREDUCE-3757. Major bug reported by Ravi Gummadi and fixed by Ravi Gummadi (tools/rumen)
Rumen Folder is not adjusting the shuffleFinished and sortFinished times of reduce task attempts

Fixed the sortFinishTime and shuffleFinishTime adjustments in Rumen Folder.
MAPREDUCE-3685. Critical bug reported by anty.rao and fixed by anty (mrv2)
There are some bugs in implementation of MergeManager
MAPREDUCE-3533. Minor improvement reported by Steve Loughran and fixed by (mrv2)
have the service interface extend Closeable and use close() as its shutdown operation
MAPREDUCE-3502. Major task reported by Steve Loughran and fixed by Steve Loughran (mrv2)
Review all Service.stop() operations and make sure that they work before a service is started
MAPREDUCE-3008. Major sub-task reported by Amar Kamat and fixed by Amar Kamat (contrib/gridmix)
[Gridmix] Improve cumulative CPU usage emulation for short running tasks

Improves cumulative CPU emulation for short running tasks.
MAPREDUCE-2722. Major bug reported by Ravi Gummadi and fixed by Ravi Gummadi (contrib/gridmix)
Gridmix simulated job's map's hdfsBytesRead counter is wrong when compressed input is used

Makes Gridmix use the uncompressed input data size while simulating map tasks in the case where compressed input data was used in original job.
HDFS-5027. Major improvement reported by Aaron T. Myers and fixed by Aaron T. Myers (datanode)
On startup, DN should scan volumes in parallel
HDFS-5025. Major sub-task reported by Jing Zhao and fixed by Jing Zhao (ha , namenode)
Record ClientId and CallId in EditLog to enable rebuilding retry cache in case of HA failover
HDFS-5024. Major bug reported by Arpit Agarwal and fixed by Arpit Agarwal (namenode)
Make DatanodeProtocol#commitBlockSynchronization idempotent
HDFS-5020. Major improvement reported by Jing Zhao and fixed by Jing Zhao (namenode)
Make DatanodeProtocol#blockReceivedAndDeleted idempotent
HDFS-5018. Minor bug reported by Ted Yu and fixed by Ted Yu
Misspelled DFSConfigKeys#DFS_NAMENODE_STALE_DATANODE_INTERVAL_DEFAULT in javadoc of DatanodeInfo#isStale()
HDFS-5016. Blocker bug reported by Devaraj Das and fixed by Suresh Srinivas
Deadlock in pipeline recovery causes Datanode to be marked dead
HDFS-5010. Major improvement reported by Kihwal Lee and fixed by Kihwal Lee (namenode , performance)
Reduce the frequency of getCurrentUser() calls from namenode
HDFS-5008. Major improvement reported by Suresh Srinivas and fixed by Jing Zhao (namenode)
Make ClientProtocol#abandonBlock() idempotent
HDFS-5007. Minor improvement reported by Kousuke Saruta and fixed by Kousuke Saruta
Replace hard-coded property keys with DFSConfigKeys fields
HDFS-5005. Major bug reported by Jing Zhao and fixed by Jing Zhao
Move SnapshotException and SnapshotAccessControlException to o.a.h.hdfs.protocol
HDFS-5003. Minor bug reported by Xi Fang and fixed by Xi Fang (test)
TestNNThroughputBenchmark failed caused by existing directories
HDFS-4999. Major bug reported by Kihwal Lee and fixed by Colin Patrick McCabe
fix TestShortCircuitLocalRead on branch-2
HDFS-4998. Major bug reported by Kihwal Lee and fixed by Kihwal Lee (test)
TestUnderReplicatedBlocks fails intermittently
HDFS-4996. Minor improvement reported by Chris Nauroth and fixed by Chris Nauroth (namenode)
ClientProtocol#metaSave can be made idempotent by overwriting the output file instead of appending to it

The dfsadmin -metasave command has been changed to overwrite the output file. Previously, this command would append to the output file if it already existed.
HDFS-4992. Major improvement reported by Max Lapan and fixed by Max Lapan (balancer)
Make balancer's thread count configurable
HDFS-4982. Major bug reported by Todd Lipcon and fixed by Todd Lipcon (journal-node , security)
JournalNode should relogin from keytab before fetching logs from other JNs
HDFS-4980. Major bug reported by Mark Grover and fixed by Mark Grover (build)
Incorrect logging.properties file for hadoop-httpfs
HDFS-4979. Major sub-task reported by Suresh Srinivas and fixed by Suresh Srinivas (namenode)
Implement retry cache on the namenode
HDFS-4978. Major improvement reported by Jing Zhao and fixed by Jing Zhao
Make disallowSnapshot idempotent
HDFS-4974. Major sub-task reported by Suresh Srinivas and fixed by Suresh Srinivas (ha , namenode)
Analyze and add annotations to Namenode protocol methods and enable retry
HDFS-4969. Blocker bug reported by Robert Kanter and fixed by Robert Kanter (test , webhdfs)
WebhdfsFileSystem expects non-standard WEBHDFS Json element
HDFS-4954. Major bug reported by Brandon Li and fixed by Brandon Li (nfs)
compile failure in branch-2: getFlushedOffset should catch or rethrow IOException
HDFS-4951. Major bug reported by Robert Kanter and fixed by Robert Kanter (security)
FsShell commands using secure httpfs throw exceptions due to missing TokenRenewer
HDFS-4948. Major bug reported by Robert Joseph Evans and fixed by Brandon Li
mvn site for hadoop-hdfs-nfs fails
HDFS-4944. Major bug reported by Chris Nauroth and fixed by Chris Nauroth (webhdfs)
WebHDFS cannot create a file path containing characters that must be URI-encoded, such as space.
HDFS-4943. Minor bug reported by Jerry He and fixed by Jerry He (webhdfs)
WebHdfsFileSystem does not work when original file path has encoded chars
HDFS-4932. Minor improvement reported by Fengdong Yu and fixed by Fengdong Yu (ha , namenode)
Avoid a wide line on the name node webUI if we have more Journal nodes
HDFS-4927. Minor bug reported by Chris Nauroth and fixed by Chris Nauroth (test)
CreateEditsLog creates inodes with an invalid inode ID, which then cannot be loaded by a namenode.
HDFS-4917. Major bug reported by Fengdong Yu and fixed by Fengdong Yu (datanode , namenode)
Start-dfs.sh cannot pass the parameters correctly
HDFS-4914. Minor improvement reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (hdfs-client)
When possible, Use DFSClient.Conf instead of Configuration
HDFS-4912. Major improvement reported by Suresh Srinivas and fixed by Suresh Srinivas (namenode)
Cleanup FSNamesystem#startFileInternal
HDFS-4910. Major bug reported by Chuan Liu and fixed by Chuan Liu
TestPermission failed in branch-2
HDFS-4908. Major sub-task reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (namenode , snapshots)
Reduce snapshot inode memory usage
HDFS-4906. Major bug reported by Aaron T. Myers and fixed by Aaron T. Myers (hdfs-client)
HDFS Output streams should not accept writes after being closed
HDFS-4903. Minor improvement reported by Suresh Srinivas and fixed by Arpit Agarwal (namenode)
Print trash configuration and trash emptier state in namenode log
HDFS-4902. Major bug reported by Binglin Chang and fixed by Binglin Chang (snapshots)
DFSClient.getSnapshotDiffReport should use string path rather than o.a.h.fs.Path
HDFS-4888. Major bug reported by Ravi Prakash and fixed by Ravi Prakash
Refactor and fix FSNamesystem.getTurnOffTip to sanity
HDFS-4887. Blocker bug reported by Kihwal Lee and fixed by Kihwal Lee (benchmarks , test)
TestNNThroughputBenchmark exits abruptly
HDFS-4883. Major bug reported by Konstantin Shvachko and fixed by Tao Luo (namenode)
complete() should verify fileId
HDFS-4880. Major bug reported by Arpit Agarwal and fixed by Suresh Srinivas (namenode)
Diagnostic logging while loading name/edits files
HDFS-4878. Major bug reported by Tao Luo and fixed by Tao Luo (namenode)
On Remove Block, Block is not Removed from neededReplications queue
HDFS-4877. Blocker bug reported by Jing Zhao and fixed by Jing Zhao (snapshots)
Snapshot: fix the scenario where a directory is renamed under its prior descendant
HDFS-4876. Minor sub-task reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (snapshots)
The javadoc of FileWithSnapshot is incorrect
HDFS-4875. Minor sub-task reported by Tsz Wo (Nicholas), SZE and fixed by Arpit Agarwal (snapshots , test)
Add a test for testing snapshot file length
HDFS-4873. Major bug reported by Hari Mankude and fixed by Jing Zhao (snapshots)
callGetBlockLocations returns incorrect number of blocks for snapshotted files
HDFS-4867. Major bug reported by Kihwal Lee and fixed by Plamen Jeliazkov (namenode)
metaSave NPEs when there are invalid blocks in repl queue.
HDFS-4866. Blocker bug reported by Ralph Castain and fixed by Arpit Agarwal (namenode)
Protocol buffer support cannot compile under C

The Protocol Buffers definition of the inter-namenode protocol required a change for compatibility with compiled C clients. This is a backwards-incompatible change. A namenode prior to this change will not be able to communicate with a namenode after this change.
HDFS-4865. Major bug reported by Wei Yan and fixed by Wei Yan
Remove sub resource warning from httpfs log at startup time
HDFS-4863. Major bug reported by Jing Zhao and fixed by Jing Zhao (snapshots)
The root directory should be added to the snapshottable directory list while loading fsimage
HDFS-4862. Major bug reported by Ravi Prakash and fixed by Ravi Prakash
SafeModeInfo.isManual() returns true when resources are low even if it wasn't entered into manually
HDFS-4857. Major bug reported by Jing Zhao and fixed by Jing Zhao (snapshots)
Snapshot.Root and AbstractINodeDiff#snapshotINode should not be put into INodeMap when loading FSImage
HDFS-4850. Major bug reported by Stephen Chu and fixed by Jing Zhao (tools)
fix OfflineImageViewer to work on fsimages with empty files or snapshots
HDFS-4848. Minor improvement reported by Stephen Chu and fixed by Jing Zhao (snapshots)
copyFromLocal and renaming a file to ".snapshot" should output that ".snapshot" is a reserved name
HDFS-4846. Minor bug reported by Stephen Chu and fixed by Jing Zhao (snapshots)
Clean up snapshot CLI commands output stacktrace for invalid arguments
HDFS-4845. Critical bug reported by Kihwal Lee and fixed by Arpit Agarwal (namenode)
FSEditLogLoader gets NPE while accessing INodeMap in TestEditLogRace
HDFS-4842. Major sub-task reported by Jing Zhao and fixed by Jing Zhao (snapshots)
Snapshot: identify the correct prior snapshot when deleting a snapshot under a renamed subtree
HDFS-4841. Major bug reported by Stephen Chu and fixed by Robert Kanter (security , webhdfs)
FsShell commands using secure webhfds fail ClientFinalizer shutdown hook
HDFS-4840. Major bug reported by Kihwal Lee and fixed by Kihwal Lee (namenode)
ReplicationMonitor gets NPE during shutdown
HDFS-4832. Critical bug reported by Ravi Prakash and fixed by Ravi Prakash
Namenode doesn't change the number of missing blocks in safemode when DNs rejoin or leave

This change makes name node keep its internal replication queues and data node state updated in manual safe mode. This allows metrics and UI to present up-to-date information while in safe mode. The behavior during start-up safe mode is unchanged.
HDFS-4830. Minor bug reported by Aaron T. Myers and fixed by Aaron T. Myers
Typo in config settings for AvailableSpaceVolumeChoosingPolicy in hdfs-default.xml
HDFS-4827. Major bug reported by Devaraj Das and fixed by Devaraj Das
Slight update to the implementation of API for handling favored nodes in DFSClient
HDFS-4826. Minor bug reported by Chris Nauroth and fixed by Chris Nauroth (test)
TestNestedSnapshots times out due to repeated slow edit log flushes when running on virtualized disk
HDFS-4825. Major bug reported by Andrew Wang and fixed by Andrew Wang (webhdfs)
webhdfs / httpfs tests broken because of min block size change
HDFS-4824. Major bug reported by Henry Robinson and fixed by Colin Patrick McCabe (hdfs-client)
FileInputStreamCache.close leaves dangling reference to FileInputStreamCache.cacheCleaner
HDFS-4819. Minor sub-task reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (documentation)
Update Snapshot doc for HDFS-4758
HDFS-4818. Minor bug reported by Chris Nauroth and fixed by Chris Nauroth (namenode , test)
several HDFS tests that attempt to make directories unusable do not work correctly on Windows
HDFS-4815. Major bug reported by Tian Hong Wang and fixed by Tian Hong Wang (datanode , test)
TestRBWBlockInvalidation#testBlockInvalidationWhenRBWReplicaMissedInDN: Double call countReplicas() to fetch corruptReplicas and liveReplicas is not needed
HDFS-4813. Minor bug reported by Tsz Wo (Nicholas), SZE and fixed by Jing Zhao (namenode)
BlocksMap may throw NullPointerException during shutdown
HDFS-4810. Major bug reported by Chris Nauroth and fixed by Chris Nauroth (test)
several HDFS HA tests have timeouts that are too short
HDFS-4807. Major bug reported by Kihwal Lee and fixed by Cristina L. Abad
DFSOutputStream.createSocketForPipeline() should not include timeout extension on connect
HDFS-4805. Critical bug reported by Daryn Sharp and fixed by Daryn Sharp (webhdfs)
Webhdfs client is fragile to token renewal errors
HDFS-4804. Minor improvement reported by Stephen Chu and fixed by Stephen Chu
WARN when users set the block balanced preference percent below 0.5 or above 1.0
HDFS-4799. Blocker bug reported by Todd Lipcon and fixed by Todd Lipcon (namenode)
Corrupt replica can be prematurely removed from corruptReplicas map
HDFS-4797. Minor bug reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (datanode)
BlockScanInfo does not override equals(..) and hashCode() consistently
HDFS-4787. Major improvement reported by Tian Hong Wang and fixed by Tian Hong Wang
Create a new HdfsConfiguration before each TestDFSClientRetries testcases
HDFS-4785. Major sub-task reported by Suresh Srinivas and fixed by Suresh Srinivas (namenode)
Concat operation does not remove concatenated files from InodeMap
HDFS-4784. Major sub-task reported by Brandon Li and fixed by Brandon Li (namenode)
NPE in FSDirectory.resolvePath()
HDFS-4783. Major bug reported by Chris Nauroth and fixed by Chris Nauroth (test)
TestDelegationTokensWithHA#testHAUtilClonesDelegationTokens fails on Windows
HDFS-4780. Minor bug reported by Kihwal Lee and fixed by Robert Parker (namenode)
Use the correct relogin method for services
HDFS-4778. Major bug reported by Devaraj Das and fixed by Devaraj Das (namenode)
Invoke getPipeline in the chooseTarget implementation that has favoredNodes
HDFS-4772. Minor improvement reported by Brandon Li and fixed by Brandon Li (namenode)
Add number of children in HdfsFileStatus
HDFS-4768. Major bug reported by Chris Nauroth and fixed by Chris Nauroth (datanode)
File handle leak in datanode when a block pool is removed
HDFS-4765. Major bug reported by Andrew Wang and fixed by Andrew Wang (namenode)
Permission check of symlink deletion incorrectly throws UnresolvedLinkException
HDFS-4762. Major sub-task reported by Brandon Li and fixed by Brandon Li (nfs)
Provide HDFS based NFSv3 and Mountd implementation
HDFS-4751. Minor bug reported by Andrew Wang and fixed by Andrew Wang (test)
TestLeaseRenewer#testThreadName flakes
HDFS-4748. Major bug reported by Chris Nauroth and fixed by Chris Nauroth (qjm , test)
MiniJournalCluster#restartJournalNode leaks resources, which causes sporadic test failures
HDFS-4745. Major bug reported by Chris Nauroth and fixed by Chris Nauroth (test)
TestDataTransferKeepalive#testSlowReader has race condition that causes sporadic failure
HDFS-4743. Major bug reported by Chris Nauroth and fixed by Chris Nauroth (test)
TestNNStorageRetentionManager fails on Windows
HDFS-4741. Major bug reported by Arpit Agarwal and fixed by Arpit Agarwal (test)
TestStorageRestore#testStorageRestoreFailure fails on Windows
HDFS-4740. Major bug reported by Arpit Agarwal and fixed by Arpit Agarwal (test)
Fixes for a few test failures on Windows
HDFS-4739. Major bug reported by Aaron T. Myers and fixed by Aaron T. Myers (namenode)
NN can miscalculate the number of extra edit log segments to retain
HDFS-4737. Major bug reported by Sean Mackrory and fixed by Sean Mackrory
JVM path embedded in fuse binaries
HDFS-4734. Major bug reported by Arpit Agarwal and fixed by Arpit Agarwal
HDFS Tests that use ShellCommandFencer are broken on Windows
HDFS-4733. Major bug reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur
Make HttpFS username pattern configurable
HDFS-4732. Minor bug reported by Chris Nauroth and fixed by Chris Nauroth (test)
TestDFSUpgradeFromImage fails on Windows due to failure to unpack old image tarball that contains hard links
HDFS-4725. Major bug reported by Chris Nauroth and fixed by Chris Nauroth (namenode , test , tools)
fix HDFS file handle leaks
HDFS-4722. Major bug reported by Ivan Mitic and fixed by Ivan Mitic (test)
TestGetConf#testFederation times out on Windows
HDFS-4721. Major improvement reported by Varun Sharma and fixed by Varun Sharma (namenode)
Speed up lease/block recovery when DN fails and a block goes into recovery
HDFS-4714. Major bug reported by Kihwal Lee and fixed by Kihwal Lee (namenode)
Log short messages in Namenode RPC server for exceptions meant for clients
HDFS-4705. Minor bug reported by Ivan Mitic and fixed by Ivan Mitic
Address HDFS test failures on Windows because of invalid dfs.namenode.name.dir
HDFS-4699. Major bug reported by Chris Nauroth and fixed by Chris Nauroth (test)
TestPipelinesFailover#testPipelineRecoveryStress fails sporadically
HDFS-4698. Minor improvement reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe (hdfs-client)
provide client-side metrics for remote reads, local reads, and short-circuit reads
HDFS-4695. Major bug reported by Ivan Mitic and fixed by Ivan Mitic (test)
TestEditLog leaks open file handles between tests
HDFS-4693. Minor bug reported by Arpit Agarwal and fixed by Arpit Agarwal (test)
Some test cases in TestCheckpoint do not clean up after themselves
HDFS-4687. Minor bug reported by Andrew Wang and fixed by Andrew Wang (test)
TestDelegationTokenForProxyUser#testWebHdfsDoAs is flaky with JDK7
HDFS-4679. Major improvement reported by Suresh Srinivas and fixed by Suresh Srinivas (namenode)
Namenode operation checks should be done in a consistent manner
HDFS-4677. Major bug reported by Ivan Mitic and fixed by Ivan Mitic
Editlog should support synchronous writes
HDFS-4676. Minor bug reported by Suresh Srinivas and fixed by Suresh Srinivas (test)
TestHDFSFileSystemContract should set MiniDFSCluster variable to null to free up memory
HDFS-4674. Major bug reported by Chris Nauroth and fixed by Chris Nauroth (test)
TestBPOfferService fails on Windows due to failure parsing datanode data directory as URI
HDFS-4669. Major bug reported by Tian Hong Wang and fixed by Tian Hong Wang (test)
TestBlockPoolManager fails using IBM java
HDFS-4659. Major bug reported by Brandon Li and fixed by Brandon Li (namenode)
Support setting execution bit for regular files
HDFS-4658. Trivial bug reported by Aaron T. Myers and fixed by Aaron T. Myers (ha , namenode)
Standby NN will log that it has received a block report "after becoming active"
HDFS-4655. Minor bug reported by Aaron T. Myers and fixed by Aaron T. Myers (datanode)
DNA_FINALIZE is logged as being an unknown command by the DN when received from the standby NN
HDFS-4646. Minor bug reported by Jagane Sundar and fixed by (namenode)
createNNProxyWithClientProtocol ignores configured timeout value
HDFS-4645. Major improvement reported by Suresh Srinivas and fixed by Arpit Agarwal (namenode)
Move from randomly generated block ID to sequentially generated block ID
HDFS-4643. Trivial bug reported by Todd Lipcon and fixed by Todd Lipcon (qjm , test)
Fix flakiness in TestQuorumJournalManager
HDFS-4639. Major bug reported by Konstantin Shvachko and fixed by Plamen Jeliazkov (namenode)
startFileInternal() should not increment generation stamp
HDFS-4635. Major improvement reported by Suresh Srinivas and fixed by Suresh Srinivas (namenode)
Move BlockManager#computeCapacity to LightWeightGSet
HDFS-4625. Minor bug reported by Arpit Agarwal and fixed by Ivan Mitic (test)
Make TestNNWithQJM#testNewNamenodeTakesOverWriter work on Windows
HDFS-4621. Minor bug reported by Todd Lipcon and fixed by Todd Lipcon (ha , qjm)
additional logging to help diagnose slow QJM logSync
HDFS-4620. Major bug reported by Sandy Ryza and fixed by Sandy Ryza (documentation)
Documentation for dfs.namenode.rpc-address specifies wrong format
HDFS-4618. Major bug reported by Todd Lipcon and fixed by Todd Lipcon (namenode)
default for checkpoint txn interval is too low
HDFS-4615. Major bug reported by Arpit Agarwal and fixed by Arpit Agarwal (test)
Fix TestDFSShell failures on Windows
HDFS-4614. Trivial bug reported by Aaron T. Myers and fixed by Aaron T. Myers (namenode)
FSNamesystem#getContentSummary should use getPermissionChecker helper method
HDFS-4610. Major bug reported by Ivan Mitic and fixed by Ivan Mitic
Move to using common utils FileUtil#setReadable/Writable/Executable and FileUtil#canRead/Write/Execute
HDFS-4609. Minor bug reported by Ivan Mitic and fixed by Ivan Mitic (test)
TestAuditLogs should release log handles between tests
HDFS-4607. Minor bug reported by Ivan Mitic and fixed by Ivan Mitic (test)
TestGetConf#testGetSpecificKey fails on Windows
HDFS-4604. Major bug reported by Ivan Mitic and fixed by Ivan Mitic
TestJournalNode fails on Windows
HDFS-4603. Major bug reported by Ivan Mitic and fixed by Ivan Mitic
TestMiniDFSCluster fails on Windows
HDFS-4602. Major sub-task reported by Suresh Srinivas and fixed by Uma Maheswara Rao G
TestBookKeeperHACheckpoints fails
HDFS-4598. Minor bug reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (webhdfs)
WebHDFS concat: the default value of sources in the code does not match the doc
HDFS-4596. Major bug reported by Andrew Wang and fixed by Andrew Wang (namenode)
Shutting down namenode during checkpointing can lead to md5sum error
HDFS-4595. Major bug reported by Suresh Srinivas and fixed by Suresh Srinivas (hdfs-client)
When short circuit read is fails, DFSClient does not fallback to regular reads
HDFS-4593. Major bug reported by Arpit Agarwal and fixed by Arpit Agarwal
TestSaveNamespace fails on Windows
HDFS-4592. Minor bug reported by Aaron T. Myers and fixed by Aaron T. Myers (namenode)
Default values for access time precision are out of sync between hdfs-default.xml and the code
HDFS-4591. Major bug reported by Aaron T. Myers and fixed by Aaron T. Myers (ha , namenode)
HA clients can fail to fail over while Standby NN is performing long checkpoint
HDFS-4586. Major bug reported by Ivan Mitic and fixed by Ivan Mitic
TestDataDirs.testGetDataDirsFromURIs fails with all directories in dfs.datanode.data.dir are invalid
HDFS-4583. Major bug reported by Ivan Mitic and fixed by Ivan Mitic
TestNodeCount fails
HDFS-4582. Major bug reported by Ivan Mitic and fixed by Ivan Mitic
TestHostsFiles fails on Windows
HDFS-4573. Major bug reported by Arpit Agarwal and fixed by Arpit Agarwal (test)
Fix TestINodeFile on Windows
HDFS-4572. Major bug reported by Arpit Agarwal and fixed by Arpit Agarwal (namenode , test)
Fix TestJournal failures on Windows
HDFS-4569. Trivial improvement reported by Andrew Wang and fixed by Andrew Wang
Small image transfer related cleanups.
HDFS-4565. Minor improvement reported by Arpit Gupta and fixed by Arpit Gupta (security)
use DFSUtil.getSpnegoKeytabKey() to get the spnego keytab key in secondary namenode and namenode http server
HDFS-4544. Major bug reported by Amareshwari Sriramadasu and fixed by Arpit Agarwal
Error in deleting blocks should not do check disk, for all types of errors
HDFS-4542. Blocker sub-task reported by Daryn Sharp and fixed by Daryn Sharp (webhdfs)
Webhdfs doesn't support secure proxy users
HDFS-4541. Major bug reported by Arpit Gupta and fixed by Arpit Gupta (datanode , security)
set hadoop.log.dir and hadoop.id.str when starting secure datanode so it writes the logs to the correct dir by default
HDFS-4540. Major bug reported by Arpit Gupta and fixed by Arpit Gupta (security)
namenode http server should use the web authentication keytab for spnego principal
HDFS-4533. Major bug reported by Fengdong Yu and fixed by Fengdong Yu (datanode , namenode)
start-dfs.sh ignored additional parameters besides -upgrade
HDFS-4532. Critical bug reported by Daryn Sharp and fixed by Daryn Sharp (namenode)
RPC call queue may fill due to current user lookup
HDFS-4525. Major sub-task reported by Uma Maheswara Rao G and fixed by SreeHari (namenode)
Provide an API for knowing that whether file is closed or not.
HDFS-4522. Minor bug reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe
LightWeightGSet expects incrementing a volatile to be atomic
HDFS-4521. Minor improvement reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe
invalid network topologies should not be cached
HDFS-4519. Major bug reported by Chris Nauroth and fixed by Chris Nauroth (datanode , scripts)
Support override of jsvc binary and log file locations when launching secure datanode.

With this improvement the following options are available in release 1.2.0 and later on 1.x release stream: 1. jsvc location can be overridden by setting environment variable JSVC_HOME. Defaults to jsvc binary packaged within the Hadoop distro. 2. jsvc log output is directed to the file defined by JSVC_OUTFILE. Defaults to $HADOOP_LOG_DIR/jsvc.out. 3. jsvc error output is directed to the file defined by JSVC_ERRFILE file. Defaults to $HADOOP_LOG_DIR/jsvc.err. With this improvement the following options are available in release 2.0.4 and later on 2.x release stream: 1. jsvc log output is directed to the file defined by JSVC_OUTFILE. Defaults to $HADOOP_LOG_DIR/jsvc.out. 2. jsvc error output is directed to the file defined by JSVC_ERRFILE file. Defaults to $HADOOP_LOG_DIR/jsvc.err. For overriding jsvc location on 2.x releases, here is the release notes from HDFS-2303: To run secure Datanodes users must install jsvc for their platform and set JSVC_HOME to point to the location of jsvc in their environment.
HDFS-4518. Major bug reported by Arpit Agarwal and fixed by Arpit Agarwal
Finer grained metrics for HDFS capacity
HDFS-4502. Blocker sub-task reported by Alejandro Abdelnur and fixed by Brandon Li (webhdfs)
WebHdfsFileSystem handling of fileld breaks compatibility
HDFS-4495. Major bug reported by Kihwal Lee and fixed by Kihwal Lee (hdfs-client)
Allow client-side lease renewal to be retried beyond soft-limit
HDFS-4484. Minor bug reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe
libwebhdfs compilation broken with gcc 4.6.2
HDFS-4477. Critical bug reported by Kihwal Lee and fixed by Daryn Sharp (security)
Secondary namenode may retain old tokens
HDFS-4471. Major bug reported by Andrew Wang and fixed by Andrew Wang (namenode)
Namenode WebUI file browsing does not work with wildcard addresses configured
HDFS-4470. Major bug reported by Chris Nauroth and fixed by Chris Nauroth
several HDFS tests attempt file operations on invalid HDFS paths when running on Windows
HDFS-4465. Major improvement reported by Suresh Srinivas and fixed by Aaron T. Myers (datanode)
Optimize datanode ReplicasMap and ReplicaInfo
HDFS-4461. Minor improvement reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe
DirectoryScanner: volume path prefix takes up memory for every block that is scanned
HDFS-4434. Major sub-task reported by Brandon Li and fixed by Suresh Srinivas (namenode)
Provide a mapping from INodeId to INode

This change adds support for referencing files and directories based on fileID/inodeID using a path /.reserved/.inodes/<inodeid>. With this change creating a file or directory /.reserved is not longer allowed. Before upgrading to a release with this change, files /.reserved needs to be renamed to another name.
HDFS-4382. Major bug reported by Ted Yu and fixed by Ted Yu
Fix typo MAX_NOT_CHANGED_INTERATIONS
HDFS-4374. Major sub-task reported by Chris Nauroth and fixed by Chris Nauroth (namenode)
Display NameNode startup progress in UI
HDFS-4373. Major sub-task reported by Chris Nauroth and fixed by Chris Nauroth (namenode)
Add HTTP API for querying NameNode startup progress
HDFS-4372. Major sub-task reported by Chris Nauroth and fixed by Chris Nauroth (namenode)
Track NameNode startup progress
HDFS-4346. Minor sub-task reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (namenode)
Refactor INodeId and GenerationStamp
HDFS-4342. Major bug reported by Mark Yang and fixed by Arpit Agarwal (namenode)
Edits dir in dfs.namenode.edits.dir.required will be silently ignored if it is not in dfs.namenode.edits.dir
HDFS-4340. Major sub-task reported by Brandon Li and fixed by Brandon Li (hdfs-client , namenode)
Update addBlock() to inculde inode id as additional argument
HDFS-4339. Major sub-task reported by Brandon Li and fixed by Brandon Li (namenode)
Persist inode id in fsimage and editlog
HDFS-4334. Major sub-task reported by Brandon Li and fixed by Brandon Li (namenode)
Add a unique id to each INode
HDFS-4305. Minor bug reported by Todd Lipcon and fixed by Andrew Wang (namenode)
Add a configurable limit on number of blocks per file, and min block size

This change introduces a maximum number of blocks per file, by default one million, and a minimum block size, by default 1MB. These can optionally be changed via the configuration settings "dfs.namenode.fs-limits.max-blocks-per-file" and "dfs.namenode.fs-limits.min-block-size", respectively.
HDFS-4304. Major improvement reported by Todd Lipcon and fixed by Colin Patrick McCabe (namenode)
Make FSEditLogOp.MAX_OP_SIZE configurable
HDFS-4300. Critical bug reported by Todd Lipcon and fixed by Andrew Wang
TransferFsImage.downloadEditsToStorage should use a tmp file for destination
HDFS-4298. Major bug reported by Todd Lipcon and fixed by Aaron T. Myers (namenode)
StorageRetentionManager spews warnings when used with QJM
HDFS-4296. Major bug reported by Suresh Srinivas and fixed by Suresh Srinivas (namenode)
Add layout version for HDFS-4256 for release 1.2.0
HDFS-4287. Major bug reported by Chris Nauroth and fixed by Chris Nauroth (webhdfs)
HTTPFS tests fail on Windows
HDFS-4261. Major bug reported by Tsz Wo (Nicholas), SZE and fixed by Junping Du (balancer)
TestBalancerWithNodeGroup times out
HDFS-4249. Major new feature reported by Suresh Srinivas and fixed by Chris Nauroth (namenode)
Add status NameNode startup to webUI
HDFS-4246. Minor improvement reported by Harsh J and fixed by Harsh J (hdfs-client)
The exclude node list should be more forgiving, for each output stream
HDFS-4240. Major bug reported by Junping Du and fixed by Junping Du (namenode)
In nodegroup-aware case, make sure nodes are avoided to place replica if some replica are already under the same nodegroup
HDFS-4235. Minor bug reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe
when outputting XML, OfflineEditsViewer can't handle some edits containing non-ASCII strings
HDFS-4234. Minor improvement reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (balancer)
Use the generic code for choosing datanode in Balancer
HDFS-4222. Minor bug reported by Xiaobo Peng and fixed by Xiaobo Peng (namenode)
NN is unresponsive and loses heartbeats of DNs when Hadoop is configured to use LDAP and LDAP has issues
HDFS-4215. Major improvement reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (namenode)
Improvements on INode and image loading
HDFS-4209. Major bug reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (namenode)
Clean up the addNode/addChild/addChildNoQuotaCheck methods in FSDirectory
HDFS-4206. Major improvement reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (namenode)
Change the fields in INode and its subclasses to private
HDFS-4205. Major bug reported by Andy Isaacson and fixed by Jason Lowe (hdfs-client)
fsck fails with symlinks
HDFS-4152. Minor improvement reported by Tsz Wo (Nicholas), SZE and fixed by Jing Zhao (namenode)
Add a new class for the parameter in INode.collectSubtreeBlocksAndClear(..)
HDFS-4151. Minor improvement reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (namenode)
Passing INodesInPath instead of INode[] in FSDirectory
HDFS-4129. Minor test reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (namenode)
Add utility methods to dump NameNode in memory tree for testing
HDFS-4128. Major bug reported by Todd Lipcon and fixed by Kihwal Lee (namenode)
2NN gets stuck in inconsistent state if edit log replay fails in the middle
HDFS-4124. Minor new feature reported by Jing Zhao and fixed by Jing Zhao
Refactor INodeDirectory#getExistingPathINodes() to enable returning more than INode array
HDFS-4053. Major improvement reported by Eli Collins and fixed by Eli Collins
Increase the default block size

The default blocks size prior to this change was 64MB. This jira changes the default block size to 128MB. To go back to previous behavior, please configure the in hdfs-site.xml, the configuration parameter "dfs.blocksize" to 67108864.
HDFS-4013. Trivial bug reported by Chao Shi and fixed by Chao Shi (hdfs-client)
TestHftpURLTimeouts throws NPE
HDFS-3940. Minor improvement reported by Eli Collins and fixed by Suresh Srinivas
Add Gset#clear method and clear the block map when namenode is shutdown
HDFS-3934. Minor bug reported by Andy Isaacson and fixed by Colin Patrick McCabe
duplicative dfs_hosts entries handled wrong
HDFS-3880. Minor improvement reported by Brandon Li and fixed by Brandon Li (datanode , ha , namenode , security)
Use Builder to get RPC server in HDFS
HDFS-3875. Critical bug reported by Todd Lipcon and fixed by Kihwal Lee (datanode , hdfs-client)
Issue handling checksum errors in write pipeline
HDFS-3817. Major improvement reported by Brandon Li and fixed by Brandon Li (namenode)
avoid printing stack information for SafeModeException
HDFS-3792. Trivial bug reported by Todd Lipcon and fixed by Todd Lipcon (build , namenode)
Fix two findbugs introduced by HDFS-3695
HDFS-3769. Critical sub-task reported by liaowenrui and fixed by (ha)
standby namenode become active fails because starting log segment fail on shared storage
HDFS-3601. Major new feature reported by Junping Du and fixed by Junping Du (namenode)
Implementation of ReplicaPlacementPolicyNodeGroup to support 4-layer network topology
HDFS-3499. Major bug reported by Junping Du and fixed by Junping Du (datanode)
Make NetworkTopology support user specified topology class
HDFS-3498. Major improvement reported by Junping Du and fixed by Junping Du (namenode)
Make Replica Removal Policy pluggable and ReplicaPlacementPolicyDefault extensible for reusing code in subclass
HDFS-3495. Major new feature reported by Junping Du and fixed by Junping Du (balancer)
Update Balancer to support new NetworkTopology with NodeGroup
HDFS-3277. Major bug reported by Colin Patrick McCabe and fixed by Andrew Wang
fail over to loading a different FSImage if the first one we try to load is corrupt
HDFS-3180. Major bug reported by Daryn Sharp and fixed by Chris Nauroth (webhdfs)
Add socket timeouts to webhdfs
HDFS-3163. Trivial improvement reported by Brandon Li and fixed by Brandon Li (test)
TestHDFSCLI.testAll fails if the user name is not all lowercase
HDFS-3009. Trivial bug reported by Hari Mankude and fixed by Hari Mankude (hdfs-client)
DFSClient islocaladdress() can use similar routine in netutils
HDFS-2857. Major improvement reported by Suresh Srinivas and fixed by Suresh Srinivas (namenode)
Cleanup BlockInfo class
HDFS-2576. Major new feature reported by Pritam Damania and fixed by Devaraj Das (hdfs-client , namenode)
Namenode should have a favored nodes hint to enable clients to have control over block placement.
HDFS-2572. Trivial improvement reported by Harsh J and fixed by Harsh J (datanode)
Unnecessary double-check in DN#getHostName
HDFS-2042. Minor improvement reported by Eli Collins and fixed by (libhdfs)
Require c99 when building libhdfs
HDFS-1804. Minor new feature reported by Harsh J and fixed by Aaron T. Myers (datanode)
Add a new block-volume device choosing policy that looks at free space

There is now a new option to have the DN take into account available disk space on each volume when choosing where to place a replica when performing an HDFS write. This can be enabled by setting the config "dfs.datanode.fsdataset.volume.choosing.policy" to the value "org.apache.hadoop.hdfs.server.datanode.fsdataset.AvailableSpaceVolumeChoosingPolicy".
HDFS-347. Major improvement reported by George Porter and fixed by Colin Patrick McCabe (datanode , hdfs-client , performance)
DFS read performance suboptimal when client co-located on nodes with data
HADOOP-9792. Major improvement reported by Suresh Srinivas and fixed by Suresh Srinivas (ipc)
Retry the methods that are tagged @AtMostOnce along with @Idempotent
HADOOP-9786. Major bug reported by Jing Zhao and fixed by Jing Zhao
RetryInvocationHandler#isRpcInvocation should support ProtocolTranslator
HADOOP-9773. Minor bug reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (test)
TestLightWeightCache fails
HADOOP-9770. Minor improvement reported by Suresh Srinivas and fixed by Suresh Srinivas (util)
Make RetryCache#state non volatile
HADOOP-9763. Major new feature reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (util)
Extends LightWeightGSet to support eviction of expired elements
HADOOP-9762. Major improvement reported by Suresh Srinivas and fixed by Suresh Srinivas (util)
RetryCache utility for implementing RPC retries
HADOOP-9760. Major improvement reported by Suresh Srinivas and fixed by Suresh Srinivas (util)
Move GSet and LightWeightGSet to hadoop-common
HADOOP-9759. Critical bug reported by Chuan Liu and fixed by Chuan Liu
Add support for NativeCodeLoader#getLibraryName on Windows
HADOOP-9756. Minor improvement reported by Junping Du and fixed by Junping Du (ipc)
Additional cleanup RPC code
HADOOP-9754. Minor improvement reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (ipc)
Clean up RPC code
HADOOP-9751. Major improvement reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (ipc)
Add clientId and retryCount to RpcResponseHeaderProto
HADOOP-9738. Major bug reported by Kihwal Lee and fixed by Jing Zhao (tools)
TestDistCh fails
HADOOP-9734. Minor improvement reported by Jason Lowe and fixed by Jason Lowe (ipc)
Common protobuf definitions for GetUserMappingsProtocol, RefreshAuthorizationPolicyProtocol and RefreshUserMappingsProtocol
HADOOP-9720. Major sub-task reported by Suresh Srinivas and fixed by Arpit Agarwal
Rename Client#uuid to Client#clientId
HADOOP-9717. Major improvement reported by Suresh Srinivas and fixed by Jing Zhao (ipc)
Add retry attempt count to the RPC requests
HADOOP-9716. Major improvement reported by Suresh Srinivas and fixed by Tsz Wo (Nicholas), SZE (ipc)
Move the Rpc request call ID generation to client side InvocationHandler
HADOOP-9707. Minor bug reported by Todd Lipcon and fixed by Todd Lipcon (util)
Fix register lists for crc32c inline assembly
HADOOP-9701. Minor bug reported by Steve Loughran and fixed by Karthik Kambatla (documentation)
mvn site ambiguous links in hadoop-common
HADOOP-9698. Blocker sub-task reported by Daryn Sharp and fixed by Daryn Sharp (ipc)
RPCv9 client must honor server's SASL negotiate response

The RPC client now waits for the Server's SASL negotiate response before instantiating its SASL client.
HADOOP-9691. Minor improvement reported by Chris Nauroth and fixed by Chris Nauroth (ipc)
RPC clients can generate call ID using AtomicInteger instead of synchronizing on the Client instance.
HADOOP-9688. Blocker improvement reported by Suresh Srinivas and fixed by Suresh Srinivas (ipc)
Add globally unique Client ID to RPC requests
HADOOP-9683. Blocker sub-task reported by Luke Lu and fixed by Daryn Sharp (ipc)
Wrap IpcConnectionContext in RPC headers

Connection context is now sent as a rpc header wrapped protobuf.
HADOOP-9681. Minor bug reported by Chuan Liu and fixed by Chuan Liu
FileUtil.unTarUsingJava() should close the InputStream upon finishing
HADOOP-9678. Major bug reported by Ivan Mitic and fixed by Ivan Mitic
TestRPC#testStopsAllThreads intermittently fails on Windows
HADOOP-9676. Minor improvement reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe
make maximum RPC buffer size configurable
HADOOP-9673. Trivial improvement reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe (net)
NetworkTopology: when a node can't be added, print out its location for diagnostic purposes
HADOOP-9665. Critical bug reported by Zhijie Shen and fixed by Zhijie Shen
BlockDecompressorStream#decompress will throw EOFException instead of return -1 when EOF
HADOOP-9661. Major improvement reported by Sandy Ryza and fixed by Sandy Ryza (metrics)
Allow metrics sources to be extended
HADOOP-9656. Minor bug reported by Chuan Liu and fixed by Chuan Liu (test , tools)
Gridmix unit tests fail on Windows and Linux
HADOOP-9649. Blocker improvement reported by Zhijie Shen and fixed by Zhijie Shen
Promote YARN service life-cycle libraries into Hadoop Common
HADOOP-9643. Minor bug reported by Mark Miller and fixed by Mark Miller (security)
org.apache.hadoop.security.SecurityUtil calls toUpperCase(Locale.getDefault()) as well as toLowerCase(Locale.getDefault()) on hadoop.security.authentication value.
HADOOP-9638. Major bug reported by Chris Nauroth and fixed by Andrey Klochkov (test)
parallel test changes caused invalid test path for several HDFS tests on Windows
HADOOP-9637. Major bug reported by Chuan Liu and fixed by Chuan Liu
Adding Native Fstat for Windows as needed by YARN
HADOOP-9632. Minor bug reported by Chuan Liu and fixed by Chuan Liu
TestShellCommandFencer will fail if there is a 'host' machine in the network
HADOOP-9630. Major sub-task reported by Luke Lu and fixed by Junping Du (ipc)
Remove IpcSerializationType
HADOOP-9625. Minor improvement reported by Paul Han and fixed by (bin , conf)
HADOOP_OPTS not picked up by hadoop command
HADOOP-9624. Minor test reported by Xi Fang and fixed by Xi Fang (test)
TestFSMainOperationsLocalFileSystem failed when the Hadoop test root path has "X" in its name
HADOOP-9619. Major sub-task reported by Sanjay Radia and fixed by Sanjay Radia (documentation)
Mark stability of .proto files
HADOOP-9607. Minor bug reported by Timothy St. Clair and fixed by (documentation)
Fixes in Javadoc build
HADOOP-9605. Major improvement reported by Timothy St. Clair and fixed by (build)
Update junit dependency
HADOOP-9604. Minor improvement reported by Jingguo Yao and fixed by Jingguo Yao (fs)
Wrong Javadoc of FSDataOutputStream
HADOOP-9599. Major bug reported by Mostafa Elhemali and fixed by Mostafa Elhemali
hadoop-config.cmd doesn't set JAVA_LIBRARY_PATH correctly
HADOOP-9593. Major bug reported by Steve Loughran and fixed by Steve Loughran (util)
stack trace printed at ERROR for all yarn clients without hadoop.home set
HADOOP-9581. Major bug reported by Ashwin Shankar and fixed by Ashwin Shankar (scripts)
hadoop --config non-existent directory should result in error
HADOOP-9574. Major bug reported by Jian He and fixed by Jian He
Add new methods in AbstractDelegationTokenSecretManager for restoring RMDelegationTokens on RMRestart
HADOOP-9566. Major bug reported by Lenni Kuff and fixed by Colin Patrick McCabe (native)
Performing direct read using libhdfs sometimes raises SIGPIPE (which in turn throws SIGABRT) causing client crashes
HADOOP-9563. Major bug reported by Kihwal Lee and fixed by Tian Hong Wang (util)
Fix incompatibility introduced by HADOOP-9523
HADOOP-9560. Minor improvement reported by Tsuyoshi OZAWA and fixed by Tsuyoshi OZAWA (metrics)
metrics2#JvmMetrics should have max memory size of JVM
HADOOP-9556. Major bug reported by Chris Nauroth and fixed by Chris Nauroth (ha , test)
disable HA tests on Windows that fail due to ZooKeeper client connection management bug
HADOOP-9553. Major bug reported by Arpit Agarwal and fixed by Arpit Agarwal (test)
TestAuthenticationToken fails on Windows
HADOOP-9550. Major bug reported by Karthik Kambatla and fixed by Karthik Kambatla
Remove aspectj dependency
HADOOP-9549. Blocker bug reported by Kihwal Lee and fixed by Daryn Sharp (security)
WebHdfsFileSystem hangs on close()
HADOOP-9532. Minor bug reported by Chris Nauroth and fixed by Chris Nauroth (bin)
HADOOP_CLIENT_OPTS is appended twice by Windows cmd scripts
HADOOP-9526. Major bug reported by Arpit Agarwal and fixed by Arpit Agarwal (test)
TestShellCommandFencer and TestShell fail on Windows
HADOOP-9524. Major bug reported by Arpit Agarwal and fixed by Arpit Agarwal (ha)
Fix ShellCommandFencer to work on Windows
HADOOP-9523. Major improvement reported by Tian Hong Wang and fixed by Tian Hong Wang
Provide a generic IBM java vendor flag in PlatformName.java to support non-Sun JREs
HADOOP-9517. Blocker bug reported by Arun C Murthy and fixed by Karthik Kambatla (documentation)
Document Hadoop Compatibility
HADOOP-9515. Major new feature reported by Brandon Li and fixed by Brandon Li
Add general interface for NFS and Mount
HADOOP-9511. Major improvement reported by Omkar Vinit Joshi and fixed by Omkar Vinit Joshi
Adding support for additional input streams (FSDataInputStream and RandomAccessFile) in SecureIOUtils.
HADOOP-9509. Major new feature reported by Brandon Li and fixed by Brandon Li
Implement ONCRPC and XDR
HADOOP-9507. Minor bug reported by Mostafa Elhemali and fixed by Chris Nauroth (fs)
LocalFileSystem rename() is broken in some cases when destination exists
HADOOP-9504. Critical bug reported by Liang Xie and fixed by Liang Xie (metrics)
MetricsDynamicMBeanBase has concurrency issues in createMBeanInfo
HADOOP-9503. Minor improvement reported by Varun Sharma and fixed by Varun Sharma (ipc)
Remove sleep between IPC client connect timeouts
HADOOP-9500. Major bug reported by Chris Nauroth and fixed by Chris Nauroth (test)
TestUserGroupInformation#testGetServerSideGroups fails on Windows due to failure to find winutils.exe
HADOOP-9496. Critical bug reported by Gopal V and fixed by Harsh J (bin)
Bad merge of HADOOP-9450 on branch-2 breaks all bin/hadoop calls that need HADOOP_CLASSPATH
HADOOP-9490. Major bug reported by Ivan Mitic and fixed by Ivan Mitic (fs)
LocalFileSystem#reportChecksumFailure not closing the checksum file handle before rename
HADOOP-9488. Major bug reported by Chris Nauroth and fixed by Chris Nauroth (fs)
FileUtil#createJarWithClassPath only substitutes environment variables from current process environment/does not support overriding when launching new process
HADOOP-9486. Major bug reported by Vinod Kumar Vavilapalli and fixed by Chris Nauroth
Promote Windows and Shell related utils from YARN to Hadoop Common
HADOOP-9485. Minor bug reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe (net)
No default value in the code for hadoop.rpc.socket.factory.class.default
HADOOP-9483. Major improvement reported by Chris Nauroth and fixed by Arpit Agarwal (util)
winutils support for readlink command
HADOOP-9481. Minor bug reported by Vadim Bondarev and fixed by Vadim Bondarev
Broken conditional logic with HADOOP_SNAPPY_LIBRARY
HADOOP-9473. Trivial bug reported by Glen Mazza and fixed by (fs)
typo in FileUtil copy() method
HADOOP-9469. Major bug reported by Thomas Graves and fixed by Robert Parker
mapreduce/yarn source jars not included in dist tarball
HADOOP-9459. Critical bug reported by Vinay and fixed by Vinay (ha)
ActiveStandbyElector can join election even before Service HEALTHY, and results in null data at ActiveBreadCrumb
HADOOP-9455. Minor bug reported by Sangjin Lee and fixed by Chris Nauroth (bin)
HADOOP_CLIENT_OPTS appended twice causes JVM failures
HADOOP-9451. Major bug reported by Junping Du and fixed by Junping Du (net)
Node with one topology layer should be handled as fault topology when NodeGroup layer is enabled
HADOOP-9450. Major improvement reported by Mitch Wyle and fixed by Harsh J (scripts)
HADOOP_USER_CLASSPATH_FIRST is not honored; CLASSPATH is PREpended instead of APpended
HADOOP-9443. Major bug reported by Chuan Liu and fixed by Chuan Liu
Port winutils static code analysis change to trunk
HADOOP-9439. Minor bug reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe (native)
JniBasedUnixGroupsMapping: fix some crash bugs
HADOOP-9437. Major bug reported by Chris Nauroth and fixed by Chris Nauroth (test)
TestNativeIO#testRenameTo fails on Windows due to assumption that POSIX errno is embedded in NativeIOException
HADOOP-9430. Major bug reported by Amir Sanjar and fixed by (security)
TestSSLFactory fails on IBM JVM
HADOOP-9429. Major bug reported by Amir Sanjar and fixed by (test)
TestConfiguration fails with IBM JAVA
HADOOP-9425. Major sub-task reported by Sanjay Radia and fixed by Sanjay Radia (ipc)
Add error codes to rpc-response
HADOOP-9421. Blocker sub-task reported by Sanjay Radia and fixed by Daryn Sharp
Convert SASL to use ProtoBuf and provide negotiation capabilities

Raw SASL protocol now uses protobufs wrapped with RPC headers. The negotiation sequence incorporates the state of the exchange. The server now has the ability to advertise its supported auth types.
HADOOP-9418. Major sub-task reported by Andrew Wang and fixed by Andrew Wang (fs)
Add symlink resolution support to DistributedFileSystem
HADOOP-9416. Major sub-task reported by Andrew Wang and fixed by Andrew Wang (fs)
Add new symlink resolution methods in FileSystem and FileSystemLinkResolver
HADOOP-9414. Major sub-task reported by Andrew Wang and fixed by Andrew Wang (fs)
Refactor out FSLinkResolver and relevant helper methods
HADOOP-9413. Major bug reported by Ivan Mitic and fixed by Ivan Mitic
Introduce common utils for File#setReadable/Writable/Executable and File#canRead/Write/Execute that work cross-platform
HADOOP-9408. Minor bug reported by rajeshbabu and fixed by rajeshbabu (conf)
misleading description for net.topology.table.file.name property in core-default.xml
HADOOP-9407. Major bug reported by Sangjin Lee and fixed by Sangjin Lee (build)
commons-daemon 1.0.3 dependency has bad group id causing build issues
HADOOP-9405. Minor bug reported by Andrew Wang and fixed by Andrew Wang (test , tools)
TestGridmixSummary#testExecutionSummarizer is broken
HADOOP-9401. Major improvement reported by Karthik Kambatla and fixed by Karthik Kambatla
CodecPool: Add counters for number of (de)compressors leased out
HADOOP-9399. Minor bug reported by Todd Lipcon and fixed by Konstantin Boudnik (build)
protoc maven plugin doesn't work on mvn 3.0.2

Committed to 2.0.4-alpha branch
HADOOP-9397. Major bug reported by Jason Lowe and fixed by Chris Nauroth (build)
Incremental dist tar build fails
HADOOP-9388. Major bug reported by Ivan Mitic and fixed by Ivan Mitic
TestFsShellCopy fails on Windows
HADOOP-9380. Major sub-task reported by Sanjay Radia and fixed by Sanjay Radia (ipc)
Add totalLength to rpc response
HADOOP-9379. Trivial improvement reported by Arpit Gupta and fixed by Arpit Gupta
capture the ulimit info after printing the log to the console
HADOOP-9376. Major bug reported by Ivan Mitic and fixed by Ivan Mitic
TestProxyUserFromEnv fails on a Windows domain joined machine
HADOOP-9373. Minor bug reported by Suresh Srinivas and fixed by Suresh Srinivas
Merge CHANGES.branch-trunk-win.txt to CHANGES.txt
HADOOP-9369. Major bug reported by Karthik Kambatla and fixed by Karthik Kambatla (net)
DNS#reverseDns() can return hostname with . appended at the end
HADOOP-9365. Major bug reported by Ivan Mitic and fixed by Ivan Mitic
TestHAZKUtil fails on Windows
HADOOP-9364. Major bug reported by Ivan Mitic and fixed by Ivan Mitic
PathData#expandAsGlob does not return correct results for absolute paths on Windows
HADOOP-9358. Major bug reported by Todd Lipcon and fixed by Todd Lipcon (ipc , security)
"Auth failed" log should include exception string
HADOOP-9355. Major sub-task reported by Andrew Wang and fixed by Andrew Wang (fs)
Abstract symlink tests to use either FileContext or FileSystem
HADOOP-9353. Major bug reported by Arpit Agarwal and fixed by Arpit Agarwal (build)
Activate native-win profile by default on Windows
HADOOP-9352. Major improvement reported by Daryn Sharp and fixed by Daryn Sharp (security)
Expose UGI.setLoginUser for tests
HADOOP-9349. Major bug reported by Sandy Ryza and fixed by Sandy Ryza (tools)
Confusing output when running hadoop version from one hadoop installation when HADOOP_HOME points to another
HADOOP-9343. Major improvement reported by Siddharth Seth and fixed by Siddharth Seth
Allow additional exceptions through the RPC layer
HADOOP-9342. Major bug reported by Thomas Weise and fixed by Thomas Weise (build)
Remove jline from distribution
HADOOP-9339. Major bug reported by Daryn Sharp and fixed by Daryn Sharp (ipc)
IPC.Server incorrectly sets UGI auth type
HADOOP-9338. Major new feature reported by Nick White and fixed by Nick White (fs)
FsShell Copy Commands Should Optionally Preserve File Attributes
HADOOP-9337. Major bug reported by Ivan A. Veselovsky and fixed by Ivan A. Veselovsky
org.apache.hadoop.fs.DF.getMount() does not work on Mac OS
HADOOP-9336. Critical improvement reported by Daryn Sharp and fixed by Daryn Sharp (ipc)
Allow UGI of current connection to be queried
HADOOP-9334. Minor improvement reported by Nicolas Liochon and fixed by Nicolas Liochon (build)
Update netty version
HADOOP-9323. Minor bug reported by Hao Zhong and fixed by Suresh Srinivas (documentation , fs , io , record)
Typos in API documentation
HADOOP-9322. Minor improvement reported by Harsh J and fixed by Harsh J (security)
LdapGroupsMapping doesn't seem to set a timeout for its directory search
HADOOP-9318. Minor improvement reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe
when exiting on a signal, print the signal name first
HADOOP-9307. Major bug reported by Todd Lipcon and fixed by Todd Lipcon (fs)
BufferedFSInputStream.read returns wrong results after certain seeks
HADOOP-9305. Major bug reported by Aaron T. Myers and fixed by Aaron T. Myers (security)
Add support for running the Hadoop client on 64-bit AIX
HADOOP-9304. Major bug reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur (build)
remove addition of avro genreated-sources dirs to build
HADOOP-9303. Major bug reported by Thomas Graves and fixed by Andy Isaacson
command manual dfsadmin missing entry for restoreFailedStorage option
HADOOP-9302. Major bug reported by Thomas Graves and fixed by Andy Isaacson (documentation)
HDFS docs not linked from top level
HADOOP-9299. Blocker bug reported by Roman Shaposhnik and fixed by Daryn Sharp (security)
kerberos name resolution is kicking in even when kerberos is not configured
HADOOP-9297. Major bug reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur
remove old record IO generation and tests
HADOOP-9294. Major bug reported by Chris Nauroth and fixed by Chris Nauroth (test)
GetGroupsTestBase fails on Windows
HADOOP-9290. Major bug reported by Arpit Agarwal and fixed by Chris Nauroth (build , native)
Some tests cannot load native library
HADOOP-9287. Major test reported by Tsuyoshi OZAWA and fixed by Andrey Klochkov (test)
Parallel testing hadoop-common
HADOOP-9283. Major new feature reported by Aaron T. Myers and fixed by Aaron T. Myers (security)
Add support for running the Hadoop client on AIX
HADOOP-9279. Major improvement reported by Tsuyoshi OZAWA and fixed by Tsuyoshi OZAWA (build , documentation)
Document the need to build hadoop-maven-plugins for eclipse and separate project builds
HADOOP-9267. Minor bug reported by Andrew Wang and fixed by Andrew Wang
hadoop -help, -h, --help should show usage instructions
HADOOP-9264. Major bug reported by Chris Nauroth and fixed by Chris Nauroth (fs)
port change to use Java untar API on Windows from branch-1-win to trunk
HADOOP-9253. Major improvement reported by Arpit Gupta and fixed by Arpit Gupta
Capture ulimit info in the logs at service start time
HADOOP-9246. Major bug reported by Karthik Kambatla and fixed by Karthik Kambatla (build)
Execution phase for hadoop-maven-plugin should be process-resources
HADOOP-9245. Major bug reported by Karthik Kambatla and fixed by Karthik Kambatla (build)
mvn clean without running mvn install before fails
HADOOP-9233. Major test reported by Vadim Bondarev and fixed by Vadim Bondarev
Cover package org.apache.hadoop.io.compress.zlib with unit tests
HADOOP-9230. Major bug reported by Karthik Kambatla and fixed by Karthik Kambatla (test)
TestUniformSizeInputFormat fails intermittently
HADOOP-9222. Major test reported by Vadim Bondarev and fixed by Vadim Bondarev
Cover package with org.apache.hadoop.io.lz4 unit tests
HADOOP-9220. Critical bug reported by Tom White and fixed by Tom White (ha)
Unnecessary transition to standby in ActiveStandbyElector
HADOOP-9218. Major sub-task reported by Sanjay Radia and fixed by Sanjay Radia (ipc)
Document the Rpc-wrappers used internally
HADOOP-9211. Major bug reported by Sarah Weissman and fixed by Plamen Jeliazkov (conf)
HADOOP_CLIENT_OPTS default setting fixes max heap size at 128m, disregards HADOOP_HEAPSIZE
HADOOP-9209. Major new feature reported by Todd Lipcon and fixed by Todd Lipcon (fs , tools)
Add shell command to dump file checksums
HADOOP-9164. Minor improvement reported by Binglin Chang and fixed by Binglin Chang (native)
Print paths of loaded native libraries in NativeLibraryChecker
HADOOP-9163. Major sub-task reported by Sanjay Radia and fixed by Sanjay Radia (ipc)
The rpc msg in ProtobufRpcEngine.proto should be moved out to avoid an extra copy
HADOOP-9154. Major bug reported by Karthik Kambatla and fixed by Karthik Kambatla (io)
SortedMapWritable#putAll() doesn't add key/value classes to the map
HADOOP-9151. Major sub-task reported by Sanjay Radia and fixed by Sanjay Radia (ipc)
Include RPC error info in RpcResponseHeader instead of sending it separately
HADOOP-9150. Critical bug reported by Todd Lipcon and fixed by Todd Lipcon (fs/s3 , ha , performance , viewfs)
Unnecessary DNS resolution attempts for logical URIs
HADOOP-9140. Major sub-task reported by Sanjay Radia and fixed by Sanjay Radia (ipc)
Cleanup rpc PB protos
HADOOP-9131. Major bug reported by Chris Nauroth and fixed by Chris Nauroth (test)
TestLocalFileSystem#testListStatusWithColons cannot run on Windows
HADOOP-9125. Major bug reported by Kai Zheng and fixed by Kai Zheng (security)
LdapGroupsMapping threw CommunicationException after some idle time
HADOOP-9117. Major improvement reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur (build)
replace protoc ant plugin exec with a maven plugin
HADOOP-9043. Major bug reported by Chris Nauroth and fixed by Chris Nauroth (util)
disallow in winutils creating symlinks with forwards slashes
HADOOP-8982. Major bug reported by Chris Nauroth and fixed by Chris Nauroth (net)
TestSocketIOWithTimeout fails on Windows
HADOOP-8973. Major bug reported by Chris Nauroth and fixed by Chris Nauroth (util)
DiskChecker cannot reliably detect an inaccessible disk on Windows with NTFS ACLs
HADOOP-8958. Major bug reported by Chris Nauroth and fixed by Chris Nauroth (viewfs)
ViewFs:Non absolute mount name failures when running multiple tests on Windows
HADOOP-8957. Major bug reported by Chris Nauroth and fixed by Chris Nauroth (fs)
AbstractFileSystem#IsValidName should be overridden for embedded file systems like ViewFs
HADOOP-8924. Major improvement reported by Chris Nauroth and fixed by Chris Nauroth (build)
Add maven plugin alternative to shell script to save package-info.java
HADOOP-8917. Major bug reported by Arpit Gupta and fixed by Arpit Gupta
add LOCALE.US to toLowerCase in SecurityUtil.replacePattern
HADOOP-8886. Major improvement reported by Eli Collins and fixed by Eli Collins (fs)
Remove KFS support

Kosmos FS (KFS) is no longer maintained and Hadoop support has been removed. KFS has been replaced by QFS (HADOOP-8885).
HADOOP-8711. Major improvement reported by Brandon Li and fixed by Brandon Li (ipc)
provide an option for IPC server users to avoid printing stack information for certain exceptions
HADOOP-8569. Minor bug reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe
CMakeLists.txt: define _GNU_SOURCE and _LARGEFILE_SOURCE
HADOOP-8562. Major new feature reported by Bikas Saha and fixed by Bikas Saha
Enhancements to support Hadoop on Windows Server and Windows Azure environments

This umbrella jira makes enhancements to support Hadoop natively on Windows Server and Windows Azure environments.
HADOOP-8470. Major sub-task reported by Junping Du and fixed by Junping Du
Implementation of 4-layer subclass of NetworkTopology (NetworkTopologyWithNodeGroup)

This patch should be checked in together (or after) with JIRA Hadoop-8469: https://issues.apache.org/jira/browse/HADOOP-8469
HADOOP-8469. Major sub-task reported by Junping Du and fixed by Junping Du
Make NetworkTopology class pluggable
HADOOP-8462. Major improvement reported by Govind Kamat and fixed by Govind Kamat (io)
Native-code implementation of bzip2 codec
HADOOP-8440. Minor bug reported by Ivan Mitic and fixed by Ivan Mitic (fs)
HarFileSystem.decodeHarURI fails for URIs whose host contains numbers
HADOOP-8415. Minor improvement reported by Jan van der Lugt and fixed by Jan van der Lugt (conf)
getDouble() and setDouble() in org.apache.hadoop.conf.Configuration
HADOOP-7487. Major bug reported by Todd Lipcon and fixed by Andrew Wang (fs)
DF should throw a more reasonable exception when mount cannot be determined
HADOOP-7391. Major bug reported by Sanjay Radia and fixed by Sanjay Radia
Document Interface Classification from HADOOP-5073

Hadoop 2.0.5-alpha Release Notes

These release notes include new developer and user-facing incompatibilities, features, and major improvements.

Changes since Hadoop 2.0.4-alpha

MAPREDUCE-5240. Blocker bug reported by Roman Shaposhnik and fixed by Vinod Kumar Vavilapalli (mrv2)
inside of FileOutputCommitter the initialized Credentials cache appears to be empty
HDFS-4482. Blocker bug reported by Uma Maheswara Rao G and fixed by Uma Maheswara Rao G (namenode)
ReplicationMonitor thread can exit with NPE due to the race between delete and replication of same file.
HADOOP-9407. Major bug reported by Sangjin Lee and fixed by Sangjin Lee (build)
commons-daemon 1.0.3 dependency has bad group id causing build issues
HADOOP-8419. Major bug reported by Luke Lu and fixed by Yu Li (io)
GzipCodec NPE upon reset with IBM JDK

Hadoop 2.0.4-alpha Release Notes

These release notes include new developer and user-facing incompatibilities, features, and major improvements.

Changes since Hadoop 2.0.3-alpha

YARN-470. Major bug reported by Hitesh Shah and fixed by Siddharth Seth (nodemanager)
Support a way to disable resource monitoring on the NodeManager

Currently, the memory management monitor's check is disabled when the maxMem is set to -1. However, the maxMem is also sent to the RM when the NM registers with it ( to define the max limit of allocate-able resources ). We need an explicit flag to disable monitoring to avoid the problems caused by the overloading of the max memory value.
YARN-449. Blocker bug reported by Siddharth Seth and fixed by
HBase test failures when running against Hadoop 2

Post YARN-429, unit tests for HBase continue to fail since the classpath for the MRAppMaster is not being set correctly. Reverting YARN-129 may fix this, but I'm not sure that's the correct solution. My guess is, as Alexandro pointed out in YARN-129, maven classloader magic is messing up java.class.path.
YARN-443. Major improvement reported by Thomas Graves and fixed by Thomas Graves (nodemanager)
allow OS scheduling priority of NM to be different than the containers it launches

It would be nice if we could have the nodemanager run at a different OS scheduling priority than the containers so that you can still communicate with the nodemanager if the containers out of control. On linux we could launch the nodemanager at a higher priority, but then all the containers it launches would also be at that higher priority, so we need a way for the container executor to launch them at a lower priority. I'm not sure how this applies to windows if at all.
YARN-429. Blocker bug reported by Siddharth Seth and fixed by Siddharth Seth (resourcemanager)
capacity-scheduler config missing from yarn-test artifact

MiniYARNCluster and MiniMRCluster are unusable by downstream projects with the 2.0.3-alpha release, since the capacity-scheduler configuration is missing from the test artifact. hadoop-yarn-server-tests-3.0.0-SNAPSHOT-tests.jar should include the default capacity-scheduler configuration. Also, this doesn't need to be part of the default classpath - and should be moved out of the top level directory in the dist package.
MAPREDUCE-5117. Blocker bug reported by Roman Shaposhnik and fixed by Siddharth Seth (security)
With security enabled HS delegation token renewer fails
MAPREDUCE-5094. Major bug reported by Siddharth Seth and fixed by Siddharth Seth
Disable mem monitoring by default in MiniMRYarnCluster
MAPREDUCE-5088. Blocker bug reported by Roman Shaposhnik and fixed by Daryn Sharp
MR Client gets an renewer token exception while Oozie is submitting a job
MAPREDUCE-5083. Major bug reported by Siddharth Seth and fixed by Siddharth Seth (mrv2)
MiniMRCluster should use a random component when creating an actual cluster

Committed to branch-2.0.4. Modified changes.txt in trunk, branch-2 and branch-2.0.4 accordingly.
MAPREDUCE-5053. Major bug reported by Robert Parker and fixed by Robert Parker
java.lang.InternalError from decompression codec cause reducer to fail
MAPREDUCE-5023. Critical bug reported by Kendall Thrapp and fixed by Ravi Prakash (jobhistoryserver , webapps)
History Server Web Services missing Job Counters
MAPREDUCE-5006. Major bug reported by Alejandro Abdelnur and fixed by Sandy Ryza (contrib/streaming)
streaming tests failing
MAPREDUCE-4549. Blocker bug reported by Robert Joseph Evans and fixed by Robert Joseph Evans (mrv2)
Distributed cache conflicts breaks backwards compatability
MAPREDUCE-3685. Critical bug reported by anty.rao and fixed by anty (mrv2)
There are some bugs in implementation of MergeManager
HDFS-4649. Blocker bug reported by Daryn Sharp and fixed by Daryn Sharp (namenode , security , webhdfs)
Webhdfs cannot list large directories
HDFS-4646. Minor bug reported by Jagane Sundar and fixed by (namenode)
createNNProxyWithClientProtocol ignores configured timeout value
HDFS-4581. Major bug reported by Rohit Kochar and fixed by Rohit Kochar (datanode)
DataNode#checkDiskError should not be called on network errors
HDFS-4577. Major sub-task reported by Daryn Sharp and fixed by Daryn Sharp (webhdfs)
Webhdfs operations should declare if authentication is required
HDFS-4571. Major bug reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur (webhdfs)
WebHDFS should not set the service hostname on the server side
HDFS-4567. Major sub-task reported by Daryn Sharp and fixed by Daryn Sharp (webhdfs)
Webhdfs does not need a token for token operations
HDFS-4566. Major sub-task reported by Daryn Sharp and fixed by Daryn Sharp (webhdfs)
Webdhfs token cancelation should use authentication
HDFS-4560. Major sub-task reported by Daryn Sharp and fixed by Daryn Sharp (webhdfs)
Webhdfs cannot use tokens obtained by another user
HDFS-4548. Blocker sub-task reported by Daryn Sharp and fixed by Daryn Sharp
Webhdfs doesn't renegotiate SPNEGO token
HDFS-3344. Major bug reported by Tsz Wo (Nicholas), SZE and fixed by Kihwal Lee (namenode)
Unreliable corrupt blocks counting in TestProcessCorruptBlocks
HADOOP-9471. Major bug reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur (build)
hadoop-client wrongfully excludes jetty-util JAR, breaking webhdfs
HADOOP-9467. Major bug reported by Chris Nauroth and fixed by Chris Nauroth (metrics)
Metrics2 record filtering (.record.filter.include/exclude) does not filter by name
HADOOP-9444. Blocker bug reported by Konstantin Boudnik and fixed by Roman Shaposhnik (conf)
$var shell substitution in properties are not expanded in hadoop-policy.xml
HADOOP-9406. Major bug reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur (build)
hadoop-client leaks dependency on JDK tools jar
HADOOP-9405. Minor bug reported by Andrew Wang and fixed by Andrew Wang (test , tools)
TestGridmixSummary#testExecutionSummarizer is broken
HADOOP-9399. Minor bug reported by Todd Lipcon and fixed by Konstantin Boudnik (build)
protoc maven plugin doesn't work on mvn 3.0.2

Committed to 2.0.4-alpha branch
HADOOP-9379. Trivial improvement reported by Arpit Gupta and fixed by Arpit Gupta
capture the ulimit info after printing the log to the console
HADOOP-9374. Major improvement reported by Daryn Sharp and fixed by Daryn Sharp (security)
Add tokens from -tokenCacheFile into UGI
HADOOP-9301. Blocker bug reported by Roman Shaposhnik and fixed by Alejandro Abdelnur (build)
hadoop client servlet/jsp/jetty/tomcat JARs creating conflicts in Oozie & HttpFS
HADOOP-9299. Blocker bug reported by Roman Shaposhnik and fixed by Daryn Sharp (security)
kerberos name resolution is kicking in even when kerberos is not configured

Hadoop 2.0.3-alpha Release Notes

These release notes include new developer and user-facing incompatibilities, features, and major improvements.

Changes since Hadoop 2.0.2

YARN-372. Minor task reported by Siddharth Seth and fixed by Siddharth Seth
Move InlineDispatcher from hadoop-yarn-server-resourcemanager to hadoop-yarn-common

InlineDispatcher is a utility used in unit tests. Belongs in yarn-common instead of yarn-server-resource-manager.
YARN-364. Major bug reported by Jason Lowe and fixed by Jason Lowe
AggregatedLogDeletionService can take too long to delete logs

AggregatedLogDeletionService uses the yarn.log-aggregation.retain-seconds property to determine which logs should be deleted, but it uses the same value to determine how often to check for old logs. This means logs could actually linger up to twice as long as configured.
YARN-360. Critical bug reported by Daryn Sharp and fixed by Daryn Sharp
Allow apps to concurrently register tokens for renewal

{{DelegationTokenRenewer#addApplication}} has an unnecessary {{synchronized}} keyword. This serializes job submissions and can add unnecessary latency and/or hang all submissions if there are problems renewing the token.
YARN-357. Major bug reported by Daryn Sharp and fixed by Daryn Sharp (resourcemanager)
App submission should not be synchronized

MAPREDUCE-2953 fixed a race condition with querying of app status by making {{RMClientService#submitApplication}} synchronously invoke {{RMAppManager#submitApplication}}. However, the {{synchronized}} keyword was also added to {{RMAppManager#submitApplication}} with the comment: bq. I made the submitApplication synchronized to keep it consistent with the other routines in RMAppManager although I do not believe it needs it since the rmapp datastructure is already a concurrentMap and I don't see anything else that would be an issue. It's been observed that app submission latency is being unnecessarily impacted.
YARN-355. Blocker bug reported by Daryn Sharp and fixed by Daryn Sharp (resourcemanager)
RM app submission jams under load

The RM performs a loopback connection to itself to renew its own tokens. If app submissions consume all RPC handlers for {{ClientRMProtocol}}, then app submissions block because it cannot loopback to itself to do the renewal.
YARN-354. Blocker bug reported by Liang Xie and fixed by Liang Xie
WebAppProxyServer exits immediately after startup

Please see HDFS-4426 for detail, i found the yarn WebAppProxyServer is broken by HADOOP-9181 as well, here's the hot fix, and i verified manually in our test cluster. I'm really applogized for bring about such trouble...
YARN-343. Major bug reported by Thomas Graves and fixed by Xuan Gong (capacityscheduler)
Capacity Scheduler maximum-capacity value -1 is invalid

I tried to start the resource manager using the capacity scheduler with a particular queues maximum-capacity set to -1 which is supposed to disable it according to the docs but I got the following exception: java.lang.IllegalArgumentException: Illegal value of maximumCapacity -0.01 used in call to setMaxCapacity for queue foo at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CSQueueUtils.checkMaxCapacity(CSQueueUtils.java:31) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.setupQueueConfigs(LeafQueue.java:220) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.<init>(LeafQueue.java:191) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.parseQueue(CapacityScheduler.java:310) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.parseQueue(CapacityScheduler.java:325) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.initializeQueues(CapacityScheduler.java:232) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitialize(CapacityScheduler.java:202)
YARN-336. Major bug reported by Sandy Ryza and fixed by Sandy Ryza (scheduler)
Fair scheduler FIFO scheduling within a queue only allows 1 app at a time

The fair scheduler allows apps to be scheduled in FIFO fashion within a queue. Currently, when this setting is turned on, the scheduler only allows one app to run at a time. While apps submitted earlier should get first priority for allocations, when there is space remaining, other apps should have a change to get at them.
YARN-334. Critical bug reported by Thomas Graves and fixed by Thomas Graves
Maven RAT plugin is not checking all source files

yarn side of HADOOP-9097 Running 'mvn apache-rat:check' passes, but running RAT by hand (by downloading the JAR) produces some warnings for Java files, amongst others.
YARN-331. Major improvement reported by Sandy Ryza and fixed by Sandy Ryza (scheduler)
Fill in missing fair scheduler documentation

In the fair scheduler documentation, a few config options are missing: locality.threshold.node locality.threshold.rack max.assign aclSubmitApps minSharePreemptionTimeout
YARN-330. Major bug reported by Hitesh Shah and fixed by Sandy Ryza (nodemanager)
Flakey test: TestNodeManagerShutdown#testKillContainersOnShutdown

=Seems to be timing related as the container status RUNNING as returned by the ContainerManager does not really indicate that the container task has been launched. Sleep of 5 seconds is not reliable. Running org.apache.hadoop.yarn.server.nodemanager.TestNodeManagerShutdown Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 9.353 sec <<< FAILURE! testKillContainersOnShutdown(org.apache.hadoop.yarn.server.nodemanager.TestNodeManagerShutdown) Time elapsed: 9283 sec <<< FAILURE! junit.framework.AssertionFailedError: Did not find sigterm message at junit.framework.Assert.fail(Assert.java:47) at junit.framework.Assert.assertTrue(Assert.java:20) at org.apache.hadoop.yarn.server.nodemanager.TestNodeManagerShutdown.testKillContainersOnShutdown(TestNodeManagerShutdown.java:162) Logs: 2013-01-09 14:13:08,401 INFO [AsyncDispatcher event handler] container.Container (ContainerImpl.java:handle(835)) - Container container_0_0000_01_000000 transitioned from NEW to LOCALIZING 2013-01-09 14:13:08,412 INFO [AsyncDispatcher event handler] localizer.LocalizedResource (LocalizedResource.java:handle(194)) - Resource file:hadoop-common/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/target/org.apache.hadoop.yarn.server.nodemanager.TestNodeManagerShutdown/tmpDir/scriptFile.sh transitioned from INIT to DOWNLOADING 2013-01-09 14:13:08,412 INFO [AsyncDispatcher event handler] localizer.ResourceLocalizationService (ResourceLocalizationService.java:handle(521)) - Created localizer for container_0_0000_01_000000 2013-01-09 14:13:08,589 INFO [LocalizerRunner for container_0_0000_01_000000] localizer.ResourceLocalizationService (ResourceLocalizationService.java:writeCredentials(895)) - Writing credentials to the nmPrivate file hadoop-common/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/target/org.apache.hadoop.yarn.server.nodemanager.TestNodeManagerShutdown/nm0/nmPrivate/container_0_0000_01_000000.tokens. Credentials list: 2013-01-09 14:13:08,628 INFO [LocalizerRunner for container_0_0000_01_000000] nodemanager.DefaultContainerExecutor (DefaultContainerExecutor.java:createUserCacheDirs(373)) - Initializing user nobody 2013-01-09 14:13:08,709 INFO [main] containermanager.ContainerManagerImpl (ContainerManagerImpl.java:getContainerStatus(538)) - Returning container_id {, app_attempt_id {, application_id {, id: 0, cluster_timestamp: 0, }, attemptId: 1, }, }, state: C_RUNNING, diagnostics: "", exit_status: -1000, 2013-01-09 14:13:08,781 INFO [LocalizerRunner for container_0_0000_01_000000] nodemanager.DefaultContainerExecutor (DefaultContainerExecutor.java:startLocalizer(99)) - Copying from hadoop-common/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/target/org.apache.hadoop.yarn.server.nodemanager.TestNodeManagerShutdown/nm0/nmPrivate/container_0_0000_01_000000.tokens to hadoop-common/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/target/org.apache.hadoop.yarn.server.nodemanager.TestNodeManagerShutdown/nm0/usercache/nobody/appcache/application_0_0000/container_0_0000_01_000000.tokens
YARN-328. Major improvement reported by Suresh Srinivas and fixed by Suresh Srinivas (resourcemanager)
Use token request messages defined in hadoop common

YARN changes related to HADOOP-9192 to reuse the protobuf messages defined in common.
YARN-325. Blocker bug reported by Jason Lowe and fixed by Arun C Murthy (capacityscheduler)
RM CapacityScheduler can deadlock when getQueueInfo() is called and a container is completing

If a client calls getQueueInfo on a parent queue (e.g.: the root queue) and containers are completing then the RM can deadlock. getQueueInfo() locks the ParentQueue and then calls the child queues' getQueueInfo() methods in turn. However when a container completes, it locks the LeafQueue then calls back into the ParentQueue. When the two mix, it's a recipe for deadlock. Stacktrace to follow.
YARN-320. Blocker bug reported by Daryn Sharp and fixed by Daryn Sharp (resourcemanager)
RM should always be able to renew its own tokens

YARN-280 introduced fast-fail for job submissions with bad tokens. Unfortunately, other stack components like oozie and customers are acquiring RM tokens with a hardcoded dummy renewer value. These jobs would fail after 24 hours because the RM token couldn't be renewed, but fast-fail is failing them immediately. The RM should always be able to renew its own tokens submitted with a job. The renewer field may continue to specify an external user who can renew.
YARN-319. Major bug reported by shenhong and fixed by shenhong (resourcemanager , scheduler)
Submit a job to a queue that not allowed in fairScheduler, client will hold forever.

RM use fairScheduler, when client submit a job to a queue, but the queue do not allow the user to submit job it, in this case, client will hold forever.
YARN-315. Major improvement reported by Suresh Srinivas and fixed by Suresh Srinivas
Use security token protobuf definition from hadoop common

YARN part of HADOOP-9173.
YARN-302. Major bug reported by Sandy Ryza and fixed by Sandy Ryza (resourcemanager , scheduler)
Fair scheduler assignmultiple should default to false

The MR1 default was false. When true, it results in overloading some machines with many tasks and underutilizing others.
YARN-301. Major bug reported by shenhong and fixed by shenhong (resourcemanager , scheduler)
Fair scheduler throws ConcurrentModificationException when iterating over app's priorities

In my test cluster, fairscheduler appear to concurrentModificationException and RM crash, here is the message: 2012-12-30 17:14:17,171 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in handling event type NODE_UPDATE to the scheduler java.util.ConcurrentModificationException at java.util.TreeMap$PrivateEntryIterator.nextEntry(TreeMap.java:1100) at java.util.TreeMap$KeyIterator.next(TreeMap.java:1154) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AppSchedulable.assignContainer(AppSchedulable.java:297) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.assignContainer(FSLeafQueue.java:181) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.nodeUpdate(FairScheduler.java:780) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:842) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:98) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:340) at java.lang.Thread.run(Thread.java:662)
YARN-300. Major bug reported by shenhong and fixed by Sandy Ryza (resourcemanager , scheduler)
After YARN-271, fair scheduler can infinite loop and not schedule any application.

After yarn-271, when yarn.scheduler.fair.max.assign<=0, when a node was been reserved, fairScheduler will infinite loop and not schedule any application.
YARN-293. Critical bug reported by Devaraj K and fixed by Robert Joseph Evans (nodemanager)
Node Manager leaks LocalizerRunner object for every Container

Node Manager creates a new LocalizerRunner object for every container and puts in ResourceLocalizationService.LocalizerTracker.privLocalizers map but it never removes from the map.
YARN-288. Major bug reported by Sandy Ryza and fixed by Sandy Ryza (resourcemanager , scheduler)
Fair scheduler queue doesn't accept any jobs when ACLs are configured.

If a queue is configured with an ACL for who can submit jobs, no jobs are allowed, even if a user on the list tries. This is caused by using the scheduler thinking the user is "yarn", because it calls UserGroupInformation.getCurrentUser() instead of UserGroupInformation.createRemoteUser() with the given user name.
YARN-286. Major new feature reported by Tom White and fixed by Tom White (applications)
Add a YARN ApplicationClassLoader

Add a classloader that provides webapp-style class isolation for use by applications. This is the YARN part of MAPREDUCE-1700 (which was already developed in that JIRA).
YARN-285. Major improvement reported by Derek Dagit and fixed by Derek Dagit
RM should be able to provide a tracking link for apps that have already been purged

As applications complete, the RM tracks their IDs in a completed list. This list is routinely truncated to limit the total number of application remembered by the RM. When a user clicks the History for a job, either the browser is redirected to the application's tracking link obtained from the stored application instance. But when the application has been purged from the RM, an error is displayed. In very busy clusters the rate at which applications complete can cause applications to be purged from the RM's internal list within hours, which breaks the proxy URLs users have saved for their jobs. We would like the RM to provide valid tracking links persist so that users are not frustrated by broken links.
YARN-283. Major bug reported by Sandy Ryza and fixed by Sandy Ryza (scheduler)
Fair scheduler fails to get queue info without root prefix

If queue1 exists, and a client calls "mapred queue -info queue1", an exception is thrown. If they use root.queue1, it works correctly.
YARN-282. Major bug reported by Sandy Ryza and fixed by Sandy Ryza
Fair scheduler web UI double counts Apps Submitted

Each app submitted is reported twice under "Apps Submitted"
YARN-280. Major sub-task reported by Daryn Sharp and fixed by Daryn Sharp (resourcemanager)
RM does not reject app submission with invalid tokens

The RM will launch an app with invalid tokens. The tasks will languish with failed connection retries, followed by task reattempts, followed by app reattempts.
YARN-278. Major bug reported by Sandy Ryza and fixed by Sandy Ryza (resourcemanager , scheduler)
Fair scheduler maxRunningApps config causes no apps to make progress

This occurs because the scheduler erroneously chooses apps to offer resources to that are not runnable, then later decides they are not runnable, and doesn't try to give the resources to anyone else.
YARN-277. Major improvement reported by Bikas Saha and fixed by Bikas Saha
Use AMRMClient in DistributedShell to exemplify the approach
YARN-272. Major bug reported by Sandy Ryza and fixed by Sandy Ryza (scheduler)
Fair scheduler log messages try to print objects without overridden toString methods

A lot of junk gets printed out like this: 2012-12-11 17:31:52,998 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSSchedulerApp: Application application_1355270529654_0003 reserved container org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl@324f0f97 on node host: c1416.hal.cloudera.com:46356 #containers=7 available=0 used=8192, currently has 4 at priority org.apache.hadoop.yarn.api.records.impl.pb.PriorityPBImpl@33; currentReservation 4096
YARN-271. Major bug reported by Sandy Ryza and fixed by Sandy Ryza (resourcemanager , scheduler)
Fair scheduler hits IllegalStateException trying to reserve different apps on same node

After the fair scheduler reserves a container on a node, it doesn't check for reservations it just made when trying to make more reservations during the same heartbeat.
YARN-267. Major bug reported by Sandy Ryza and fixed by Sandy Ryza (resourcemanager , scheduler)
Fix fair scheduler web UI

The fair scheduler web UI was broken by MAPREDUCE-4720. The queues area is not shown, and changes are required to still show the fair share inside the applications table.
YARN-266. Critical bug reported by Ravi Prakash and fixed by Ravi Prakash (resourcemanager)
RM and JHS Web UIs are blank because AppsBlock is not escaping string properly

e.g. Job names with a line feed "\n" are causing a line feed in the JSON array being written out (since we are only using StringEscapeUtils.escapeHtml() ) and the Javascript parser complains that string quotes are unclosed. This
YARN-264. Major bug reported by Karthik Kambatla and fixed by Karthik Kambatla
y.s.rm.DelegationTokenRenewer attempts to renew token even after removing an app

yarn.s.rm.security.DelegationTokenRenewer uses TimerTask/Timer. When such a timer task is canceled, already scheduled tasks run to completion. The task should check for such cancellation before running. Also, delegationTokens needs to be synchronized on all accesses.
YARN-258. Major bug reported by Ravi Prakash and fixed by Ravi Prakash (resourcemanager)
RM web page UI shows Invalid Date for start and finish times

Whenever the number of jobs was greater than a 100, two javascript arrays were being populated. appsData and appsTableData. appsData was winning out (because it was coming out later) and so renderHadoopDate was trying to render a <br title=""...> string.
YARN-254. Major improvement reported by Sandy Ryza and fixed by Sandy Ryza (resourcemanager , scheduler)
Update fair scheduler web UI for hierarchical queues

The fair scheduler should have a web UI similar to the capacity scheduler that shows nested queues.
YARN-253. Critical bug reported by Tom White and fixed by Tom White (nodemanager)
Container launch may fail if no files were localized

This can be demonstrated with DistributedShell. The containers running the shell do not have any files to localize (if there is no shell script to copy) so if they run on a different NM to the AM (which does localize files), then they will fail since the appcache directory does not exist.
YARN-251. Major bug reported by Tom White and fixed by Tom White (resourcemanager)
Proxy URI generation fails for blank tracking URIs

If the URI is an empty string (the default if not set), then a warning is displayed. A null URI displays no such warning. These two cases should be handled in the same way.
YARN-230. Major sub-task reported by Bikas Saha and fixed by Bikas Saha (resourcemanager)
Make changes for RM restart phase 1

As described in YARN-128, phase 1 of RM restart puts in place mechanisms to save application state and read them back after restart. Upon restart, the NM's are asked to reboot and the previously running AM's are restarted. After this is done, RM HA and work preserving restart can continue in parallel. For more details please refer to the design document in YARN-128
YARN-229. Major sub-task reported by Bikas Saha and fixed by Bikas Saha (resourcemanager)
Remove old code for restart

Much of the code is dead/commented out and is not executed. Removing it will help with making and understanding new changes.
YARN-225. Critical bug reported by Devaraj K and fixed by Devaraj K (resourcemanager)
Proxy Link in RM UI thows NPE in Secure mode

{code:xml} java.lang.NullPointerException at org.apache.hadoop.yarn.server.webproxy.WebAppProxyServlet.doGet(WebAppProxyServlet.java:241) at javax.servlet.http.HttpServlet.service(HttpServlet.java:707) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221) at org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:109) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:975) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:326) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) {code}
YARN-224. Major bug reported by Sandy Ryza and fixed by Sandy Ryza
Fair scheduler logs too many nodeUpdate INFO messages

The RM logs are filled with an INFO message the fair scheduler logs every time it receives a nodeUpdate. It should be taken out or demoted to debug.
YARN-223. Critical bug reported by Radim Kolar and fixed by Radim Kolar
Change processTree interface to work better with native code

Problem is that on every update of processTree new object is required. This is undesired when working with processTree implementation in native code. replace ProcessTree.getProcessTree() with updateProcessTree(). No new object allocation is needed and it simplify application code a bit.
YARN-222. Major improvement reported by Sandy Ryza and fixed by Sandy Ryza (resourcemanager , scheduler)
Fair scheduler should create queue for each user by default

In MR1 the fair scheduler's default behavior was to create a pool for each user. The YARN fair scheduler has this capability, but it should be turned on by default, for consistency.
YARN-219. Critical sub-task reported by Robert Joseph Evans and fixed by Robert Joseph Evans (nodemanager)
NM should aggregate logs when application finishes.

The NM should only aggregate logs when the application finishes. This will reduce the load on the NN, especially with respect to lease renewal.
YARN-217. Blocker bug reported by Devaraj K and fixed by Devaraj K (resourcemanager)
yarn rmadmin commands fail in secure cluster

All the rmadmin commands fail in secure mode with the "protocol org.apache.hadoop.yarn.server.nodemanager.api.RMAdminProtocolPB is unauthorized" message in RM logs.
YARN-216. Major improvement reported by Todd Lipcon and fixed by Robert Joseph Evans
Remove jquery theming support

As of today we have 9.4MB of JQuery themes in our code tree. In addition to being a waste of space, it's a highly questionable feature. I've never heard anyone complain that the Hadoop interface isn't themeable enough, and there's far more value in consistency across installations than there is in themeability. Let's rip it out.
YARN-214. Major bug reported by Jason Lowe and fixed by Jonathan Eagles (resourcemanager)
RMContainerImpl does not handle event EXPIRE at state RUNNING

RMContainerImpl has a race condition where a container can enter the RUNNING state just as the container expires. This results in an invalid event transition error: {noformat} 2012-11-11 05:31:38,954 [ResourceManager Event Processor] ERROR org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: Can't handle this event at current state org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: EXPIRE at RUNNING at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443) at org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:205) at org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:44) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApp.containerCompleted(SchedulerApp.java:203) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.completedContainer(LeafQueue.java:1337) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.completedContainer(CapacityScheduler.java:739) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:659) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:80) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:340) at java.lang.Thread.run(Thread.java:619) {noformat} EXPIRE needs to be handled (well at least ignored) in the RUNNING state to account for this race condition.
YARN-212. Blocker bug reported by Nathan Roberts and fixed by Nathan Roberts (nodemanager)
NM state machine ignores an APPLICATION_CONTAINER_FINISHED event when it shouldn't

The NM state machines can make the following two invalid state transitions when a speculative attempt is killed shortly after it gets started. When this happens the NM keeps the log aggregation context open for this application and therefore chews up FDs and leases on the NN, eventually running the NN out of FDs and bringing down the entire cluster. 2012-11-07 05:36:33,774 [AsyncDispatcher event handler] WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Can't handle this event at current state org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: APPLICATION_CONTAINER_FINISHED at INITING 2012-11-07 05:36:33,775 [AsyncDispatcher event handler] WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Can't handle this event at current state: Current: [DONE], eventType: [INIT_CONTAINER] org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: INIT_CONTAINER at DONE
YARN-206. Major bug reported by Jason Lowe and fixed by Jason Lowe (resourcemanager)
TestApplicationCleanup.testContainerCleanup occasionally fails

testContainerCleanup is occasionally failing with the error: testContainerCleanup(org.apache.hadoop.yarn.server.resourcemanager.TestApplicationCleanup): expected:<2> but was:<1>
YARN-204. Major bug reported by Aleksey Gorshkov and fixed by Aleksey Gorshkov (applications)
test coverage for org.apache.hadoop.tools

Added some tests for org.apache.hadoop.tools
YARN-202. Critical bug reported by Kihwal Lee and fixed by Kihwal Lee
Log Aggregation generates a storm of fsync() for namenode

When the log aggregation is on, write to each aggregated container log causes hflush() to be called. For large clusters, this can creates a lot of fsync() calls for namenode. We have seen 6-7x increase in the average number of fsync operations compared to 1.0.x on a large busy cluster. Over 99% of fsync ops were for log aggregation writing to tmp files.
YARN-201. Critical bug reported by Jason Lowe and fixed by Jason Lowe (capacityscheduler)
CapacityScheduler can take a very long time to schedule containers if requests are off cluster

When a user runs a job where one of the input files is a large file on another cluster, the job can create many splits on nodes which are unreachable for computation from the current cluster. The off-switch delay logic in LeafQueue can cause the ResourceManager to allocate containers for the job very slowly. In one case the job was only getting one container every 23 seconds, and the queue had plenty of spare capacity.
YARN-189. Blocker bug reported by Thomas Graves and fixed by Thomas Graves (resourcemanager)
deadlock in RM - AMResponse object

we ran into a deadlock in the RM. ============================= "1128743461@qtp-1252749669-5201": waiting for ownable synchronizer 0x00002aabbc87b960, (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync), which is held by "AsyncDispatcher event handler" "AsyncDispatcher event handler": waiting to lock monitor 0x00002ab0bba3a370 (object 0x00002aab3d4cd698, a org.apache.hadoop.yarn.api.records.impl.pb.AMResponsePBImpl), which is held by "IPC Server handler 36 on 8030" "IPC Server handler 36 on 8030": waiting for ownable synchronizer 0x00002aabbc87b960, (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync), which is held by "AsyncDispatcher event handler" Java stack information for the threads listed above: =================================================== "1128743461@qtp-1252749669-5201": at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x00002aabbc87b960> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811) at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:941) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1261) at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:594) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.getFinalApplicationStatus(RMAppAttemptImpl.java:2 95) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.getFinalApplicationStatus(RMAppImpl.java:222) at org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices.getApps(RMWebServices.java:328) at sun.reflect.GeneratedMethodAccessor41.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60) at com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$TypeOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:185) at com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaM ... ... .. "AsyncDispatcher event handler": at org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.unregisterAttempt(ApplicationMasterService.java:307) - waiting to lock <0x00002aab3d4cd698> (a org.apache.hadoop.yarn.api.records.impl.pb.AMResponsePBImpl) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$BaseFinalTransition.transition(RMAppAttemptImpl.java:647) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$FinalTransition.transition(RMAppAttemptImpl.java:809) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$FinalTransition.transition(RMAppAttemptImpl.java:796) at org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:357) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:298) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443) - locked <0x00002aabbb673090> (a org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:478) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:81) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:436) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:417) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75) at java.lang.Thread.run(Thread.java:619) "IPC Server handler 36 on 8030": at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x00002aabbc87b960> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:842) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1178) at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:807) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.pullJustFinishedContainers(RMAppAttemptImpl.java:437) at org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:285) - locked <0x00002aab3d4cd698> (a org.apache.hadoop.yarn.api.records.impl.pb.AMResponsePBImpl) at org.apache.hadoop.yarn.api.impl.pb.service.AMRMProtocolPBServiceImpl.allocate(AMRMProtocolPBServiceImpl.java:56) at org.apache.hadoop.yarn.proto.AMRMProtocol$AMRMProtocolService$2.callBlockingMethod(AMRMProtocol.java:87) at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Server.call(ProtoOverHadoopRpcEngine.java:353) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1528) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1524) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1212) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1522)
YARN-188. Major test reported by Aleksey Gorshkov and fixed by Aleksey Gorshkov (capacityscheduler)
Coverage fixing for CapacityScheduler

some tests for CapacityScheduler YARN-188-branch-0.23.patch patch for branch 0.23 YARN-188-branch-2.patch patch for branch 2 YARN-188-trunk.patch patch for trunk
YARN-187. Major new feature reported by Sandy Ryza and fixed by Sandy Ryza (scheduler)
Add hierarchical queues to the fair scheduler
YARN-186. Major test reported by Aleksey Gorshkov and fixed by Aleksey Gorshkov (resourcemanager , scheduler)
Coverage fixing LinuxContainerExecutor

Added some tests for LinuxContainerExecuror YARN-186-branch-0.23.patch patch for branch-0.23 YARN-186-branch-2.patch patch for branch-2 ARN-186-trunk.patch patch for trank
YARN-184. Major improvement reported by Sandy Ryza and fixed by Sandy Ryza
Remove unnecessary locking in fair scheduler, and address findbugs excludes.

In YARN-12, locks were added to all fields of QueueManager to address findbugs. In addition, findbugs exclusions were added in response to MAPREDUCE-4439, without a deep look at the code.
YARN-183. Minor improvement reported by Sandy Ryza and fixed by Sandy Ryza (scheduler)
Clean up fair scheduler code

The fair scheduler code has a bunch of minor stylistic issues.
YARN-181. Critical bug reported by Siddharth Seth and fixed by Siddharth Seth (resourcemanager)
capacity-scheduler.xml move breaks Eclipse import

Eclipse doesn't seem to handle "testResources" which resolve to an absolute path. YARN-140 moved capacity-scheduler.cfg a couple of levels up to the hadoop-yarn project.
YARN-180. Critical bug reported by Thomas Graves and fixed by Arun C Murthy (capacityscheduler)
Capacity scheduler - containers that get reserved create container token to early

The capacity scheduler has the ability to 'reserve' containers. Unfortunately before it decides that it goes to reserved rather then assigned, the Container object is created which creates a container token that expires in roughly 10 minutes by default. This means that by the time the NM frees up enough space on that node for the container to move to assigned the container token may have expired.
YARN-179. Blocker bug reported by Vinod Kumar Vavilapalli and fixed by Vinod Kumar Vavilapalli (capacityscheduler)
Bunch of test failures on trunk

{{CapacityScheduler.setConf()}} mandates a YarnConfiguration. It doesn't need to, throughout all of YARN, components only depend on Configuration and depend on the callers to provide correct configuration. This is causing multiple tests to fail.
YARN-178. Critical bug reported by Radim Kolar and fixed by Radim Kolar
Fix custom ProcessTree instance creation

1. In current pluggable resourcecalculatorprocesstree is not passed root process id to custom implementation making it unusable. 2. pstree do not extend Configured as it should Added constructor with pid argument with testsuite. Also added test that pstree is correctly configured.
YARN-177. Critical bug reported by Thomas Graves and fixed by Arun C Murthy (capacityscheduler)
CapacityScheduler - adding a queue while the RM is running has wacky results

Adding a queue to the capacity scheduler while the RM is running and then running a job in the queue added results in very strange behavior. The cluster Total Memory can either decrease or increase. We had a cluster where total memory decreased to almost 1/6th the capacity. Running on a small test cluster resulted in the capacity going up by simply adding a queue and running wordcount. Looking at the RM logs, used memory can go negative but other logs show the number positive: 2012-10-21 22:56:44,796 [ResourceManager Event Processor] INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: assignedContainer queue=root usedCapacity=0.0375 absoluteUsedCapacity=0.0375 used=memory: 7680 cluster=memory: 204800 2012-10-21 22:56:45,831 [ResourceManager Event Processor] INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: completedContainer queue=root usedCapacity=-0.0225 absoluteUsedCapacity=-0.0225 used=memory: -4608 cluster=memory: 204800
YARN-170. Major bug reported by Sandy Ryza and fixed by Sandy Ryza (nodemanager)
NodeManager stop() gets called twice on shutdown

The stop method in the NodeManager gets called twice when the NodeManager is shut down via the shutdown hook. The first is the stop that gets called directly by the shutdown hook. The second occurs when the NodeStatusUpdaterImpl is stopped. The NodeManager responds to the NodeStatusUpdaterImpl stop stateChanged event by stopping itself. This is so that NodeStatusUpdaterImpl can notify the NodeManager to stop, by stopping itself in response to a request from the ResourceManager This could be avoided if the NodeStatusUpdaterImpl were to stop the NodeManager by calling its stop method directly.
YARN-169. Minor improvement reported by Anthony Rojas and fixed by Anthony Rojas (nodemanager)
Update log4j.appender.EventCounter to use org.apache.hadoop.log.metrics.EventCounter

We should update the log4j.appender.EventCounter in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/resources/container-log4j.properties to use *org.apache.hadoop.log.metrics.EventCounter* rather than *org.apache.hadoop.metrics.jvm.EventCounter* to avoid triggering the following warning: {code}WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated. Please use org.apache.hadoop.log.metrics.EventCounter in all the log4j.properties files{code}
YARN-166. Major bug reported by Thomas Graves and fixed by Thomas Graves (capacityscheduler)
capacity scheduler doesn't allow capacity < 1.0

1.x supports queue capacity < 1, but in 0.23 the capacity scheduler doesn't. This is an issue for us since we have a large cluster running 1.x that currently has a queue with capacity 0.5%.
YARN-165. Blocker improvement reported by Jason Lowe and fixed by Jason Lowe (resourcemanager)
RM should point tracking URL to RM web page for app when AM fails

Currently when an ApplicationMaster fails the ResourceManager is updating the tracking URL to an empty string, see RMAppAttemptImpl.ContainerFinishedTransition. Unfortunately when the client attempts to follow the proxy URL it results in a web page showing an HTTP 500 error and an ugly backtrace because "http://" isn't a very helpful tracking URL. It would be much more helpful if the proxy URL redirected to the RM webapp page for the specific application. That page shows the various AM attempts and pointers to their logs which will be useful for debugging the problems that caused the AM attempts to fail.
YARN-163. Major bug reported by Jason Lowe and fixed by Jason Lowe (nodemanager)
Retrieving container log via NM webapp can hang with multibyte characters in log

ContainerLogsBlock.printLogs currently assumes that skipping N bytes in the log file is the same as skipping N characters, but that is not true when the log contains multibyte characters. This can cause the loop that skips a portion of the log to try to skip past the end of the file and loop forever (or until Jetty kills the worker thread).
YARN-161. Major bug reported by Chris Nauroth and fixed by Chris Nauroth (api)
Yarn Common has multiple compiler warnings for unchecked operations

The warnings are in classes StateMachineFactory, RecordFactoryProvider, RpcFactoryProvider, and YarnRemoteExceptionFactoryProvider. OpenJDK 1.6.0_24 actually treats these as compilation errors, causing the build to fail.
YARN-159. Major bug reported by Thomas Graves and fixed by Thomas Graves (resourcemanager)
RM web ui applications page should be sorted to display last app first

RM web ui applications page should be sorted to display last app first. It currently sorts with smallest application id first, which is the first apps that were submitted. After you have one page worth of apps its much more useful for it to sort such that the biggest appid (last submitted app) shows up first.
YARN-151. Major bug reported by Robert Joseph Evans and fixed by Ravi Prakash
Browser thinks RM main page JS is taking too long

The main RM page with the default settings of 10,000 applications can cause browsers to think that the JS on the page is stuck and ask you if you want to kill it. This is a big usability problem.
YARN-150. Major bug reported by Bikas Saha and fixed by Bikas Saha
AppRejectedTransition does not unregister app from master service and scheduler

AttemptStartedTransition() adds the app to the ApplicationMasterService and scheduler. when the scheduler rejects the app then AppRejectedTransition() forgets to unregister it from the ApplicationMasterService.
YARN-146. Major new feature reported by Sandy Ryza and fixed by Sandy Ryza (resourcemanager)
Add unit tests for computing fair share in the fair scheduler

MR1 had TestComputeFairShares. This should go into the YARN fair scheduler.
YARN-145. Major new feature reported by Sandy Ryza and fixed by Sandy Ryza (resourcemanager)
Add a Web UI to the fair share scheduler

The fair scheduler had a UI in MR1. Port the capacity scheduler web UI and modify appropriately for the fair share scheduler.
YARN-140. Major bug reported by Ahmed Radwan and fixed by Ahmed Radwan (capacityscheduler)
Add capacity-scheduler-default.xml to provide a default set of configurations for the capacity scheduler.

When setting up the capacity scheduler users are faced with problems like: {code} FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error starting ResourceManager java.lang.IllegalArgumentException: Illegal capacity of -1 for queue root {code} Which basically arises from missing basic configurations, which in many cases, there is no need to explicitly provide, and a default configuration will be sufficient. For example, to address the error above, the user need to add a capacity of 100 to the root queue. So, we need to add a capacity-scheduler-default.xml, this will be helpful to provide the basic set of default configurations required to run the capacity scheduler. The user can still override existing configurations or provide new ones in capacity-scheduler.xml. This is similar to *-default.xml vs *-site.xml for yarn, core, mapred, hdfs, etc.
YARN-139. Major bug reported by Nathan Roberts and fixed by Vinod Kumar Vavilapalli (api)
Interrupted Exception within AsyncDispatcher leads to user confusion

Successful applications tend to get InterruptedExceptions during shutdown. The exception is harmless but it leads to lots of user confusion and therefore could be cleaned up. 2012-09-28 14:50:12,477 WARN [AsyncDispatcher event handler] org.apache.hadoop.yarn.event.AsyncDispatcher: Interrupted Exception while stopping java.lang.InterruptedException at java.lang.Object.wait(Native Method) at java.lang.Thread.join(Thread.java:1143) at java.lang.Thread.join(Thread.java:1196) at org.apache.hadoop.yarn.event.AsyncDispatcher.stop(AsyncDispatcher.java:105) at org.apache.hadoop.yarn.service.CompositeService.stop(CompositeService.java:99) at org.apache.hadoop.yarn.service.CompositeService.stop(CompositeService.java:89) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobFinishEventHandler.handle(MRAppMaster.java:437) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobFinishEventHandler.handle(MRAppMaster.java:402) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75) at java.lang.Thread.run(Thread.java:619) 2012-09-28 14:50:12,477 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.service.AbstractService: Service:Dispatcher is stopped. 2012-09-28 14:50:12,477 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.service.AbstractService: Service:org.apache.hadoop.mapreduce.v2.app.MRAppMaster is stopped. 2012-09-28 14:50:12,477 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Exiting MR AppMaster..GoodBye
YARN-136. Major bug reported by Vinod Kumar Vavilapalli and fixed by Vinod Kumar Vavilapalli (resourcemanager)
Make ClientTokenSecretManager part of RMContext

Helps to add it to the context instead of passing it all around as an extra parameter.
YARN-135. Major sub-task reported by Vinod Kumar Vavilapalli and fixed by Vinod Kumar Vavilapalli (resourcemanager)
ClientTokens should be per app-attempt and be unregistered on App-finish.

Two issues: - ClientTokens are per app-attempt but are created per app. - Apps don't get unregistered from RMClientTokenSecretManager.
YARN-134. Major sub-task reported by Vinod Kumar Vavilapalli and fixed by Vinod Kumar Vavilapalli
ClientToAMSecretManager creates keys without checking for validity of the appID
YARN-133. Major bug reported by Thomas Graves and fixed by Ravi Prakash (resourcemanager)
update web services docs for RM clusterMetrics

Looks like jira https://issues.apache.org/jira/browse/MAPREDUCE-3747 added in more RM cluster metrics but the docs didn't get updated: http://hadoop.apache.org/docs/r0.23.3/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html#Cluster_Metrics_API
YARN-131. Major bug reported by Ahmed Radwan and fixed by Ahmed Radwan (capacityscheduler)
Incorrect ACL properties in capacity scheduler documentation

The CapacityScheduler apt file incorrectly specifies the property names controlling acls for application submission and queue administration. {{yarn.scheduler.capacity.root.<queue-path>.acl_submit_jobs}} should be {{yarn.scheduler.capacity.root.<queue-path>.acl_submit_applications}} {{yarn.scheduler.capacity.root.<queue-path>.acl_administer_jobs}} should be {{yarn.scheduler.capacity.root.<queue-path>.acl_administer_queue}} Uploading a patch momentarily.
YARN-129. Major improvement reported by Tom White and fixed by Tom White (client)
Simplify classpath construction for mini YARN tests

The test classpath includes a special file called 'mrapp-generated-classpath' (or similar in distributed shell) that is constructed at build time, and whose contents are a classpath with all the dependencies needed to run the tests. When the classpath for a container (e.g. the AM) is constructed the contents of mrapp-generated-classpath is read and added to the classpath, and the file itself is then added to the classpath so that later when the AM constructs a classpath for a task container it can propagate the test classpath correctly. This mechanism can be drastically simplified by propagating the system classpath of the current JVM (read from the java.class.path property) to a launched JVM, but only if running in the context of the mini YARN cluster. Any tests that use the mini YARN cluster will automatically work with this change. Although any that explicitly deal with mrapp-generated-classpath can be simplified.
YARN-127. Major bug reported by Vinod Kumar Vavilapalli and fixed by Vinod Kumar Vavilapalli
Move RMAdmin tool to the client package

It belongs to the client package and not the RM clearly.
YARN-116. Major bug reported by xieguiming and fixed by xieguiming (resourcemanager)
RM is missing ability to add include/exclude files without a restart

The "yarn.resourcemanager.nodes.include-path" default value is "", if we need to add an include file, we must currently restart the RM. I suggest that for adding an include or exclude file, there should be no need to restart the RM. We may only execute the refresh command. The HDFS NameNode already has this ability. Fix is to the modify HostsFileReader class instances: From: {code} public HostsFileReader(String inFile, String exFile) {code} To: {code} public HostsFileReader(Configuration conf, String NODES_INCLUDE_FILE_PATH,String DEFAULT_NODES_INCLUDE_FILE_PATH, String NODES_EXCLUDE_FILE_PATH,String DEFAULT_NODES_EXCLUDE_FILE_PATH) {code} And thus, we can read the config file dynamically when a {{refreshNodes}} is invoked and therefore have no need to restart the ResourceManager.
YARN-103. Major improvement reported by Bikas Saha and fixed by Bikas Saha
Add a yarn AM - RM client module

Add a basic client wrapper library to the AM RM protocol in order to prevent proliferation of code being duplicated everywhere. Provide helper functions to perform reverse mapping of container requests to RM allocation resource request table format.
YARN-102. Trivial bug reported by Devaraj K and fixed by Devaraj K (resourcemanager)
Move the apache licence header to the top of the file in MemStore.java
YARN-94. Major bug reported by Vinod Kumar Vavilapalli and fixed by Hitesh Shah (applications/distributed-shell)
DistributedShell jar should point to Client as the main class by default

Today, it says so.. {code} $ $YARN_HOME/bin/yarn jar $YARN_HOME/share/hadoop/yarn/hadoop-yarn-applications-distributedshell-$VERSION.jar RunJar jarFile [mainClass] args... {code}
YARN-93. Major bug reported by Jason Lowe and fixed by Jason Lowe (resourcemanager)
Diagnostics missing from applications that have finished but failed

If an application finishes in the YARN sense but fails in the app framework sense (e.g.: a failed MapReduce job) then diagnostics are missing from the RM web page for the application. The RM should be reporting diagnostic messages even for successful YARN applications.
YARN-82. Minor bug reported by Andy Isaacson and fixed by Hemanth Yamijala (nodemanager)
YARN local-dirs defaults to /tmp/nm-local-dir

{{yarn.nodemanager.local-dirs}} defaults to {{/tmp/nm-local-dir}}. It should be {hadoop.tmp.dir}/nm-local-dir or similar. Among other problems, this can prevent multiple test clusters from starting on the same machine. Thanks to Hemanth Yamijala for reporting this issue.
YARN-78. Major bug reported by Bikas Saha and fixed by Bikas Saha (applications)
Change UnmanagedAMLauncher to use YarnClientImpl

YARN-29 added a common client impl to talk to the RM. Use that in the UnmanagedAMLauncher.
YARN-72. Major bug reported by Hitesh Shah and fixed by Sandy Ryza (nodemanager)
NM should handle cleaning up containers when it shuts down

Ideally, the NM should wait for a limited amount of time when it gets a shutdown signal for existing containers to complete and kill the containers ( if we pick an aggressive approach ) after this time interval. For NMs which come up after an unclean shutdown, the NM should look through its directories for existing container.pids and try and kill an existing containers matching the pids found.
YARN-57. Major improvement reported by Radim Kolar and fixed by Radim Kolar (nodemanager)
Plugable process tree

Trunk version of Pluggable process tree. Work based on MAPREDUCE-4204
YARN-50. Blocker sub-task reported by Siddharth Seth and fixed by Siddharth Seth
Implement renewal / cancellation of Delegation Tokens

Currently, delegation tokens issues by the RM and History server cannot be renewed or cancelled. This needs to be implemented.
YARN-43. Major bug reported by Thomas Graves and fixed by Thomas Graves
TestResourceTrackerService fail intermittently on jdk7

Tests run: 6, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 1.73 sec <<< FAILURE! testDecommissionWithIncludeHosts(org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService) Time elapsed: 0.086 sec <<< FAILURE! junit.framework.AssertionFailedError: expected:<0> but was:<1> at junit.framework.Assert.fail(Assert.java:47) at junit.framework.Assert.failNotEquals(Assert.java:283) at junit.framework.Assert.assertEquals(Assert.java:64) at junit.framework.Assert.assertEquals(Assert.java:195) at junit.framework.Assert.assertEquals(Assert.java:201) at org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService.testDecommissionWithIncludeHosts(TestResourceTrackerService.java:90)
YARN-40. Major bug reported by Devaraj K and fixed by Devaraj K (client)
Provide support for missing yarn commands

1. status <app-id> 2. kill <app-id> (Already issue present with Id : MAPREDUCE-3793) 3. list-apps [all] 4. nodes-report
YARN-33. Major bug reported by Mayank Bansal and fixed by Mayank Bansal (nodemanager)
LocalDirsHandler should validate the configured local and log dirs

WHen yarn.nodemanager.log-dirs is with file:// URI then startup of node manager creates the directory like file:// under CWD. WHich should not be there. Thanks, Mayank
YARN-32. Major bug reported by Thomas Graves and fixed by Vinod Kumar Vavilapalli
TestApplicationTokens fails intermintently on jdk7

TestApplicationsTokens fails intermintently on jdk7.
YARN-30. Major bug reported by Thomas Graves and fixed by Thomas Graves
TestNMWebServicesApps, TestRMWebServicesApps and TestRMWebServicesNodes fail on jdk7

It looks like the string changed from "const class" to "constant". Tests run: 19, Failures: 3, Errors: 0, Skipped: 0, Time elapsed: 6.786 sec <<< FAILURE! testNodeAppsStateInvalid(org.apache.hadoop.yarn.server.nodemanager.webapp.TestNMWebServicesApps) Time elapsed: 0.248 sec <<< FAILURE! java.lang.AssertionError: exception message doesn't match, got: No enum constant org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationState.FOO_STATE expected: No enum const class org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationState.FOO_STATE
YARN-28. Major bug reported by Thomas Graves and fixed by Thomas Graves
TestCompositeService fails on jdk7

test TestCompositeService fails when run with jdk7. It appears it expects test testCallSequence to be called first and the sequence numbers to start at 0. On jdk7 its not being called first and sequence number has already been incremented.
YARN-23. Major improvement reported by Karthik Kambatla and fixed by Karthik Kambatla (scheduler)
FairScheduler: FSQueueSchedulable#updateDemand() - potential redundant aggregation

In FS, FSQueueSchedulable#updateDemand() limits the demand to maxTasks only after iterating though all the pools and computing the final demand. By checking if the demand has reached maxTasks in every iteration, we can avoid redundant work, at the expense of one condition check every iteration.
YARN-3. Major sub-task reported by Arun C Murthy and fixed by Andrew Ferguson
Add support for CPU isolation/monitoring of containers
YARN-2. Major new feature reported by Arun C Murthy and fixed by Arun C Murthy (capacityscheduler , scheduler)
Enhance CS to schedule accounting for both memory and cpu cores

With YARN being a general purpose system, it would be useful for several applications (MPI et al) to specify not just memory but also CPU (cores) for their resource requirements. Thus, it would be useful to the CapacityScheduler to account for both.
MAPREDUCE-4977. Major improvement reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur (documentation)
Documentation for pluggable shuffle and pluggable sort
MAPREDUCE-4971. Minor improvement reported by Arun C Murthy and fixed by Arun C Murthy
Minor extensibility enhancements
MAPREDUCE-4969. Major bug reported by Arpit Agarwal and fixed by Arpit Agarwal (test)
TestKeyValueTextInputFormat test fails with Open JDK 7
MAPREDUCE-4953. Major bug reported by Andy Isaacson and fixed by Andy Isaacson (pipes)
HadoopPipes misuses fprintf
MAPREDUCE-4949. Minor improvement reported by Sandy Ryza and fixed by Sandy Ryza (examples)
Enable multiple pi jobs to run in parallel
MAPREDUCE-4948. Critical bug reported by Junping Du and fixed by Junping Du (client)
TestYARNRunner.testHistoryServerToken failed on trunk
MAPREDUCE-4946. Critical bug reported by Jason Lowe and fixed by Jason Lowe (mr-am)
Type conversion of map completion events leads to performance problems with large jobs
MAPREDUCE-4936. Critical bug reported by Daryn Sharp and fixed by Arun C Murthy (mrv2)
JobImpl uber checks for cpu are wrong
MAPREDUCE-4934. Critical bug reported by Thomas Graves and fixed by Thomas Graves (build)
Maven RAT plugin is not checking all source files
MAPREDUCE-4928. Major improvement reported by Suresh Srinivas and fixed by Suresh Srinivas (applicationmaster , security)
Use token request messages defined in hadoop common

Protobuf message GetDelegationTokenRequestProto field renewer is made requried from optional. This change is not wire compatible with the older releases.
MAPREDUCE-4925. Major bug reported by Karthik Kambatla and fixed by Karthik Kambatla (examples)
The pentomino option parser may be buggy
MAPREDUCE-4924. Trivial bug reported by Robert Kanter and fixed by Robert Kanter (mrv1)
flakey test: org.apache.hadoop.mapred.TestClusterMRNotification.testMR
MAPREDUCE-4923. Minor bug reported by Sandy Ryza and fixed by Sandy Ryza (mrv1 , mrv2 , task)
Add toString method to TaggedInputSplit
MAPREDUCE-4921. Blocker bug reported by Daryn Sharp and fixed by Daryn Sharp (client)
JobClient should acquire HS token with RM principal
MAPREDUCE-4920. Major bug reported by Vinod Kumar Vavilapalli and fixed by Suresh Srinivas
Use security token protobuf definition from hadoop common
MAPREDUCE-4913. Major bug reported by Jason Lowe and fixed by Jason Lowe (mr-am)
TestMRAppMaster#testMRAppMasterMissingStaging occasionally exits
MAPREDUCE-4907. Major improvement reported by Sandy Ryza and fixed by Sandy Ryza (mrv1 , tasktracker)
TrackerDistributedCacheManager issues too many getFileStatus calls
MAPREDUCE-4905. Major test reported by Aleksey Gorshkov and fixed by Aleksey Gorshkov
test org.apache.hadoop.mapred.pipes
MAPREDUCE-4902. Trivial bug reported by Albert Chu and fixed by Albert Chu
Fix typo "receievd" should be "received" in log output
MAPREDUCE-4899. Major improvement reported by Derek Dagit and fixed by Derek Dagit
Provide a plugin to the Yarn Web App Proxy to generate tracking links for M/R appllications given the ID
MAPREDUCE-4895. Major bug reported by Dennis Y and fixed by Dennis Y
Fix compilation failure of org.apache.hadoop.mapred.gridmix.TestResourceUsageEmulators
MAPREDUCE-4894. Blocker bug reported by Siddharth Seth and fixed by Siddharth Seth (jobhistoryserver , mrv2)
Renewal / cancellation of JobHistory tokens
MAPREDUCE-4893. Major bug reported by Bikas Saha and fixed by Bikas Saha (applicationmaster)
MR AppMaster can do sub-optimal assignment of containers to map tasks leading to poor node locality
MAPREDUCE-4890. Critical bug reported by Jason Lowe and fixed by Jason Lowe (mr-am)
Invalid TaskImpl state transitions when task fails while speculating
MAPREDUCE-4884. Major bug reported by Chris Nauroth and fixed by Chris Nauroth (contrib/streaming , test)
streaming tests fail to start MiniMRCluster due to "Queue configuration missing child queue names for root"
MAPREDUCE-4861. Major bug reported by Karthik Kambatla and fixed by Karthik Kambatla
Cleanup: Remove unused mapreduce.security.token.DelegationTokenRenewal
MAPREDUCE-4856. Major bug reported by Sandy Ryza and fixed by Sandy Ryza (test)
TestJobOutputCommitter uses same directory as TestJobCleanup
MAPREDUCE-4848. Major bug reported by Jason Lowe and fixed by Jerry Chen (mr-am)
TaskAttemptContext cast error during AM recovery
MAPREDUCE-4845. Major improvement reported by Sandy Ryza and fixed by Sandy Ryza (client)
ClusterStatus.getMaxMemory() and getUsedMemory() exist in MR1 but not MR2
MAPREDUCE-4842. Blocker bug reported by Jason Lowe and fixed by Mariappan Asokan (mrv2)
Shuffle race can hang reducer
MAPREDUCE-4838. Major improvement reported by Arun C Murthy and fixed by Zhijie Shen
Add extra info to JH files
MAPREDUCE-4836. Major bug reported by Ravi Prakash and fixed by Ravi Prakash
Elapsed time for running tasks on AM web UI tasks page is 0
MAPREDUCE-4833. Critical bug reported by Robert Joseph Evans and fixed by Robert Parker (applicationmaster , mrv2)
Task can get stuck in FAIL_CONTAINER_CLEANUP
MAPREDUCE-4832. Critical bug reported by Robert Joseph Evans and fixed by Jason Lowe (applicationmaster)
MR AM can get in a split brain situation
MAPREDUCE-4825. Major bug reported by Jason Lowe and fixed by Jason Lowe (mr-am)
JobImpl.finished doesn't expect ERROR as a final job state
MAPREDUCE-4822. Trivial improvement reported by Robert Joseph Evans and fixed by Chu Tong (jobhistoryserver)
Unnecessary conversions in History Events
MAPREDUCE-4819. Blocker bug reported by Jason Lowe and fixed by Bikas Saha (mr-am)
AM can rerun job after reporting final job status to the client
MAPREDUCE-4817. Critical bug reported by Jason Lowe and fixed by Thomas Graves (applicationmaster , mr-am)
Hardcoded task ping timeout kills tasks localizing large amounts of data
MAPREDUCE-4813. Critical bug reported by Jason Lowe and fixed by Jason Lowe (applicationmaster)
AM timing out during job commit
MAPREDUCE-4811. Minor improvement reported by Ravi Prakash and fixed by Ravi Prakash (jobhistoryserver , mrv2)
JobHistoryServer should show when it was started in WebUI About page
MAPREDUCE-4810. Minor improvement reported by Jason Lowe and fixed by Jerry Chen (applicationmaster)
Add admin command options for ApplicationMaster
MAPREDUCE-4809. Major sub-task reported by Arun C Murthy and fixed by Mariappan Asokan
Change visibility of classes for pluggable sort changes
MAPREDUCE-4808. Major new feature reported by Arun C Murthy and fixed by Mariappan Asokan
Refactor MapOutput and MergeManager to facilitate reuse by Shuffle implementations
MAPREDUCE-4807. Major sub-task reported by Arun C Murthy and fixed by Mariappan Asokan
Allow MapOutputBuffer to be pluggable
MAPREDUCE-4803. Minor test reported by Mariappan Asokan and fixed by Mariappan Asokan (test)
Duplicate copies of TestIndexCache.java
MAPREDUCE-4802. Major improvement reported by Ravi Prakash and fixed by Ravi Prakash (mr-am , mrv2 , webapps)
Takes a long time to load the task list on the AM for large jobs
MAPREDUCE-4801. Critical bug reported by Jason Lowe and fixed by Jason Lowe
ShuffleHandler can generate large logs due to prematurely closed channels
MAPREDUCE-4797. Major bug reported by Jason Lowe and fixed by Jason Lowe (applicationmaster)
LocalContainerAllocator can loop forever trying to contact the RM
MAPREDUCE-4787. Major bug reported by Ravi Prakash and fixed by Robert Parker (test)
TestJobMonitorAndPrint is broken
MAPREDUCE-4786. Major bug reported by Ravi Prakash and fixed by Ravi Prakash (mrv2)
Job End Notification retry interval is 5 milliseconds by default
MAPREDUCE-4782. Blocker bug reported by Mark Fuhs and fixed by Mark Fuhs (client)
NLineInputFormat skips first line of last InputSplit
MAPREDUCE-4778. Major bug reported by Sandy Ryza and fixed by Sandy Ryza (jobtracker , scheduler)
Fair scheduler event log is only written if directory exists on HDFS
MAPREDUCE-4777. Minor improvement reported by Sandy Ryza and fixed by Sandy Ryza
In TestIFile, testIFileReaderWithCodec relies on testIFileWriterWithCodec
MAPREDUCE-4774. Major bug reported by Ivan A. Veselovsky and fixed by Jason Lowe (applicationmaster , mrv2)
JobImpl does not handle asynchronous task events in FAILED state
MAPREDUCE-4772. Critical bug reported by Robert Joseph Evans and fixed by Robert Joseph Evans (mrv2)
Fetch failures can take way too long for a map to be restarted
MAPREDUCE-4771. Major bug reported by Jason Lowe and fixed by Jason Lowe (mrv2)
KeyFieldBasedPartitioner not partitioning properly when configured
MAPREDUCE-4764. Major improvement reported by Ivan A. Veselovsky and fixed by
repair test org.apache.hadoop.mapreduce.security.TestBinaryTokenFile
MAPREDUCE-4763. Minor improvement reported by Ivan A. Veselovsky and fixed by
repair test org.apache.hadoop.mapreduce.security.TestUmbilicalProtocolWithJobToken
MAPREDUCE-4752. Major improvement reported by Robert Joseph Evans and fixed by Robert Joseph Evans (mrv2)
Reduce MR AM memory usage through String Interning
MAPREDUCE-4751. Major bug reported by Ravi Prakash and fixed by Vinod Kumar Vavilapalli
AM stuck in KILL_WAIT for days
MAPREDUCE-4748. Blocker bug reported by Robert Joseph Evans and fixed by Jason Lowe (mrv2)
Invalid event: T_ATTEMPT_SUCCEEDED at SUCCEEDED
MAPREDUCE-4746. Major bug reported by Robert Parker and fixed by Robert Parker (applicationmaster)
The MR Application Master does not have a config to set environment variables
MAPREDUCE-4741. Minor bug reported by Jason Lowe and fixed by Vinod Kumar Vavilapalli (applicationmaster , mrv2)
WARN and ERROR messages logged during normal AM shutdown
MAPREDUCE-4740. Blocker bug reported by Robert Joseph Evans and fixed by Robert Joseph Evans (mrv2)
only .jars can be added to the Distributed Cache classpath
MAPREDUCE-4736. Trivial improvement reported by Brandon Li and fixed by Brandon Li (test)
Remove obsolete option [-rootDir] from TestDFSIO
MAPREDUCE-4733. Major bug reported by Jason Lowe and fixed by Jason Lowe (applicationmaster , mrv2)
Reducer can fail to make progress during shuffle if too many reducers complete consecutively
MAPREDUCE-4730. Blocker bug reported by Jason Lowe and fixed by Jason Lowe (applicationmaster , mrv2)
AM crashes due to OOM while serving up map task completion events
MAPREDUCE-4729. Major bug reported by Thomas Graves and fixed by Vinod Kumar Vavilapalli (jobhistoryserver)
job history UI not showing all job attempts
MAPREDUCE-4724. Major bug reported by Thomas Graves and fixed by Thomas Graves (jobhistoryserver)
job history web ui applications page should be sorted to display last app first
MAPREDUCE-4723. Major improvement reported by Sandy Ryza and fixed by Sandy Ryza
Fix warnings found by findbugs 2
MAPREDUCE-4721. Major bug reported by Ravi Prakash and fixed by Ravi Prakash (jobhistoryserver)
Task startup time in JHS is same as job startup time.
MAPREDUCE-4720. Major bug reported by Robert Joseph Evans and fixed by Ravi Prakash
Browser thinks History Server main page JS is taking too long
MAPREDUCE-4712. Major bug reported by Vinod Kumar Vavilapalli and fixed by Vinod Kumar Vavilapalli (jobhistoryserver)
mr-jobhistory-daemon.sh doesn't accept --config
MAPREDUCE-4705. Critical bug reported by Jason Lowe and fixed by Jason Lowe (jobhistoryserver , mrv2)
Historyserver links expire before the history data does
MAPREDUCE-4703. Major improvement reported by Ahmed Radwan and fixed by Ahmed Radwan (mrv1 , mrv2 , test)
Add the ability to start the MiniMRClientCluster using the configurations used before it is being stopped.
MAPREDUCE-4681. Major bug reported by Arun C Murthy and fixed by Arun C Murthy
HDFS-3910 broke MR tests
MAPREDUCE-4678. Minor bug reported by Chris McConnell and fixed by Chris McConnell (examples)
Running the Pentomino example with defaults throws java.lang.NegativeArraySizeException
MAPREDUCE-4674. Minor bug reported by Robert Justice and fixed by Robert Justice
Hadoop examples secondarysort has a typo "secondarysrot" in the usage
MAPREDUCE-4666. Minor improvement reported by Jason Lowe and fixed by Jason Lowe (jobhistoryserver)
JVM metrics for history server
MAPREDUCE-4654. Critical bug reported by Colin Patrick McCabe and fixed by Sandy Ryza (test)
TestDistCp is @ignored
MAPREDUCE-4637. Major bug reported by Tom White and fixed by Mayank Bansal (mrv2)
Killing an unassigned task attempt causes the job to fail

Handle TaskAttempt diagnostic updates while in the NEW and UNASSIGNED states.
MAPREDUCE-4616. Minor improvement reported by Tony Burton and fixed by Tony Burton (documentation)
Improvement to MultipleOutputs javadocs
MAPREDUCE-4607. Major bug reported by Bikas Saha and fixed by Bikas Saha
Race condition in ReduceTask completion can result in Task being incorrectly failed
MAPREDUCE-4596. Major task reported by Siddharth Seth and fixed by Siddharth Seth (applicationmaster , mrv2)
Split StateMachine state from states seen by MRClientProtocol (for Job, Task, TaskAttempt)
MAPREDUCE-4554. Major bug reported by Benoy Antony and fixed by Benoy Antony (job submission , security)
Job Credentials are not transmitted if security is turned off
MAPREDUCE-4521. Major bug reported by Jason Lowe and fixed by Ravi Prakash (mrv2)
mapreduce.user.classpath.first incompatibility with 0.20/1.x
MAPREDUCE-4520. Major new feature reported by Arun C Murthy and fixed by Arun C Murthy
Add experimental support for MR AM to schedule CPUs along-with memory
MAPREDUCE-4517. Minor improvement reported by James Kinley and fixed by Jason Lowe (applicationmaster)
Too many INFO messages written out during AM to RM heartbeat
MAPREDUCE-4479. Major bug reported by Mariappan Asokan and fixed by Mariappan Asokan (test)
Fix parameter order in assertEquals() in TestCombineInputFileFormat.java
MAPREDUCE-4458. Major improvement reported by Robert Joseph Evans and fixed by Robert Parker (mrv2)
Warn if java.library.path is used for AM or Task
MAPREDUCE-4425. Critical bug reported by Siddharth Seth and fixed by Jason Lowe (mrv2)
Speculation + Fetch failures can lead to a hung job
MAPREDUCE-4279. Major bug reported by Rahul Jain and fixed by Devaraj K (jobtracker)
getClusterStatus() fails with null pointer exception when running jobs in local mode
MAPREDUCE-4278. Major bug reported by Araceli Henley and fixed by Sandy Ryza
cannot run two local jobs in parallel from the same gateway.
MAPREDUCE-4272. Major bug reported by Luke Lu and fixed by Yu Gao (task)
SortedRanges.Range#compareTo is not spec compliant
MAPREDUCE-4266. Major task reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur (build)
remove Ant remnants from MR
MAPREDUCE-4229. Major improvement reported by Todd Lipcon and fixed by Miomir Boljanovic (jobtracker)
Counter names' memory usage can be decreased by interning
MAPREDUCE-4123. Critical bug reported by Nishan Shetty and fixed by Devaraj K (mrv2)
./mapred groups gives NoClassDefFoundError
MAPREDUCE-4049. Major sub-task reported by Avner BenHanoch and fixed by Avner BenHanoch (performance , task , tasktracker)
plugin for generic shuffle service

Allow ReduceTask loading a third party plugin for shuffle (and merge) instead of the default shuffle.
MAPREDUCE-3678. Major new feature reported by Bejoy KS and fixed by Harsh J (mrv1 , mrv2)
The Map tasks logs should have the value of input split it processed

A map-task's syslogs now carries basic info on the InputSplit it processed.
MAPREDUCE-2454. Minor new feature reported by Mariappan Asokan and fixed by Mariappan Asokan
Allow external sorter plugin for MR

MAPREDUCE-4807 Allow external implementations of the sort phase in a Map task
MAPREDUCE-2264. Major bug reported by Adam Kramer and fixed by Devaraj K (jobtracker)
Job status exceeds 100% in some cases
MAPREDUCE-1806. Major bug reported by Paul Yang and fixed by Gera Shegalov (harchive)
CombineFileInputFormat does not work with paths not on default FS
MAPREDUCE-1700. Major bug reported by Tom White and fixed by Tom White (task)
User supplied dependencies may conflict with MapReduce system JARs
HDFS-4468. Minor bug reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE
Fix TestHDFSCLI and TestQuota for HADOOP-9252
HDFS-4462. Major bug reported by Aaron T. Myers and fixed by Aaron T. Myers (namenode)
2NN will fail to checkpoint after an HDFS upgrade from a pre-federation version of HDFS
HDFS-4458. Major bug reported by wenwupeng and fixed by Binglin Chang (balancer)
start balancer failed with "Failed to create file [/system/balancer.id]" if configure IP on fs.defaultFS
HDFS-4456. Major new feature reported by Tsz Wo (Nicholas), SZE and fixed by Plamen Jeliazkov (webhdfs)
Add concat to HttpFS and WebHDFS REST API docs
HDFS-4452. Critical bug reported by Konstantin Shvachko and fixed by Konstantin Shvachko (namenode)
getAdditionalBlock() can create multiple blocks if the client times out and retries.
HDFS-4451. Major bug reported by Joshua Blatt and fixed by (balancer)
hdfs balancer command returns exit code 1 on success instead of 0

This is an incompatible change from release 2.0.2-alpha and prior releases. Balancer tool exited with exit code 1 on success. It is changed to exit with exit code 0 on success. Non 0 exit code indicates failure.
HDFS-4445. Blocker sub-task reported by Vinay and fixed by Vinay
All BKJM ledgers are not checked while tailing, So failover will fail.
HDFS-4444. Trivial bug reported by Stephen Chu and fixed by Stephen Chu
Add space between total transaction time and number of transactions in FSEditLog#printStatistics
HDFS-4443. Trivial bug reported by Christian Rohling and fixed by Christian Rohling (namenode)
Remove trailing '`' character from HDFS nodelist jsp
HDFS-4428. Minor bug reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe
FsDatasetImpl should disclose what the error is when a rename fails
HDFS-4426. Blocker bug reported by Jason Lowe and fixed by Arpit Agarwal (namenode)
Secondary namenode shuts down immediately after startup
HDFS-4415. Major bug reported by Robert Kanter and fixed by Robert Kanter
HostnameFilter should handle hostname resolution failures and continue processing
HDFS-4404. Critical bug reported by liaowenrui and fixed by Todd Lipcon (ha , hdfs-client)
Create file failure when the machine of first attempted NameNode is down
HDFS-4403. Minor bug reported by Todd Lipcon and fixed by Todd Lipcon (hdfs-client)
DFSClient can infer checksum type when not provided by reading first byte

The HDFS implementation of getFileChecksum() can now operate correctly against earlier-version datanodes which do not include the checksum type information in their checksum response. The checksum type is automatically inferred by issuing a read of the first byte of each block.
HDFS-4393. Minor improvement reported by Brandon Li and fixed by Brandon Li
Empty request and responses in protocol translators can be static final members
HDFS-4392. Trivial improvement reported by Andrew Purtell and fixed by Andrew Purtell (test)
Use NetUtils#getFreeSocketPort in MiniDFSCluster
HDFS-4385. Critical bug reported by Thomas Graves and fixed by Thomas Graves (build)
Maven RAT plugin is not checking all source files
HDFS-4384. Minor bug reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe (libhdfs)
test_libhdfs_threaded gets SEGV if JNIEnv cannot be initialized
HDFS-4381. Major improvement reported by Jing Zhao and fixed by Jing Zhao (namenode)
Document fsimage format details in FSImageFormat class javadoc
HDFS-4377. Trivial bug reported by Eli Collins and fixed by Eli Collins
Some trivial DN comment cleanup
HDFS-4375. Major improvement reported by Suresh Srinivas and fixed by Suresh Srinivas (namenode , security)
Use token request messages defined in hadoop common
HDFS-4369. Blocker bug reported by Suresh Srinivas and fixed by Suresh Srinivas (namenode)
GetBlockKeysResponseProto does not handle null response

Protobuf message GetBlockKeysResponseProto member keys is made optional from required so that null values can be passed over the wire. This is an incompatible wire protocol change and does not affect the API backward compatibility.
HDFS-4367. Blocker bug reported by Suresh Srinivas and fixed by Suresh Srinivas (namenode)
GetDataEncryptionKeyResponseProto does not handle null response

Member dataEncryptionKey of the protobuf message GetDataEncryptionKeyResponseProto is made optional instead of required. This is incompatible change is not likely to affect the existing users (that are using HDFS FileSystem and other public APIs).
HDFS-4364. Blocker bug reported by Suresh Srinivas and fixed by Suresh Srinivas
GetLinkTargetResponseProto does not handle null path

Protobuf message GetLinkTargetResponseProto member targetPath is made optional from required so that null values can be passed over the wire. This is an incompatible wire protocol change and does not affect the API backward compatibility.
HDFS-4363. Major bug reported by Suresh Srinivas and fixed by Suresh Srinivas
Combine PBHelper and HdfsProtoUtil and remove redundant methods
HDFS-4362. Critical bug reported by Suresh Srinivas and fixed by Suresh Srinivas
GetDelegationTokenResponseProto does not handle null token
HDFS-4359. Major bug reported by Liang Xie and fixed by Liang Xie (datanode)
remove an unnecessary synchronized keyword in BPOfferService.java
HDFS-4351. Major bug reported by Andrew Wang and fixed by Andrew Wang (namenode)
Fix BlockPlacementPolicyDefault#chooseTarget when avoiding stale nodes
HDFS-4350. Major bug reported by Andrew Wang and fixed by Andrew Wang
Make enabling of stale marking on read and write paths independent

This patch makes an incompatible configuration change, as described below: In releases 1.1.0 and other point releases 1.1.x, the configuration parameter "dfs.namenode.check.stale.datanode" could be used to turn on checking for the stale nodes. This configuration is no longer supported in release 1.2.0 onwards and is renamed as "dfs.namenode.avoid.read.stale.datanode". How feature works and configuring this feature: As described in HDFS-3703 release notes, datanode stale period can be configured using parameter "dfs.namenode.stale.datanode.interval" in seconds (default value is 30 seconds). NameNode can be configured to use this staleness information for reads using configuration "dfs.namenode.avoid.read.stale.datanode". When this parameter is set to true, namenode picks a stale datanode as the last target to read from when returning block locations for reads. Using staleness information for writes is as described in the releases notes of HDFS-3912.
HDFS-4349. Major test reported by Konstantin Shvachko and fixed by Konstantin Shvachko (namenode , test)
Test reading files from BackupNode
HDFS-4347. Major bug reported by Konstantin Shvachko and fixed by Plamen Jeliazkov (namenode , test)
TestBackupNode can go into infinite loop "Waiting checkpoint to complete."
HDFS-4344. Major bug reported by tamtam180 and fixed by Andy Isaacson (namenode)
dfshealth.jsp throws NumberFormatException when dfs.hosts/dfs.hosts.exclude includes port number
HDFS-4326. Major task reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur
bump up Tomcat version for HttpFS to 6.0.36
HDFS-4315. Major bug reported by Aaron T. Myers and fixed by Aaron T. Myers (datanode)
DNs with multiple BPs can have BPOfferServices fail to start due to unsynchronized map access
HDFS-4308. Major bug reported by Konstantin Shvachko and fixed by Plamen Jeliazkov (namenode)
addBlock() should persist file blocks once
HDFS-4307. Minor bug reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe
SocketCache should use monotonic time
HDFS-4306. Major bug reported by Binglin Chang and fixed by Binglin Chang
PBHelper.convertLocatedBlock miss convert BlockToken
HDFS-4302. Major bug reported by Eugene Koontz and fixed by Eugene Koontz (ha , namenode)
Precondition in EditLogFileInputStream's length() method is checked too early in NameNode startup, causing fatal exception
HDFS-4295. Major bug reported by Stephen Chu and fixed by Stephen Chu (security)
Using port 1023 should be valid when starting Secure DataNode
HDFS-4294. Major bug reported by Robert Parker and fixed by Robert Parker
Backwards compatibility is not maintained for TestVolumeId
HDFS-4292. Minor bug reported by Binglin Chang and fixed by Binglin Chang
Sanity check not correct in RemoteBlockReader2.newBlockReader
HDFS-4291. Minor bug reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe
edit log unit tests leave stray test_edit_log_file around
HDFS-4288. Critical bug reported by Daryn Sharp and fixed by Daryn Sharp (namenode)
NN accepts incremental BR as IBR in safemode
HDFS-4282. Major bug reported by Junping Du and fixed by Todd Lipcon (namenode , test)
TestEditLog.testFuzzSequences FAILED in all pre-commit test
HDFS-4279. Minor bug reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe (namenode)
NameNode does not initialize generic conf keys when started with -recover
HDFS-4274. Minor bug reported by Chris Nauroth and fixed by Chris Nauroth (datanode)
BlockPoolSliceScanner does not close verification log during shutdown
HDFS-4270. Minor bug reported by Derek Dagit and fixed by Derek Dagit (namenode)
Replications of the highest priority should be allowed to choose a source datanode that has reached its max replication limit
HDFS-4268. Major bug reported by Konstantin Shvachko and fixed by Konstantin Shvachko (namenode)
Remove redundant enum NNHAStatusHeartbeat.State
HDFS-4259. Minor improvement reported by Harsh J and fixed by Harsh J (hdfs-client)
Improve pipeline DN replacement failure message
HDFS-4247. Blocker sub-task reported by Daryn Sharp and fixed by Daryn Sharp (namenode)
saveNamespace should be tolerant of dangling lease
HDFS-4242. Major bug reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (namenode)
Map.Entry is incorrectly used in LeaseManager
HDFS-4238. Major bug reported by Vinay and fixed by Todd Lipcon (ha)
[HA] Standby namenode should not do purging of shared storage edits.
HDFS-4236. Blocker bug reported by Allen Wittenauer and fixed by Alejandro Abdelnur
Regression: HDFS-4171 puts artificial limit on username length
HDFS-4232. Blocker bug reported by Daryn Sharp and fixed by Daryn Sharp (namenode)
NN fails to write a fsimage with stale leases
HDFS-4231. Major improvement reported by Konstantin Shvachko and fixed by Konstantin Shvachko (ha , namenode)
Introduce HAState for BackupNode
HDFS-4216. Major bug reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (namenode)
Adding symlink should not ignore QuotaExceededException
HDFS-4214. Trivial improvement reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe (tools)
OfflineEditsViewer should print out the offset at which it encountered an error
HDFS-4213. Major new feature reported by Jing Zhao and fixed by Jing Zhao (hdfs-client , namenode)
When the client calls hsync, allows the client to update the file length in the NameNode
HDFS-4199. Minor test reported by Ivan A. Veselovsky and fixed by Ivan A. Veselovsky
Provide test for HdfsVolumeId
HDFS-4186. Critical bug reported by Kihwal Lee and fixed by Kihwal Lee (namenode)
logSync() is called with the write lock held while releasing lease
HDFS-4182. Critical bug reported by Todd Lipcon and fixed by Robert Joseph Evans (namenode)
SecondaryNameNode leaks NameCache entries
HDFS-4181. Critical bug reported by Kihwal Lee and fixed by Kihwal Lee (namenode)
LeaseManager tries to double remove and prints extra messages
HDFS-4179. Major bug reported by Konstantin Shvachko and fixed by Konstantin Shvachko (namenode)
BackupNode: allow reads, fix checkpointing, safeMode
HDFS-4178. Major bug reported by Andy Isaacson and fixed by Andy Isaacson (scripts)
shell scripts should not close stderr
HDFS-4172. Minor bug reported by Derek Dagit and fixed by Derek Dagit (namenode)
namenode does not URI-encode parameters when building URI for datanode request
HDFS-4171. Major bug reported by Harsh J and fixed by Alejandro Abdelnur
WebHDFS and HttpFs should accept only valid Unix user names
HDFS-4164. Minor bug reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe (fuse-dfs)
fuse_dfs: add -lrt to the compiler command line on Linux
HDFS-4162. Minor bug reported by Derek Dagit and fixed by Derek Dagit (datanode)
Some malformed and unquoted HTML strings are returned from datanode web ui
HDFS-4156. Major bug reported by Eli Collins and fixed by Eli Reisman
Seeking to a negative position should throw an IOE
HDFS-4155. Major improvement reported by Liang Xie and fixed by Liang Xie (libhdfs)
libhdfs implementation of hsync API
HDFS-4153. Major improvement reported by Liang Xie and fixed by Liang Xie (journal-node)
Add START_MSG/SHUTDOWN_MSG for JournalNode
HDFS-4143. Minor improvement reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (namenode)
Change INodeFile.blocks to private
HDFS-4140. Major bug reported by Andy Isaacson and fixed by Colin Patrick McCabe (fuse-dfs)
fuse-dfs handles open(O_TRUNC) poorly
HDFS-4139. Major bug reported by Andy Isaacson and fixed by Colin Patrick McCabe (fuse-dfs)
fuse-dfs RO mode still allows file truncation
HDFS-4132. Major bug reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe (libhdfs)
when libwebhdfs is not enabled, nativeMiniDfsClient frees uninitialized memory
HDFS-4130. Major sub-task reported by Han Xiao and fixed by Han Xiao (ha , performance)
BKJM: The reading for editlog at NN starting using bkjm is not efficient
HDFS-4127. Minor bug reported by Junping Du and fixed by Junping Du (namenode)
Log message is not correct in case of short of replica
HDFS-4122. Major bug reported by Suresh Srinivas and fixed by Suresh Srinivas (datanode , hdfs-client , namenode)
Cleanup HDFS logs and reduce the size of logged messages

The change from this jira changes the content of some of the log messages. No log message are removed. Only the content of the log messages is changed to reduce the size. If you have a tool that depends on the exact content of the log, please look at the patch and make appropriate updates to the tool.
HDFS-4121. Minor improvement reported by Binglin Chang and fixed by Binglin Chang
Add namespace declarations in hdfs .proto files for languages other than java
HDFS-4112. Major bug reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (namenode)
A few improvements on INodeDirectory
HDFS-4110. Trivial improvement reported by Liang Xie and fixed by Liang Xie (journal-node)
Refine JNStorage log
HDFS-4107. Major bug reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (namenode)
Add utility methods to cast INode to INodeFile and INodeFileUnderConstruction
HDFS-4106. Minor bug reported by Jing Zhao and fixed by Jing Zhao (namenode , test)
BPServiceActor#lastHeartbeat, lastBlockReport and lastDeletedReport should be declared as volatile
HDFS-4105. Major bug reported by Arpit Gupta and fixed by Arpit Gupta
the SPNEGO user for secondary namenode should use the web keytab
HDFS-4104. Minor bug reported by Andy Isaacson and fixed by Andy Isaacson
dfs -test -d prints inappropriate error on nonexistent directory
HDFS-4100. Major sub-task reported by Liang Xie and fixed by Liang Xie (datanode , journal-node , security)
Fix all findbug security warings
HDFS-4099. Minor bug reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (namenode)
Clean up replication code and add more javadoc
HDFS-4090. Critical bug reported by Kihwal Lee and fixed by Kihwal Lee (hdfs-client)
getFileChecksum() result incompatible when called against zero-byte files.
HDFS-4088. Minor improvement reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (namenode)
Remove "throws QuotaExceededException" from an INodeDirectoryWithQuota constructor
HDFS-4080. Major bug reported by Kihwal Lee and fixed by Kihwal Lee (namenode)
Add a separate logger for block state change logs to enable turning off those logs
HDFS-4075. Critical bug reported by Kihwal Lee and fixed by Kihwal Lee (namenode)
Reduce recommissioning overhead
HDFS-4074. Trivial improvement reported by Brandon Li and fixed by Brandon Li (namenode)
Remove empty constructors for INode
HDFS-4073. Minor improvement reported by Tsz Wo (Nicholas), SZE and fixed by Jing Zhao (namenode)
Two minor improvements to FSDirectory
HDFS-4072. Minor bug reported by Jing Zhao and fixed by Jing Zhao (namenode)
On file deletion remove corresponding blocks pending replication
HDFS-4068. Minor improvement reported by Eli Collins and fixed by Eli Collins (datanode)
DatanodeID and DatanodeInfo member should be private
HDFS-4061. Major bug reported by Eli Collins and fixed by Eli Collins
TestBalancer and TestUnderReplicatedBlocks need timeouts
HDFS-4059. Minor sub-task reported by Jing Zhao and fixed by Jing Zhao (datanode , namenode)
Add number of stale DataNodes to metrics

This jira adds a new metric with name "StaleDataNodes" under metrics context "dfs" of type Gauge. This tracks the number of DataNodes marked as stale. A DataNode is marked stale when the heartbeat message from the DataNode is not received within the configured time ""dfs.namenode.stale.datanode.interval". Please see hdfs-default.xml documentation corresponding to ""dfs.namenode.stale.datanode.interval" for more details on how to configure this feature. When this feature is not configured, this metrics would return zero.
HDFS-4058. Major improvement reported by Eli Collins and fixed by Eli Collins (datanode)
DirectoryScanner may fail with IOOB if the directory scanning threads return out of volume order
HDFS-4055. Major bug reported by Binglin Chang and fixed by Binglin Chang
TestAuditLogs is flaky
HDFS-4049. Critical bug reported by Todd Lipcon and fixed by Todd Lipcon (datanode , performance)
hflush performance regression due to nagling delays
HDFS-4048. Major improvement reported by Stephen Chu and fixed by Stephen Chu
Use ERROR instead of INFO for volume failure logs
HDFS-4046. Minor bug reported by Binglin Chang and fixed by Binglin Chang (datanode , namenode)
ChecksumTypeProto use NULL as enum value which is illegal in C/C++
HDFS-4044. Major bug reported by Binglin Chang and fixed by Binglin Chang (datanode)
Duplicate ChecksumType definition in HDFS .proto files
HDFS-4041. Major improvement reported by Chris Nauroth and fixed by Chris Nauroth (build)
Hadoop HDFS Maven protoc calls must not depend on external sh script
HDFS-4038. Minor sub-task reported by Vinay and fixed by Vinay (ha)
Override toString() for BookKeeperEditLogInputStream
HDFS-4037. Major improvement reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (namenode)
Rename the getReplication() method in BlockCollection to getBlockReplication()
HDFS-4036. Major bug reported by Tsz Wo (Nicholas), SZE and fixed by Jing Zhao (namenode)
FSDirectory.unprotectedAddFile(..) should not throw UnresolvedLinkException
HDFS-4035. Major sub-task reported by Eli Collins and fixed by Eli Collins
LightWeightGSet and LightWeightHashSet increment a volatile without synchronization
HDFS-4034. Major sub-task reported by Eli Collins and fixed by Eli Collins
Remove redundant null checks
HDFS-4033. Major sub-task reported by Eli Collins and fixed by Eli Collins
Miscellaneous findbugs 2 fixes
HDFS-4032. Major sub-task reported by Eli Collins and fixed by Eli Collins
Specify the charset explicitly rather than rely on the default
HDFS-4031. Major sub-task reported by Eli Collins and fixed by Eli Collins (namenode)
Update findbugsExcludeFile.xml to include findbugs 2 exclusions
HDFS-4030. Major sub-task reported by Eli Collins and fixed by Eli Collins (namenode)
BlockManager excessBlocksCount and postponedMisreplicatedBlocksCount should be AtomicLongs
HDFS-4029. Major sub-task reported by Eli Collins and fixed by Eli Collins (namenode)
GenerationStamp should use an AtomicLong
HDFS-4022. Blocker bug reported by suja s and fixed by Vinay
Replication not happening for appended block
HDFS-4021. Minor bug reported by Colin Patrick McCabe and fixed by Christopher Conner (namenode)
Misleading error message when resources are low on the NameNode
HDFS-4020. Major bug reported by Eli Collins and fixed by Eli Collins
TestRBWBlockInvalidation may time out
HDFS-4018. Minor bug reported by Eli Collins and fixed by Eli Collins
TestDataNodeMultipleRegistrations#testMiniDFSClusterWithMultipleNN is missing some cluster cleanup
HDFS-4008. Minor improvement reported by Eli Collins and fixed by Eli Collins (test)
TestBalancerWithEncryptedTransfer needs a timeout
HDFS-4007. Minor test reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe (test)
Rehabilitate bit-rotted unit tests under hadoop-hdfs-project/hadoop-hdfs/src/test/unit/
HDFS-4006. Major bug reported by Eli Collins and fixed by Todd Lipcon (namenode)
TestCheckpoint#testSecondaryHasVeryOutOfDateImage occasionally fails due to unexpected exit
HDFS-4000. Major improvement reported by Eli Collins and fixed by Colin Patrick McCabe
TestParallelLocalRead fails with "input ByteBuffers must be direct buffers"
HDFS-3999. Major bug reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur
HttpFS OPEN operation expects len parameter, it should be length
HDFS-3997. Trivial bug reported by Mithun Radhakrishnan and fixed by Mithun Radhakrishnan (namenode)
OfflineImageViewer incorrectly passes value of imageVersion when visiting IS_COMPRESSED element
HDFS-3996. Minor bug reported by Eli Collins and fixed by Eli Collins
Add debug log removed in HDFS-3873 back
HDFS-3992. Minor bug reported by Ivan A. Veselovsky and fixed by Ivan A. Veselovsky
Method org.apache.hadoop.hdfs.TestHftpFileSystem.tearDown() sometimes throws NPEs
HDFS-3990. Critical bug reported by Daryn Sharp and fixed by Daryn Sharp (namenode)
NN's health report has severe performance problems
HDFS-3985. Major bug reported by Eli Collins and fixed by (test)
Add timeouts to TestMulitipleNNDataBlockScanner
HDFS-3979. Major bug reported by Lars Hofhansl and fixed by Lars Hofhansl (datanode)
Fix hsync semantics
HDFS-3970. Major bug reported by Vinay and fixed by Andrew Wang (datanode)
BlockPoolSliceStorage#doRollback(..) should use BlockPoolSliceStorage instead of DataStorage to read prev version file.
HDFS-3964. Minor bug reported by Eli Collins and fixed by Eli Collins (namenode)
Make NN log of fs.defaultFS debug rather than info
HDFS-3957. Minor improvement reported by Andrew Wang and fixed by Andrew Wang
Change MutableQuantiles to use a shared thread for rolling over metrics
HDFS-3951. Major bug reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur (datanode)
datanode web ui does not work over HTTPS when datanode is started in secure mode
HDFS-3949. Minor bug reported by Eli Collins and fixed by Eli Collins (namenode)
NameNodeRpcServer#join should join on both client and server RPC servers
HDFS-3948. Minor bug reported by Eli Collins and fixed by Jing Zhao (test)
TestWebHDFS#testNamenodeRestart occasionally fails
HDFS-3944. Major task reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur
Httpfs resolveAuthority() is not resolving host correctly
HDFS-3939. Minor improvement reported by Eli Collins and fixed by Eli Collins (namenode)
NN RPC address cleanup
HDFS-3938. Major bug reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur (documentation)
remove current limitations from HttpFS docs
HDFS-3936. Major bug reported by Eli Collins and fixed by Eli Collins
MiniDFSCluster shutdown races with BlocksMap usage
HDFS-3935. Major sub-task reported by Eli Collins and fixed by Andy Isaacson
QJM: Add JournalNode to the start / stop scripts
HDFS-3932. Major bug reported by Eli Collins and fixed by Eli Collins
NameNode Web UI broken if the rpc-address is set to the wildcard
HDFS-3931. Minor bug reported by Eli Collins and fixed by Andy Isaacson (test)
TestDatanodeBlockScanner#testBlockCorruptionPolicy2 is broken
HDFS-3925. Minor improvement reported by Andrew Wang and fixed by Andrew Wang
Prettify PipelineAck#toString() for printing to a log
HDFS-3924. Major bug reported by Andrew Wang and fixed by Andrew Wang (hdfs-client)
Multi-byte id in HdfsVolumeId
HDFS-3923. Major sub-task reported by Jing Zhao and fixed by Jing Zhao
libwebhdfs testing code cleanup
HDFS-3921. Major bug reported by Stephen Chu and fixed by Aaron T. Myers
NN will prematurely consider blocks missing when entering active state while still in safe mode
HDFS-3920. Major sub-task reported by Jing Zhao and fixed by Jing Zhao
libwebdhfs code cleanup: string processing and using strerror consistently to handle all errors
HDFS-3919. Minor bug reported by Andy Isaacson and fixed by Andy Isaacson (test)
MiniDFSCluster:waitClusterUp can hang forever
HDFS-3916. Major bug reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe (webhdfs)
libwebhdfs (C client) code cleanups
HDFS-3912. Major sub-task reported by Jing Zhao and fixed by Jing Zhao
Detecting and avoiding stale datanodes for writing
HDFS-3910. Minor improvement reported by Eli Collins and fixed by Eli Collins (test)
DFSTestUtil#waitReplication should timeout
HDFS-3896. Minor improvement reported by Jeff Lord and fixed by Jeff Lord
Add descriptions for dfs.namenode.rpc-address and dfs.namenode.servicerpc-address to hdfs-default.xml
HDFS-3831. Critical bug reported by Jason Lowe and fixed by Jason Lowe (security)
Failure to renew tokens due to test-sources left in classpath
HDFS-3829. Major bug reported by Trevor Robinson and fixed by Trevor Robinson (test)
TestHftpURLTimeouts fails intermittently with JDK7
HDFS-3824. Major bug reported by Trevor Robinson and fixed by Trevor Robinson (test)
TestHftpDelegationToken fails intermittently with JDK7
HDFS-3813. Major improvement reported by Stephen Chu and fixed by Stephen Chu (security , webhdfs)
Log error message if security and WebHDFS are enabled but principal/keytab are not configured
HDFS-3810. Major sub-task reported by Ivan Kelly and fixed by Ivan Kelly
Implement format() for BKJM
HDFS-3809. Major sub-task reported by Ivan Kelly and fixed by Ivan Kelly (namenode)
Make BKJM use protobufs for all serialization with ZK
HDFS-3804. Major bug reported by Trevor Robinson and fixed by Trevor Robinson (test)
TestHftpFileSystem fails intermittently with JDK7
HDFS-3789. Major sub-task reported by Ivan Kelly and fixed by Ivan Kelly (ha , namenode)
JournalManager#format() should be able to throw IOException
HDFS-3753. Major bug reported by Eli Collins and fixed by Colin Patrick McCabe (build , test)
Tests don't run with native libraries
HDFS-3703. Major improvement reported by nkeywal and fixed by Jing Zhao (datanode , namenode)
Decrease the datanode failure detection time

This jira adds a new DataNode state called "stale" at the NameNode. DataNodes are marked as stale if it does not send heartbeat message to NameNode within the timeout configured using the configuration parameter "dfs.namenode.stale.datanode.interval" in seconds (default value is 30 seconds). NameNode picks a stale datanode as the last target to read from when returning block locations for reads. This feature is by default turned * off *. To turn on the feature, set the HDFS configuration "dfs.namenode.check.stale.datanode" to true.
HDFS-3695. Major sub-task reported by Todd Lipcon and fixed by Todd Lipcon (ha , namenode)
Genericize format() to non-file JournalManagers
HDFS-3682. Minor improvement reported by Eli Collins and fixed by Todd Lipcon (test)
MiniDFSCluster#init should provide more info when it fails
HDFS-3680. Minor improvement reported by Marcelo Vanzin and fixed by Marcelo Vanzin (namenode)
Allow customized audit logging in HDFS FSNamesystem
HDFS-3678. Critical bug reported by Todd Lipcon and fixed by Aaron T. Myers (namenode)
Edit log files are never being purged from 2NN
HDFS-3626. Blocker bug reported by Todd Lipcon and fixed by Todd Lipcon (namenode)
Creating file with invalid path can corrupt edit log
HDFS-3623. Major sub-task reported by Uma Maheswara Rao G and fixed by Uma Maheswara Rao G (namenode)
BKJM: zkLatchWaitTimeout hard coded to 6000. Make use of ZKSessionTimeout instead.
HDFS-3616. Major bug reported by Uma Maheswara Rao G and fixed by Jing Zhao (datanode)
TestWebHdfsWithMultipleNameNodes fails with ConcurrentModificationException in DN shutdown
HDFS-3598. Major new feature reported by Tsz Wo (Nicholas), SZE and fixed by Plamen Jeliazkov (webhdfs)
WebHDFS: support file concat
HDFS-3573. Minor sub-task reported by Todd Lipcon and fixed by Todd Lipcon (namenode)
Supply NamespaceInfo when instantiating JournalManagers
HDFS-3571. Major sub-task reported by Todd Lipcon and fixed by Todd Lipcon (ha , namenode)
Allow EditLogFileInputStream to read from a remote URL
HDFS-3553. Blocker bug reported by Daryn Sharp and fixed by Daryn Sharp
Hftp proxy tokens are broken
HDFS-3510. Major bug reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe
Improve FSEditLog pre-allocation
HDFS-3507. Critical bug reported by Vinay and fixed by Vinay (ha)
DFS#isInSafeMode needs to execute only on Active NameNode
HDFS-3483. Major improvement reported by Stephen Chu and fixed by Stephen Fritz
Better error message when hdfs fsck is run against a ViewFS config
HDFS-3429. Major bug reported by Todd Lipcon and fixed by Todd Lipcon (datanode , performance)
DataNode reads checksums even if client does not need them
HDFS-3373. Major bug reported by Todd Lipcon and fixed by John George (hdfs-client)
FileContext HDFS implementation can leak socket caches
HDFS-3224. Minor bug reported by Eli Collins and fixed by Jason Lowe
Bug in check for DN re-registration with different storage ID
HDFS-3077. Major new feature reported by Todd Lipcon and fixed by Todd Lipcon (ha , namenode)
Quorum-based protocol for reading and writing edit logs
HDFS-3049. Minor new feature reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe (namenode)
During the normal loading NN startup process, fall back on a different EditLog if we see one that is corrupt
HDFS-2946. Major improvement reported by Aaron T. Myers and fixed by Aaron T. Myers (ha , namenode)
HA: Put a cap on the number of completed edits files retained by the NN
HDFS-2908. Minor sub-task reported by Suresh Srinivas and fixed by Brandon Li
Add apache license header for StorageReport.java
HDFS-2656. Major improvement reported by Zhanwei.Wang and fixed by Jing Zhao (webhdfs)
Implement a pure c client based on webhdfs
HDFS-2264. Major bug reported by Aaron T. Myers and fixed by Aaron T. Myers (namenode)
NamenodeProtocol has the wrong value for clientPrincipal in KerberosInfo annotation
HDFS-1331. Minor bug reported by Allen Wittenauer and fixed by Andy Isaacson (tools)
dfs -test should work like /bin/test

"test" will not print a warning for non-existent paths when testing for existence
HDFS-1322. Major bug reported by Ravi Gummadi and fixed by Colin Patrick McCabe
Document umask in DistributedFileSystem#mkdirs javadocs
HDFS-1245. Major new feature reported by Dmytro Molkov and fixed by Konstantin Shvachko (namenode)
Plugable block id generation
HADOOP-9289. Blocker bug reported by Daryn Sharp and fixed by Daryn Sharp (fs)
FsShell rm -f fails for non-matching globs
HADOOP-9278. Major bug reported by Chris Nauroth and fixed by Chris Nauroth (fs)
HarFileSystem may leak file handle
HADOOP-9276. Minor improvement reported by Arun C Murthy and fixed by Arun C Murthy
Allow BoundedByteArrayOutputStream to be resettable
HADOOP-9260. Critical bug reported by Jerry Chen and fixed by Chris Nauroth
Hadoop version may be not correct when starting name node or data node
HADOOP-9255. Critical bug reported by Thomas Graves and fixed by Thomas Graves (scripts)
relnotes.py missing last jira
HADOOP-9252. Minor bug reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (util)
StringUtils.humanReadableInt(..) has a race condition
HADOOP-9247. Minor improvement reported by Ivan A. Veselovsky and fixed by Ivan A. Veselovsky
parametrize Clover "generateXxx" properties to make them re-definable via -D in mvn calls
HADOOP-9231. Major bug reported by Konstantin Boudnik and fixed by Konstantin Boudnik (build)
Parametrize staging URL for the uniformity of distributionManagement
HADOOP-9221. Major bug reported by Andy Isaacson and fixed by Andy Isaacson
Convert remaining xdocs to APT
HADOOP-9217. Major test reported by Andrey Klochkov and fixed by Andrey Klochkov (test)
Print thread dumps when hadoop-common tests fail
HADOOP-9216. Major improvement reported by Tsuyoshi OZAWA and fixed by Tsuyoshi OZAWA (io)
CompressionCodecFactory#getCodecClasses should trim the result of parsing by Configuration.
HADOOP-9215. Blocker bug reported by Thomas Graves and fixed by Colin Patrick McCabe
when using cmake-2.6, libhadoop.so doesn't get created (only libhadoop.so.1.0.0)
HADOOP-9212. Major bug reported by Tom White and fixed by Tom White (fs)
Potential deadlock in FileSystem.Cache/IPC/UGI
HADOOP-9203. Trivial bug reported by Andrew Purtell and fixed by Andrew Purtell (ipc , test)
RPCCallBenchmark should find a random available port
HADOOP-9193. Minor bug reported by Jason Lowe and fixed by Andy Isaacson (scripts)
hadoop script can inadvertently expand wildcard arguments when delegating to hdfs script
HADOOP-9192. Major improvement reported by Suresh Srinivas and fixed by Suresh Srinivas (security)
Move token related request/response messages to common
HADOOP-9190. Major bug reported by Thomas Graves and fixed by Andy Isaacson (documentation)
packaging docs is broken
HADOOP-9183. Major bug reported by Tom White and fixed by Tom White (ha)
Potential deadlock in ActiveStandbyElector
HADOOP-9181. Major bug reported by Liang Xie and fixed by Liang Xie
Set daemon flag for HttpServer's QueuedThreadPool
HADOOP-9178. Minor bug reported by Sandy Ryza and fixed by Sandy Ryza
src/main/conf is missing hadoop-policy.xml
HADOOP-9173. Major bug reported by Suresh Srinivas and fixed by Suresh Srinivas
Add security token protobuf definition to common and use it in hdfs
HADOOP-9162. Minor improvement reported by Binglin Chang and fixed by Binglin Chang (native)
Add utility to check native library availability
HADOOP-9155. Minor bug reported by Binglin Chang and fixed by Binglin Chang
FsPermission should have different default value, 777 for directory and 666 for file
HADOOP-9153. Major improvement reported by Sandy Ryza and fixed by Sandy Ryza (viewfs)
Support createNonRecursive in ViewFileSystem
HADOOP-9152. Minor bug reported by Brock Noland and fixed by Brock Noland (fs)
HDFS can report negative DFS Used on clusters with very small amounts of data
HADOOP-9147. Trivial improvement reported by Jonathan Allen and fixed by Jonathan Allen
Add missing fields to FIleStatus.toString

Update FileStatus.toString to include missing fields
HADOOP-9135. Trivial bug reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe (security)
JniBasedUnixGroupsMappingWithFallback should log at debug rather than info during fallback
HADOOP-9127. Major improvement reported by Daisuke Kobayashi and fixed by Daisuke Kobayashi (documentation)
Update documentation for ZooKeeper Failover Controller
HADOOP-9119. Minor test reported by Steve Loughran and fixed by Steve Loughran (fs , test)
Add test to FileSystemContractBaseTest to verify integrity of overwritten files

Patches adds more tests to verify overwritten and more complex operations -write-delete-overwrite. By using differently sized datasets and different data inside, these tests verify that the overwrite really did take place. While HDFS meets all these requirements directly, eventually consistent object stores may not -hence these tests.
HADOOP-9118. Trivial improvement reported by Steve Loughran and fixed by (test)
FileSystemContractBaseTest test data for read/write isn't rigorous enough

Resolved as part of HADOOP-9119 -it's test data generator creates more bits in every test byte
HADOOP-9113. Major bug reported by Karthik Kambatla and fixed by Karthik Kambatla (security , test)
o.a.h.fs.TestDelegationTokenRenewer is failing intermittently
HADOOP-9106. Major improvement reported by Todd Lipcon and fixed by Robert Parker (ipc)
Allow configuration of IPC connect timeout

This jira introduces a new configuration parameter "ipc.client.connect.timeout". This configuration defines the Hadoop RPC connection timeout in milliseconds for a client to connect to a server. For details see the description associated with this configuration in core-default.xml.
HADOOP-9105. Major bug reported by Daryn Sharp and fixed by Daryn Sharp (fs)
FsShell -moveFromLocal erroneously fails
HADOOP-9103. Major bug reported by yixiaohua and fixed by Todd Lipcon (io)
UTF8 class does not properly decode Unicode characters outside the basic multilingual plane
HADOOP-9097. Critical bug reported by Tom White and fixed by Thomas Graves (build)
Maven RAT plugin is not checking all source files
HADOOP-9093. Major improvement reported by Suresh Srinivas and fixed by Suresh Srinivas
Move all the Exception in PathExceptions to o.a.h.fs package
HADOOP-9090. Minor new feature reported by Mostafa Elhemali and fixed by Mostafa Elhemali (metrics)
Support on-demand publish of metrics
HADOOP-9072. Major bug reported by Robert Parker and fixed by Robert Parker
Hadoop-Common-0.23-Build Fails to build in Jenkins
HADOOP-9070. Blocker sub-task reported by Daryn Sharp and fixed by Daryn Sharp (ipc)
Kerberos SASL server cannot find kerberos key
HADOOP-9067. Minor test reported by Ivan A. Veselovsky and fixed by Ivan A. Veselovsky
provide test for method org.apache.hadoop.fs.LocalFileSystem.reportChecksumFailure(Path, FSDataInputStream, long, FSDataInputStream, long)
HADOOP-9064. Major bug reported by Karthik Kambatla and fixed by Karthik Kambatla (security)
Augment DelegationTokenRenewer API to cancel the tokens on calls to removeRenewAction
HADOOP-9054. Major new feature reported by Robert Kanter and fixed by Robert Kanter (security)
Add AuthenticationHandler that uses Kerberos but allows for an alternate form of authentication for browsers
HADOOP-9049. Major bug reported by Karthik Kambatla and fixed by Karthik Kambatla (security)
DelegationTokenRenewer needs to be Singleton and FileSystems should register/deregister to/from.
HADOOP-9042. Minor test reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe
Add a test for umask in FileSystemContractBaseTest
HADOOP-9041. Critical bug reported by Radim Kolar and fixed by Radim Kolar (fs)
FileSystem initialization can go into infinite loop
HADOOP-9038. Minor test reported by Ivan A. Veselovsky and fixed by Ivan A. Veselovsky
provide unit-test coverage of class org.apache.hadoop.fs.LocalDirAllocator.AllocatorPerContext.PathIterator
HADOOP-9035. Major sub-task reported by Daryn Sharp and fixed by Daryn Sharp (security)
Generalize setup of LoginContext
HADOOP-9025. Major bug reported by Robert Joseph Evans and fixed by Jonathan Eagles
org.apache.hadoop.tools.TestCopyListing failing
HADOOP-9022. Major bug reported by Haiyang Jiang and fixed by Jonathan Eagles
Hadoop distcp tool fails to copy file if -m 0 specified
HADOOP-9021. Major sub-task reported by Daryn Sharp and fixed by Daryn Sharp (ipc , security)
Enforce configured SASL method on the server
HADOOP-9020. Major sub-task reported by Daryn Sharp and fixed by Daryn Sharp (ipc , security)
Add a SASL PLAIN server
HADOOP-9015. Major sub-task reported by Daryn Sharp and fixed by Daryn Sharp (ipc)
Standardize creation of SaslRpcServers
HADOOP-9014. Major sub-task reported by Daryn Sharp and fixed by Daryn Sharp (ipc)
Standardize creation of SaslRpcClients
HADOOP-9013. Major sub-task reported by Daryn Sharp and fixed by Daryn Sharp (fs , security)
UGI should not hardcode loginUser's authenticationType
HADOOP-9012. Major sub-task reported by Daryn Sharp and fixed by Daryn Sharp (ipc)
IPC Client sends wrong connection context
HADOOP-9010. Major sub-task reported by Daryn Sharp and fixed by Daryn Sharp (fs , security)
Map UGI authenticationMethod to RPC authMethod
HADOOP-9009. Major sub-task reported by Daryn Sharp and fixed by Daryn Sharp (fs , security)
Add SecurityUtil methods to get/set authentication method
HADOOP-9004. Major improvement reported by Stephen Chu and fixed by Stephen Chu (security , test)
Allow security unit tests to use external KDC
HADOOP-8999. Major sub-task reported by Daryn Sharp and fixed by Daryn Sharp (ipc)
SASL negotiation is flawed

The RPC SASL negotiation now always ends with final response. If the SASL mechanism does not have a final response (GSSAPI, PLAIN), then an empty success response is sent to the client. The client will now always expect a final response to definitively know if negotiation is complete/successful.
HADOOP-8998. Minor improvement reported by Andy Isaacson and fixed by Alejandro Abdelnur
set Cache-Control no-cache header on all dynamic content
HADOOP-8994. Minor bug reported by Andy Isaacson and fixed by Andy Isaacson (test)
TestDFSShell creates file named "noFileHere", making further tests hard to understand
HADOOP-8992. Minor improvement reported by Ivan A. Veselovsky and fixed by Ivan A. Veselovsky
Enhance unit-test coverage of class HarFileSystem
HADOOP-8986. Critical bug reported by Robert Joseph Evans and fixed by Robert Joseph Evans (ipc)
Server$Call object is never released after it is sent
HADOOP-8985. Minor improvement reported by Binglin Chang and fixed by Binglin Chang (ha , ipc)
Add namespace declarations in .proto files for languages other than java
HADOOP-8981. Major bug reported by Chris Nauroth and fixed by Xuan Gong (metrics)
TestMetricsSystemImpl fails on Windows
HADOOP-8962. Critical bug reported by Jason Lowe and fixed by Jason Lowe (fs)
RawLocalFileSystem.listStatus fails when a child filename contains a colon
HADOOP-8951. Minor improvement reported by Steve Loughran and fixed by Steve Loughran (util)
RunJar to fail with user-comprehensible error message if jar missing
HADOOP-8948. Major bug reported by Chris Nauroth and fixed by Chris Nauroth (test)
TestFileUtil.testGetDU fails on Windows due to incorrect assumption of line separator
HADOOP-8932. Major improvement reported by Kihwal Lee and fixed by Kihwal Lee (security)
JNI-based user-group mapping modules can be too chatty on lookup failures
HADOOP-8931. Trivial improvement reported by Eli Collins and fixed by Eli Collins
Add Java version to startup message
HADOOP-8930. Major improvement reported by Andrey Klochkov and fixed by Andrey Klochkov (test)
Cumulative code coverage calculation
HADOOP-8929. Major improvement reported by Todd Lipcon and fixed by Todd Lipcon (metrics)
Add toString, other improvements for SampleQuantiles
HADOOP-8926. Major improvement reported by Gopal V and fixed by Gopal V (util)
hadoop.util.PureJavaCrc32 cache hit-ratio is low for static data

Speed up Crc32 by improving the cache hit-ratio of hadoop.util.PureJavaCrc32
HADOOP-8925. Minor improvement reported by Eli Collins and fixed by Eli Collins (build)
Remove the packaging
HADOOP-8922. Trivial improvement reported by Damien Hardy and fixed by Damien Hardy (metrics)
Provide alternate JSONP output for JMXJsonServlet to allow javascript in browser dashboard

Add a JSONP alternative outpout for /jmx HTTP interface to provide a Javascript polling ability in browsers.
HADOOP-8913. Minor bug reported by Sandy Ryza and fixed by Sandy Ryza (metrics)
hadoop-metrics2.properties should give units in comment for sampling period
HADOOP-8912. Major bug reported by Raja Aluri and fixed by Raja Aluri (build)
adding .gitattributes file to prevent CRLF and LF mismatches for source and text files
HADOOP-8911. Major bug reported by Raja Aluri and fixed by Raja Aluri (build)
CRLF characters in source and text files
HADOOP-8909. Major improvement reported by Chris Nauroth and fixed by Chris Nauroth (build)
Hadoop Common Maven protoc calls must not depend on external sh script
HADOOP-8906. Critical bug reported by Daryn Sharp and fixed by Daryn Sharp (fs)
paths with multiple globs are unreliable
HADOOP-8901. Minor bug reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe (native)
GZip and Snappy support may not work without unversioned libraries
HADOOP-8900. Major bug reported by Slavik Krassovsky and fixed by Andy Isaacson
BuiltInGzipDecompressor throws IOException - stored gzip size doesn't match decompressed size
HADOOP-8894. Major improvement reported by Todd Lipcon and fixed by Todd Lipcon
GenericTestUtils.waitFor should dump thread stacks on timeout
HADOOP-8889. Major improvement reported by Todd Lipcon and fixed by Todd Lipcon (build , test)
Upgrade to Surefire 2.12.3
HADOOP-8883. Major bug reported by Robert Kanter and fixed by Robert Kanter
Anonymous fallback in KerberosAuthenticator is broken
HADOOP-8881. Major bug reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur (security)
FileBasedKeyStoresFactory initialization logging should be debug not info
HADOOP-8878. Major bug reported by Arpit Gupta and fixed by Arpit Gupta
uppercase namenode hostname causes hadoop dfs calls with webhdfs filesystem and fsck to fail when security is on
HADOOP-8866. Minor improvement reported by Andrew Wang and fixed by Andrew Wang
SampleQuantiles#query is O(N^2) instead of O(N)
HADOOP-8860. Major task reported by Tom White and fixed by Tom White (documentation)
Split MapReduce and YARN sections in documentation navigation
HADOOP-8855. Minor bug reported by Todd Lipcon and fixed by Todd Lipcon (security)
SSL-based image transfer does not work when Kerberos is disabled
HADOOP-8851. Minor improvement reported by Ivan A. Veselovsky and fixed by Ivan A. Veselovsky (test)
Use -XX:+HeapDumpOnOutOfMemoryError JVM option in the forked tests
HADOOP-8849. Minor improvement reported by Ivan A. Veselovsky and fixed by Ivan A. Veselovsky
FileUtil#fullyDelete should grant the target directories +rwx permissions before trying to delete them
HADOOP-8843. Critical bug reported by Robert Joseph Evans and fixed by Jason Lowe
Old trash directories are never deleted on upgrade from 1.x
HADOOP-8833. Major bug reported by Harsh J and fixed by Harsh J (fs)
fs -text should make sure to call inputstream.seek(0) before using input stream
HADOOP-8822. Major bug reported by Robert Joseph Evans and fixed by Robert Joseph Evans
relnotes.py was deleted post mavenization
HADOOP-8819. Major bug reported by Brandon Li and fixed by Brandon Li (fs)
Should use && instead of & in a few places in FTPFileSystem,FTPInputStream,S3InputStream,ViewFileSystem,ViewFs
HADOOP-8816. Major bug reported by Moritz Moeller and fixed by Moritz Moeller (net)
HTTP Error 413 full HEAD if using kerberos authentication
HADOOP-8812. Minor improvement reported by Eli Collins and fixed by Eli Collins
ExitUtil#terminate should print Exception#toString
HADOOP-8811. Critical bug reported by Radim Kolar and fixed by Radim Kolar (native)
Compile hadoop native library in FreeBSD
HADOOP-8806. Minor improvement reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe (build)
libhadoop.so: dlopen should be better at locating libsnappy.so, etc.
HADOOP-8804. Minor improvement reported by Eli Collins and fixed by Senthil V Kumar
Improve Web UIs when the wildcard address is used
HADOOP-8795. Minor bug reported by Sean Mackrory and fixed by Sean Mackrory (scripts)
BASH tab completion doesn't look in PATH, assumes path to executable is specified
HADOOP-8791. Major bug reported by Bertrand Dechoux and fixed by Jing Zhao (documentation)
rm "Only deletes non empty directory and files."
HADOOP-8789. Minor improvement reported by Andy Isaacson and fixed by Andy Isaacson (test)
Tests setLevel(Level.OFF) should be Level.ERROR
HADOOP-8786. Major bug reported by Todd Lipcon and fixed by Todd Lipcon
HttpServer continues to start even if AuthenticationFilter fails to init
HADOOP-8784. Major sub-task reported by Daryn Sharp and fixed by Daryn Sharp (ipc , security)
Improve IPC.Client's token use
HADOOP-8783. Major sub-task reported by Daryn Sharp and fixed by Daryn Sharp (ipc , security)
Improve RPC.Server's digest auth
HADOOP-8780. Major bug reported by Ahmed Radwan and fixed by Ahmed Radwan
Update DeprecatedProperties apt file
HADOOP-8756. Minor bug reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe (native)
Fix SEGV when libsnappy is in java.library.path but not LD_LIBRARY_PATH
HADOOP-8755. Major improvement reported by Andrey Klochkov and fixed by Andrey Klochkov (test)
Print thread dump when tests fail due to timeout
HADOOP-8736. Major improvement reported by Brandon Li and fixed by Brandon Li (ipc)
Add Builder for building an RPC server
HADOOP-8713. Major bug reported by Trevor Robinson and fixed by Trevor Robinson (test)
TestRPCCompatibility fails intermittently with JDK7
HADOOP-8712. Minor improvement reported by Robert Parker and fixed by Robert Parker (security)
Change default hadoop.security.group.mapping

The default group mapping policy has been changed to JniBasedUnixGroupsNetgroupMappingWithFallback. This should maintain the same semantics as the prior default for most users.
HADOOP-8684. Minor bug reported by Hiroshi Ikeda and fixed by Jing Zhao (io)
Deadlock between WritableComparator and WritableComparable
HADOOP-8616. Major bug reported by Eli Collins and fixed by Sandy Ryza (viewfs)
ViewFS configuration requires a trailing slash
HADOOP-8597. Major new feature reported by Harsh J and fixed by Ivan Vladimirov Ivanov (fs)
FsShell's Text command should be able to read avro data files
HADOOP-8589. Major bug reported by Andrey Klochkov and fixed by Sanjay Radia (fs , test)
ViewFs tests fail when tests and home dirs are nested
HADOOP-8561. Major improvement reported by Luke Lu and fixed by Yu Gao (security)
Introduce HADOOP_PROXY_USER for secure impersonation in child hadoop client processes
HADOOP-8427. Major task reported by Eli Collins and fixed by Andy Isaacson (documentation)
Convert Forrest docs to APT, incremental
HADOOP-8418. Major bug reported by Luke Lu and fixed by Yu Gao (security)
Fix UGI for IBM JDK running on Windows
HADOOP-7886. Minor improvement reported by Jakob Homan and fixed by SreeHari
Add toString to FileStatus
HADOOP-7688. Major improvement reported by Tsz Wo (Nicholas), SZE and fixed by Uma Maheswara Rao G
When a servlet filter throws an exception in init(..), the Jetty server failed silently.
HADOOP-7115. Major bug reported by Arun C Murthy and fixed by Alejandro Abdelnur
Add a cache for getpwuid_r and getpwgid_r calls
HADOOP-6762. Critical bug reported by sam rash and fixed by sam rash
exception while doing RPC I/O closes channel
HADOOP-6607. Minor bug reported by Steve Loughran and fixed by Alejandro Abdelnur (io)
Add different variants of non caching HTTP headers

Hadoop 2.0.2-alpha Release Notes

These release notes include new developer and user-facing incompatibilities, features, and major improvements.

Changes since Hadoop 2.0.1-alpha

YARN-137. Major improvement reported by Siddharth Seth and fixed by Siddharth Seth (scheduler)
Change the default scheduler to the CapacityScheduler

There's some bugs in the FifoScheduler atm - doesn't distribute tasks across nodes and some headroom (available resource) issues. That's not the best experience for users trying out the 2.0 branch. The CS with the default configuration of a single queue behaves the same as the FifoScheduler and doesn't have these issues.
YARN-108. Critical bug reported by Jason Lowe and fixed by Jason Lowe (nodemanager)
FSDownload can create cache directories with the wrong permissions

When the cluster is configured with a restrictive umask, e.g.: {{fs.permissions.umask-mode=0077}}, the nodemanager can end up creating directory entries in the public cache with the wrong permissions. The permissions can end up where only the nodemanager user can access files in the public cache, preventing jobs from running properly.
YARN-106. Major bug reported by Jason Lowe and fixed by Jason Lowe (nodemanager)
Nodemanager needs to set permissions of local directories

If the nodemanager process is running with a restrictive default umask (e.g.: 0077) then it will create its local directories with permissions that are too restrictive to allow containers from other users to run.
YARN-88. Major bug reported by Jason Lowe and fixed by Jason Lowe (nodemanager)
DefaultContainerExecutor can fail to set proper permissions

{{DefaultContainerExecutor}} can fail to set the proper permissions on its local directories if the cluster has been configured with a restrictive umask, e.g.: fs.permissions.umask-mode=0077. The configured umask ends up defeating the permissions requested by {{DefaultContainerExecutor}} when it creates directories.
YARN-87. Critical bug reported by Jason Lowe and fixed by Jason Lowe (nodemanager)
NM ResourceLocalizationService does not set permissions of local cache directories

{{ResourceLocalizationService}} creates a file cache and user cache directory when it starts up but doesn't specify the permissions for them when they are created. If the cluster configs are set to limit the default permissions (e.g.: fs.permissions.umask-mode=0077 instead of the default 0022), then the cache directories are created with too-restrictive permissions and no jobs are able to run.
YARN-83. Major bug reported by Bikas Saha and fixed by Bikas Saha (client)
Change package of YarnClient to include apache

Currently its org.hadoop.* instead of org.apache.hadoop.*
YARN-80. Major improvement reported by Todd Lipcon and fixed by Arun C Murthy (capacityscheduler)
Support delay scheduling for node locality in MR2's capacity scheduler

The capacity scheduler in MR2 doesn't support delay scheduling for achieving node-level locality. So, jobs exhibit poor data locality even if they have good rack locality. Especially on clusters where disk throughput is much better than network capacity, this hurts overall job performance. We should optionally support node-level delay scheduling heuristics similar to what the fair scheduler implements in MR1.
YARN-79. Major bug reported by Bikas Saha and fixed by Vinod Kumar Vavilapalli (client)
Calling YarnClientImpl.close throws Exception

The following exception is thrown =========== *org.apache.hadoop.HadoopIllegalArgumentException: Cannot close proxy - is not Closeable or does not provide closeable invocation handler class org.apache.hadoop.yarn.api.impl.pb.client.ClientRMProtocolPBClientImpl* *at org.apache.hadoop.ipc.RPC.stopProxy(RPC.java:624)* *at org.hadoop.yarn.client.YarnClientImpl.stop(YarnClientImpl.java:102)* at org.apache.hadoop.yarn.applications.unmanagedamlauncher.UnmanagedAMLauncher.run(UnmanagedAMLauncher.java:336) at org.apache.hadoop.yarn.applications.unmanagedamlauncher.TestUnmanagedAMLauncher.testDSShell(TestUnmanagedAMLauncher.java:156) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) at org.junit.runners.BlockJUnit4ClassRunner.runNotIgnored(BlockJUnit4ClassRunner.java:79) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:71) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:49) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31) at org.junit.runners.ParentRunner.run(ParentRunner.java:236) at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:236) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:134) at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:113) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(ReflectionUtils.java:189) at org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:165) at org.apache.maven.surefire.booter.ProviderFactory.invokeProvider(ProviderFactory.java:85) at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:103) at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:74) ===========
YARN-75. Major bug reported by Siddharth Seth and fixed by Siddharth Seth
RMContainer should handle a RELEASE event while RUNNING

An AppMaster can send a container release at any point. Currently this results in an exception, if this is done while the RM considers the container to be RUNNING. The event not being processed correctly also implies that these containers do not show up in the Completed Container List seen by the AM (AMRMProtocol). MR-3902 depends on this set being complete.
YARN-68. Major bug reported by patrick white and fixed by Daryn Sharp (nodemanager)
NodeManager will refuse to shutdown indefinitely due to container log aggregation

The nodemanager is able to get into a state where containermanager.logaggregation.AppLogAggregatorImpl will apparently wait indefinitely for log aggregation to complete for an application, even if that application has abnormally terminated and is no longer present. Observed behavior is that an attempt to stop the nodemanager daemon will return but have no effect, the nm log continually displays messages similar to this: [Thread-1]2012-08-21 17:44:07,581 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl: Waiting for aggregation to complete for application_1345221477405_2733 The only recovery we found to work was to 'kill -9' the nm process. What exactly causes the NM to enter this state is unclear but we do see this behavior reliably when the NM has run a task which failed, for example when debugging oozie distcp actions and having a distcp map task fail, the NM that was running the container will now enter this state where a shutdown on said NM will never complete, 'never' in this case was waiting for 2 hours before killing the nodemanager process.
YARN-66. Critical bug reported by Thomas Graves and fixed by Thomas Graves (nodemanager)
aggregated logs permissions not set properly

If the default file permissions are set to something restrictive - like 700, application logs get aggregated and created with those restrictive file permissions which doesn't allow the history server to serve them up. They need to be created with group readable similar to how log aggregation sets up the directory permissions.
YARN-63. Major bug reported by Jason Lowe and fixed by Jason Lowe (resourcemanager)
RMNodeImpl is missing valid transitions from the UNHEALTHY state

The ResourceManager isn't properly handling nodes that have been marked UNHEALTHY when they are lost or are decommissioned.
YARN-60. Blocker sub-task reported by Daryn Sharp and fixed by Vinod Kumar Vavilapalli (nodemanager)
NMs rejects all container tokens after secret key rolls

The NM's token secret manager will reject all container tokens after the secret key is activated which means the NM will not launch _any_ containers including AMs. The whole yarn cluster becomes inoperable in 1d.
YARN-58. Critical bug reported by Daryn Sharp and fixed by Jason Lowe (nodemanager)
NM leaks filesystems

The NM is exhausting its fds because it's not closing fs instances when the app is finished.
YARN-42. Major bug reported by Devaraj K and fixed by Devaraj K (nodemanager)
Node Manager throws NPE on startup

NM throws NPE on startup if it doesn't have persmission's on nm local dir's {code:xml} 2012-05-14 16:32:13,468 FATAL org.apache.hadoop.yarn.server.nodemanager.NodeManager: Error starting NodeManager org.apache.hadoop.yarn.YarnException: Failed to initialize LocalizationService at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.init(ResourceLocalizationService.java:202) at org.apache.hadoop.yarn.service.CompositeService.init(CompositeService.java:58) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.init(ContainerManagerImpl.java:183) at org.apache.hadoop.yarn.service.CompositeService.init(CompositeService.java:58) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.init(NodeManager.java:166) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:268) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:284) Caused by: java.io.IOException: mkdir of /mrv2/tmp/nm-local-dir/usercache failed at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:907) at org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:143) at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189) at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:706) at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:703) at org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2325) at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:703) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.init(ResourceLocalizationService.java:188) ... 6 more 2012-05-14 16:32:13,472 INFO org.apache.hadoop.yarn.service.CompositeService: Error stopping org.apache.hadoop.yarn.server.nodemanager.containermanager.loghandler.NonAggregatingLogHandler java.lang.NullPointerException at org.apache.hadoop.yarn.server.nodemanager.containermanager.loghandler.NonAggregatingLogHandler.stop(NonAggregatingLogHandler.java:82) at org.apache.hadoop.yarn.service.CompositeService.stop(CompositeService.java:99) at org.apache.hadoop.yarn.service.CompositeService.stop(CompositeService.java:89) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.stop(ContainerManagerImpl.java:266) at org.apache.hadoop.yarn.service.CompositeService.stop(CompositeService.java:99) at org.apache.hadoop.yarn.service.CompositeService.stop(CompositeService.java:89) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.stop(NodeManager.java:182) at org.apache.hadoop.yarn.service.CompositeService$CompositeServiceShutdownHook.run(CompositeService.java:122) at org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54) {code}
YARN-39. Critical sub-task reported by Vinod Kumar Vavilapalli and fixed by Vinod Kumar Vavilapalli
RM-NM secret-keys should be randomly generated and rolled every so often

- RM should generate the master-key randomly - The master-key should roll every so often - NM should remember old expired keys so that already doled out container-requests can be satisfied.
YARN-37. Minor bug reported by Jason Lowe and fixed by Mayank Bansal (resourcemanager)
TestRMAppTransitions.testAppSubmittedKilled passes for the wrong reason

TestRMAppTransitions#testAppSubmittedKilled causes an invalid event exception but the test doesn't catch the error since the final app state is still killed. Killed for the wrong reason, but the final state is the same.
YARN-36. Blocker bug reported by Eli Collins and fixed by Radim Kolar
branch-2.1.0-alpha doesn't build

branch-2.1.0-alpha doesn't build due to the following. Per YARN-1 I updated the mvn version to be 2.1.0-SNAPSHOT, before I hit this issue it didn't compile due to the bogus version. {noformat} hadoop-branch-2.1.0-alpha $ mvn compile [INFO] Scanning for projects... [ERROR] The build could not read 1 project -> [Help 1] [ERROR] [ERROR] The project org.apache.hadoop:hadoop-yarn-project:2.1.0-SNAPSHOT (/home/eli/src/hadoop-branch-2.1.0-alpha/hadoop-yarn-project/pom.xml) has 1 error [ERROR] 'dependencies.dependency.version' for org.hsqldb:hsqldb:jar is missing. @ line 160, column 17 {noformat}
YARN-31. Major bug reported by Thomas Graves and fixed by Thomas Graves
TestDelegationTokenRenewer fails on jdk7

TestDelegationTokenRenewer fails when run with jdk7. With JDK7, test methods run in an undefined order. Here it is expecting that testDTRenewal runs first but it no longer is.
YARN-29. Major bug reported by Vinod Kumar Vavilapalli and fixed by Vinod Kumar Vavilapalli (client)
Add a yarn-client module

I see that we are duplicating (some) code for talking to RM via client API. In this light, a yarn-client module will be useful so that clients of all frameworks can use/extend it. And that same module can be the destination for all the YARN's command line tools.
YARN-27. Major bug reported by Ramya Sunil and fixed by Arun C Murthy
Failed refreshQueues due to misconfiguration prevents further refreshing of queues

Stumbled upon this problem while refreshing queues with incorrect configuration. The exact scenario was: 1. Added a new queue "newQueue" without defining its capacity. 2. "bin/mapred queue -refreshQueues" fails correctly with "Illegal capacity of -1 for queue root.newQueue" 3. However, after defining the capacity of "newQueue" followed by a second "bin/mapred queue -refreshQueues" throws "org.apache.hadoop.metrics2.MetricsException: Metrics source QueueMetrics,q0=root,q1=newQueue already exists!" Also see Hadoop:name=QueueMetrics,q0=root,q1=newQueue,service=ResourceManager metrics being available even though the queue was not added. The expected behavior would be to refresh the queues correctly and allow addition of "newQueue".
YARN-25. Major bug reported by Thomas Graves and fixed by Robert Joseph Evans
remove old aggregated logs

Currently the aggregated user logs under NM_REMOTE_APP_LOG_DIR are never removed. We should have mechanism to remove them after certain period. It might make sense for job history server to remove them.
YARN-22. Minor bug reported by Eli Collins and fixed by Mayank Bansal
Using URI for yarn.nodemanager log dirs fails

If I use URIs (eg file:///home/eli/hadoop/dirs) for yarn.nodemanager.log-dirs or yarn.nodemanager.remote-app-log-dir the container log servlet fails with an NPE (works if I remove the "file" scheme). Using a URI for yarn.nodemanager.local-dirs works.
YARN-15. Critical bug reported by Alejandro Abdelnur and fixed by Arun C Murthy (nodemanager)
YarnConfiguration DEFAULT_YARN_APPLICATION_CLASSPATH should be updated

{code} /** * Default CLASSPATH for YARN applications. A comma-separated list of * CLASSPATH entries */ public static final String[] DEFAULT_YARN_APPLICATION_CLASSPATH = { "$HADOOP_CONF_DIR", "$HADOOP_COMMON_HOME/share/hadoop/common/*", "$HADOOP_COMMON_HOME/share/hadoop/common/lib/*", "$HADOOP_HDFS_HOME/share/hadoop/hdfs/*", "$HADOOP_HDFS_HOME/share/hadoop/hdfs/lib/*", "$YARN_HOME/share/hadoop/mapreduce/*", "$YARN_HOME/share/hadoop/mapreduce/lib/*"}; {code} It should have {{share/yarn/}} and MR should add the {{share/mapreduce/}} (another JIRA?)
YARN-14. Major bug reported by Jason Lowe and fixed by Jason Lowe (nodemanager)
Symlinks to peer distributed cache files no longer work

Trying to create a symlink to another file that is specified for the distributed cache will fail to create the link. For example: hadoop jar ... -files "x,y,x#z" will localize the files x and y as x and y, but the z symlink for x will not be created. This is a regression from 1.x behavior.
YARN-13. Critical bug reported by Todd Lipcon and fixed by
Merge of yarn reorg into branch-2 copied trunk tree

When the move of yarn from inside MR to the project root was merged into branch-2, it seems like the trunk code base was actually copied into the branch-2 branch, instead of a parallel move occurring. So, the poms in branch-2 show the version as 3.0.0-SNAPSHOT instead of a 2.x snapshot version. This is breaking the branch-2 build.
YARN-12. Major bug reported by Junping Du and fixed by Junping Du (scheduler)
Several Findbugs issues with new FairScheduler in YARN

The good feature of FairScheduler is added recently to YARN. As recently PreCommit test from MAPREDUCE-4309, there are several bugs found by Findbugs related to FairScheduler: org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairSchedulerEventLog.shutdown() might ignore java.lang.Exception Inconsistent synchronization of org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairSchedulerEventLog.logDisabled; locked 50% of time Inconsistent synchronization of org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.queueMaxAppsDefault; locked 50% of time Inconsistent synchronization of org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.userMaxAppsDefault; locked 50% of time The details are in:https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2612//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html#DE_MIGHT_IGNORE
YARN-10. Major improvement reported by Arun C Murthy and fixed by Hitesh Shah
dist-shell shouldn't have a (test) dependency on hadoop-mapreduce-client-core

dist-shell shouldn't have a (test) dependency on hadoop-mapreduce-client-core, this should be removed.
YARN-9. Major improvement reported by Arun C Murthy and fixed by Vinod Kumar Vavilapalli
Rename YARN_HOME to HADOOP_YARN_HOME

We should rename YARN_HOME to HADOOP_YARN_HOME to be consistent with rest of Hadoop sub-projects.
YARN-1. Major task reported by Arun C Murthy and fixed by Arun C Murthy
Move YARN out of hadoop-mapreduce

Move YARN out of hadoop-mapreduce-project into hadoop-yarn-project in hadoop trunk
MAPREDUCE-4691. Critical bug reported by Jason Lowe and fixed by Robert Joseph Evans (jobhistoryserver , mrv2)
Historyserver can report "Unknown job" after RM says job has completed
MAPREDUCE-4689. Major bug reported by Jason Lowe and fixed by Jason Lowe (client)
JobClient.getMapTaskReports on failed job results in NPE
MAPREDUCE-4649. Major bug reported by Vinod Kumar Vavilapalli and fixed by Vinod Kumar Vavilapalli (jobhistoryserver)
mr-jobhistory-daemon.sh needs to be updated post YARN-1
MAPREDUCE-4647. Major bug reported by Robert Joseph Evans and fixed by Robert Joseph Evans (mrv2)
We should only unjar jobjar if there is a lib directory in it.
MAPREDUCE-4646. Major bug reported by Jason Lowe and fixed by Jason Lowe (mrv2)
client does not receive job diagnostics for failed jobs
MAPREDUCE-4642. Major bug reported by Robert Kanter and fixed by Robert Kanter (test)
MiniMRClientClusterFactory should not use job.setJar()
MAPREDUCE-4641. Major bug reported by Jason Lowe and fixed by Jason Lowe (mrv2)
Exception in commitJob marks job as successful in job history
MAPREDUCE-4638. Major improvement reported by Arun C Murthy and fixed by Arun C Murthy
MR AppMaster shouldn't rely on YARN_APPLICATION_CLASSPATH providing MR jars
MAPREDUCE-4635. Major bug reported by Bikas Saha and fixed by Bikas Saha
MR side of YARN-83. Changing package of YarnClient
MAPREDUCE-4633. Critical bug reported by Thomas Graves and fixed by Thomas Graves (jobhistoryserver)
history server doesn't set permissions on all subdirs
MAPREDUCE-4629. Major bug reported by Karthik Kambatla and fixed by Karthik Kambatla
Remove JobHistory.DEBUG_MODE
MAPREDUCE-4614. Major improvement reported by Daryn Sharp and fixed by Daryn Sharp (client , task)
Simplify debugging a job's tokens
MAPREDUCE-4612. Critical bug reported by Thomas Graves and fixed by Thomas Graves
job summary file permissions not set when its created
MAPREDUCE-4611. Critical bug reported by Robert Joseph Evans and fixed by Robert Joseph Evans
MR AM dies badly when Node is decomissioned
MAPREDUCE-4610. Major bug reported by Tom White and fixed by Tom White (mrv2)
Support deprecated mapreduce.job.counters.limit property in MR2
MAPREDUCE-4608. Major bug reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur (build)
hadoop-mapreduce-client is missing some dependencies
MAPREDUCE-4604. Critical bug reported by Ravi Prakash and fixed by Ravi Prakash (mrv2)
In mapred-default, mapreduce.map.maxattempts & mapreduce.reduce.maxattempts defaults are set to 4 as well as mapreduce.job.maxtaskfailures.per.tracker.
MAPREDUCE-4600. Critical bug reported by Robert Joseph Evans and fixed by Daryn Sharp
TestTokenCache.java from MRV1 no longer compiles
MAPREDUCE-4580. Major bug reported by Vinod Kumar Vavilapalli and fixed by Vinod Kumar Vavilapalli
Change MapReduce to use the yarn-client module
MAPREDUCE-4579. Major bug reported by Thomas Graves and fixed by Thomas Graves
TestTaskAttempt fails jdk7
MAPREDUCE-4577. Minor bug reported by Alejandro Abdelnur and fixed by Aaron T. Myers (test)
HDFS-3672 broke TestCombineFileInputFormat.testMissingBlocks() test
MAPREDUCE-4572. Major bug reported by Ahmed Radwan and fixed by Ahmed Radwan (tasktracker , webapps)
Can not access user logs - Jetty is not configured by default to serve aliases/symlinks
MAPREDUCE-4570. Minor bug reported by Ahmed Radwan and fixed by Ahmed Radwan (mrv2)
ProcfsBasedProcessTree#constructProcessInfo() prints a warning if procfsDir/<pid>/stat is not found.
MAPREDUCE-4569. Major bug reported by Thomas Graves and fixed by Thomas Graves
TestHsWebServicesJobsQuery fails on jdk7
MAPREDUCE-4562. Major bug reported by Jarek Jarcec Cecho and fixed by Jarek Jarcec Cecho
Support for "FileSystemCounter" legacy counter group name for compatibility reasons is creating incorrect counter name
MAPREDUCE-4511. Major improvement reported by Ahmed Radwan and fixed by Ahmed Radwan (mrv1 , mrv2 , performance)
Add IFile readahead
MAPREDUCE-4504. Major bug reported by Robert Joseph Evans and fixed by Robert Joseph Evans (mrv2)
SortValidator writes to wrong directory
MAPREDUCE-4503. Major bug reported by Robert Joseph Evans and fixed by Robert Joseph Evans (mrv2)
Should throw InvalidJobConfException if duplicates found in cacheArchives or cacheFiles
MAPREDUCE-4498. Critical bug reported by Robert Kanter and fixed by Robert Kanter (build , examples)
Remove hsqldb jar from Hadoop runtime classpath
MAPREDUCE-4496. Major bug reported by Jason Lowe and fixed by Jason Lowe (applicationmaster , mrv2)
AM logs link is missing user name
MAPREDUCE-4494. Major bug reported by Ahmed Radwan and fixed by Ahmed Radwan (mrv2 , test)
TestFifoScheduler failing with Metrics source QueueMetrics,q0=default already exists!
MAPREDUCE-4493. Critical bug reported by Robert Joseph Evans and fixed by Robert Joseph Evans (mrv2)
Distibuted Cache Compatability Issues
MAPREDUCE-4492. Minor bug reported by Nishan Shetty and fixed by Mayank Bansal (mrv2)
Configuring total queue capacity between 100.5 and 99.5 at perticular level is sucessfull
MAPREDUCE-4484. Major bug reported by Ahmed Radwan and fixed by Ahmed Radwan (mrv2)
Incorrect IS_MINI_YARN_CLUSTER property name in YarnConfiguration
MAPREDUCE-4483. Major bug reported by John George and fixed by John George
2.0 build does not work
MAPREDUCE-4470. Major bug reported by Kihwal Lee and fixed by Ilya Katsov (test)
Fix TestCombineFileInputFormat.testForEmptyFile
MAPREDUCE-4467. Critical bug reported by Andrey Klochkov and fixed by Kihwal Lee (nodemanager)
IndexCache failures due to missing synchronization
MAPREDUCE-4465. Trivial bug reported by Bo Wang and fixed by Bo Wang
Update description of yarn.nodemanager.address property
MAPREDUCE-4457. Critical bug reported by Thomas Graves and fixed by Robert Joseph Evans (mrv2)
mr job invalid transition TA_TOO_MANY_FETCH_FAILURE at FAILED
MAPREDUCE-4456. Major bug reported by Robert Joseph Evans and fixed by Robert Joseph Evans (mrv2)
LocalDistributedCacheManager can get an ArrayIndexOutOfBounds when creating symlinks
MAPREDUCE-4449. Major bug reported by Ahmed Radwan and fixed by Ahmed Radwan (mrv2)
Incorrect MR_HISTORY_STORAGE property name in JHAdminConfig
MAPREDUCE-4448. Critical bug reported by Jason Lowe and fixed by Jason Lowe (mrv2 , nodemanager)
Nodemanager crashes upon application cleanup if aggregation failed to start
MAPREDUCE-4447. Major bug reported by Eli Collins and fixed by Eli Collins (build)
Remove aop from cruft from the ant build
MAPREDUCE-4444. Blocker bug reported by Nathan Roberts and fixed by Jason Lowe (nodemanager)
nodemanager fails to start when one of the local-dirs is bad
MAPREDUCE-4441. Blocker bug reported by Karthik Kambatla and fixed by Karthik Kambatla
Fix build issue caused by MR-3451
MAPREDUCE-4440. Major bug reported by Arun C Murthy and fixed by Arun C Murthy
Change SchedulerApp & SchedulerNode to be a minimal interface
MAPREDUCE-4437. Critical bug reported by Jason Lowe and fixed by Jason Lowe (applicationmaster , mrv2)
Race in MR ApplicationMaster can cause reducers to never be scheduled
MAPREDUCE-4432. Trivial bug reported by Gabriel Reid and fixed by
Confusing warning message when GenericOptionsParser is not used
MAPREDUCE-4427. Major improvement reported by Bikas Saha and fixed by Bikas Saha
Enable the RM to work with AM's that are not managed by it
MAPREDUCE-4423. Critical bug reported by Robert Joseph Evans and fixed by Robert Joseph Evans (mrv2)
Potential infinite fetching of map output
MAPREDUCE-4422. Major improvement reported by Arun C Murthy and fixed by Ahmed Radwan (nodemanager)
YARN_APPLICATION_CLASSPATH needs a documented default value in YarnConfiguration
MAPREDUCE-4419. Major bug reported by Nishan Shetty and fixed by Devaraj K (mrv2)
./mapred queue -info <queuename> -showJobs displays all the jobs irrespective of <queuename>
MAPREDUCE-4417. Major new feature reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur (mrv2 , security)
add support for encrypted shuffle
MAPREDUCE-4416. Critical bug reported by Kihwal Lee and fixed by Kihwal Lee (client , mrv2)
Some tests fail if Clover is enabled
MAPREDUCE-4408. Major improvement reported by Alejandro Abdelnur and fixed by Robert Kanter (mrv1 , mrv2)
allow jobs to set a JAR that is in the distributed cached
MAPREDUCE-4407. Major bug reported by Ahmed Radwan and fixed by Ahmed Radwan (build , mrv2)
Add hadoop-yarn-server-tests-<version>-tests.jar to hadoop dist package
MAPREDUCE-4406. Major bug reported by Ahmed Radwan and fixed by Ahmed Radwan (mrv2 , test)
Users should be able to specify the MiniCluster ResourceManager and JobHistoryServer ports
MAPREDUCE-4402. Major bug reported by Jason Lowe and fixed by Jason Lowe (test)
TestFileInputFormat fails intermittently
MAPREDUCE-4395. Critical bug reported by Bhallamudi Venkata Siva Kamesh and fixed by Bhallamudi Venkata Siva Kamesh (distributed-cache , job submission , mrv2)
Possible NPE at ClientDistributedCacheManager#determineTimestamps
MAPREDUCE-4392. Major bug reported by Jason Lowe and fixed by Jason Lowe (mrv2)
Counters.makeCompactString() changed behavior from 0.20
MAPREDUCE-4387. Major bug reported by Kihwal Lee and fixed by Kihwal Lee (resourcemanager)
RM gets fatal error and exits during TestRM
MAPREDUCE-4384. Major bug reported by Kihwal Lee and fixed by Kihwal Lee (nodemanager)
Race conditions in IndexCache
MAPREDUCE-4383. Minor bug reported by Andy Isaacson and fixed by Andy Isaacson (pipes)
HadoopPipes.cc needs to include unistd.h
MAPREDUCE-4380. Minor bug reported by Devaraj K and fixed by Devaraj K (mrv2 , nodemanager)
Empty Userlogs directory is getting created under logs directory
MAPREDUCE-4379. Blocker bug reported by Devaraj K and fixed by Devaraj K (mrv2 , nodemanager)
Node Manager throws java.lang.OutOfMemoryError: Java heap space due to org.apache.hadoop.fs.LocalDirAllocator.contexts
MAPREDUCE-4376. Major bug reported by Jason Lowe and fixed by Kihwal Lee (mrv2 , test)
TestClusterMRNotification times out
MAPREDUCE-4375. Major improvement reported by Robert Joseph Evans and fixed by Robert Joseph Evans (applicationmaster)
Show Configuration Tracability in MR UI
MAPREDUCE-4372. Major bug reported by Devaraj K and fixed by Devaraj K (mrv2 , resourcemanager)
Deadlock in Resource Manager between SchedulerEventDispatcher.EventProcessor and Shutdown hook manager
MAPREDUCE-4361. Major bug reported by Jason Lowe and fixed by Jason Lowe (mrv2)
Fix detailed metrics for protobuf-based RPC on 0.23
MAPREDUCE-4355. Major new feature reported by Karthik Kambatla and fixed by Karthik Kambatla (mrv1 , mrv2)
Add RunningJob.getJobStatus()
MAPREDUCE-4341. Major bug reported by Thomas Graves and fixed by Karthik Kambatla (capacity-sched , mrv2)
add types to capacity scheduler properties documentation
MAPREDUCE-4336. Major bug reported by Siddharth Seth and fixed by Ahmed Radwan (mrv2)
Distributed Shell fails when used with the CapacityScheduler
MAPREDUCE-4320. Major bug reported by Thomas Graves and fixed by Thomas Graves (contrib/gridmix)
gridmix mainClass wrong in pom.xml
MAPREDUCE-4313. Blocker bug reported by Eli Collins and fixed by Robert Joseph Evans (build , test)
TestTokenCache doesn't compile due TokenCache.getDelegationToken compilation error
MAPREDUCE-4311. Major bug reported by Thomas Graves and fixed by Karthik Kambatla (capacity-sched , mrv2)
Capacity scheduler.xml does not accept decimal values for capacity and maximum-capacity settings
MAPREDUCE-4307. Major bug reported by Ahmed Radwan and fixed by Ahmed Radwan (mrv2)
TeraInputFormat calls FileSystem.getDefaultBlockSize() without a Path - Failure when using ViewFileSystem
MAPREDUCE-4306. Major bug reported by Ahmed Radwan and fixed by Ahmed Radwan (mrv2)
Problem running Distributed Shell applications as a user other than the one started the daemons
MAPREDUCE-4302. Critical bug reported by Daryn Sharp and fixed by Daryn Sharp (nodemanager)
NM goes down if error encountered during log aggregation
MAPREDUCE-4301. Major improvement reported by Robert Joseph Evans and fixed by Robert Joseph Evans (applicationmaster)
Dedupe some strings in MRAM for memory savings
MAPREDUCE-4300. Major bug reported by Robert Joseph Evans and fixed by Robert Joseph Evans (applicationmaster)
OOM in AM can turn it into a zombie.
MAPREDUCE-4299. Major bug reported by Tom White and fixed by Tom White (mrv2)
Terasort hangs with MR2 FifoScheduler
MAPREDUCE-4297. Major bug reported by Ravi Prakash and fixed by Ravi Prakash (contrib/gridmix)
Usersmap file in gridmix should not fail on empty lines
MAPREDUCE-4295. Critical bug reported by Thomas Graves and fixed by Thomas Graves (mrv2 , resourcemanager)
RM crashes due to DNS issue
MAPREDUCE-4290. Major bug reported by Nishan Shetty and fixed by Devaraj K (mrv2)
JobStatus.getState() API is giving ambiguous values
MAPREDUCE-4283. Major improvement reported by Jason Lowe and fixed by Jason Lowe (jobhistoryserver , mrv2)
Display tail of aggregated logs by default
MAPREDUCE-4276. Major bug reported by Ahmed Radwan and fixed by Ahmed Radwan (mrv2)
Allow setting yarn.nodemanager.delete.debug-delay-sec property to "-1" for easier container debugging.
MAPREDUCE-4270. Major bug reported by Brock Noland and fixed by Thomas Graves (mrv2)
data_join test classes are in the wrong packge
MAPREDUCE-4269. Major bug reported by Jonathan Eagles and fixed by Jonathan Eagles (mrv2)
documentation: Gridmix has javadoc warnings in StressJobFactory
MAPREDUCE-4267. Critical bug reported by Thomas Graves and fixed by Thomas Graves (mrv2)
mavenize pipes
MAPREDUCE-4264. Blocker bug reported by Thomas Graves and fixed by Thomas Graves (mrv2)
Got ClassCastException when using mapreduce.history.server.delegationtoken.required=true
MAPREDUCE-4262. Minor bug reported by Devaraj K and fixed by Devaraj K (mrv2 , nodemanager)
NM gives wrong log message saying "Connected to ResourceManager" before trying to connect
MAPREDUCE-4252. Major bug reported by Tom White and fixed by Tom White (mrv2)
MR2 job never completes with 1 pending task
MAPREDUCE-4250. Major bug reported by Patrick Hunt and fixed by Patrick Hunt (nodemanager)
hadoop-config.sh missing variable exports, causes Yarn jobs to fail with ClassNotFoundException MRAppMaster
MAPREDUCE-4238. Critical bug reported by Thomas Graves and fixed by Thomas Graves (mrv2)
mavenize data_join
MAPREDUCE-4237. Major bug reported by Robert Joseph Evans and fixed by Robert Joseph Evans
TestNodeStatusUpdater can fail if localhost has a domain associated with it
MAPREDUCE-4233. Critical bug reported by Robert Joseph Evans and fixed by Robert Joseph Evans
NPE can happen in RMNMNodeInfo.
MAPREDUCE-4228. Major bug reported by Jason Lowe and fixed by Jason Lowe (applicationmaster , mrv2)
mapreduce.job.reduce.slowstart.completedmaps is not working properly to delay the scheduling of the reduce tasks
MAPREDUCE-4226. Major bug reported by Tom White and fixed by Tom White (mrv2)
ConcurrentModificationException in FileSystemCounterGroup
MAPREDUCE-4224. Major bug reported by Devaraj K and fixed by Devaraj K (mrv2 , scheduler , test)
TestFifoScheduler throws org.apache.hadoop.metrics2.MetricsException
MAPREDUCE-4220. Minor bug reported by Jonathan Eagles and fixed by Jonathan Eagles (mrv2)
RM apps page starttime/endtime sorts are incorrect
MAPREDUCE-4215. Major bug reported by Jonathan Eagles and fixed by Jonathan Eagles (mrv2)
RM app page shows 500 error on appid parse error
MAPREDUCE-4212. Major test reported by Daryn Sharp and fixed by Daryn Sharp (test)
TestJobClientGetJob sometimes fails
MAPREDUCE-4211. Minor bug reported by Jonathan Eagles and fixed by Jonathan Eagles (mrv2)
Error conditions (missing appid, appid not found) are masked in the RM app page
MAPREDUCE-4210. Major improvement reported by Daryn Sharp and fixed by Daryn Sharp (webapps)
Expose listener address for WebApp
MAPREDUCE-4209. Major bug reported by Radim Kolar and fixed by (build)
junit dependency in hadoop-mapreduce-client is missing scope test
MAPREDUCE-4206. Minor bug reported by Jonathan Eagles and fixed by Jonathan Eagles (mrv2)
Sorting by Last Health-Update on the RM nodes page sorts does not work correctly
MAPREDUCE-4205. Major improvement reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur (mrv2)
retrofit all JVM shutdown hooks to use ShutdownHookManager
MAPREDUCE-4197. Major bug reported by Ravi Prakash and fixed by Ravi Prakash
Include the hsqldb jar in the hadoop-mapreduce tar file
MAPREDUCE-4194. Major bug reported by Jonathan Eagles and fixed by Jonathan Eagles (mrv2)
ConcurrentModificationError in DirectoryCollection
MAPREDUCE-4190. Major improvement reported by Thomas Graves and fixed by Thomas Graves (mrv2 , webapps)
Improve web UI for task attempts userlog link
MAPREDUCE-4189. Critical bug reported by Devaraj K and fixed by Devaraj K (mrv2)
TestContainerManagerSecurity is failing
MAPREDUCE-4169. Minor bug reported by Jonathan Eagles and fixed by Jonathan Eagles (mrv2)
Container Logs appear in unsorted order
MAPREDUCE-4165. Trivial bug reported by Jonathan Eagles and fixed by Jonathan Eagles (mrv2)
Committing is misspelled as commiting in task logs
MAPREDUCE-4163. Major sub-task reported by Daryn Sharp and fixed by Daryn Sharp (mrv2)
consistently set the bind address
MAPREDUCE-4162. Major sub-task reported by Daryn Sharp and fixed by Daryn Sharp (client , mrv2)
Correctly set token service
MAPREDUCE-4161. Major sub-task reported by Daryn Sharp and fixed by Daryn Sharp (client , mrv2)
create sockets consistently
MAPREDUCE-4160. Major bug reported by Thomas Graves and fixed by Thomas Graves (test)
some mrv1 ant tests fail with timeout - due to 4156
MAPREDUCE-4159. Major bug reported by Nishan Shetty and fixed by Devaraj K (mrv2)
Job is running in Uber mode after setting "mapreduce.job.ubertask.maxreduces" to zero
MAPREDUCE-4157. Major improvement reported by Jason Lowe and fixed by Jason Lowe (mrv2)
ResourceManager should not kill apps that are well behaved
MAPREDUCE-4156. Major bug reported by Thomas Graves and fixed by Thomas Graves (build)
ant build fails compiling JobInProgress
MAPREDUCE-4152. Major bug reported by Thomas Graves and fixed by Thomas Graves (mrv2)
map task left hanging after AM dies trying to connect to RM
MAPREDUCE-4151. Major improvement reported by Jason Lowe and fixed by Jason Lowe (mrv2 , webapps)
RM scheduler web page should filter apps to those that are relevant to scheduling
MAPREDUCE-4148. Major bug reported by Tom White and fixed by Tom White (mrv2)
MapReduce should not have a compile-time dependency on HDFS
MAPREDUCE-4146. Major improvement reported by Tom White and fixed by Ahmed Radwan
Support limits on task status string length and number of block locations in branch-2
MAPREDUCE-4144. Critical bug reported by Jason Lowe and fixed by Jason Lowe (mrv2)
ResourceManager NPE while handling NODE_UPDATE
MAPREDUCE-4140. Major bug reported by Patrick Hunt and fixed by Patrick Hunt (client , mrv2)
mapreduce classes incorrectly importing "clover.org.apache.*" classes
MAPREDUCE-4139. Major bug reported by Jason Lowe and fixed by Jason Lowe (mrv2)
Potential ResourceManager deadlock when SchedulerEventDispatcher is stopped
MAPREDUCE-4134. Major task reported by Ravi Prakash and fixed by Ravi Prakash (mrv2)
Remove references of mapred.child.ulimit etc. since they are not being used any more
MAPREDUCE-4133. Major bug reported by John George and fixed by John George
MR over viewfs is broken
MAPREDUCE-4129. Major bug reported by Ahmed Radwan and fixed by Ahmed Radwan (mrv2)
Lots of unneeded counters log messages
MAPREDUCE-4128. Major bug reported by Bikas Saha and fixed by Bikas Saha (mrv2)
AM Recovery expects all attempts of a completed task to also be completed.
MAPREDUCE-4117. Critical bug reported by Devaraj K and fixed by Devaraj K (client , mrv2)
mapred job -status throws NullPointerException
MAPREDUCE-4102. Major bug reported by Thomas Graves and fixed by Bhallamudi Venkata Siva Kamesh (webapps)
job counters not available in Jobhistory webui for killed jobs
MAPREDUCE-4099. Critical bug reported by Jason Lowe and fixed by Jason Lowe (mrv2)
ApplicationMaster may fail to remove staging directory
MAPREDUCE-4097. Major bug reported by Alejandro Abdelnur and fixed by Roman Shaposhnik (build)
tools testcases fail because missing mrapp-generated-classpath file in classpath
MAPREDUCE-4092. Blocker bug reported by Jonathan Eagles and fixed by Jonathan Eagles (mrv2)
commitJob Exception does not fail job (regression in 0.23 vs 0.20)
MAPREDUCE-4091. Critical bug reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur (build , test)
tools testcases failing because of MAPREDUCE-4082
MAPREDUCE-4089. Blocker bug reported by Robert Joseph Evans and fixed by Robert Joseph Evans (mrv2)
Hung Tasks never time out.
MAPREDUCE-4082. Critical bug reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur (build)
hadoop-mapreduce-client-app's mrapp-generated-classpath file should not be in the module JAR
MAPREDUCE-4079. Blocker improvement reported by Robert Joseph Evans and fixed by Robert Joseph Evans (mr-am , mrv2)
Allow MR AppMaster to limit ephemeral port range.
MAPREDUCE-4074. Major bug reported by Devaraj K and fixed by xieguiming
Client continuously retries to RM When RM goes down before launching Application Master
MAPREDUCE-4073. Critical bug reported by Siddharth Seth and fixed by Siddharth Seth (mrv2 , scheduler)
CS assigns multiple off-switch containers when using multi-level-queues
MAPREDUCE-4072. Major bug reported by Anupam Seth and fixed by Anupam Seth (mrv2)
User set java.library.path seems to overwrite default creating problems native lib loading

-Djava.library.path in mapred.child.java.opts can cause issues with native libraries. LD_LIBRARY_PATH through mapred.child.env should be used instead.
MAPREDUCE-4068. Blocker bug reported by Ahmed Radwan and fixed by Robert Kanter (mrv2)
Jars in lib subdirectory of the submittable JAR are not added to the classpath
MAPREDUCE-4062. Major bug reported by Thomas Graves and fixed by Thomas Graves (mrv2)
AM Launcher thread can hang forever
MAPREDUCE-4060. Major bug reported by Jason Lowe and fixed by Jason Lowe (build)
Multiple SLF4J binding warning
MAPREDUCE-4059. Major improvement reported by Robert Joseph Evans and fixed by Robert Joseph Evans (mrv2)
The history server should have a separate pluggable storage/query interface
MAPREDUCE-4053. Major bug reported by Alejandro Abdelnur and fixed by Robert Joseph Evans (mrv2)
Counters group names deprecation is wrong, iterating over group names deprecated names don't show up
MAPREDUCE-4051. Major task reported by Ravi Prakash and fixed by Ravi Prakash
Remove the empty hadoop-mapreduce-project/assembly/all.xml file
MAPREDUCE-4050. Major bug reported by Bhallamudi Venkata Siva Kamesh and fixed by Bhallamudi Venkata Siva Kamesh (mrv2)
Invalid node link
MAPREDUCE-4048. Major bug reported by Devaraj K and fixed by Devaraj K (mrv2)
NullPointerException exception while accessing the Application Master UI
MAPREDUCE-4040. Minor bug reported by Bhallamudi Venkata Siva Kamesh and fixed by Bhallamudi Venkata Siva Kamesh (jobhistoryserver , mrv2)
History links should use hostname rather than IP address.
MAPREDUCE-4031. Critical bug reported by Devaraj K and fixed by Devaraj K (mrv2 , nodemanager)
Node Manager hangs on shut down
MAPREDUCE-4024. Major bug reported by Thomas Graves and fixed by Thomas Graves (mrv2)
RM webservices can't query on finalStatus
MAPREDUCE-4020. Major bug reported by Jason Lowe and fixed by Anupam Seth (mrv2 , webapps)
Web services returns incorrect JSON for deep queue tree
MAPREDUCE-4017. Trivial improvement reported by Koji Noguchi and fixed by Thomas Graves (jobhistoryserver , jobtracker)
Add jobname to jobsummary log

The Job Summary log may contain commas in values that are escaped by a '\' character. This was true before, but is more likely to be exposed now.
MAPREDUCE-4012. Minor bug reported by Koji Noguchi and fixed by Thomas Graves
Hadoop Job setup error leaves no useful info to users (when LinuxTaskController is used)
MAPREDUCE-4010. Critical bug reported by Jason Lowe and fixed by Alejandro Abdelnur (mrv2)
TestWritableJobConf fails on trunk
MAPREDUCE-4002. Major bug reported by Bhallamudi Venkata Siva Kamesh and fixed by Bhallamudi Venkata Siva Kamesh (examples)
MultiFileWordCount job fails if the input path is not from default file system
MAPREDUCE-3999. Major bug reported by Ravi Prakash and fixed by Ravi Prakash (mrv2 , webapps)
Tracking link gives an error if the AppMaster hasn't started yet
MAPREDUCE-3993. Major bug reported by Todd Lipcon and fixed by Karthik Kambatla (mrv1 , mrv2)
Graceful handling of codec errors during decompression
MAPREDUCE-3992. Major bug reported by Todd Lipcon and fixed by Todd Lipcon (mrv1)
Reduce fetcher doesn't verify HTTP status code of response
MAPREDUCE-3988. Major bug reported by Vinod Kumar Vavilapalli and fixed by Eric Payne (mrv2)
mapreduce.job.local.dir doesn't point to a single directory on a node.
MAPREDUCE-3983. Major test reported by Robert Joseph Evans and fixed by Ravi Prakash (mrv1)
TestTTResourceReporting can fail, and should just be deleted
MAPREDUCE-3972. Major sub-task reported by Robert Joseph Evans and fixed by Robert Joseph Evans (mrv2)
Locking and exception issues in JobHistory Server.
MAPREDUCE-3947. Minor bug reported by Todd Lipcon and fixed by Devaraj K
yarn.app.mapreduce.am.resource.mb not documented
MAPREDUCE-3942. Major sub-task reported by Vinod Kumar Vavilapalli and fixed by Vinod Kumar Vavilapalli (mrv2 , security)
Randomize master key generation for ApplicationTokenSecretManager and roll it every so often
MAPREDUCE-3940. Major sub-task reported by Vinod Kumar Vavilapalli and fixed by Vinod Kumar Vavilapalli (mrv2 , security)
ContainerTokens should have an expiry interval

ContainerTokens now have an expiry interval so that stale tokens cannot be used for launching containers.
MAPREDUCE-3932. Critical bug reported by Vinod Kumar Vavilapalli and fixed by Robert Joseph Evans (mr-am , mrv2)
MR tasks failing and crashing the AM when available-resources/headRoom becomes zero
MAPREDUCE-3927. Critical bug reported by MengWang and fixed by Bhallamudi Venkata Siva Kamesh (mrv2)
Shuffle hang when set map.failures.percent
MAPREDUCE-3907. Minor improvement reported by Eugene Koontz and fixed by Eugene Koontz (documentation)
Document entries mapred-default.xml for the jobhistory server.
MAPREDUCE-3906. Trivial improvement reported by Eugene Koontz and fixed by Eugene Koontz (documentation)
Fix inconsistency in documentation regarding mapreduce.jobhistory.principal
MAPREDUCE-3893. Critical bug reported by Thomas Graves and fixed by Thomas Graves (mrv2)
allow capacity scheduler configs maximum-applications and maximum-am-resource-percent configurable on a per queue basis
MAPREDUCE-3889. Critical bug reported by Thomas Graves and fixed by Devaraj K (mrv2)
job client tries to use /tasklog interface, but that doesn't exist anymore
MAPREDUCE-3873. Minor bug reported by Nishan Shetty and fixed by xieguiming (mrv2 , nodemanager)
Nodemanager is not getting decommisioned if the absolute ip is given in exclude file.

Fixed NodeManagers' decommissioning at RM to accept IP addresses also.
MAPREDUCE-3871. Major improvement reported by Tom White and fixed by Tom White (distributed-cache)
Allow symlinking in LocalJobRunner DistributedCache
MAPREDUCE-3870. Major bug reported by Bhallamudi Venkata Siva Kamesh and fixed by Bhallamudi Venkata Siva Kamesh (mrv2)
Invalid App Metrics
MAPREDUCE-3850. Major improvement reported by Daryn Sharp and fixed by Daryn Sharp (security)
Avoid redundant calls for tokens in TokenCache
MAPREDUCE-3842. Critical improvement reported by Alejandro Abdelnur and fixed by Thomas Graves (mrv2 , webapps)
stop webpages from automatic refreshing
MAPREDUCE-3812. Major sub-task reported by Vinod Kumar Vavilapalli and fixed by Harsh J (mrv2 , performance)
Lower default allocation sizes, fix allocation configurations and document them

Removes two sets of previously available config properties: 1. ( yarn.scheduler.fifo.minimum-allocation-mb and yarn.scheduler.fifo.maximum-allocation-mb ) and, 2. ( yarn.scheduler.capacity.minimum-allocation-mb and yarn.scheduler.capacity.maximum-allocation-mb ) In favor of two new, generically named properties: 1. yarn.scheduler.minimum-allocation-mb - This acts as the floor value of memory resource requests for containers. 2. yarn.scheduler.maximum-allocation-mb - This acts as the ceiling value of memory resource requests for containers. Both these properties need to be set at the ResourceManager (RM) to take effect, as the RM is where the scheduler resides. Also changes the default minimum and maximums to 128 MB and 10 GB respectively.
MAPREDUCE-3782. Critical bug reported by Arpit Gupta and fixed by Jason Lowe (mrv2)
teragen terasort jobs fail when using webhdfs://
MAPREDUCE-3773. Major new feature reported by Owen O'Malley and fixed by Owen O'Malley (jobtracker)
Add queue metrics with buckets for job run times
MAPREDUCE-3728. Critical bug reported by Roman Shaposhnik and fixed by Giridharan Kesavan (mrv2 , nodemanager)
ShuffleHandler can't access results when configured in a secure mode
MAPREDUCE-3682. Major bug reported by David Capwell and fixed by Ravi Prakash (mrv2)
Tracker URL says AM tasks run on localhost
MAPREDUCE-3672. Major bug reported by Vinod Kumar Vavilapalli and fixed by Anupam Seth (mr-am , mrv2)
Killed maps shouldn't be counted towards JobCounter.NUM_FAILED_MAPS
MAPREDUCE-3659. Major improvement reported by Daryn Sharp and fixed by Daryn Sharp (security)
Host-based token support
MAPREDUCE-3650. Blocker bug reported by Thomas Graves and fixed by Ravi Prakash (mrv2)
testGetTokensForHftpFS() fails
MAPREDUCE-3621. Major bug reported by Thomas Graves and fixed by Ravi Prakash (mrv2)
TestDBJob and TestDataDrivenDBInputFormat ant tests fail
MAPREDUCE-3613. Critical sub-task reported by Thomas Graves and fixed by Thomas Graves (mrv2)
web service calls header contains 2 content types
MAPREDUCE-3543. Critical bug reported by Mahadev konar and fixed by Thomas Graves (mrv2)
Mavenize Gridmix.

Note that to apply this you should first run the script - ./MAPREDUCE-3543v3.sh svn, then apply the patch. If this is merged to more then trunk, the version inside of hadoop-tools/hadoop-gridmix/pom.xml will need to be udpated accordingly.
MAPREDUCE-3506. Minor bug reported by Ratandeep Ratti and fixed by Jason Lowe (client , mrv2)
Calling getPriority on JobInfo after parsing a history log with JobHistoryParser throws a NullPointerException
MAPREDUCE-3493. Minor bug reported by Ahmed Radwan and fixed by (mrv2)
Add the default mapreduce.shuffle.port property to mapred-default.xml
MAPREDUCE-3451. Major new feature reported by Patrick Wendell and fixed by Patrick Wendell (mrv2 , scheduler)
Port Fair Scheduler to MR2
MAPREDUCE-3350. Critical bug reported by Vinod Kumar Vavilapalli and fixed by Jonathan Eagles (mrv2 , webapps)
Per-app RM page should have the list of application-attempts like on the app JHS page
MAPREDUCE-3348. Major bug reported by Devaraj K and fixed by Devaraj K (mrv2)
mapred job -status fails to give info even if the job is present in History

Fixed a bug in MR client to redirect to JobHistoryServer correctly when RM forgets the app.
MAPREDUCE-3289. Major improvement reported by Todd Lipcon and fixed by Todd Lipcon (mrv2 , nodemanager , performance)
Make use of fadvise in the NM's shuffle handler
MAPREDUCE-3082. Major bug reported by Rajit Saha and fixed by John George (harchive)
archive command take wrong path for input file with current directory
MAPREDUCE-2786. Minor improvement reported by Plamen Jeliazkov and fixed by Plamen Jeliazkov (benchmarks)
TestDFSIO should also test compression reading/writing from command-line.
MAPREDUCE-2739. Minor bug reported by Ahmed Radwan and fixed by Bo Wang (mrv2)
MR-279: Update installation docs (remove YarnClientFactory)
MAPREDUCE-2374. Major bug reported by Todd Lipcon and fixed by Andy Isaacson
"Text File Busy" errors launching MR tasks
MAPREDUCE-2289. Major bug reported by Todd Lipcon and fixed by Ahmed Radwan (job submission)
Permissions race can make getStagingDir fail on local filesystem
MAPREDUCE-2220. Minor bug reported by Rui KUBO and fixed by Rui KUBO (documentation)
Fix new API FileOutputFormat-related typos in mapred-default.xml
MAPREDUCE-987. Minor new feature reported by Philip Zeyliger and fixed by Ahmed Radwan (build , test)
Exposing MiniDFS and MiniMR clusters as a single process command-line
HDFS-3972. Critical bug reported by Todd Lipcon and fixed by Todd Lipcon (name-node)
Trash emptier fails in secure HA cluster
HDFS-3928. Major bug reported by Eli Collins and fixed by Eli Collins (test)
MiniDFSCluster should reset the first ExitException on shutdown
HDFS-3902. Minor bug reported by Andy Isaacson and fixed by Andy Isaacson
TestDatanodeBlockScanner#testBlockCorruptionPolicy is broken
HDFS-3895. Major bug reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur (build)
hadoop-client must include commons-cli
HDFS-3890. Critical bug reported by Thomas Graves and fixed by Thomas Graves
filecontext mkdirs doesn't apply umask as expected
HDFS-3888. Minor bug reported by Jing Zhao and fixed by Jing Zhao
BlockPlacementPolicyDefault code cleanup
HDFS-3887. Trivial improvement reported by Jing Zhao and fixed by Jing Zhao (name-node)
Remove redundant chooseTarget methods in BlockPlacementPolicy.java
HDFS-3879. Minor bug reported by Eli Collins and fixed by Eli Collins (name-node)
Fix findbugs warning in TransferFsImage on branch-2
HDFS-3873. Major bug reported by Daryn Sharp and fixed by Daryn Sharp (hdfs client)
Hftp assumes security is disabled if token fetch fails
HDFS-3871. Minor improvement reported by Arun C Murthy and fixed by Arun C Murthy (hdfs client)
Change NameNodeProxies to use HADOOP-8748
HDFS-3866. Minor improvement reported by Ryan Hennig and fixed by Plamen Jeliazkov (build)
HttpFS POM should have property where to download tomcat from
HDFS-3864. Major bug reported by Aaron T. Myers and fixed by Aaron T. Myers (name-node)
NN does not update internal file mtime for OP_CLOSE when reading from the edit log
HDFS-3861. Blocker bug reported by Kihwal Lee and fixed by Kihwal Lee (hdfs client)
Deadlock in DFSClient
HDFS-3860. Major bug reported by Jing Zhao and fixed by Jing Zhao
HeartbeatManager#Monitor may wrongly hold the writelock of namesystem
HDFS-3856. Blocker bug reported by Thomas Graves and fixed by Eli Collins (test)
TestHDFSServerPorts failure is causing surefire fork failure
HDFS-3853. Minor bug reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe (name-node)
Port MiniDFSCluster enableManagedDfsDirsRedundancy option to branch-2
HDFS-3852. Major bug reported by Aaron T. Myers and fixed by Daryn Sharp (hdfs client , security)
TestHftpDelegationToken is broken after HADOOP-8225
HDFS-3849. Critical bug reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe (name-node)
When re-loading the FSImage, we should clear the existing genStamp and leases.
HDFS-3844. Trivial improvement reported by Jing Zhao and fixed by Jing Zhao
Add @Override where necessary and remove unnecessary {@inheritdoc} and imports
HDFS-3841. Major bug reported by Robert Joseph Evans and fixed by Robert Joseph Evans
Port HDFS-3835 to branch-0.23
HDFS-3837. Major bug reported by Eli Collins and fixed by Eli Collins (data-node)
Fix DataNode.recoverBlock findbugs warning
HDFS-3835. Major bug reported by Aaron T. Myers and fixed by Aaron T. Myers (name-node , security)
Long-lived 2NN cannot perform a checkpoint if security is enabled and the NN restarts with outstanding delegation tokens
HDFS-3833. Major bug reported by Brandon Li and fixed by Brandon Li (test)
TestDFSShell fails on Windows due to file concurrent read write
HDFS-3832. Major bug reported by Suresh Srinivas and fixed by Suresh Srinivas (data-node , name-node)
Remove protocol methods related to DistributedUpgrade
HDFS-3830. Minor bug reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe (libhdfs)
test_libhdfs_threaded: use forceNewInstance
HDFS-3819. Minor improvement reported by Jing Zhao and fixed by Jing Zhao
Should check whether invalidate work percentage default value is not greater than 1.0f
HDFS-3816. Major bug reported by Jing Zhao and fixed by Jing Zhao (name-node)
Invalidate work percentage default value should be 0.32f instead of 32
HDFS-3808. Critical bug reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe (fuse-dfs)
fuse_dfs: postpone libhdfs intialization until after fork
HDFS-3803. Minor bug reported by Andrew Purtell and fixed by (data-node)
BlockPoolSliceScanner new work period notice is very chatty at INFO level
HDFS-3802. Trivial improvement reported by Jing Zhao and fixed by Jing Zhao
StartupOption.name in HdfsServerConstants should be final
HDFS-3796. Minor improvement reported by Todd Lipcon and fixed by Todd Lipcon (test)
Speed up edit log tests by avoiding fsync()
HDFS-3794. Major bug reported by Ravi Prakash and fixed by Ravi Prakash (webhdfs)
WebHDFS Open used with Offset returns the original (and incorrect) Content Length in the HTTP Header.
HDFS-3790. Minor bug reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe (fuse-dfs)
test_fuse_dfs.c doesn't compile on centos 5
HDFS-3788. Critical bug reported by Eli Collins and fixed by Tsz Wo (Nicholas), SZE (webhdfs)
distcp can't copy large files using webhdfs due to missing Content-Length header
HDFS-3765. Major improvement reported by Vinay and fixed by Vinay (ha)
Namenode INITIALIZESHAREDEDITS should be able to initialize all shared storages
HDFS-3760. Minor bug reported by Andy Isaacson and fixed by Andy Isaacson (hdfs client)
primitiveCreate is a write, not a read
HDFS-3758. Minor bug reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe (fuse-dfs)
TestFuseDFS test failing
HDFS-3756. Critical bug reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur (security)
DelegationTokenFetcher creates 2 HTTP connections, the second one not properly configured
HDFS-3755. Major bug reported by Todd Lipcon and fixed by Todd Lipcon (name-node)
Creating an already-open-for-write file with overwrite=true fails
HDFS-3754. Major bug reported by Eli Collins and fixed by Eli Collins (data-node)
BlockSender doesn't shutdown ReadaheadPool threads
HDFS-3738. Minor bug reported by Aaron T. Myers and fixed by Aaron T. Myers (test)
TestDFSClientRetries#testFailuresArePerOperation sets incorrect timeout config
HDFS-3733. Major bug reported by Andy Isaacson and fixed by Andy Isaacson (webhdfs)
Audit logs should include WebHDFS access
HDFS-3732. Minor bug reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe (fuse-dfs)
fuse_dfs: incorrect configuration value checked for connection expiry timer period
HDFS-3731. Blocker bug reported by Suresh Srinivas and fixed by Kihwal Lee (data-node)
2.0 release upgrade must handle blocks being written from 1.0
HDFS-3724. Major bug reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur
add InterfaceAudience annotations to HttpFS classes and making inner enum static
HDFS-3723. Major improvement reported by E. Sammer and fixed by Jing Zhao (scripts , tools)
All commands should support meaningful --help
HDFS-3721. Critical bug reported by Todd Lipcon and fixed by Aaron T. Myers (data-node , hdfs client)
hsync support broke wire compatibility
HDFS-3720. Major bug reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe (libhdfs)
hdfs.h must get packaged
HDFS-3718. Critical bug reported by Kihwal Lee and fixed by Kihwal Lee (data-node)
Datanode won't shutdown because of runaway DataBlockScanner thread
HDFS-3715. Major bug reported by Eli Collins and fixed by Andrew Wang (test)
Fix TestFileCreation#testFileCreationNamenodeRestart
HDFS-3711. Major improvement reported by Andrew Wang and fixed by Andrew Wang
Manually convert remaining tests to JUnit4
HDFS-3710. Minor bug reported by Andy Isaacson and fixed by Andy Isaacson (libhdfs)
libhdfs misuses O_RDONLY/WRONLY/RDWR
HDFS-3709. Major test reported by Eli Collins and fixed by Eli Collins (test)
TestStartup tests still binding to the ephemeral port
HDFS-3707. Minor bug reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe
TestFSInputChecker: improper use of skip
HDFS-3697. Minor improvement reported by Todd Lipcon and fixed by Todd Lipcon (data-node , performance)
Enable fadvise readahead by default

The datanode now performs 4MB readahead by default when reading data from its disks, if the native libraries are present. This has been shown to improve performance in many workloads. The feature may be disabled by setting dfs.datanode.readahead.bytes to "0".
HDFS-3696. Critical bug reported by Kihwal Lee and fixed by Tsz Wo (Nicholas), SZE
Create files with WebHdfsFileSystem goes OOM when file size is big
HDFS-3690. Major bug reported by Eli Collins and fixed by Eli Collins
BlockPlacementPolicyDefault incorrectly casts LOG
HDFS-3688. Major bug reported by Jason Lowe and fixed by Jason Lowe (data-node)
Namenode loses datanode hostname if datanode re-registers
HDFS-3683. Minor bug reported by Todd Lipcon and fixed by Plamen Jeliazkov (name-node)
Edit log replay progress indicator shows >100% complete
HDFS-3679. Minor bug reported by Conrad Meyer and fixed by Conrad Meyer (fuse-dfs)
fuse_dfs notrash option sets usetrash
HDFS-3675. Minor bug reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe (libhdfs)
libhdfs: follow documented return codes
HDFS-3673. Minor bug reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe
libhdfs: fix some compiler warnings
HDFS-3672. Major improvement reported by Andrew Wang and fixed by Andrew Wang
Expose disk-location information for blocks to enable better scheduling
HDFS-3666. Minor improvement reported by Eli Collins and fixed by Eli Collins
Plumb more exception messages to terminate
HDFS-3665. Major test reported by Eli Collins and fixed by Eli Collins (test)
Add a test for renaming across file systems via a symlink
HDFS-3664. Major bug reported by Eli Collins and fixed by Colin Patrick McCabe (test)
BlockManager race when stopping active services
HDFS-3663. Major improvement reported by Eli Collins and fixed by Eli Collins (test)
MiniDFSCluster should capture the code path that led to the first ExitException
HDFS-3658. Major bug reported by Eli Collins and fixed by Tsz Wo (Nicholas), SZE
TestDFSClientRetries#testNamenodeRestart failed
HDFS-3650. Major improvement reported by Andrew Wang and fixed by Andrew Wang
Use MutableQuantiles to provide latency histograms for various operations
HDFS-3646. Critical bug reported by Kihwal Lee and fixed by Kihwal Lee (hdfs client)
LeaseRenewer can hold reference to inactive DFSClient instances forever
HDFS-3641. Minor improvement reported by Eli Collins and fixed by Eli Collins
Move server Util time methods to common and use now instead of System#currentTimeMillis
HDFS-3637. Major new feature reported by Aaron T. Myers and fixed by Aaron T. Myers (data-node , hdfs client , security)
Add support for encrypting the DataTransferProtocol
HDFS-3634. Minor test reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe (fuse-dfs)
Add self-contained, mavenized fuse_dfs test
HDFS-3633. Minor bug reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe (libhdfs)
libhdfs: hdfsDelete should pass JNI_FALSE or JNI_TRUE
HDFS-3629. Trivial bug reported by Brandon Li and fixed by Brandon Li (name-node)
fix the typo in the error message about inconsistent storage layout version
HDFS-3622. Major bug reported by Robert Joseph Evans and fixed by Robert Joseph Evans
Backport HDFS-3541 to branch-0.23
HDFS-3615. Major bug reported by Eli Collins and fixed by Aaron T. Myers (security)
Two BlockTokenSecretManager findbugs warnings
HDFS-3613. Trivial improvement reported by Harsh J and fixed by Andrew Wang (name-node)
GSet prints some INFO level values, which aren't really very useful to all
HDFS-3612. Trivial improvement reported by Harsh J and fixed by Andy Isaacson (name-node)
Single namenode image directory config warning can be improved
HDFS-3611. Trivial bug reported by Harsh J and fixed by Colin Patrick McCabe (name-node)
NameNode prints unnecessary WARNs about edit log normally skipping a few bytes
HDFS-3610. Minor improvement reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe
fuse_dfs: Provide a way to use the default (configured) NN URI
HDFS-3609. Major bug reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe (libhdfs)
libhdfs: don't force the URI to look like hdfs://hostname:port
HDFS-3608. Minor bug reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe
fuse_dfs: detect changes in UID ticket cache
HDFS-3606. Minor test reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe (libhdfs)
libhdfs: create self-contained unit test
HDFS-3605. Major bug reported by Brahma Reddy Battula and fixed by Todd Lipcon (ha , name-node)
Block mistakenly marked corrupt during edit log catchup phase of failover
HDFS-3604. Minor improvement reported by Eli Collins and fixed by Eli Collins
Add dfs.webhdfs.enabled to hdfs-default.xml
HDFS-3603. Major bug reported by Jason Lowe and fixed by Jason Lowe (test)
Decouple TestHDFSTrash from TestTrash
HDFS-3597. Minor bug reported by Andy Isaacson and fixed by Andy Isaacson
SNN can fail to start on upgrade
HDFS-3591. Major bug reported by Robert Joseph Evans and fixed by Robert Joseph Evans
Backport HDFS-3357 to branch-0.23
HDFS-3583. Major improvement reported by Eli Collins and fixed by Andrew Wang (test)
Convert remaining tests to Junit4
HDFS-3582. Minor improvement reported by Eli Collins and fixed by Eli Collins (test)
Hook daemon process exit for testing
HDFS-3581. Major bug reported by Eli Collins and fixed by Eli Collins (name-node)
FSPermissionChecker#checkPermission sticky bit check missing range check
HDFS-3580. Minor bug reported by Andy Isaacson and fixed by Andy Isaacson
incompatible types; no instance(s) of type variable(s) V exist so that V conforms to boolean compiling HttpFSServer.java with OpenJDK
HDFS-3579. Major bug reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe (libhdfs)
libhdfs: fix exception handling
HDFS-3577. Blocker bug reported by Alejandro Abdelnur and fixed by Tsz Wo (Nicholas), SZE (hdfs client)
WebHdfsFileSystem can not read files larger than 24KB
HDFS-3575. Minor bug reported by Brock Noland and fixed by Brock Noland
HttpFS does not log Exception Stacktraces
HDFS-3574. Minor bug reported by Todd Lipcon and fixed by Todd Lipcon (name-node)
Fix small race and do some cleanup in GetImageServlet
HDFS-3572. Minor bug reported by Todd Lipcon and fixed by Todd Lipcon (name-node , security)
Cleanup code which inits SPNEGO in HttpServer
HDFS-3568. Major improvement reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe
fuse_dfs: add support for security
HDFS-3559. Minor bug reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe
DFSTestUtil: use Builder class to construct DFSTestUtil instances
HDFS-3555. Major bug reported by Jeff Lord and fixed by Andy Isaacson (data-node , hdfs client)
idle client socket triggers DN ERROR log (should be INFO or DEBUG)
HDFS-3551. Major bug reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (webhdfs)
WebHDFS CREATE does not use client location for redirection
HDFS-3548. Critical bug reported by Todd Lipcon and fixed by Colin Patrick McCabe (name-node)
NamenodeFsck.copyBlock fails to create a Block Reader
HDFS-3541. Major bug reported by suja s and fixed by Vinay (data-node)
Deadlock between recovery, xceiver and packet responder
HDFS-3539. Minor bug reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe
libhdfs code cleanups
HDFS-3537. Minor improvement reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe (fuse-dfs , libhdfs)
Move libhdfs and fuse-dfs source to native subdirectories
HDFS-3535. Major new feature reported by Andy Isaacson and fixed by Andy Isaacson (name-node)
Audit logging should log denied accesses
HDFS-3531. Minor bug reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe (name-node)
EditLogFileOutputStream#preallocate should check for incomplete writes
HDFS-3524. Major bug reported by Eli Collins and fixed by Brandon Li (test)
TestFileLengthOnClusterRestart failed due to error message change
HDFS-3522. Major bug reported by Brandon Li and fixed by Brandon Li (name-node)
If NN is in safemode, it should throw SafeModeException when getBlockLocations has zero locations

getBlockLocations(), and hence open() for read, will now throw SafeModeException if the NameNode is still in safe mode and there are no replicas reported yet for one of the blocks in the file.
HDFS-3520. Major improvement reported by Eli Collins and fixed by Eli Collins (name-node)
Add transfer rate logging to TransferFsImage
HDFS-3518. Major bug reported by Bikas Saha and fixed by Tsz Wo (Nicholas), SZE (hdfs client)
Provide API to check HDFS operational state

Add a utility method HdfsUtils.isHealthy(uri) for checking if the given HDFS is healthy.
HDFS-3517. Minor bug reported by Eli Collins and fixed by Eli Collins (test)
TestStartup should bind ephemeral ports
HDFS-3516. Major improvement reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (hdfs client)
Check content-type in WebHdfsFileSystem
HDFS-3514. Major improvement reported by Henry Robinson and fixed by Henry Robinson (test)
Add missing TestParallelLocalRead
HDFS-3513. Major improvement reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur
HttpFS should cache filesystems
HDFS-3505. Minor bug reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe
DirectoryScanner does not join all threads in shutdown
HDFS-3504. Major improvement reported by Siddharth Seth and fixed by Tsz Wo (Nicholas), SZE
Configurable retry in DFSClient
HDFS-3502. Major sub-task reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (name-node)
Change INodeFile and INodeFileUnderConstruction to package private
HDFS-3501. Major bug reported by Aaron T. Myers and fixed by Aaron T. Myers (ha , name-node)
Checkpointing with security enabled will stop working after ticket lifetime expires
HDFS-3491. Major bug reported by Romain Rigaux and fixed by Alejandro Abdelnur
HttpFs does not set permissions correctly
HDFS-3490. Minor bug reported by Todd Lipcon and fixed by Tsz Wo (Nicholas), SZE (webhdfs)
DN WebHDFS methods throw NPE if Namenode RPC address param not specified
HDFS-3487. Minor bug reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe (tools)
offlineimageviewer should give byte offset information when it encounters an exception
HDFS-3486. Minor bug reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe (security , tools)
offlineimageviewer can't read fsimage files that contain persistent delegation tokens
HDFS-3485. Minor bug reported by Andy Isaacson and fixed by Andy Isaacson
DataTransferThrottler will over-throttle when currentTimeMillis jumps
HDFS-3484. Minor bug reported by Aaron T. Myers and fixed by Aaron T. Myers (hdfs client)
hdfs fsck doesn't work if NN HTTP address is set to 0.0.0.0 even if NN RPC address is configured
HDFS-3481. Major improvement reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur
Refactor HttpFS handling of JAX-RS query string parameters
HDFS-3480. Major bug reported by Eli Collins and fixed by Vinay (build)
Multiple SLF4J binding warning
HDFS-3475. Trivial improvement reported by Harsh J and fixed by Harsh J
Make the replication and invalidation rates configurable

This change adds two new configuration parameters. # {{dfs.namenode.invalidate.work.pct.per.iteration}} for controlling deletion rate of blocks. # {{dfs.namenode.replication.work.multiplier.per.iteration}} for controlling replication rate. This in turn allows controlling the time it takes for decommissioning. Please see hdfs-default.xml for detailed description.
HDFS-3474. Major sub-task reported by Ivan Kelly and fixed by Ivan Kelly
Cleanup Exception handling in BookKeeper journal manager
HDFS-3469. Minor bug reported by Vinay and fixed by Vinay (auto-failover)
start-dfs.sh will start zkfc, but stop-dfs.sh will not stop zkfc similarly.
HDFS-3468. Major sub-task reported by Uma Maheswara Rao G and fixed by Uma Maheswara Rao G
Make BKJM-ZK session timeout configurable.
HDFS-3466. Major bug reported by Owen O'Malley and fixed by Owen O'Malley (name-node , security)
The SPNEGO filter for the NameNode should come out of the web keytab file
HDFS-3460. Critical bug reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur
HttpFS proxyuser validation with Kerberos ON uses full principal name
HDFS-3454. Minor improvement reported by Eli Collins and fixed by Eli Collins (balancer)
Balancer unconditionally logs InterruptedException at INFO level on shutdown if security is enabled
HDFS-3452. Blocker sub-task reported by suja s and fixed by Uma Maheswara Rao G
BKJM:Switch from standby to active fails and NN gets shut down due to delay in clearing of lock
HDFS-3446. Major bug reported by Matthew Jacobs and fixed by Matthew Jacobs (name-node)
HostsFileReader silently ignores bad includes/excludes

HDFS no longer silently ignores missing or unreadable host files specified by dfs.hosts or dfs.hosts.exclude. In order to specify that no hosts should be included or excluded, administrators should either refrain from setting the relevant config properties, or create an empty file in order to represent an empty list.
HDFS-3444. Major bug reported by Aaron T. Myers and fixed by Aaron T. Myers (hdfs client)
hdfs groups command doesn't work with security enabled
HDFS-3442. Minor bug reported by suja s and fixed by Andrew Wang
Incorrect count for Missing Replicas in FSCK report
HDFS-3441. Major sub-task reported by suja s and fixed by Rakesh R
Race condition between rolling logs at active NN and purging at standby
HDFS-3440. Minor bug reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe
should more effectively limit stream memory consumption when reading corrupt edit logs
HDFS-3438. Major improvement reported by Todd Lipcon and fixed by Todd Lipcon (ha)
BootstrapStandby should not require a rollEdits on active node
HDFS-3436. Major bug reported by Brahma Reddy Battula and fixed by Vinay (data-node)
adding new datanode to existing pipeline fails in case of Append/Recovery
HDFS-3433. Major bug reported by Aaron T. Myers and fixed by Aaron T. Myers (name-node)
GetImageServlet should allow administrative requestors when security is enabled
HDFS-3428. Major bug reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur (security)
move DelegationTokenRenewer to common
HDFS-3423. Major sub-task reported by Rakesh R and fixed by Ivan Kelly
BKJM: NN startup is failing, when tries to recoverUnfinalizedSegments() a bad inProgress_ ZNodes
HDFS-3422. Minor bug reported by Todd Lipcon and fixed by Todd Lipcon (test)
TestStandbyIsHot timeouts too aggressive
HDFS-3419. Minor improvement reported by Eli Collins and fixed by Eli Collins
Cleanup LocatedBlock
HDFS-3417. Minor improvement reported by Eli Collins and fixed by Eli Collins (data-node)
Rename BalancerDatanode#getName to getDisplayName to be consistent with Datanode
HDFS-3416. Minor improvement reported by Eli Collins and fixed by Eli Collins (data-node)
Cleanup DatanodeID and DatanodeRegistration constructors used by testing
HDFS-3415. Major bug reported by Brahma Reddy Battula and fixed by Brandon Li (name-node)
During NameNode starting up, it may pick wrong storage directory inspector when the layout versions of the storage directories are different
HDFS-3414. Minor bug reported by Aaron T. Myers and fixed by Aaron T. Myers (balancer)
Balancer does not find NameNode if rpc-address or servicerpc-address are not set in client configs
HDFS-3413. Critical bug reported by Todd Lipcon and fixed by Aaron T. Myers (ha , test)
TestFailureToReadEdits timing out
HDFS-3408. Minor sub-task reported by Rakesh R and fixed by Rakesh R (name-node)
BKJM : Namenode format fails, if there is no BK root
HDFS-3404. Major improvement reported by Aaron T. Myers and fixed by Aaron T. Myers
Make putImage in GetImageServlet infer remote address to fetch from request
HDFS-3401. Major improvement reported by Eli Collins and fixed by Eli Collins (data-node , test)
Cleanup DatanodeDescriptor creation in the tests
HDFS-3400. Major improvement reported by Aaron T. Myers and fixed by Aaron T. Myers (data-node , scripts)
DNs should be able start with jsvc even if security is disabled
HDFS-3398. Minor bug reported by Brahma Reddy Battula and fixed by amith (hdfs client)
Client will not retry when primaryDN is down once it's just got pipeline
HDFS-3394. Minor improvement reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (name-node)
Do not use generic in INodeFile.getLastBlock()
HDFS-3391. Critical bug reported by Arun C Murthy and fixed by Todd Lipcon
TestPipelinesFailover#testLeaseRecoveryAfterFailover is failing
HDFS-3390. Minor improvement reported by Aaron T. Myers and fixed by Aaron T. Myers (hdfs client)
DFSAdmin should print full stack traces of errors when DEBUG logging is enabled
HDFS-3389. Major sub-task reported by Uma Maheswara Rao G and fixed by Uma Maheswara Rao G (name-node)
Document the BKJM usage in Namenode HA.
HDFS-3385. Major bug reported by Brahma Reddy Battula and fixed by Tsz Wo (Nicholas), SZE (name-node)
ClassCastException when trying to append a file
HDFS-3372. Minor improvement reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe (tools)
offlineEditsViewer should be able to read a binary edits file with recovery mode
HDFS-3369. Minor sub-task reported by John George and fixed by John George (name-node)
change variable names referring to inode in blockmanagement to more appropriate
HDFS-3368. Major bug reported by Konstantin Shvachko and fixed by Konstantin Shvachko (name-node)
Missing blocks due to bad DataNodes coming up and down.
HDFS-3359. Critical bug reported by Todd Lipcon and fixed by Todd Lipcon (hdfs client)
DFSClient.close should close cached sockets
HDFS-3341. Minor improvement reported by Todd Lipcon and fixed by Todd Lipcon
Change minimum RPC versions to 2.0.0-SNAPSHOT instead of 2.0.0
HDFS-3335. Major improvement reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe
check for edit log corruption at the end of the log
HDFS-3334. Major bug reported by Daryn Sharp and fixed by Daryn Sharp (hdfs client)
ByteRangeInputStream leaks streams
HDFS-3331. Major bug reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (name-node)
setBalancerBandwidth do not checkSuperuserPrivilege
HDFS-3321. Major bug reported by Ravi Prakash and fixed by Ravi Prakash
Error message for insufficient data nodes to come out of safemode is wrong.
HDFS-3318. Blocker bug reported by Daryn Sharp and fixed by Daryn Sharp (hdfs client)
Hftp hangs on transfers >2GB
HDFS-3312. Blocker bug reported by Daryn Sharp and fixed by Daryn Sharp (hdfs client)
Hftp selects wrong token service
HDFS-3308. Critical bug reported by Daryn Sharp and fixed by Daryn Sharp (webhdfs)
hftp/webhdfs can't get tokens if authority has no port
HDFS-3306. Minor bug reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe
fuse_dfs: don't lock release operations
HDFS-3291. Major test reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur
add test that covers HttpFS working w/ a non-HDFS Hadoop filesystem
HDFS-3276. Minor improvement reported by Vinithra Varadharajan and fixed by Todd Lipcon (ha , name-node)
initializeSharedEdits should have a -nonInteractive flag
HDFS-3266. Minor bug reported by Aaron T. Myers and fixed by madhukara phatak
DFSTestUtil#waitCorruptReplicas doesn't sleep between checks
HDFS-3258. Major test reported by Eli Collins and fixed by Junping Du (test)
Test for HADOOP-8144 (pseudoSortByDistance in NetworkTopology for first rack local node)
HDFS-3243. Major bug reported by Todd Lipcon and fixed by Henry Robinson (hdfs client , test)
TestParallelRead timing out on jenkins
HDFS-3235. Minor bug reported by Henry Robinson and fixed by Henry Robinson
MiniDFSClusterManager doesn't correctly support -format option
HDFS-3230. Minor improvement reported by Eli Collins and fixed by Eli Collins (test)
Cleanup DatanodeID creation in the tests
HDFS-3194. Major bug reported by suja s and fixed by Andy Isaacson (data-node)
DataNode block scanner is running too frequently
HDFS-3190. Minor sub-task reported by Todd Lipcon and fixed by Todd Lipcon (name-node)
Simple refactors in existing NN code to assist QuorumJournalManager extension
HDFS-3177. Major bug reported by Kihwal Lee and fixed by Kihwal Lee (data-node , hdfs client)
Allow DFSClient to find out and use the CRC type being used for a file.
HDFS-3176. Major bug reported by Kihwal Lee and fixed by Kihwal Lee (hdfs client)
JsonUtil should not parse the MD5MD5CRC32FileChecksum bytes on its own.
HDFS-3170. Major improvement reported by Todd Lipcon and fixed by Matthew Jacobs (data-node)
Add more useful metrics for write latency
HDFS-3168. Major sub-task reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (name-node)
Clean up FSNamesystem and BlockManager
HDFS-3166. Critical bug reported by Daryn Sharp and fixed by Daryn Sharp (hdfs client)
Hftp connections do not have a timeout
HDFS-3157. Major bug reported by J.Andreina and fixed by Ashish Singhi (name-node)
Error in deleting block is keep on coming from DN even after the block report and directory scanning has happened
HDFS-3150. Major new feature reported by Eli Collins and fixed by Eli Collins (data-node , hdfs client)
Add option for clients to contact DNs via hostname
HDFS-3136. Major bug reported by Jason Lowe and fixed by Jason Lowe (build)
Multiple SLF4J binding warning
HDFS-3134. Major improvement reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe (name-node)
Harden edit log loader against malformed or malicious input
HDFS-3113. Major new feature reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur
httpfs does not support delegation tokens
HDFS-3110. Major improvement reported by Henry Robinson and fixed by Henry Robinson (libhdfs , performance)
libhdfs implementation of direct read API

libhdfs is enhanced to read directly into user-supplied buffers when possible, reducing the number of memory copies.
HDFS-3067. Major bug reported by Henry Robinson and fixed by Henry Robinson (hdfs client)
NPE in DFSInputStream.readBuffer if read is repeated on corrupted block
HDFS-3058. Major sub-task reported by Ivan Kelly and fixed by Ivan Kelly
HA: Bring BookKeeperJournalManager up to date with HA changes
HDFS-3054. Major bug reported by patrick white and fixed by Colin Patrick McCabe (tools)
distcp -skipcrccheck has no effect
HDFS-3048. Major bug reported by Eli Collins and fixed by Andy Isaacson (name-node)
Small race in BlockManager#close
HDFS-3042. Major new feature reported by Todd Lipcon and fixed by Todd Lipcon (auto-failover , ha)
Automatic failover support for NN HA
HDFS-3040. Trivial improvement reported by Aaron T. Myers and fixed by madhukara phatak (test)
TestMulitipleNNDataBlockScanner is misspelled
HDFS-3037. Minor bug reported by Aaron T. Myers and fixed by Aaron T. Myers (test)
TestMulitipleNNDataBlockScanner#testBlockScannerAfterRestart is racy
HDFS-3031. Major bug reported by Stephen Chu and fixed by Todd Lipcon (ha)
HA: Error (failed to close file) when uploading large file + kill active NN + manual failover
HDFS-3002. Trivial improvement reported by Suresh Srinivas and fixed by Suresh Srinivas (test)
TestNameNodeMetrics need not wait for metrics update with new metrics framework
HDFS-2988. Minor improvement reported by Todd Lipcon and fixed by Miomir Boljanovic (name-node)
Improve error message when storage directory lock fails
HDFS-2982. Critical bug reported by Todd Lipcon and fixed by Colin Patrick McCabe (name-node)
Startup performance suffers when there are many edit log segments
HDFS-2978. Major new feature reported by Aaron T. Myers and fixed by Aaron T. Myers (name-node)
The NameNode should expose name dir statuses via JMX
HDFS-2966. Minor bug reported by Steve Loughran and fixed by Steve Loughran (test)
TestNameNodeMetrics tests can fail under load
HDFS-2963. Minor bug reported by J.Andreina and fixed by Andrew Wang
Console Output is confusing while executing metasave (dfsadmin command)
HDFS-2914. Major bug reported by Hari Mankude and fixed by Vinay (ha , name-node)
HA: Standby should not enter safemode when resources are low
HDFS-2885. Major improvement reported by Eli Collins and fixed by Tsz Wo (Nicholas), SZE (name-node)
Remove "federation" from the nameservice config options
HDFS-2834. Major improvement reported by Henry Robinson and fixed by Henry Robinson (hdfs client , performance)
ByteBuffer-based read API for DFSInputStream
HDFS-2800. Major bug reported by Aaron T. Myers and fixed by Todd Lipcon (ha , test)
HA: TestStandbyCheckpoints.testCheckpointCancellation is racy
HDFS-2797. Major bug reported by Aaron T. Myers and fixed by Colin Patrick McCabe (ha , name-node)
Fix misuses of InputStream#skip in the edit log code
HDFS-2793. Major new feature reported by Aaron T. Myers and fixed by Todd Lipcon (name-node)
Add an admin command to trigger an edit log roll

Introduced a new command, "hdfs dfsadmin -rollEdits" which requests that the active NameNode roll its edit log. This can be useful for administrators manually backing up log segments.
HDFS-2759. Major bug reported by Aaron T. Myers and fixed by Aaron T. Myers (ha , name-node)
Pre-allocate HDFS edit log files after writing version number
HDFS-2757. Major bug reported by Jean-Daniel Cryans and fixed by Jean-Daniel Cryans
Cannot read a local block that's being written to when using the local read short circuit
HDFS-2727. Minor improvement reported by Sho Shimauchi and fixed by Colin Patrick McCabe (libhdfs)
libhdfs should get the default block size from the server

libhdfs now uses the server block size configuration rather than the deprecated dfs.block.size client configuration.
HDFS-2717. Major sub-task reported by Ivan Kelly and fixed by Ivan Kelly
BookKeeper Journal output stream doesn't check addComplete rc
HDFS-2686. Major improvement reported by Todd Lipcon and fixed by Suresh Srinivas (data-node , name-node)
Remove DistributedUpgrade related code

This jira removes functionality that has not been used/applicable since release 0.17. The incompatibility introduced by this change will not affect any HDFS users.
HDFS-2652. Major improvement reported by Daryn Sharp and fixed by Daryn Sharp
Port token service changes from 205
HDFS-2619. Major bug reported by Owen O'Malley and fixed by Owen O'Malley (build)
Remove my personal email address from the libhdfs build file.
HDFS-2617. Major improvement reported by Jakob Homan and fixed by Jakob Homan (security)
Replaced Kerberized SSL for image transfer and fsck with SPNEGO-based solution

Due to the requirement that KSSL use weak encryption types for Kerberos tickets, HTTP authentication to the NameNode will now use SPNEGO by default. This will require users of previous branch-1 releases with security enabled to modify their configurations and create new Kerberos principals in order to use SPNEGO. The old behavior of using KSSL can optionally be enabled by setting the configuration option "hadoop.security.use-weak-http-crypto" to "true".
HDFS-2421. Major improvement reported by Hairong Kuang and fixed by Jing Zhao (name-node)
Improve the concurrency of SerialNumberMap in NameNode
HDFS-2391. Major improvement reported by Rajit Saha and fixed by Harsh J (balancer)
Newly set BalancerBandwidth value is not displayed anywhere
HDFS-2330. Major sub-task reported by Uma Maheswara Rao G and fixed by Uma Maheswara Rao G (name-node)
In NNStorage.java, IOExceptions of stream closures can mask root exceptions.
HDFS-2285. Major bug reported by Konstantin Shvachko and fixed by Konstantin Shvachko (name-node)
BackupNode should reject requests trying to modify namespace
HDFS-2025. Minor bug reported by sravankorumilli and fixed by Ashish Singhi (data-node)
Go Back to File View link is not working in tail.jsp
HDFS-1490. Minor bug reported by Dmytro Molkov and fixed by Vinay (name-node)
TransferFSImage should timeout
HDFS-1249. Minor bug reported by matsusaka kentaro and fixed by Colin Patrick McCabe (fuse-dfs)
with fuse-dfs, chown which only has owner (or only group) argument fails with Input/output error.
HDFS-1153. Minor bug reported by Ravi Phulari and fixed by Ravi Phulari (data-node)
dfsnodelist.jsp should handle invalid input parameters
HDFS-1013. Minor improvement reported by Todd Lipcon and fixed by Eugene Koontz
Miscellaneous improvements to HTML markup for web UIs
HDFS-799. Major improvement reported by Christian Kunz and fixed by Colin Patrick McCabe
libhdfs must call DetachCurrentThread when a thread is destroyed
HDFS-766. Minor bug reported by Ravi Phulari and fixed by Jon Zuanich
Error message not clear for set space quota out of boundary values.
HDFS-744. Major new feature reported by Hairong Kuang and fixed by Lars Hofhansl (data-node , hdfs client)
Support hsync in HDFS
HDFS-711. Major bug reported by freestyler and fixed by Colin Patrick McCabe (documentation)
hdfsUtime does not handle atime = 0 or mtime = 0 correctly
HDFS-470. Minor bug reported by Pete Wyckoff and fixed by Colin Patrick McCabe
libhdfs should handle 0-length reads from FSInputStream correctly
HADOOP-8801. Major bug reported by Eli Collins and fixed by Eli Collins
ExitUtil#terminate should capture the exception stack trace
HADOOP-8794. Major bug reported by Vinod Kumar Vavilapalli and fixed by Vinod Kumar Vavilapalli
Modifiy bin/hadoop to point to HADOOP_YARN_HOME
HADOOP-8781. Major bug reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur (scripts)
hadoop-config.sh should add JAVA_LIBRARY_PATH to LD_LIBRARY_PATH
HADOOP-8775. Major bug reported by Sandy Ryza and fixed by Sandy Ryza
MR2 distcp permits non-positive value to -bandwidth option which causes job never to complete
HADOOP-8770. Blocker bug reported by Todd Lipcon and fixed by Eli Collins (trash)
NN should not RPC to self to find trash defaults (causes deadlock)
HADOOP-8766. Major bug reported by Eli Collins and fixed by Colin Patrick McCabe (test)
FileContextMainOperationsBaseTest should randomize the root dir
HADOOP-8764. Major bug reported by Trevor Robinson and fixed by Trevor Robinson (build)
CMake: HADOOP-8737 broke ARM build
HADOOP-8754. Minor improvement reported by Brandon Li and fixed by Brandon Li (ipc)
Deprecate all the RPC.getServer() variants
HADOOP-8749. Major bug reported by Ahmed Radwan and fixed by Ahmed Radwan (conf)
HADOOP-8031 changed the way in which relative xincludes are handled in Configuration.
HADOOP-8748. Minor improvement reported by Arun C Murthy and fixed by Arun C Murthy (io)
Move dfsclient retry to a util class
HADOOP-8747. Major bug reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe (native)
Syntax error on cmake version 2.6 patch 2 in JNIFlags.cmake
HADOOP-8737. Minor bug reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe (native)
cmake: always use JAVA_HOME to find libjvm.so, jni.h, jni_md.h
HADOOP-8727. Major bug reported by Harsh J and fixed by Harsh J (conf)
Gracefully deprecate dfs.umaskmode in 2.x onwards
HADOOP-8726. Major bug reported by Benoy Antony and fixed by Daryn Sharp (security)
The Secrets in Credentials are not available to MR tasks
HADOOP-8725. Blocker bug reported by Daryn Sharp and fixed by Daryn Sharp (security)
MR is broken when security is off
HADOOP-8722. Minor bug reported by Eli Collins and fixed by Colin Patrick McCabe (documentation)
Update BUILDING.txt with latest snappy info
HADOOP-8721. Critical bug reported by suja s and fixed by Vinay (auto-failover , ha)
ZKFC should not retry 45 times when attempting a graceful fence during a failover
HADOOP-8720. Trivial bug reported by Vlad Rozov and fixed by Vlad Rozov (test)
TestLocalFileSystem should use test root subdirectory
HADOOP-8710. Major improvement reported by Eli Collins and fixed by Eli Collins (fs)
Remove ability for users to easily run the trash emptier

The trash emptier may no longer be run using "hadoop org.apache.hadoop.fs.Trash". The trash emptier runs on the NameNode (if configured). Old trash checkpoints may be deleted using "hadoop fs -expunge".
HADOOP-8709. Critical bug reported by Jason Lowe and fixed by Jason Lowe (fs)
globStatus changed behavior from 0.20/1.x
HADOOP-8703. Major bug reported by Dave Thompson and fixed by Dave Thompson
distcpV2: turn CRC checking off for 0 byte size

distcp skips CRC on 0 byte files.
HADOOP-8700. Minor improvement reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (util)
Move the checksum type constants to an enum
HADOOP-8699. Critical bug reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur (test)
some common testcases create core-site.xml in test-classes making other testcases to fail
HADOOP-8697. Major bug reported by Trevor Robinson and fixed by Trevor Robinson (test)
TestWritableName fails intermittently with JDK7
HADOOP-8695. Major bug reported by Trevor Robinson and fixed by Trevor Robinson (test)
TestPathData fails intermittently with JDK7
HADOOP-8693. Major bug reported by Trevor Robinson and fixed by Trevor Robinson (test)
TestSecurityUtil fails intermittently with JDK7
HADOOP-8692. Major bug reported by Trevor Robinson and fixed by Trevor Robinson (test)
TestLocalDirAllocator fails intermittently with JDK7
HADOOP-8689. Major improvement reported by Eli Collins and fixed by Eli Collins (fs)
Make trash a server side configuration option

If fs.trash.interval is configured on the server then the client's value for this configuration is ignored.
HADOOP-8686. Minor bug reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe (native)
Fix warnings in native code
HADOOP-8660. Major bug reported by Eli Collins and fixed by Alejandro Abdelnur
TestPseudoAuthenticator failing with NPE
HADOOP-8659. Major bug reported by Trevor Robinson and fixed by Colin Patrick McCabe (native)
Native libraries must build with soft-float ABI for Oracle JVM on ARM
HADOOP-8655. Major bug reported by Arun A K and fixed by (util)
In TextInputFormat, while specifying textinputformat.record.delimiter the character/character sequences in data file similar to starting character/starting character sequence in delimiter were found missing in certain cases in the Map Output
HADOOP-8654. Major bug reported by Gelesh and fixed by (util)
TextInputFormat delimiter bug:- Input Text portion ends with & Delimiter starts with same char/char sequence
HADOOP-8648. Major bug reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe
libhadoop: native CRC32 validation crashes when io.bytes.per.checksum=1
HADOOP-8644. Critical new feature reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur (security)
AuthenticatedURL should be able to use SSLFactory
HADOOP-8637. Critical bug reported by Daryn Sharp and fixed by Daryn Sharp (fs)
FilterFileSystem#setWriteChecksum is broken
HADOOP-8635. Critical improvement reported by Daryn Sharp and fixed by Daryn Sharp (fs)
Cannot cancel paths registered deleteOnExit
HADOOP-8634. Critical bug reported by Daryn Sharp and fixed by Daryn Sharp (fs)
Ensure FileSystem#close doesn't squawk for deleteOnExit paths
HADOOP-8633. Critical bug reported by Daryn Sharp and fixed by Daryn Sharp (fs)
Interrupted FsShell copies may leave tmp files
HADOOP-8632. Major bug reported by Costin Leau and fixed by Costin Leau (conf)
Configuration leaking class-loaders
HADOOP-8627. Critical bug reported by Daryn Sharp and fixed by Daryn Sharp (fs)
FS deleteOnExit may delete the wrong path
HADOOP-8626. Major bug reported by Jonathan Natkins and fixed by Jonathan Natkins (security)
Typo in default setting for hadoop.security.group.mapping.ldap.search.filter.user
HADOOP-8624. Minor improvement reported by Todd Lipcon and fixed by Todd Lipcon (ipc)
ProtobufRpcEngine should log all RPCs if TRACE logging is enabled
HADOOP-8623. Minor improvement reported by Steven Willis and fixed by Steven Willis (scripts)
hadoop jar command should respect HADOOP_OPTS
HADOOP-8620. Minor improvement reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe (build)
Add -Drequire.fuse and -Drequire.snappy
HADOOP-8614. Minor bug reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe
IOUtils#skipFully hangs forever on EOF
HADOOP-8613. Critical bug reported by Daryn Sharp and fixed by Daryn Sharp
AbstractDelegationTokenIdentifier#getUser() should set token auth type
HADOOP-8611. Major bug reported by Kihwal Lee and fixed by Robert Parker (security)
Allow fall-back to the shell-based implementation when JNI-based users-group mapping fails
HADOOP-8609. Major improvement reported by Todd Lipcon and fixed by Jon Zuanich
IPC server logs a useless message when shutting down socket
HADOOP-8606. Major bug reported by Daryn Sharp and fixed by Daryn Sharp (fs)
FileSystem.get may return the wrong filesystem
HADOOP-8599. Major bug reported by Andrey Klochkov and fixed by Andrey Klochkov (fs)
Non empty response from FileSystem.getFileBlockLocations when asking for data beyond the end of file
HADOOP-8587. Minor bug reported by Eli Collins and fixed by Eli Collins (fs)
HarFileSystem access of harMetaCache isn't threadsafe
HADOOP-8586. Major bug reported by Eli Collins and fixed by Eli Collins
Fixup a bunch of SPNEGO misspellings
HADOOP-8585. Minor bug reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe
Fix initialization circularity between UserGroupInformation and HadoopConfiguration
HADOOP-8581. Major new feature reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur (security)
add support for HTTPS to the web UIs
HADOOP-8573. Major bug reported by Robert Joseph Evans and fixed by Robert Joseph Evans (conf)
Configuration tries to read from an inputstream resource multiple times.
HADOOP-8566. Major bug reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur (io)
AvroReflectSerializer.accept(Class) throws a NPE if the class has no package (primitive types and arrays)
HADOOP-8563. Minor bug reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe (build)
don't package hadoop-pipes examples/bin
HADOOP-8551. Major bug reported by Robert Joseph Evans and fixed by John George (fs)
fs -mkdir creates parent directories without the -p option

FsShell's "mkdir" no longer implicitly creates all non-existent parent directories. The command adopts the posix compliant behavior of requiring the "-p" flag to auto-create parent directories.
HADOOP-8550. Major bug reported by Robert Joseph Evans and fixed by John George (fs)
hadoop fs -touchz automatically created parent directories
HADOOP-8547. Minor bug reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe
Package hadoop-pipes examples/bin directory (again)
HADOOP-8543. Major bug reported by Radim Kolar and fixed by Radim Kolar (build)
Invalid pom.xml files on 0.23 branch
HADOOP-8541. Major improvement reported by Andrew Wang and fixed by Andrew Wang (metrics)
Better high-percentile latency metrics
HADOOP-8538. Major bug reported by Trevor Robinson and fixed by Trevor Robinson (native)
CMake builds fail on ARM
HADOOP-8537. Major bug reported by Todd Lipcon and fixed by Todd Lipcon (io)
Two TFile tests failing recently
HADOOP-8535. Major improvement reported by Jonathan Eagles and fixed by Jonathan Eagles (build)
Cut hadoop build times in half (upgrade maven-compiler-plugin to 2.5.1)
HADOOP-8533. Major improvement reported by Suresh Srinivas and fixed by Brandon Li (ipc)
Remove Parallel Call in IPC

Merged the change to branch-2
HADOOP-8531. Trivial improvement reported by Harsh J and fixed by madhukara phatak (io)
SequenceFile Writer can throw out a better error if a serializer or deserializer isn't available
HADOOP-8525. Trivial improvement reported by Robert Joseph Evans and fixed by Robert Joseph Evans
Provide Improved Traceability for Configuration
HADOOP-8524. Trivial improvement reported by Harsh J and fixed by Harsh J (conf)
Allow users to get source of a Configuration parameter
HADOOP-8512. Minor bug reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur (security)
AuthenticatedURL should reset the Token when the server returns other than OK on authentication
HADOOP-8509. Minor bug reported by Matteo Bertozzi and fixed by Alejandro Abdelnur (util)
JarFinder duplicate entry: META-INF/MANIFEST.MF exception
HADOOP-8507. Minor bug reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe
Avoid OOM while deserializing DelegationTokenIdentifer
HADOOP-8501. Major bug reported by Radim Kolar and fixed by Radim Kolar (benchmarks)
Gridmix fails to compile on OpenJDK7u4
HADOOP-8499. Minor bug reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe
Lower min.user.id to 500 for the tests
HADOOP-8495. Critical bug reported by Jason Lowe and fixed by Jason Lowe (build)
Update Netty to avoid leaking file descriptors during shuffle
HADOOP-8488. Minor bug reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe
test-patch.sh gives +1 even if the native build fails.
HADOOP-8485. Minor bug reported by Eli Collins and fixed by Eli Collins (documentation)
Don't hardcode "Apache Hadoop 0.23" in the docs
HADOOP-8481. Trivial bug reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe (documentation)
update BUILDING.txt to talk about cmake rather than autotools
HADOOP-8480. Trivial bug reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe
The native build should honor -DskipTests
HADOOP-8466. Major bug reported by Bruno Mahé and fixed by Bruno Mahé (build)
hadoop-client POM incorrectly excludes avro
HADOOP-8465. Major new feature reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur (security)
hadoop-auth should support ephemeral authentication
HADOOP-8463. Major improvement reported by Eli Collins and fixed by madhukara phatak (security)
hadoop.security.auth_to_local needs a key definition and doc
HADOOP-8460. Major bug reported by Robert Joseph Evans and fixed by Robert Joseph Evans (documentation)
Document proper setting of HADOOP_PID_DIR and HADOOP_SECURE_DN_PID_DIR
HADOOP-8458. Major new feature reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur (security)
Add management hook to AuthenticationHandler to enable delegation token operations support
HADOOP-8452. Minor bug reported by Andy Isaacson and fixed by Andy Isaacson
DN logs backtrace when running under jsvc and /jmx is loaded
HADOOP-8450. Trivial bug reported by Colin Patrick McCabe and fixed by Eli Collins (test)
Remove src/test/system
HADOOP-8449. Minor bug reported by Joey Echeverria and fixed by Harsh J
hadoop fs -text fails with compressed sequence files with the codec file extension
HADOOP-8444. Major bug reported by Mariappan Asokan and fixed by madhukara phatak (fs , test)
Fix the tests FSMainOperationsBaseTest.java and F ileContextMainOperationsBaseTest.java to avoid potential test failure
HADOOP-8438. Major bug reported by Devaraj K and fixed by Devaraj K
hadoop-validate-setup.sh refers to examples jar file which doesn't exist
HADOOP-8433. Major bug reported by Brahma Reddy Battula and fixed by Brahma Reddy Battula (scripts)
Don't set HADOOP_LOG_DIR in hadoop-env.sh
HADOOP-8431. Major bug reported by Eli Collins and fixed by Sandy Ryza
Running distcp wo args throws IllegalArgumentException
HADOOP-8423. Major bug reported by Jason B and fixed by Todd Lipcon (io)
MapFile.Reader.get() crashes jvm or throws EOFException on Snappy or LZO block-compressed data
HADOOP-8422. Minor bug reported by Eli Collins and fixed by Eli Collins (fs)
Deprecate FileSystem#getDefault* and getServerDefault methods that don't take a Path argument
HADOOP-8408. Major bug reported by Aaron T. Myers and fixed by Aaron T. Myers (viewfs)
MR doesn't work with a non-default ViewFS mount table and security enabled
HADOOP-8406. Major bug reported by Todd Lipcon and fixed by Todd Lipcon (io)
CompressionCodecFactory.CODEC_PROVIDERS iteration is thread-unsafe
HADOOP-8403. Major task reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur (build)
bump up POMs version to 2.0.1-SNAPSHOT
HADOOP-8400. Major bug reported by Eli Collins and fixed by Alejandro Abdelnur (security)
All commands warn "Kerberos krb5 configuration not found" when security is not enabled
HADOOP-8393. Major bug reported by Patrick Hunt and fixed by Patrick Hunt (scripts)
hadoop-config.sh missing variable exports, causes Yarn jobs to fail with ClassNotFoundException MRAppMaster
HADOOP-8390. Major bug reported by Trevor Robinson and fixed by Trevor Robinson (test)
TestFileSystemCanonicalization fails with JDK7
HADOOP-8373. Major improvement reported by Daryn Sharp and fixed by Daryn Sharp (ipc)
Port RPC.getServerAddress to 0.23
HADOOP-8372. Major bug reported by Junping Du and fixed by Junping Du (io , util)
normalizeHostName() in NetUtils is not working properly in resolving a hostname start with numeric character
HADOOP-8370. Major bug reported by Trevor Robinson and fixed by Trevor Robinson (native)
Native build failure: javah: class file for org.apache.hadoop.classification.InterfaceAudience not found
HADOOP-8368. Minor improvement reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe
Use CMake rather than autotools to build native code
HADOOP-8367. Major improvement reported by Sanjay Radia and fixed by Sanjay Radia
Improve documentation of declaringClassProtocolName in rpc headers
HADOOP-8362. Trivial improvement reported by Todd Lipcon and fixed by madhukara phatak (conf)
Improve exception message when Configuration.set() is called with a null key or value
HADOOP-8361. Minor improvement reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe
Avoid out-of-memory problems when deserializing strings
HADOOP-8358. Trivial improvement reported by Harsh J and fixed by Harsh J (conf)
Config-related WARN for dfs.web.ugi can be avoided.
HADOOP-8342. Major bug reported by Randy Clayton and fixed by Alejandro Abdelnur (fs)
HDFS command fails with exception following merge of HADOOP-8325
HADOOP-8341. Major bug reported by Robert Joseph Evans and fixed by Robert Joseph Evans
Fix or filter findbugs issues in hadoop-tools
HADOOP-8340. Minor improvement reported by Todd Lipcon and fixed by Todd Lipcon (util)
SNAPSHOT build versions should compare as less than their eventual final release
HADOOP-8335. Major improvement reported by Daryn Sharp and fixed by Daryn Sharp (util)
Improve Configuration's address handling
HADOOP-8334. Major bug reported by Daryn Sharp and fixed by Daryn Sharp
HttpServer sometimes returns incorrect port
HADOOP-8330. Minor bug reported by John George and fixed by John George (test)
TestSequenceFile.testCreateUsesFsArg() is broken
HADOOP-8329. Major bug reported by Kumar Ravi and fixed by Eli Collins (build)
Build fails with Java 7
HADOOP-8328. Major bug reported by Tom White and fixed by Tom White (fs)
Duplicate FileSystem Statistics object for 'file' scheme
HADOOP-8327. Major bug reported by Dave Thompson and fixed by Dave Thompson
distcpv2 and distcpv1 jars should not coexist

Resolve sporadic distcp issue due to having two DistCp classes (v1 & v2) in the classpath.
HADOOP-8325. Critical bug reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur (fs)
Add a ShutdownHookManager to be used by different components instead of the JVM shutdownhook
HADOOP-8323. Critical improvement reported by Harsh J and fixed by Harsh J (io)
Revert HADOOP-7940 and improve javadocs and test for Text.clear()
HADOOP-8317. Major bug reported by Radim Kolar and fixed by (build)
Update maven-assembly-plugin to 2.3 - fix build on FreeBSD
HADOOP-8316. Major bug reported by Eli Collins and fixed by Eli Collins (conf)
Audit logging should be disabled by default
HADOOP-8305. Major bug reported by John George and fixed by John George (viewfs)
distcp over viewfs is broken
HADOOP-8288. Major bug reported by Ravi Prakash and fixed by Ravi Prakash
Remove references of mapred.child.ulimit etc. since they are not being used any more
HADOOP-8287. Major bug reported by Eli Collins and fixed by Eli Collins (conf)
etc/hadoop is missing hadoop-env.sh
HADOOP-8286. Major improvement reported by Daryn Sharp and fixed by Daryn Sharp (conf)
Simplify getting a socket address from conf
HADOOP-8283. Major test reported by Daryn Sharp and fixed by Daryn Sharp (test)
Allow tests to control token service value
HADOOP-8278. Major improvement reported by Tom White and fixed by Tom White (build)
Make sure components declare correct set of dependencies
HADOOP-8268. Major bug reported by Radim Kolar and fixed by Radim Kolar (build)
A few pom.xml across Hadoop project may fail XML validation
HADOOP-8244. Major improvement reported by Henry Robinson and fixed by Henry Robinson
Improve comments on ByteBufferReadable.read
HADOOP-8242. Minor improvement reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe
AbstractDelegationTokenIdentifier: add getter methods for owner and realuser
HADOOP-8240. Major improvement reported by Kihwal Lee and fixed by Kihwal Lee (fs)
Allow users to specify a checksum type on create()
HADOOP-8239. Major improvement reported by Kihwal Lee and fixed by Kihwal Lee (fs)
Extend MD5MD5CRC32FileChecksum to show the actual checksum type being used
HADOOP-8227. Blocker improvement reported by Robert Joseph Evans and fixed by Robert Joseph Evans
Allow RPC to limit ephemeral port range.
HADOOP-8225. Blocker bug reported by Mithun Radhakrishnan and fixed by Daryn Sharp (security)
DistCp fails when invoked by Oozie
HADOOP-8224. Major improvement reported by Eli Collins and fixed by Tomohiko Kinebuchi (conf)
Don't hardcode hdfs.audit.logger in the scripts
HADOOP-8197. Critical bug reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur (conf)
Configuration logs WARNs on every use of a deprecated key
HADOOP-8180. Major bug reported by Ravi Prakash and fixed by Ravi Prakash
Remove hsqldb since its not needed from pom.xml
HADOOP-8179. Minor bug reported by Steve Loughran and fixed by Daryn Sharp (fs)
risk of NPE in CopyCommands processArguments()
HADOOP-8172. Critical bug reported by Robert Joseph Evans and fixed by Anupam Seth (conf)
Configuration no longer sets all keys in a deprecated key list.
HADOOP-8168. Major bug reported by Eugene Koontz and fixed by Eugene Koontz (fs)
empty-string owners or groups causes {{MissingFormatWidthException}} in o.a.h.fs.shell.Ls.ProcessPath()
HADOOP-8167. Blocker bug reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur (conf)
Configuration deprecation logic breaks backwards compatibility
HADOOP-8144. Minor bug reported by Junping Du and fixed by Junping Du (io)
pseudoSortByDistance in NetworkTopology doesn't work properly if no local node and first node is local rack node
HADOOP-8135. Major new feature reported by Henry Robinson and fixed by Henry Robinson (fs)
Add ByteBufferReadable interface to FSDataInputStream
HADOOP-8129. Major bug reported by Ravi Prakash and fixed by Ahmed Radwan (fs , test)
ViewFileSystemTestSetup setupForViewFileSystem is erring when the user's home directory is somewhere other than /home (eg. /User) etc.
HADOOP-8110. Major bug reported by Tsz Wo (Nicholas), SZE and fixed by Jason Lowe (fs)
TestViewFsTrash occasionally fails
HADOOP-8104. Major bug reported by Colin Patrick McCabe and fixed by Alejandro Abdelnur
Inconsistent Jackson versions
HADOOP-8088. Major bug reported by Kihwal Lee and fixed by Kihwal Lee (security)
User-group mapping cache incorrectly does negative caching on transient failures
HADOOP-8075. Major improvement reported by Eli Collins and fixed by Hızır Sefa İrken (native)
Lower native-hadoop library log from info to debug
HADOOP-8060. Major bug reported by Kihwal Lee and fixed by Kihwal Lee (fs , util)
Add a capability to discover and set checksum types per file.
HADOOP-8031. Major bug reported by Elias Ross and fixed by Elias Ross (conf)
Configuration class fails to find embedded .jar resources; should use URL.openStream()
HADOOP-8014. Major bug reported by Daryn Sharp and fixed by John George (fs)
ViewFileSystem does not correctly implement getDefaultBlockSize, getDefaultReplication, getContentSummary
HADOOP-8005. Major bug reported by Joe Crobak and fixed by Jason Lowe (scripts)
Multiple SLF4J binding message in .out file for all daemons
HADOOP-7967. Critical bug reported by Daryn Sharp and fixed by Daryn Sharp (fs , security)
Need generalized multi-token filesystem support
HADOOP-7868. Major bug reported by James Page and fixed by Trevor Robinson (native)
Hadoop native fails to compile when default linker option is -Wl,--as-needed
HADOOP-7818. Minor bug reported by Eli Collins and fixed by madhukara phatak (util)
DiskChecker#checkDir should fail if the directory is not executable
HADOOP-7754. Major sub-task reported by Todd Lipcon and fixed by Todd Lipcon (native , performance)
Expose file descriptors from Hadoop-wrapped local FileSystems
HADOOP-7703. Major bug reported by Devaraj K and fixed by Devaraj K
WebAppContext should also be stopped and cleared

Improved excpetion handling of shutting down web server. (Devaraj K via Eric Yang)
HADOOP-7510. Major improvement reported by Daryn Sharp and fixed by Daryn Sharp (security)
Tokens should use original hostname provided instead of ip
HADOOP-6963. Critical bug reported by Owen O'Malley and fixed by Ravi Prakash (fs)
Fix FileUtil.getDU. It should not include the size of the directory or follow symbolic links
HADOOP-6802. Major improvement reported by Erik Steffl and fixed by Sho Shimauchi (conf , fs)
Remove FS_CLIENT_BUFFER_DIR_KEY = "fs.client.buffer.dir" from CommonConfigurationKeys.java (not used, deprecated)
HADOOP-3886. Minor bug reported by brien colwell and fixed by Jingguo Yao (documentation)
Error in javadoc of Reporter, Mapper and Progressable
HADOOP-3450. Minor improvement reported by Ari Rabkin and fixed by Sho Shimauchi (fs)
Add tests to Local Directory Allocator for asserting their URI-returning capability

Hadoop 2.0.1-alpha Release Notes

These release notes include new developer and user-facing incompatibilities, features, and major improvements.

Changes since Hadoop 2.0.0-alpha

HADOOP-8552. Major bug reported by Karthik Kambatla and fixed by Karthik Kambatla (conf , security)
Conflict: Same security.log.file for multiple users.

Hadoop 2.0.0-alpha Release Notes

These release notes include new developer and user-facing incompatibilities, features, and major improvements.

Changes since Hadoop 0.23.2

MAPREDUCE-4274. Minor improvement reported by Todd Lipcon and fixed by Todd Lipcon (performance , task)
MapOutputBuffer should use native byte order for kvmeta
MAPREDUCE-4231. Major bug reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (contrib/raid)
Update RAID to not to use FSInodeInfo
MAPREDUCE-4219. Major improvement reported by Roman Shaposhnik and fixed by Roman Shaposhnik (security)
make default container-executor.conf.dir be a path relative to the container-executor binary
MAPREDUCE-4202. Major bug reported by Daryn Sharp and fixed by Daryn Sharp (test)
TestYarnClientProtocolProvider is broken
MAPREDUCE-4193. Major bug reported by Patrick Hunt and fixed by Patrick Hunt (documentation)
broken doc link for yarn-default.xml in site.xml
MAPREDUCE-4147. Major bug reported by Tom White and fixed by Tom White
YARN should not have a compile-time dependency on HDFS
MAPREDUCE-4138. Major improvement reported by Tom White and fixed by Tom White
Reduce memory usage of counters due to non-static nested classes
MAPREDUCE-4113. Major sub-task reported by Devaraj K and fixed by Devaraj K (mrv2 , test)
Fix tests org.apache.hadoop.mapred.TestClusterMRNotification
MAPREDUCE-4112. Major sub-task reported by Devaraj K and fixed by Devaraj K (mrv2 , test)
Fix tests org.apache.hadoop.mapred.TestClusterMapReduceTestCase
MAPREDUCE-4111. Major sub-task reported by Devaraj K and fixed by Devaraj K (mrv2 , test)
Fix tests in org.apache.hadoop.mapred.TestJobName
MAPREDUCE-4110. Major sub-task reported by Devaraj K and fixed by Devaraj K (mrv2 , test)
Fix tests in org.apache.hadoop.mapred.TestMiniMRClasspath & org.apache.hadoop.mapred.TestMiniMRWithDFSWithDistinctUsers
MAPREDUCE-4108. Major sub-task reported by Devaraj K and fixed by Devaraj K (mrv2)
Fix tests in org.apache.hadoop.util.TestRunJar
MAPREDUCE-4107. Major sub-task reported by Devaraj K and fixed by Devaraj K (mrv2)
Fix tests in org.apache.hadoop.ipc.TestSocketFactory
MAPREDUCE-4105. Major bug reported by Ahmed Radwan and fixed by Ahmed Radwan (mrv2)
Yarn RackResolver ignores rack configurations
MAPREDUCE-4103. Major improvement reported by Todd Lipcon and fixed by Todd Lipcon (documentation)
Fix HA docs for changes to shell command fencer args
MAPREDUCE-4098. Major bug reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur (test)
TestMRApps testSetClasspath fails
MAPREDUCE-4093. Major improvement reported by Devaraj K and fixed by Devaraj K (mrv2)
Improve RM WebApp start up when proxy address is not set
MAPREDUCE-4081. Blocker bug reported by Jason Lowe and fixed by Jason Lowe (build , mrv2)
TestMROutputFormat.java does not compile
MAPREDUCE-4076. Blocker bug reported by Devaraj K and fixed by Devaraj K (mrv2)
Stream job fails with ZipException when use yarn jar command
MAPREDUCE-4066. Minor bug reported by xieguiming and fixed by xieguiming (job submission , mrv2)
To get "yarn.app.mapreduce.am.staging-dir" value, should set the default value
MAPREDUCE-4057. Major bug reported by Tsz Wo (Nicholas), SZE and fixed by Devaraj K (contrib/raid)
Compilation error in RAID
MAPREDUCE-4008. Major bug reported by Devaraj K and fixed by Devaraj K (mrv2 , scheduler)
ResourceManager throws MetricsException on start up saying QueueMetrics MBean already exists
MAPREDUCE-4007. Major bug reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur (mrv2)
JobClient getJob(JobID) should return NULL if the job does not exist (for backwards compatibility)
MAPREDUCE-3991. Trivial improvement reported by Harsh J and fixed by Harsh J (documentation)
Streaming FAQ has some wrong instructions about input files splitting
MAPREDUCE-3989. Major improvement reported by Patrick Hunt and fixed by Patrick Hunt
cap space usage of default log4j rolling policy (mr specific changes)
MAPREDUCE-3974. Blocker bug reported by Arun C Murthy and fixed by Aaron T. Myers
TestSubmitJob in MR1 tests doesn't compile after HDFS-1623 merge
MAPREDUCE-3958. Major bug reported by Bikas Saha and fixed by Bikas Saha (mrv2)
RM: Remove RMNodeState and replace it with NodeState
MAPREDUCE-3955. Blocker improvement reported by Jitendra Nath Pandey and fixed by Jitendra Nath Pandey (mrv2)
Replace ProtoOverHadoopRpcEngine with ProtobufRpcEngine.
MAPREDUCE-3952. Major bug reported by Zhenxiao Luo and fixed by Bhallamudi Venkata Siva Kamesh (mrv2)
In MR2, when Total input paths to process == 1, CombinefileInputFormat.getSplits() returns 0 split.
MAPREDUCE-3935. Major improvement reported by Tom White and fixed by Tom White (client)
Annotate Counters.Counter and Counters.Group as @Public
MAPREDUCE-3933. Major bug reported by Ahmed Radwan and fixed by Ahmed Radwan (mrv2 , test)
Failures because MALLOC_ARENA_MAX is not set
MAPREDUCE-3916. Critical bug reported by Roman Shaposhnik and fixed by Devaraj K (mrv2 , resourcemanager , webapps)
various issues with running yarn proxyserver
MAPREDUCE-3909. Trivial improvement reported by Steve Loughran and fixed by Steve Loughran (mrv2)
javadoc the Service interfaces
MAPREDUCE-3885. Major improvement reported by Devaraj Das and fixed by Devaraj Das (mrv2)
Apply the fix similar to HADOOP-8084
MAPREDUCE-3883. Minor improvement reported by Eugene Koontz and fixed by Eugene Koontz (documentation , mrv2)
Document yarn.nodemanager.delete.debug-delay-sec configuration property
MAPREDUCE-3869. Blocker bug reported by Devaraj K and fixed by Devaraj K (mrv2)
Distributed shell application fails with NoClassDefFoundError
MAPREDUCE-3867. Major bug reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur (test)
MiniMRYarn/MiniYarn uses fixed ports
MAPREDUCE-3818. Blocker bug reported by Vinod Kumar Vavilapalli and fixed by Suresh Srinivas (build , test)
Trunk MRV1 compilation is broken.

Fixed broken compilation in TestSubmitJob after the patch for HDFS-2895.
MAPREDUCE-3740. Blocker bug reported by Devaraj K and fixed by Devaraj K (mrv2)
Mapreduce Trunk compilation fails
MAPREDUCE-3578. Major bug reported by Gilad Wolff and fixed by Tom White (nodemanager)
starting nodemanager as 'root' gives "Unknown -jvm option"
MAPREDUCE-3545. Major bug reported by Suresh Srinivas and fixed by Suresh Srinivas
Remove Avro RPC
MAPREDUCE-3431. Minor bug reported by Steve Loughran and fixed by Steve Loughran (resourcemanager)
NPE in Resource Manager shutdown
MAPREDUCE-3377. Major bug reported by Jane Chen and fixed by Jane Chen
Compatibility issue with 0.20.203.
MAPREDUCE-3353. Major bug reported by Vinod Kumar Vavilapalli and fixed by Bikas Saha (applicationmaster , mrv2 , resourcemanager)
Need a RM->AM channel to inform AMs about faulty/unhealthy/lost nodes
MAPREDUCE-3173. Critical bug reported by Devaraj K and fixed by Devaraj K (mrv2)
MRV2 UI doesn't work properly without internet
MAPREDUCE-2942. Critical bug reported by Vinod Kumar Vavilapalli and fixed by Thomas Graves
TestNMAuditLogger.testNMAuditLoggerWithIP failing
MAPREDUCE-2934. Major improvement reported by Aaron T. Myers and fixed by Aaron T. Myers (mrv2)
MR portion of HADOOP-7607 - Simplify the RPC proxy cleanup process
MAPREDUCE-2887. Major improvement reported by Sanjay Radia and fixed by Sanjay Radia
MR changes to match HADOOP-7524 (multiple RPC protocols)
HDFS-3418. Minor improvement reported by Eli Collins and fixed by Eli Collins
Rename BlockWithLocationsProto datanodeIDs field to storageIDs
HDFS-3396. Minor bug reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe (fuse-dfs)
FUSE build fails on Ubuntu 12.04
HDFS-3395. Major bug reported by Aaron T. Myers and fixed by Aaron T. Myers (name-node)
NN doesn't start with HA+security enabled and HTTP address set to 0.0.0.0
HDFS-3378. Trivial improvement reported by Eli Collins and fixed by Eli Collins
Remove DFS_NAMENODE_SECONDARY_HTTPS_PORT_KEY and DEFAULT
HDFS-3376. Critical bug reported by Todd Lipcon and fixed by Todd Lipcon (hdfs client)
DFSClient fails to make connection to DN if there are many unusable cached sockets
HDFS-3375. Trivial improvement reported by Todd Lipcon and fixed by Todd Lipcon (data-node)
Put client name in DataXceiver thread name for readBlock and keepalive
HDFS-3365. Minor improvement reported by Todd Lipcon and fixed by Todd Lipcon (hdfs client)
Enable users to disable socket caching in DFS client configuration
HDFS-3363. Minor sub-task reported by John George and fixed by John George (name-node)
blockmanagement should stop using INodeFile & INodeFileUC
HDFS-3357. Critical bug reported by Todd Lipcon and fixed by Todd Lipcon (data-node)
DataXceiver reads from client socket with incorrect/no timeout
HDFS-3351. Major bug reported by Aaron T. Myers and fixed by Aaron T. Myers (name-node)
NameNode#initializeGenericKeys should always set fs.defaultFS regardless of whether HA or Federation is enabled
HDFS-3350. Major bug reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (name-node)
findbugs warning: INodeFileUnderConstruction doesn't override INodeFile.equals(Object)
HDFS-3339. Minor sub-task reported by John George and fixed by John George (name-node)
change INode to package private
HDFS-3336. Minor bug reported by Roman Shaposhnik and fixed by Roman Shaposhnik (scripts)
hdfs launcher script will be better off not special casing namenode command with regards to hadoop.security.logger
HDFS-3332. Major bug reported by amith and fixed by amith (data-node)
NullPointerException in DN when directoryscanner is trying to report bad blocks
HDFS-3330. Critical bug reported by Todd Lipcon and fixed by Todd Lipcon (name-node)
If GetImageServlet throws an Error or RTE, response has HTTP "OK" status
HDFS-3328. Minor bug reported by Uma Maheswara Rao G and fixed by Eli Collins (data-node)
NPE in DataNode.getIpcPort
HDFS-3326. Trivial bug reported by J.Andreina and fixed by Matthew Jacobs (name-node)
Append enabled log message uses the wrong variable
HDFS-3322. Major sub-task reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (hdfs client)
Update file context to use HdfsDataInputStream and HdfsDataOutputStream
HDFS-3319. Minor improvement reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (hdfs client)
DFSOutputStream should not start a thread in constructors
HDFS-3314. Major bug reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur
HttpFS operation for getHomeDirectory is incorrect
HDFS-3309. Major bug reported by Romain Rigaux and fixed by Alejandro Abdelnur
HttpFS (Hoop) chmod not supporting octal and sticky bit permissions
HDFS-3305. Major bug reported by Aaron T. Myers and fixed by Aaron T. Myers (ha , name-node)
GetImageServlet should consider SBN a valid requestor in a secure HA setup
HDFS-3303. Minor bug reported by Brandon Li and fixed by Brandon Li (name-node)
RemoteEditLogManifest doesn't need to implements Writable
HDFS-3298. Major sub-task reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (hdfs client)
Add HdfsDataOutputStream as a public API
HDFS-3294. Trivial improvement reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (data-node , name-node)
Fix indentation in NamenodeWebHdfsMethods and DatanodeWebHdfsMethods
HDFS-3286. Major bug reported by J.Andreina and fixed by Ashish Singhi (balancer)
When the threshold value for balancer is 0(zero) ,unexpected output is displayed
HDFS-3284. Minor bug reported by Todd Lipcon and fixed by Todd Lipcon (ha , security)
bootstrapStandby fails in secure cluster
HDFS-3282. Major sub-task reported by Uma Maheswara Rao G and fixed by Uma Maheswara Rao G (hdfs client)
Add HdfsDataInputStream as a public API
HDFS-3280. Critical bug reported by Todd Lipcon and fixed by Todd Lipcon (hdfs client)
DFSOutputStream.sync should not be synchronized
HDFS-3279. Minor improvement reported by Tsz Wo (Nicholas), SZE and fixed by Arpit Gupta (name-node)
One of the FSEditLog constructors should be moved to TestEditLog
HDFS-3275. Major bug reported by Vinithra Varadharajan and fixed by amith (ha , name-node)
Format command overwrites contents of non-empty shared edits dir if name dirs are empty without any prompting
HDFS-3268. Critical bug reported by Daryn Sharp and fixed by Daryn Sharp (ha , hdfs client)
Hdfs mishandles token service & incompatible with HA
HDFS-3263. Major improvement reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur
HttpFS should read HDFS config from Hadoop site.xml files
HDFS-3260. Major bug reported by Aaron T. Myers and fixed by Aaron T. Myers
TestDatanodeRegistration should set minimum DN version in addition to minimum NN version
HDFS-3259. Major improvement reported by Aaron T. Myers and fixed by Aaron T. Myers (ha , name-node)
NameNode#initializeSharedEdits should populate shared edits dir with edit log segments
HDFS-3256. Major bug reported by Aaron T. Myers and fixed by Aaron T. Myers
HDFS considers blocks under-replicated if topology script is configured with only 1 rack
HDFS-3255. Critical bug reported by Daryn Sharp and fixed by Daryn Sharp (ha , hdfs client)
HA DFS returns wrong token service
HDFS-3254. Major bug reported by Anupam Seth and fixed by Anupam Seth (fuse-dfs)
Branch-2 build broken due to wrong version number in fuse-dfs' pom.xml
HDFS-3249. Trivial improvement reported by Todd Lipcon and fixed by Todd Lipcon (name-node)
Use ToolRunner.confirmPrompt in NameNode
HDFS-3248. Minor bug reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe
bootstrapstanby repeated twice in hdfs namenode usage message
HDFS-3247. Minor improvement reported by Todd Lipcon and fixed by Todd Lipcon (ha)
Improve bootstrapStandby behavior when original NN is not active
HDFS-3244. Major improvement reported by Eli Collins and fixed by Eli Collins
Remove dead writable code from hdfs/protocol
HDFS-3240. Trivial improvement reported by Todd Lipcon and fixed by Todd Lipcon (data-node)
Drop log level of "heartbeat: ..." in BPServiceActor to DEBUG
HDFS-3238. Major improvement reported by Eli Collins and fixed by Eli Collins
ServerCommand and friends don't need to be writables
HDFS-3236. Minor bug reported by Aaron T. Myers and fixed by Aaron T. Myers (ha , name-node)
NameNode does not initialize generic conf keys when started with -initializeSharedEditsDir
HDFS-3234. Trivial bug reported by Todd Lipcon and fixed by Todd Lipcon (tools)
Accidentally left log message in GetConf after HDFS-3226
HDFS-3226. Major improvement reported by Todd Lipcon and fixed by Todd Lipcon (tools)
Allow GetConf tool to print arbitrary keys
HDFS-3222. Major bug reported by Uma Maheswara Rao G and fixed by Uma Maheswara Rao G (hdfs client)
DFSInputStream#openInfo should not silently get the length as 0 when locations length is zero for last partial block.
HDFS-3214. Blocker bug reported by Todd Lipcon and fixed by Todd Lipcon (data-node)
InterDatanodeProtocolServerSideTranslatorPB doesn't handle null response from initReplicaRecovery
HDFS-3211. Major sub-task reported by Suresh Srinivas and fixed by Suresh Srinivas (ha , name-node)
JournalProtocol changes required for introducing epoch and fencing
HDFS-3210. Major bug reported by Eli Collins and fixed by Eli Collins
JsonUtil#toJsonMap for for a DatanodeInfo should use "ipAddr" instead of "name"
HDFS-3208. Major bug reported by Eli Collins and fixed by Eli Collins (name-node)
Bogus entries in hosts files are incorrectly displayed in the report
HDFS-3204. Major improvement reported by Suresh Srinivas and fixed by Suresh Srinivas (name-node)
Minor modification to JournalProtocol.proto to make it generic
HDFS-3202. Major bug reported by Aaron T. Myers and fixed by Aaron T. Myers (data-node)
NamespaceInfo PB translation drops build version
HDFS-3199. Major bug reported by Eli Collins and fixed by Todd Lipcon
TestValidateConfigurationSettings is failing
HDFS-3187. Minor sub-task reported by Todd Lipcon and fixed by Todd Lipcon (build)
Upgrade guava to 11.0.2
HDFS-3181. Minor bug reported by Colin Patrick McCabe and fixed by Tsz Wo (Nicholas), SZE (test)
testHardLeaseRecoveryAfterNameNodeRestart fails when length before restart is 1 byte less than CRC chunk size
HDFS-3179. Major improvement reported by Zhanwei.Wang and fixed by Tsz Wo (Nicholas), SZE (hdfs client)
Improve the error message: DataStreamer throw an exception, "nodes.length != original.length + 1" on single datanode cluster
HDFS-3172. Trivial improvement reported by Eli Collins and fixed by Eli Collins (name-node)
dfs.upgrade.permission is dead code
HDFS-3171. Major improvement reported by Eli Collins and fixed by Eli Collins (data-node)
The DatanodeID "name" field is overloaded
HDFS-3169. Minor improvement reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe (test)
TestFsck should test multiple -move operations in a row
HDFS-3167. Minor new feature reported by Henry Robinson and fixed by Henry Robinson (test)
CLI-based driver for MiniDFSCluster
HDFS-3164. Major improvement reported by Eli Collins and fixed by Eli Collins (data-node)
Move DatanodeInfo#hostName to DatanodeID

This change modifies DatanodeID, which is part of the client to server protocol, therefore clients must be upgraded with servers.
HDFS-3160. Major bug reported by Roman Shaposhnik and fixed by Roman Shaposhnik (scripts)
httpfs should exec catalina instead of forking it
HDFS-3158. Major improvement reported by Aaron T. Myers and fixed by Aaron T. Myers (name-node)
LiveNodes member of NameNodeMXBean should list non-DFS used space and capacity per DN
HDFS-3156. Major bug reported by Aaron T. Myers and fixed by Aaron T. Myers (test)
TestDFSHAAdmin is failing post HADOOP-8202
HDFS-3155. Major sub-task reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (data-node)
Clean up FSDataset implemenation related code.
HDFS-3148. Major new feature reported by Eli Collins and fixed by Eli Collins (hdfs client , performance)
The client should be able to use multiple local interfaces for data transfer
HDFS-3144. Major improvement reported by Eli Collins and fixed by Eli Collins (data-node)
Refactor DatanodeID#getName by use

This change modifies DatanodeID, which is part of the client to server protocol, therefore clients must be upgraded with servers.
HDFS-3143. Major bug reported by Eli Collins and fixed by Arpit Gupta (test)
TestGetBlocks.testGetBlocks is failing
HDFS-3142. Blocker bug reported by Eli Collins and fixed by Brandon Li (test)
TestHDFSCLI.testAll is failing
HDFS-3139. Minor improvement reported by Eli Collins and fixed by Eli Collins (data-node)
Minor Datanode logging improvement
HDFS-3138. Major improvement reported by Eli Collins and fixed by Eli Collins
Move DatanodeInfo#ipcPort to DatanodeID

This change modifies DatanodeID, which is part of the client to server protocol, therefore clients must be upgraded with servers.
HDFS-3137. Major improvement reported by Eli Collins and fixed by Eli Collins (name-node)
Bump LAST_UPGRADABLE_LAYOUT_VERSION to -16

Upgrade from Hadoop versions earlier than 0.18 is not supported as of 2.0. To upgrade from an earlier release, first upgrade to 0.18, and then upgrade again from there.
HDFS-3132. Minor bug reported by Todd Lipcon and fixed by Todd Lipcon (name-node)
Findbugs warning on HDFS trunk
HDFS-3130. Major sub-task reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (data-node)
Move FSDataset implemenation to a package
HDFS-3129. Minor test reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe
NetworkTopology: add test that getLeaf should check for invalid topologies
HDFS-3121. Major bug reported by John George and fixed by John George
hdfs tests for HADOOP-8014
HDFS-3120. Major improvement reported by Eli Collins and fixed by Eli Collins
Enable hsync and hflush by default
HDFS-3119. Minor bug reported by J.Andreina and fixed by Ashish Singhi (name-node)
Overreplicated block is not deleted even after the replication factor is reduced after sync follwed by closing that file
HDFS-3111. Trivial task reported by Todd Lipcon and fixed by Uma Maheswara Rao G
Missing license headers in trunk
HDFS-3109. Major bug reported by Ravi Prakash and fixed by Ravi Prakash
Remove hsqldb exclusions from pom.xml
HDFS-3105. Major sub-task reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (data-node , hdfs client)
Add DatanodeStorage information to block recovery
HDFS-3102. Major new feature reported by Todd Lipcon and fixed by Aaron T. Myers (ha , name-node)
Add CLI tool to initialize the shared-edits dir
HDFS-3101. Major bug reported by Zhanwei.Wang and fixed by Tsz Wo (Nicholas), SZE (webhdfs)
cannot read empty file using webhdfs
HDFS-3100. Major bug reported by Zhanwei.Wang and fixed by Brandon Li (data-node)
failed to append data
HDFS-3099. Major bug reported by Aaron T. Myers and fixed by Aaron T. Myers (name-node)
SecondaryNameNode does not properly initialize metrics system
HDFS-3094. Major improvement reported by Arpit Gupta and fixed by Arpit Gupta
add -nonInteractive and -force option to namenode -format command

The 'namenode -format' command now supports the flags '-nonInteractive' and '-force' to improve usefulness without user input.
HDFS-3093. Critical bug reported by Todd Lipcon and fixed by Todd Lipcon
TestAllowFormat is trying to be interactive
HDFS-3091. Major improvement reported by Uma Maheswara Rao G and fixed by Tsz Wo (Nicholas), SZE (data-node , hdfs client , name-node)
Update the usage limitations of ReplaceDatanodeOnFailure policy in the config description for the smaller clusters.
HDFS-3089. Major sub-task reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (data-node)
Move FSDatasetInterface and other related classes/interfaces to a package
HDFS-3088. Major sub-task reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (data-node)
Move FSDatasetInterface inner classes to a package
HDFS-3086. Major sub-task reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (data-node)
Change Datanode not to send storage list in registration - it will be sent in block report
HDFS-3084. Major improvement reported by Philip Zeyliger and fixed by Todd Lipcon (ha)
FenceMethod.tryFence() and ShellCommandFencer should pass namenodeId as well as host:port
HDFS-3083. Critical bug reported by Mingjie Lai and fixed by Aaron T. Myers (ha , security)
Cannot run an MR job with HA and security enabled when second-listed NN active
HDFS-3082. Major sub-task reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (data-node)
Clean up FSDatasetInterface
HDFS-3071. Major improvement reported by Philip Zeyliger and fixed by Todd Lipcon (ha)
haadmin failover command does not provide enough detail for when target NN is not ready to be active
HDFS-3070. Major bug reported by Stephen Chu and fixed by Aaron T. Myers (balancer)
HDFS balancer doesn't ensure that hdfs-site.xml is loaded
HDFS-3066. Major improvement reported by Patrick Hunt and fixed by Patrick Hunt (scripts)
cap space usage of default log4j rolling policy (hdfs specific changes)
HDFS-3062. Critical bug reported by Mingjie Lai and fixed by Mingjie Lai (ha , security)
Fail to submit mapred job on a secured-HA-HDFS: logic URI cannot be picked up by job submission.
HDFS-3057. Major bug reported by Roman Shaposhnik and fixed by Roman Shaposhnik (scripts)
httpfs and hdfs launcher scripts should honor CATALINA_HOME and HADOOP_LIBEXEC_DIR
HDFS-3056. Major improvement reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (data-node)
Add an interface for DataBlockScanner logging
HDFS-3050. Minor improvement reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe (name-node)
rework OEV to share more code with the NameNode
HDFS-3044. Major improvement reported by Eli Collins and fixed by Colin Patrick McCabe (name-node)
fsck move should be non-destructive by default

The fsck "move" option is no longer destructive. It copies the accessible blocks of corrupt files to lost and found as before, but no longer deletes the corrupt files after copying the blocks. The original, destructive behavior can be enabled by specifying both the "move" and "delete" options.
HDFS-3038. Trivial bug reported by Todd Lipcon and fixed by Todd Lipcon
Add FSEditLog.metrics to findbugs exclude list
HDFS-3036. Trivial improvement reported by Aaron T. Myers and fixed by Aaron T. Myers (name-node)
Remove unused method DFSUtil#isDefaultNamenodeAddress
HDFS-3032. Major bug reported by Kihwal Lee and fixed by Kihwal Lee (hdfs client)
Lease renewer tries forever even if renewal is not possible
HDFS-3030. Major improvement reported by Jitendra Nath Pandey and fixed by Jitendra Nath Pandey
Remove getProtocolVersion and getProtocolSignature from translators
HDFS-3026. Major bug reported by Aaron T. Myers and fixed by Aaron T. Myers (ha , name-node)
HA: Handle failure during HA state transition
HDFS-3024. Minor improvement reported by Todd Lipcon and fixed by Todd Lipcon (name-node)
Improve performance of stringification in addStoredBlock
HDFS-3021. Major improvement reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (data-node)
Use generic type to declare FSDatasetInterface
HDFS-3020. Critical bug reported by Todd Lipcon and fixed by Todd Lipcon (name-node)
Auto-logSync based on edit log buffer size broken
HDFS-3014. Major improvement reported by Sho Shimauchi and fixed by Sho Shimauchi (name-node)
FSEditLogOp and its subclasses should have toString() method
HDFS-3005. Major bug reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (data-node)
ConcurrentModificationException in FSDataset$FSVolume.getDfsUsed(..)
HDFS-3004. Major new feature reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe (tools)
Implement Recovery Mode

This is a new feature. It is documented in hdfs_user_guide.xml.
HDFS-3003. Trivial improvement reported by Brandon Li and fixed by Brandon Li (name-node)
Remove getHostPortString() from NameNode, replace it with NetUtils.getHostPortString()
HDFS-3000. Major new feature reported by Aaron T. Myers and fixed by Aaron T. Myers (hdfs client)
Add a public API for setting quotas
HDFS-2995. Major bug reported by Todd Lipcon and fixed by Eli Collins (scripts)
start-dfs.sh should only start the 2NN for namenodes with dfs.namenode.secondary.http-address configured
HDFS-2983. Major improvement reported by Eli Collins and fixed by Aaron T. Myers
Relax the build version check to permit rolling upgrades within a release
HDFS-2968. Blocker bug reported by Todd Lipcon and fixed by Todd Lipcon (data-node , name-node)
Protocol translator for BlockRecoveryCommand broken when multiple blocks need recovery
HDFS-2941. Major new feature reported by Aaron T. Myers and fixed by Aaron T. Myers (hdfs client , name-node)
Add an administrative command to download a copy of the fsimage from the NN
HDFS-2899. Major sub-task reported by Suresh Srinivas and fixed by Suresh Srinivas
Service protocol change to support multiple storages added in HDFS-2880
HDFS-2895. Major improvement reported by Suresh Srinivas and fixed by Suresh Srinivas (data-node , name-node)
Remove Writable wire protocol related code that is no longer necessary
HDFS-2880. Major sub-task reported by Suresh Srinivas and fixed by Suresh Srinivas (data-node , name-node)
Protocol buffer changes to add support multiple storages
HDFS-2878. Blocker bug reported by Eli Collins and fixed by Todd Lipcon (test)
TestBlockRecovery does not compile
HDFS-2815. Critical bug reported by Uma Maheswara Rao G and fixed by Uma Maheswara Rao G (name-node)
Namenode is not coming out of safemode when we perform ( NN crash + restart ) . Also FSCK report shows blocks missed.
HDFS-2801. Major sub-task reported by Jitendra Nath Pandey and fixed by Jitendra Nath Pandey
Provide a method in client side translators to check for a methods supported in underlying protocol.
HDFS-2799. Major bug reported by Eli Collins and fixed by amith (name-node)
Trim fs.checkpoint.dir values
HDFS-2768. Major bug reported by Uma Maheswara Rao G and fixed by Uma Maheswara Rao G (name-node)
BackupNode stop can not close proxy connections because it is not a proxy instance.
HDFS-2765. Major bug reported by Aaron T. Myers and fixed by Aaron T. Myers (test)
TestNameEditsConfigs is incorrectly swallowing IOE
HDFS-2739. Critical bug reported by Sho Shimauchi and fixed by Jitendra Nath Pandey
SecondaryNameNode doesn't start up
HDFS-2731. Major new feature reported by Aaron T. Myers and fixed by Todd Lipcon (ha)
HA: Autopopulate standby name dirs if they're empty

The HA NameNode may now be started with the "-bootstrapStandby" flag. This causes it to copy the namespace information and most recent checkpoint from its HA pair, and save it to local storage, allowing an HA setup to be bootstrapped without use of rsync or external tools.
HDFS-2708. Minor improvement reported by Eli Collins and fixed by Aaron T. Myers (data-node , name-node)
Stats for the # of blocks per DN
HDFS-2700. Major bug reported by Uma Maheswara Rao G and fixed by Uma Maheswara Rao G
TestDataNodeMultipleRegistrations is failing in trunk
HDFS-2697. Major sub-task reported by Suresh Srinivas and fixed by Jitendra Nath Pandey
Move RefreshAuthPolicy, RefreshUserMappings, GetUserMappings protocol to protocol buffers
HDFS-2696. Major bug reported by Petru Dimulescu and fixed by Bruno Mahé (build , fuse-dfs)
Fix the fuse-fds build
HDFS-2694. Major bug reported by Aaron T. Myers and fixed by Aaron T. Myers (name-node)
Removal of Avro broke non-PB NN services
HDFS-2687. Major sub-task reported by Uma Maheswara Rao G and fixed by Suresh Srinivas (test)
Tests are failing with ClassCastException, due to new protocol changes
HDFS-2676. Major bug reported by Suresh Srinivas and fixed by Suresh Srinivas
Remove Avro RPC
HDFS-2669. Major sub-task reported by Sanjay Radia and fixed by Sanjay Radia
Enable protobuf rpc for ClientNamenodeProtocol
HDFS-2666. Major sub-task reported by Suresh Srinivas and fixed by Suresh Srinivas (test)
TestBackupNode fails
HDFS-2663. Major sub-task reported by Suresh Srinivas and fixed by Suresh Srinivas
Optional parameters are not handled correctly
HDFS-2661. Major sub-task reported by Jitendra Nath Pandey and fixed by Jitendra Nath Pandey
Enable protobuf RPC for DatanodeProtocol
HDFS-2651. Major sub-task reported by Sanjay Radia and fixed by Sanjay Radia
ClientNameNodeProtocol Translators for Protocol Buffers
HDFS-2650. Minor improvement reported by Hari Mankude and fixed by Hari Mankude
Replace @inheritDoc with @Override
HDFS-2647. Major sub-task reported by Suresh Srinivas and fixed by Suresh Srinivas (balancer , data-node , hdfs client , name-node)
Enable protobuf RPC for InterDatanodeProtocol, ClientDatanodeProtocol, JournalProtocol and NamenodeProtocol
HDFS-2642. Major sub-task reported by Jitendra Nath Pandey and fixed by Jitendra Nath Pandey
Protobuf translators for DatanodeProtocol
HDFS-2636. Major sub-task reported by Suresh Srinivas and fixed by Suresh Srinivas
Implement protobuf service for ClientDatanodeProtocol
HDFS-2629. Major sub-task reported by Suresh Srinivas and fixed by Suresh Srinivas (data-node)
Implement protobuf service for InterDatanodeProtocol
HDFS-2618. Major sub-task reported by Suresh Srinivas and fixed by Suresh Srinivas (name-node)
Implement protobuf service for NamenodeProtocol
HDFS-2597. Major sub-task reported by Sanjay Radia and fixed by Sanjay Radia
ClientNameNodeProtocol in Protocol Buffers
HDFS-2581. Major sub-task reported by Suresh Srinivas and fixed by Suresh Srinivas (name-node)
Implement protobuf service for JournalProtocol
HDFS-2532. Critical bug reported by Todd Lipcon and fixed by Uma Maheswara Rao G (test)
TestDfsOverAvroRpc timing out in trunk
HDFS-2526. Major bug reported by Aaron T. Myers and fixed by Aaron T. Myers (hdfs client , name-node)
(Client)NamenodeProtocolTranslatorR23 do not need to keep a reference to rpcProxyWithoutRetry
HDFS-2520. Major sub-task reported by Suresh Srinivas and fixed by Suresh Srinivas (data-node)
Protobuf - Add protobuf service for InterDatanodeProtocol
HDFS-2519. Major sub-task reported by Suresh Srinivas and fixed by Suresh Srinivas (data-node , name-node)
Protobuf - Add protobuf service for DatanodeProtocol
HDFS-2518. Major sub-task reported by Suresh Srinivas and fixed by Suresh Srinivas (name-node)
Protobuf - Add protobuf service for NamenodeProtocol
HDFS-2517. Major sub-task reported by Suresh Srinivas and fixed by Suresh Srinivas (name-node)
Protobuf - Add protocol service for JournalProtocol
HDFS-2507. Major improvement reported by Todd Lipcon and fixed by Todd Lipcon (name-node)
HA: Allow saveNamespace operations to be canceled
HDFS-2505. Minor test reported by Ravi Prakash and fixed by Ravi Prakash (test)
Add a test to verify getFileChecksum works with ViewFS
HDFS-2499. Major sub-task reported by Suresh Srinivas and fixed by Suresh Srinivas (name-node)
Fix RPC client creation bug from HDFS-2459
HDFS-2497. Major bug reported by Suresh Srinivas and fixed by Suresh Srinivas
Fix TestBackupNode failure
HDFS-2496. Major improvement reported by Suresh Srinivas and fixed by Suresh Srinivas
Separate datatypes for DatanodeProtocol
HDFS-2495. Major sub-task reported by Tomasz Nykiel and fixed by Tomasz Nykiel (name-node)
Increase granularity of write operations in ReplicationMonitor thus reducing contention for write lock
HDFS-2489. Major sub-task reported by Suresh Srinivas and fixed by Suresh Srinivas
Move commands Finalize and Register out of DatanodeCommand class.
HDFS-2488. Major sub-task reported by Suresh Srinivas and fixed by Suresh Srinivas (data-node)
Separate datatypes for InterDatanodeProtocol
HDFS-2481. Major bug reported by Tsz Wo (Nicholas), SZE and fixed by Sanjay Radia
Unknown protocol: org.apache.hadoop.hdfs.protocol.ClientProtocol
HDFS-2480. Major sub-task reported by Suresh Srinivas and fixed by Suresh Srinivas
Separate datatypes for NamenodeProtocol
HDFS-2479. Major sub-task reported by Sanjay Radia and fixed by Sanjay Radia
HDFS Client Data Types in Protocol Buffers
HDFS-2477. Major sub-task reported by Tomasz Nykiel and fixed by Tomasz Nykiel (name-node)
Optimize computing the diff between a block report and the namenode state.
HDFS-2476. Major sub-task reported by Tomasz Nykiel and fixed by Tomasz Nykiel (name-node)
More CPU efficient data structure for under-replicated/over-replicated/invalidate blocks
HDFS-2459. Major sub-task reported by Suresh Srinivas and fixed by Suresh Srinivas
Separate datatypes for Journal protocol
HDFS-2430. Major new feature reported by Aaron T. Myers and fixed by Aaron T. Myers (name-node)
The number of failed or low-resource volumes the NN can tolerate should be configurable
HDFS-2413. Major improvement reported by Todd Lipcon and fixed by Harsh J (hdfs client)
Add public APIs for safemode
HDFS-2410. Minor improvement reported by Suresh Srinivas and fixed by Suresh Srinivas (data-node , name-node , test)
Further clean up hard-coded configuration keys
HDFS-2351. Major improvement reported by Sanjay Radia and fixed by Sanjay Radia
Change Namenode and Datanode to register each of their protocols seperately
HDFS-2337. Major improvement reported by Aaron T. Myers and fixed by Aaron T. Myers (hdfs client)
DFSClient shouldn't keep multiple RPC proxy references
HDFS-2334. Major sub-task reported by Ivan Kelly and fixed by Ivan Kelly (name-node)
Add Closeable to JournalManager
HDFS-2303. Major improvement reported by Roman Shaposhnik and fixed by Mingjie Lai (build , scripts)
Unbundle jsvc

To run secure Datanodes users must install jsvc for their platform and set JSVC_HOME to point to the location of jsvc in their environment.
HDFS-2223. Major improvement reported by Todd Lipcon and fixed by Todd Lipcon (name-node)
Untangle depencencies between NN components
HDFS-2181. Major sub-task reported by Sanjay Radia and fixed by Sanjay Radia
Separate HDFS Client wire protocol data types
HDFS-2158. Major sub-task reported by Jitendra Nath Pandey and fixed by Jitendra Nath Pandey
Add JournalSet to manage the set of journals.
HDFS-2038. Critical test reported by Daryn Sharp and fixed by Kihwal Lee (test)
Update test to handle relative paths with globs
HDFS-2018. Major sub-task reported by Ivan Kelly and fixed by Ivan Kelly
1073: Move all journal stream management code into one place
HDFS-1765. Major bug reported by Hairong Kuang and fixed by Uma Maheswara Rao G (name-node)
Block Replication should respect under-replication block priority
HDFS-1623. Major new feature reported by Sanjay Radia and fixed by
High Availability Framework for HDFS NN
HDFS-1580. Major improvement reported by Ivan Kelly and fixed by Jitendra Nath Pandey (name-node)
Add interface for generic Write Ahead Logging mechanisms
HDFS-891. Minor bug reported by Steve Loughran and fixed by Harsh J (data-node)
DataNode no longer needs to check for dfs.network.script
HDFS-860. Minor wish reported by Brian Bockelman and fixed by Brian Bockelman (fuse-dfs)
fuse-dfs truncate behavior causes issues with scp
HDFS-395. Major sub-task reported by dhruba borthakur and fixed by Tomasz Nykiel (data-node , name-node)
DFS Scalability: Incremental block reports
HDFS-309. Major improvement reported by Todd Lipcon and fixed by Sho Shimauchi
FSEditLog should log progress during replay
HDFS-234. Major new feature reported by Luca Telloli and fixed by Ivan Kelly
Integration with BookKeeper logging system
HDFS-208. Minor improvement reported by Allen Wittenauer and fixed by Uma Maheswara Rao G (name-node)
name node should warn if only one dir is listed in dfs.name.dir
HADOOP-8619. Major improvement reported by Radim Kolar and fixed by Chris Douglas (io)
WritableComparator must implement no-arg constructor
HADOOP-8398. Minor improvement reported by Eli Collins and fixed by Eli Collins
Cleanup BlockLocation
HADOOP-8388. Minor improvement reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe
Remove unused BlockLocation serialization
HADOOP-8366. Blocker improvement reported by Sanjay Radia and fixed by Sanjay Radia
Use ProtoBuf for RpcResponseHeader
HADOOP-8359. Trivial task reported by Harsh J and fixed by Anupam Seth (conf)
Clear up javadoc warnings in hadoop-common-project
HADOOP-8356. Major improvement reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur (fs)
FileSystem service loading mechanism should print the FileSystem impl it is failing to load
HADOOP-8355. Minor bug reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur (security)
SPNEGO filter throws/logs exception when authentication fails
HADOOP-8353. Major improvement reported by Roman Shaposhnik and fixed by Roman Shaposhnik (scripts)
hadoop-daemon.sh and yarn-daemon.sh can be misleading on stop
HADOOP-8350. Major improvement reported by Todd Lipcon and fixed by Todd Lipcon (util)
Improve NetUtils.getInputStream to return a stream which has a tunable timeout
HADOOP-8349. Major bug reported by Aaron T. Myers and fixed by Aaron T. Myers (viewfs)
ViewFS doesn't work when the root of a file system is mounted
HADOOP-8347. Major bug reported by Philip Zeyliger and fixed by Philip Zeyliger (security)
Hadoop Common logs misspell 'successful'
HADOOP-8343. Major new feature reported by Philip Zeyliger and fixed by Alejandro Abdelnur (util)
Allow configuration of authorization for JmxJsonServlet and MetricsServlet
HADOOP-8314. Major bug reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur (security)
HttpServer#hasAdminAccess should return false if authorization is enabled but user is not authenticated
HADOOP-8310. Major bug reported by Aaron T. Myers and fixed by Aaron T. Myers (fs)
FileContext#checkPath should handle URIs with no port
HADOOP-8309. Major bug reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur (security)
Pseudo & Kerberos AuthenticationHandler should use getType() to create token
HADOOP-8296. Minor bug reported by Thomas Graves and fixed by Devaraj K
hadoop/yarn daemonlog usage wrong
HADOOP-8285. Major improvement reported by Sanjay Radia and fixed by Sanjay Radia (ipc)
Use ProtoBuf for RpcPayLoadHeader
HADOOP-8282. Minor bug reported by Devaraj K and fixed by Devaraj K (scripts)
start-all.sh refers incorrectly start-dfs.sh existence for starting start-yarn.sh
HADOOP-8280. Major improvement reported by Ahmed Radwan and fixed by Ahmed Radwan (test , util)
Move VersionUtil/TestVersionUtil and GenericTestUtils from HDFS into Common.
HADOOP-8275. Minor bug reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe
Range check DelegationKey length
HADOOP-8270. Minor bug reported by Roman Shaposhnik and fixed by Roman Shaposhnik (scripts)
hadoop-daemon.sh stop action should return 0 for an already stopped service

The daemon stop action no longer returns failure when stopping an already stopped service.
HADOOP-8264. Trivial bug reported by Bernd Fondermann and fixed by Bernd Fondermann
Remove irritating double double quotes in front of hostname
HADOOP-8263. Minor bug reported by Todd Lipcon and fixed by Todd Lipcon (ipc)
Stringification of IPC calls not useful
HADOOP-8261. Major bug reported by Aaron T. Myers and fixed by Aaron T. Myers (fs)
Har file system doesn't deal with FS URIs with a host but no port
HADOOP-8251. Blocker bug reported by Todd Lipcon and fixed by Todd Lipcon (security)
SecurityUtil.fetchServiceTicket broken after HADOOP-6941
HADOOP-8243. Critical bug reported by Todd Lipcon and fixed by Todd Lipcon (ha , security)
Security support broken in CLI (manual) failover controller
HADOOP-8238. Major bug reported by Eli Collins and fixed by Eli Collins
NetUtils#getHostNameOfIP blows up if given ip:port string w/o port
HADOOP-8236. Major improvement reported by Philip Zeyliger and fixed by Todd Lipcon (ha)
haadmin should have configurable timeouts for failover commands
HADOOP-8218. Critical bug reported by Todd Lipcon and fixed by Todd Lipcon (ipc , test)
RPC.closeProxy shouldn't throw error when closing a mock
HADOOP-8214. Major improvement reported by Roman Shaposhnik and fixed by Roman Shaposhnik (scripts)
make hadoop script recognize a full set of deprecated commands
HADOOP-8211. Major sub-task reported by Eli Collins and fixed by Eli Collins (io , performance)
Update commons-net version to 3.1
HADOOP-8210. Major sub-task reported by Eli Collins and fixed by Eli Collins (io , performance)
Common side of HDFS-3148
HADOOP-8206. Major new feature reported by Todd Lipcon and fixed by Todd Lipcon (ha)
Common portion of ZK-based failover controller
HADOOP-8204. Major bug reported by Tom White and fixed by Todd Lipcon
TestHealthMonitor fails occasionally
HADOOP-8202. Minor bug reported by Hari Mankude and fixed by Hari Mankude (ipc)
stopproxy() is not closing the proxies correctly
HADOOP-8200. Minor improvement reported by Eli Collins and fixed by Eli Collins (conf)
Remove HADOOP_[JOBTRACKER|TASKTRACKER]_OPTS
HADOOP-8199. Major bug reported by Nishan Shetty and fixed by Devaraj K
Fix issues in start-all.sh and stop-all.sh
HADOOP-8193. Major improvement reported by Todd Lipcon and fixed by Todd Lipcon (ha)
Refactor FailoverController/HAAdmin code to add an abstract class for "target" services
HADOOP-8191. Major bug reported by Philip Zeyliger and fixed by Todd Lipcon (ha)
SshFenceByTcpPort uses netcat incorrectly
HADOOP-8189. Major bug reported by Jonathan Natkins and fixed by Jonathan Natkins (security)
LdapGroupsMapping shouldn't throw away IOException
HADOOP-8185. Major improvement reported by Arpit Gupta and fixed by Arpit Gupta (documentation)
Update namenode -format documentation and add -nonInteractive and -force
HADOOP-8184. Major improvement reported by Sanjay Radia and fixed by Sanjay Radia (ipc)
ProtoBuf RPC engine does not need it own reply packet - it can use the IPC layer reply packet.

This change will affect the output of errors for some Hadoop CLI commands. Specifically, the name of the exception class will no longer appear, and instead only the text of the exception message will appear.
HADOOP-8183. Minor improvement reported by Harsh J and fixed by Harsh J (util)
Stop using "mapred.used.genericoptionsparser" to avoid unnecessary warnings
HADOOP-8169. Critical bug reported by Thomas Graves and fixed by Thomas Graves (build)
javadoc generation fails with java.lang.OutOfMemoryError: Java heap space
HADOOP-8164. Major sub-task reported by Suresh Srinivas and fixed by Daryn Sharp (fs)
Handle paths using back slash as path separator for windows only

This jira only allows providing paths using back slash as separator on Windows. The back slash on *nix system will be used as escape character. The support for paths using back slash as path separator will be removed in HADOOP-8139 in release 23.3.
HADOOP-8163. Major improvement reported by Todd Lipcon and fixed by Todd Lipcon (ha)
Improve ActiveStandbyElector to provide hooks for fencing old active
HADOOP-8159. Major bug reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe
NetworkTopology: getLeaf should check for invalid topologies
HADOOP-8154. Major bug reported by Eli Collins and fixed by Eli Collins (conf)
DNS#getIPs shouldn't silently return the local host IP for bogus interface names
HADOOP-8152. Major improvement reported by Aaron T. Myers and fixed by Aaron T. Myers (security)
Expand public APIs for security library classes
HADOOP-8149. Major improvement reported by Patrick Hunt and fixed by Patrick Hunt (conf)
cap space usage of default log4j rolling policy

Hadoop log files are now rolled by size instead of date (daily) by default. Tools that depend on the log file name format will need to be updated. Users who would like to maintain the previous settings of hadoop.root.logger and hadoop.security.logger can use their current log4j.properties files and update the HADOOP_ROOT_LOGGER and HADOOP_SECURITY_LOGGER environment variables to use DRFA and DRFAS respectively.
HADOOP-8142. Major task reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (build)
Update versions from 0.23.2 to 0.23.3
HADOOP-8141. Trivial improvement reported by Todd Lipcon and fixed by Todd Lipcon (security)
Add method to init krb5 cipher suites
HADOOP-8121. Major new feature reported by Jonathan Natkins and fixed by Jonathan Natkins (security)
Active Directory Group Mapping Service
HADOOP-8119. Minor bug reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (test)
Fix javac warnings in TestAuthenticationFilter
HADOOP-8118. Minor improvement reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (metrics)
Print the stack trace of InstanceAlreadyExistsException in trace level
HADOOP-8117. Trivial improvement reported by Todd Lipcon and fixed by Todd Lipcon (build , test)
Upgrade test build to Surefire 2.12
HADOOP-8113. Trivial improvement reported by Eugene Koontz and fixed by Eugene Koontz (documentation)
Correction to BUILDING.txt: HDFS needs ProtocolBuffer, too (not just MapReduce)
HADOOP-8098. Major improvement reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur (security)
KerberosAuthenticatorHandler should use _HOST replacement to resolve principal name
HADOOP-8086. Minor improvement reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur (security)
KerberosName silently sets defaultRealm to "" if the Kerberos config is not found, it should log a WARN
HADOOP-8084. Major improvement reported by Devaraj Das and fixed by Devaraj Das (ipc)
Protobuf RPC engine can be optimized to not do copying for the RPC request/response
HADOOP-8077. Major improvement reported by Todd Lipcon and fixed by Todd Lipcon (ha)
HA: fencing method should be able to be configured on a per-NN or per-NS basis
HADOOP-8070. Major improvement reported by Todd Lipcon and fixed by Todd Lipcon (benchmarks , ipc)
Add standalone benchmark of protobuf IPC
HADOOP-8007. Major improvement reported by Aaron T. Myers and fixed by Todd Lipcon (ha)
HA: use substitution token for fencing argument
HADOOP-7994. Major sub-task reported by Jitendra Nath Pandey and fixed by Jitendra Nath Pandey
Remove getProtocolVersion and getProtocolSignature from the client side translator and server side implementation
HADOOP-7968. Minor bug reported by Todd Lipcon and fixed by Sho Shimauchi (ipc)
Errant println left in RPC.getHighestSupportedProtocol
HADOOP-7965. Major sub-task reported by Jitendra Nath Pandey and fixed by Jitendra Nath Pandey (ipc)
Support for protocol version and signature in PB
HADOOP-7957. Major improvement reported by Jitendra Nath Pandey and fixed by Jitendra Nath Pandey
Classes deriving GetGroupsBase should be able to override proxy creation.
HADOOP-7940. Major bug reported by Aaron, and fixed by Csaba Miklos (io)
method clear() in org.apache.hadoop.io.Text does not work
HADOOP-7931. Major bug reported by Aaron T. Myers and fixed by Aaron T. Myers (ipc)
o.a.h.ipc.WritableRpcEngine should have a way to force initialization
HADOOP-7920. Major bug reported by Suresh Srinivas and fixed by Suresh Srinivas (ipc)
Remove Avro RPC
HADOOP-7913. Major sub-task reported by Sanjay Radia and fixed by Sanjay Radia (ipc)
Fix bug in ProtoBufRpcEngine -
HADOOP-7900. Major bug reported by Ravi Gummadi and fixed by Ravi Gummadi (fs)
LocalDirAllocator confChanged() accesses conf.get() twice
HADOOP-7899. Major improvement reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur (build)
Generate proto java files as part of the build
HADOOP-7897. Major bug reported by Suresh Srinivas and fixed by Suresh Srinivas (ipc)
ProtobufRPCEngine client side exception mechanism is not consistent with WritableRpcEngine
HADOOP-7892. Trivial bug reported by Todd Lipcon and fixed by Todd Lipcon (ipc)
IPC logs too verbose after "RpcKind" introduction
HADOOP-7888. Major bug reported by Jason Lowe and fixed by Jason Lowe (test)
TestFailoverProxy fails intermittently on trunk
HADOOP-7876. Major new feature reported by Suresh Srinivas and fixed by Suresh Srinivas (ipc)
Allow access to BlockKey/DelegationKey endoded key for RPC over protobuf
HADOOP-7875. Major improvement reported by Suresh Srinivas and fixed by Suresh Srinivas (ipc)
Add helper class to unwrap RemoteException from ServiceException thrown on protobuf based RPC
HADOOP-7862. Major sub-task reported by Sanjay Radia and fixed by Sanjay Radia (ipc)
Move the support for multiple protocols to lower layer so that Writable, PB and Avro can all use it
HADOOP-7833. Major bug reported by John Lee and fixed by John Lee (ipc)
Inner classes of org.apache.hadoop.ipc.protobuf.HadoopRpcProtos generates findbugs warnings which results in -1 for findbugs
HADOOP-7806. Major new feature reported by Harsh J and fixed by Harsh J (util)
Support binding to sub-interfaces
HADOOP-7788. Major new feature reported by Todd Lipcon and fixed by Todd Lipcon (ha)
HA: Simple HealthMonitor class to watch an HAService
HADOOP-7776. Major sub-task reported by Sanjay Radia and fixed by Sanjay Radia (ipc)
Make the Ipc-Header in a RPC-Payload an explicit header
HADOOP-7773. Major sub-task reported by Suresh Srinivas and fixed by Suresh Srinivas (ipc)
Add support for protocol buffer based RPC engine
HADOOP-7729. Major improvement reported by Todd Lipcon and fixed by Todd Lipcon (ipc)
Send back valid HTTP response if user hits IPC port with HTTP GET
HADOOP-7717. Major improvement reported by Aaron T. Myers and fixed by Aaron T. Myers (ipc)
Move handling of concurrent client fail-overs to RetryInvocationHandler
HADOOP-7716. Minor improvement reported by Sanjay Radia and fixed by Sanjay Radia
RPC protocol registration on SS does not log the protocol name (only the class which may be different)
HADOOP-7695. Major bug reported by Aaron T. Myers and fixed by Aaron T. Myers (ipc)
RPC.stopProxy can throw unintended exception while logging error
HADOOP-7693. Major improvement reported by Doug Cutting and fixed by Doug Cutting (ipc)
fix RPC.Server#addProtocol to work in AvroRpcEngine
HADOOP-7687. Minor improvement reported by Sanjay Radia and fixed by Sanjay Radia
Make getProtocolSignature public
HADOOP-7635. Major improvement reported by Aaron T. Myers and fixed by Aaron T. Myers (ipc)
RetryInvocationHandler should release underlying resources on close
HADOOP-7621. Critical bug reported by Alejandro Abdelnur and fixed by Aaron T. Myers (security)
alfredo config should be in a file not readable by users
HADOOP-7607. Major improvement reported by Aaron T. Myers and fixed by Aaron T. Myers (ipc)
Simplify the RPC proxy cleanup process
HADOOP-7557. Major sub-task reported by Sanjay Radia and fixed by Sanjay Radia
Make IPC header be extensible
HADOOP-7549. Major improvement reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur (fs)
Use JDK ServiceLoader mechanism to find FileSystem implementations
HADOOP-7524. Major sub-task reported by Sanjay Radia and fixed by Sanjay Radia (ipc)
Change RPC to allow multiple protocols including multiple versions of the same protocol
HADOOP-7454. Major new feature reported by Aaron T. Myers and fixed by
Common side of High Availability Framework (HDFS-1623)
HADOOP-7358. Minor improvement reported by Todd Lipcon and fixed by Todd Lipcon (ipc)
Improve log levels when exceptions caught in RPC handler
HADOOP-7350. Major improvement reported by Tom White and fixed by Tom White (conf , io)
Use ServiceLoader to discover compression codec classes
HADOOP-7069. Major improvement reported by Jakob Homan and fixed by (documentation)
Replace forrest with supported framework
HADOOP-7030. Major new feature reported by Patrick Angeles and fixed by Tom White
Add TableMapping topology implementation to read host to rack mapping from a file
HADOOP-6941. Major bug reported by Stephen Watt and fixed by Devaraj Das
Support non-SUN JREs in UserGroupInformation
HADOOP-6924. Major bug reported by Stephen Watt and fixed by Devaraj Das
Build fails with non-Sun JREs due to different pathing to the operating system architecture shared libraries

Hadoop 0.23.2 Release Notes

These release notes include new developer and user-facing incompatibilities, features, and major improvements.

Changes since Hadoop 0.23.1

MAPREDUCE-4043. Blocker bug reported by Jason Lowe and fixed by Jason Lowe (mrv2 , security)
Secret keys set in Credentials are not seen by tasks
MAPREDUCE-4034. Blocker bug reported by Jason Lowe and fixed by Jason Lowe (mrv2)
Unable to view task logs on history server with mapreduce.job.acl-view-job=*
MAPREDUCE-4025. Blocker bug reported by Jason Lowe and fixed by Jason Lowe (mr-am , mrv2)
AM can crash if task attempt reports bogus progress value
MAPREDUCE-4006. Major bug reported by Jason Lowe and fixed by Siddharth Seth (jobhistoryserver , mrv2)
history server container log web UI sometimes combines stderr/stdout/syslog contents together
MAPREDUCE-4005. Major bug reported by Jason Lowe and fixed by Jason Lowe (mrv2)
AM container logs URL is broken for completed apps when log aggregation is enabled
MAPREDUCE-3982. Critical bug reported by Robert Joseph Evans and fixed by Robert Joseph Evans (mrv2)
TestEmptyJob fails with FileNotFound

Fixed FileOutputCommitter to not err out for an 'empty-job' whose tasks don't write any outputs.
MAPREDUCE-3977. Critical bug reported by Jason Lowe and fixed by Jason Lowe (mrv2 , nodemanager)
LogAggregationService leaks log aggregator objects
MAPREDUCE-3976. Major bug reported by Bikas Saha and fixed by Jason Lowe (mrv2)
TestRMContainerAllocator failing
MAPREDUCE-3975. Blocker bug reported by Eric Payne and fixed by Eric Payne (mrv2)
Default value not set for Configuration parameter mapreduce.job.local.dir

Exporting mapreduce.job.local.dir for mapreduce tasks to use as job-level shared scratch space.
MAPREDUCE-3964. Critical bug reported by Jason Lowe and fixed by Jason Lowe (mrv2 , resourcemanager)
ResourceManager does not have JVM metrics
MAPREDUCE-3961. Major bug reported by Siddharth Seth and fixed by Siddharth Seth (mrv2)
Map/ReduceSlotMillis computation incorrect
MAPREDUCE-3960. Critical bug reported by Thomas Graves and fixed by Thomas Graves (mrv2)
web proxy doesn't forward request to AM with configured hostname/IP
MAPREDUCE-3954. Blocker bug reported by Robert Joseph Evans and fixed by Robert Joseph Evans (mrv2)
Clean up passing HEAPSIZE to yarn and mapred commands.

Added new envs to separate heap size for different daemons started via bin scripts.
MAPREDUCE-3944. Blocker sub-task reported by Robert Joseph Evans and fixed by Robert Joseph Evans (mrv2)
JobHistory web services are slower then the UI and can easly overload the JH
MAPREDUCE-3931. Major bug reported by Vinod Kumar Vavilapalli and fixed by Siddharth Seth (mrv2)
MR tasks failing due to changing timestamps on Resources to download

Changed PB implementation of LocalResource to take locks so that race conditions don't fail tasks by inadvertantly changing the timestamps.
MAPREDUCE-3930. Critical bug reported by Robert Joseph Evans and fixed by Robert Joseph Evans (mrv2)
The AM page for a Reducer that has not been launched causes an NPE
MAPREDUCE-3929. Major bug reported by John George and fixed by John George (mrv2)
output of mapred -showacl is not clear
MAPREDUCE-3922. Minor improvement reported by Eugene Koontz and fixed by Hitesh Shah (build , mrv2)
Fix the potential problem compiling 32 bit binaries on a x86_64 host.

Fixed build to not compile 32bit container-executor binary by default on all platforms.
MAPREDUCE-3920. Major bug reported by Dave Thompson and fixed by Dave Thompson (nodemanager , resourcemanager)
Revise yarn default port number selection

port number changes for resourcemanager and nodemanager
MAPREDUCE-3918. Major bug reported by Jonathan Eagles and fixed by Jonathan Eagles (mrv2)
proc_historyserver no longer in command line arguments for HistoryServer
MAPREDUCE-3913. Critical bug reported by Jason Lowe and fixed by Jason Lowe (mrv2 , webapps)
RM application webpage is unresponsive after 2000 jobs
MAPREDUCE-3910. Blocker bug reported by John George and fixed by John George (mrv2)
user not allowed to submit jobs even though queue -showacls shows it allows

Fixed a bug in CapacityScheduler LeafQueue which was causing app-submission to fail.
MAPREDUCE-3904. Major bug reported by Jonathan Eagles and fixed by Jonathan Eagles (mrv2)
[NPE] Job history produced with mapreduce.cluster.acls.enabled false can not be viewed with mapreduce.cluster.acls.enabled true
MAPREDUCE-3903. Critical bug reported by Thomas Graves and fixed by Thomas Graves (mrv2)
no admin override to view jobs on mr app master and job history server
MAPREDUCE-3901. Major improvement reported by Siddharth Seth and fixed by Siddharth Seth (jobhistoryserver , mrv2)
lazy load JobHistory Task and TaskAttempt details

Modified JobHistory records in YARN to lazily load job and task reports so as to improve UI response times.
MAPREDUCE-3897. Critical bug reported by Thomas Graves and fixed by Eric Payne (mrv2)
capacity scheduler - maxActiveApplicationsPerUser calculation can be wrong
MAPREDUCE-3896. Blocker bug reported by John George and fixed by Vinod Kumar Vavilapalli (jobhistoryserver , mrv2)
pig job through oozie hangs
MAPREDUCE-3884. Critical bug reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur (mrv2)
PWD should be first in the classpath of MR tasks
MAPREDUCE-3878. Critical bug reported by Jonathan Eagles and fixed by Jonathan Eagles (mrv2)
Null user on filtered jobhistory job page
MAPREDUCE-3877. Minor test reported by Steve Loughran and fixed by Steve Loughran (mrv2)
Add a test to formalise the current state transitions of the yarn lifecycle
MAPREDUCE-3866. Minor bug reported by Vinod Kumar Vavilapalli and fixed by Vinod Kumar Vavilapalli (mrv2)
bin/yarn prints the command line unnecessarily

Fixed the bin/yarn script to not print the command line unnecessarily.
MAPREDUCE-3864. Minor improvement reported by Todd Lipcon and fixed by Todd Lipcon (documentation , security)
Fix cluster setup docs for correct SNN HTTPS parameters
MAPREDUCE-3862. Major bug reported by Jason Lowe and fixed by Jason Lowe (mrv2 , nodemanager)
Nodemanager can appear to hang on shutdown due to lingering DeletionService threads
MAPREDUCE-3852. Blocker bug reported by Thomas Graves and fixed by Thomas Graves (mrv2)
test TestLinuxResourceCalculatorPlugin failing
MAPREDUCE-3849. Major improvement reported by Daryn Sharp and fixed by Daryn Sharp (security)
Change TokenCache's reading of the binary token file
MAPREDUCE-3816. Critical bug reported by Thomas Graves and fixed by Thomas Graves (mrv2)
capacity scheduler web ui bar graphs for used capacity wrong
MAPREDUCE-3798. Major test reported by Ravi Prakash and fixed by Ravi Prakash (test)
TestJobCleanup testCustomCleanup is failing

Fixed failing TestJobCleanup.testCusomCleanup() and moved it to the maven build.
MAPREDUCE-3792. Critical bug reported by Ramya Sunil and fixed by Jason Lowe (mrv2)
job -list displays only the jobs submitted by a particular user

Fix "bin/mapred job -list" to display all jobs instead of only the jobs owned by the user.
MAPREDUCE-3790. Major bug reported by Jason Lowe and fixed by Jason Lowe (contrib/streaming , mrv2)
Broken pipe on streaming job can lead to truncated output for a successful job
MAPREDUCE-3738. Critical bug reported by Jason Lowe and fixed by Jason Lowe (mrv2 , nodemanager)
NM can hang during shutdown if AppLogAggregatorImpl thread dies unexpectedly

Committed to trunk and branch-0.23. Thanks Jason.
MAPREDUCE-3730. Minor improvement reported by Jason Lowe and fixed by Jason Lowe (mrv2 , resourcemanager)
Allow restarted NM to rejoin cluster before RM expires it

Modified RM to allow restarted NMs to be able to join the cluster without waiting for expiry.
MAPREDUCE-3706. Critical bug reported by Thomas Graves and fixed by Robert Joseph Evans (mrv2)
HTTP Circular redirect error on the job attempts page
MAPREDUCE-3687. Major bug reported by David Capwell and fixed by Ravi Prakash (mrv2)
If AM dies before it returns new tracking URL, proxy redirects to http://N/A/ and doesn't return error code
MAPREDUCE-3686. Critical bug reported by Thomas Graves and fixed by Bhallamudi Venkata Siva Kamesh (mrv2)
history server web ui - job counter values for map/reduce not shown properly

Fixed two bugs in Counters because of which web app displays zero counter values for framework counters.
MAPREDUCE-3680. Major bug reported by Thomas Graves and fixed by (mrv2)
FifoScheduler web service rest API can print out invalid JSON
MAPREDUCE-3634. Major bug reported by Vinod Kumar Vavilapalli and fixed by Vinod Kumar Vavilapalli (mrv2)
All daemons should crash instead of hanging around when their EventHandlers get exceptions

Fixed all daemons to crash instead of hanging around when their EventHandlers get exceptions.
MAPREDUCE-3614. Major bug reported by Ravi Prakash and fixed by Ravi Prakash (mrv2)
finalState UNDEFINED if AM is killed by hand

Fixed MR AM to close history file quickly and send a correct final state to the RM when it is killed.
MAPREDUCE-3583. Critical bug reported by Ted Yu and fixed by Ted Yu
ProcfsBasedProcessTree#constructProcessInfo() may throw NumberFormatException
MAPREDUCE-3497. Major bug reported by Thomas Graves and fixed by Thomas Graves (documentation , mrv2)
missing documentation for yarn cli and subcommands - similar to commands_manual.html
MAPREDUCE-3034. Critical bug reported by Vinod Kumar Vavilapalli and fixed by Devaraj K (mrv2 , nodemanager)
NM should act on a REBOOT command from RM
MAPREDUCE-3009. Major bug reported by chackaravarthy and fixed by chackaravarthy (jobhistoryserver , mrv2)
RM UI -> Applications -> Application(Job History) -> Map Tasks -> Task ID -> Node link is not working

Fixed node link on JobHistory webapp.
MAPREDUCE-2855. Major bug reported by Todd Lipcon and fixed by Siddharth Seth
ResourceBundle lookup during counter name resolution takes a lot of time

Passing a cached class-loader to ResourceBundle creator to minimize counter names lookup time.
MAPREDUCE-2793. Critical bug reported by Ramya Sunil and fixed by Bikas Saha (mrv2)
[MR-279] Maintain consistency in naming appIDs, jobIDs and attemptIDs

Corrected AppIDs, JobIDs, TaskAttemptIDs to be of correct format on the web pages.
HDFS-3853. Minor bug reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe (name-node)
Port MiniDFSCluster enableManagedDfsDirsRedundancy option to branch-2
HDFS-3104. Major test reported by Daryn Sharp and fixed by Daryn Sharp (test)
Add tests for mkdir -p
HDFS-3101. Major bug reported by Zhanwei.Wang and fixed by Tsz Wo (Nicholas), SZE (webhdfs)
cannot read empty file using webhdfs
HDFS-3098. Major test reported by Daryn Sharp and fixed by Daryn Sharp (test)
Update FsShell tests for quoted metachars
HDFS-3060. Minor test reported by Eli Collins and fixed by Eli Collins (test)
Bump TestDistributedUpgrade#testDistributedUpgrade timeout
HDFS-3012. Critical bug reported by Ramya Sunil and fixed by Robert Joseph Evans
Exception while renewing delegation token
HDFS-3008. Major bug reported by Eli Collins and fixed by Eli Collins (hdfs client)
Negative caching of local addrs doesn't work
HDFS-3006. Major bug reported by bc Wong and fixed by Tsz Wo (Nicholas), SZE (webhdfs)
Webhdfs "SETOWNER" call returns incorrect content-type
HDFS-2985. Minor improvement reported by Todd Lipcon and fixed by Todd Lipcon (name-node)
Improve logging when replicas are marked as corrupt
HDFS-2981. Major improvement reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE
The default value of dfs.client.block.write.replace-datanode-on-failure.enable should be true
HDFS-2969. Critical bug reported by Todd Lipcon and fixed by Todd Lipcon (data-node)
ExtendedBlock.equals is incorrectly implemented
HDFS-2950. Minor bug reported by Todd Lipcon and fixed by Todd Lipcon (name-node)
Secondary NN HTTPS address should be listed as a NAMESERVICE_SPECIFIC_KEY

The configuration dfs.secondary.https.port has been renamed to dfs.namenode.secondary.https-port for consistency. The old configuration is still supported via a deprecation path.
HDFS-2944. Major bug reported by Aaron T. Myers and fixed by Aaron T. Myers (hdfs client)
Typo in hdfs-default.xml causes dfs.client.block.write.replace-datanode-on-failure.enable to be mistakenly disabled
HDFS-2943. Major new feature reported by Aaron T. Myers and fixed by Aaron T. Myers (name-node)
Expose last checkpoint time and transaction stats as JMX metrics
HDFS-2938. Major bug reported by Suresh Srinivas and fixed by Hari Mankude (name-node)
Recursive delete of a large directory makes namenode unresponsive
HDFS-2931. Minor task reported by Harsh J and fixed by Harsh J (data-node)
Switch the DataNode's BlockVolumeChoosingPolicy to be a private-audience interface
HDFS-2907. Minor improvement reported by Sanjay Radia and fixed by Tsz Wo (Nicholas), SZE
Make FSDataset in Datanode Pluggable

Add a private conf property dfs.datanode.fsdataset.factory to make FSDataset in Datanode pluggable.
HDFS-2887. Major sub-task reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (data-node)
Define a FSVolume interface
HDFS-2764. Major bug reported by Aaron T. Myers and fixed by Aaron T. Myers (name-node , test)
TestBackupNode is racy
HDFS-2725. Major bug reported by Prashant Sharma and fixed by (hdfs client)
hdfs script usage information is missing the information about "dfs" command
HDFS-2506. Major improvement reported by Suresh Srinivas and fixed by Suresh Srinivas (data-node , name-node)
Umbrella jira for tracking separation of wire protocol datatypes from the implementation types
HDFS-1217. Major improvement reported by Tsz Wo (Nicholas), SZE and fixed by Laxman (name-node)
Some methods in the NameNdoe should not be public
HDFS-776. Critical bug reported by Owen O'Malley and fixed by Uma Maheswara Rao G (balancer)
Fix exception handling in Balancer
HADOOP-8176. Major bug reported by Daryn Sharp and fixed by Daryn Sharp (fs)
Disambiguate the destination of FsShell copies
HADOOP-8175. Major sub-task reported by Daryn Sharp and fixed by Daryn Sharp (fs)
Add mkdir -p flag

FsShell mkdir now accepts a -p flag. Like unix, mkdir -p will not fail if the directory already exists. Unlike unix, intermediate directories are always created, regardless of the flag, to avoid incompatibilities at this time.
HADOOP-8173. Major sub-task reported by Daryn Sharp and fixed by Daryn Sharp (fs)
FsShell needs to handle quoted metachars
HADOOP-8157. Major test reported by Eli Collins and fixed by Todd Lipcon
TestRPCCallBenchmark#testBenchmarkWithWritable fails with RTE
HADOOP-8146. Major bug reported by Daryn Sharp and fixed by Daryn Sharp (fs)
FsShell commands cannot be interrupted
HADOOP-8140. Major bug reported by arkady borkovsky and fixed by Daryn Sharp
dfs -getmerge should process its argments better
HADOOP-8137. Major bug reported by Vinod Kumar Vavilapalli and fixed by Thomas Graves (documentation)
Site side links for commands manual (MAPREDUCE-3497)
HADOOP-8131. Critical bug reported by Daryn Sharp and fixed by Daryn Sharp
FsShell put doesn't correctly handle a non-existent dir
HADOOP-8123. Critical bug reported by Jonathan Eagles and fixed by Jonathan Eagles (build)
hadoop-project invalid pom warnings prevent transitive dependency resolution
HADOOP-8083. Major bug reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur (build)
javadoc generation for some modules is not done under target/
HADOOP-8082. Major bug reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur (build)
add hadoop-client and hadoop-minicluster to the dependency-management section
HADOOP-8074. Trivial bug reported by Eli Collins and fixed by Colin Patrick McCabe (scripts)
Small bug in hadoop error message for unknown commands
HADOOP-8071. Minor improvement reported by Todd Lipcon and fixed by Todd Lipcon (ipc)
Avoid an extra packet in client code when nagling is disabled
HADOOP-8066. Major bug reported by Aaron T. Myers and fixed by Andrew Bayer (build)
The full docs build intermittently fails
HADOOP-8064. Major bug reported by Kihwal Lee and fixed by Kihwal Lee (build)
Remove unnecessary dependency on w3c.org in document processing
HADOOP-8057. Major bug reported by Vinay and fixed by Vinay (scripts)
hadoop-setup-conf.sh not working because of some extra spaces.
HADOOP-8051. Major bug reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur (documentation)
HttpFS documentation it is not wired to the generated site
HADOOP-8050. Major bug reported by Kihwal Lee and fixed by Kihwal Lee (metrics)
Deadlock in metrics
HADOOP-8048. Major improvement reported by Daryn Sharp and fixed by Daryn Sharp (util)
Allow merging of Credentials
HADOOP-8046. Minor bug reported by Steve Loughran and fixed by Steve Loughran
Revert StaticMapping semantics to the existing ones, add DNS mapping diagnostics in progress
HADOOP-8042. Critical bug reported by Kevin J. Price and fixed by Daryn Sharp (fs)
When copying a file out of HDFS, modifying it, and uploading it back into HDFS, the put fails due to a CRC mismatch
HADOOP-8036. Major bug reported by Eli Collins and fixed by Colin Patrick McCabe (fs , test)
TestViewFsTrash assumes the user's home directory is 2 levels deep
HADOOP-8035. Minor bug reported by Andrew Bayer and fixed by Andrew Bayer (build)
Hadoop Maven site is inefficient and runs phases redundantly
HADOOP-8032. Major wish reported by Ravi Prakash and fixed by Ravi Prakash (build , documentation)
mvn site:stage-deploy should be able to use the scp protocol to stage documents
HADOOP-6502. Critical bug reported by Hairong Kuang and fixed by Sharad Agarwal (util)
DistributedFileSystem#listStatus is very slow when listing a directory with a size of 1300

Hadoop 0.23.1 Release Notes

These release notes include new developer and user-facing incompatibilities, features, and major improvements.

Changes since Hadoop 0.23.0

MAPREDUCE-3858. Critical bug reported by Tom White and fixed by Tom White (mrv2)
Task attempt failure during commit results in task never completing
MAPREDUCE-3856. Critical bug reported by Eric Payne and fixed by Eric Payne (mrv2)
Instances of RunningJob class givs incorrect job tracking urls when mutiple jobs are submitted from same client jvm.
MAPREDUCE-3854. Major test reported by Tom White and fixed by Tom White (mrv2)
Reinstate environment variable tests in TestMiniMRChildTask

Fixed and reenabled tests related to MR child JVM's environmental variables in TestMiniMRChildTask.
MAPREDUCE-3846. Critical sub-task reported by Vinod Kumar Vavilapalli and fixed by Vinod Kumar Vavilapalli (mrv2)
Restarted+Recovered AM hangs in some corner cases

Addressed MR AM hanging issues during AM restart and then the recovery.
MAPREDUCE-3843. Critical bug reported by Anupam Seth and fixed by Anupam Seth (jobhistoryserver , mrv2)
Job summary log file found missing on the RM host
MAPREDUCE-3840. Blocker bug reported by Ravi Prakash and fixed by Ravi Prakash (mrv2)
JobEndNotifier doesn't use the proxyToUse during connecting
MAPREDUCE-3834. Critical bug reported by Siddharth Seth and fixed by Siddharth Seth (mr-am , mrv2)
If multiple hosts for a split belong to the same rack, the rack is added multiple times in the AM request table

Changed MR AM to not add the same rack entry multiple times into the container request table when multiple hosts for a split happen to be on the same rack
MAPREDUCE-3833. Major bug reported by Jason Lowe and fixed by Jason Lowe (mrv2)
Capacity scheduler queue refresh doesn't recompute queue capacities properly
MAPREDUCE-3828. Major bug reported by Ahmed Radwan and fixed by Siddharth Seth (mrv2)
Broken urls: AM tracking url and jobhistory url in a single node setup.
MAPREDUCE-3827. Blocker sub-task reported by Vinod Kumar Vavilapalli and fixed by Vinod Kumar Vavilapalli (mrv2 , performance)
Counters aggregation slowed down significantly after MAPREDUCE-3749
MAPREDUCE-3826. Major bug reported by Arpit Gupta and fixed by Jonathan Eagles (mrv2)
RM UI when loaded throws a message stating Data Tables warning and then the column sorting stops working
MAPREDUCE-3823. Major sub-task reported by Vinod Kumar Vavilapalli and fixed by Vinod Kumar Vavilapalli (mrv2 , performance)
Counters are getting calculated twice at job-finish and delaying clients.
MAPREDUCE-3822. Critical bug reported by Mahadev konar and fixed by Mahadev konar (mrv2)
TestJobCounters is failing intermittently on trunk and 0.23.
MAPREDUCE-3817. Major bug reported by Arpit Gupta and fixed by Arpit Gupta (mrv2)
bin/mapred command cannot run distcp and archive jobs
MAPREDUCE-3815. Critical sub-task reported by Siddharth Seth and fixed by Siddharth Seth (mrv2)
Data Locality suffers if the AM asks for containers using IPs instead of hostnames

Fixed MR AM to always use hostnames and never IPs when requesting containers so that scheduler can give off data local containers correctly.
MAPREDUCE-3814. Major bug reported by Arun C Murthy and fixed by Arun C Murthy (mrv1 , mrv2)
MR1 compile fails
MAPREDUCE-3813. Major sub-task reported by Vinod Kumar Vavilapalli and fixed by Vinod Kumar Vavilapalli (mrv2 , performance)
RackResolver should maintain a cache to avoid repetitive lookups.
MAPREDUCE-3811. Critical task reported by Siddharth Seth and fixed by Siddharth Seth (mrv2)
Make the Client-AM IPC retry count configurable
MAPREDUCE-3810. Blocker sub-task reported by Vinod Kumar Vavilapalli and fixed by Vinod Kumar Vavilapalli (mrv2 , performance)
MR AM's ContainerAllocator is assigning the allocated containers very slowly
MAPREDUCE-3809. Blocker sub-task reported by Siddharth Seth and fixed by Siddharth Seth (mrv2)
Tasks may take upto 3 seconds to exit after completion
MAPREDUCE-3808. Blocker bug reported by Siddharth Seth and fixed by Robert Joseph Evans (mrv2)
NPE in FileOutputCommitter when running a 0 reduce job

Fixed an NPE in FileOutputCommitter for jobs with maps but no reduces.
MAPREDUCE-3804. Major bug reported by Dave Thompson and fixed by Dave Thompson (jobhistoryserver , mrv2 , resourcemanager)
yarn webapp interface vulnerable to cross scripting attacks

fix cross scripting attacks vulnerability through webapp interface.
MAPREDUCE-3803. Major test reported by Ravi Prakash and fixed by Ravi Prakash (build)
HDFS-2864 broke ant compilation
MAPREDUCE-3802. Critical sub-task reported by Robert Joseph Evans and fixed by Vinod Kumar Vavilapalli (applicationmaster , mrv2)
If an MR AM dies twice it looks like the process freezes

Added test to validate that AM can crash multiple times and still can recover successfully after MAPREDUCE-3846.
MAPREDUCE-3795. Major bug reported by Vinod Kumar Vavilapalli and fixed by Vinod Kumar Vavilapalli (mrv2)
"job -status" command line output is malformed
MAPREDUCE-3794. Major bug reported by Tom White and fixed by Tom White (mrv2)
Support mapred.Task.Counter and mapred.JobInProgress.Counter enums for compatibility
MAPREDUCE-3791. Major bug reported by Roman Shaposhnik and fixed by Mahadev konar (documentation , mrv2)
can't build site in hadoop-yarn-server-common
MAPREDUCE-3787. Major improvement reported by Amar Kamat and fixed by Amar Kamat (contrib/gridmix)
[Gridmix] Improve STRESS mode

JobMonitor can now deploy multiple threads for faster job-status polling. Use 'gridmix.job-monitor.thread-count' to set the number of threads. Stress mode now relies on the updates from the job monitor instead of polling for job status. Failures in job submission now get reported to the statistics module and ultimately reported to the user via summary.
MAPREDUCE-3784. Major bug reported by Ramya Sunil and fixed by Arun C Murthy (mrv2)
maxActiveApplications(|PerUser) per queue is too low for small clusters

Fixed CapacityScheduler so that maxActiveApplication and maxActiveApplicationsPerUser per queue are not too low for small clusters.
MAPREDUCE-3780. Blocker bug reported by Ramya Sunil and fixed by Hitesh Shah (mrv2)
RM assigns containers to killed applications
MAPREDUCE-3775. Minor bug reported by Hitesh Shah and fixed by Hitesh Shah (mrv2)
Change MiniYarnCluster to escape special chars in testname
MAPREDUCE-3774. Major bug reported by Mahadev konar and fixed by Mahadev konar (mrv2)
yarn-default.xml should be moved to hadoop-yarn-common.

MAPREDUCE-3774. Moved yarn-default.xml to hadoop-yarn-common from hadoop-server-common.
MAPREDUCE-3771. Major improvement reported by Arun C Murthy and fixed by Arun C Murthy
Port MAPREDUCE-1735 to trunk/0.23
MAPREDUCE-3770. Critical bug reported by Amar Kamat and fixed by Amar Kamat (tools/rumen)
[Rumen] Zombie.getJobConf() results into NPE
MAPREDUCE-3765. Minor bug reported by Hitesh Shah and fixed by Hitesh Shah (mrv2)
FifoScheduler does not respect yarn.scheduler.fifo.minimum-allocation-mb setting
MAPREDUCE-3764. Critical bug reported by Siddharth Seth and fixed by Arun C Murthy (mrv2)
AllocatedGB etc metrics incorrect if min-allocation-mb isn't a multiple of 1GB
MAPREDUCE-3762. Critical bug reported by Mahadev konar and fixed by Mahadev konar (mrv2)
Resource Manager fails to come up with default capacity scheduler configs.
MAPREDUCE-3760. Major bug reported by Ramya Sunil and fixed by Vinod Kumar Vavilapalli (mrv2)
Blacklisted NMs should not appear in Active nodes list

Changed active nodes list to not contain unhealthy nodes on the webUI and metrics.
MAPREDUCE-3759. Major bug reported by Ramya Sunil and fixed by Vinod Kumar Vavilapalli (mrv2)
ClassCastException thrown in -list-active-trackers when there are a few unhealthy nodes
MAPREDUCE-3756. Major improvement reported by Arun C Murthy and fixed by Hitesh Shah (mrv2)
Make single shuffle limit configurable
MAPREDUCE-3754. Major bug reported by Vinod Kumar Vavilapalli and fixed by Vinod Kumar Vavilapalli (mrv2 , webapps)
RM webapp should have pages filtered based on App-state

Modified RM UI to filter applications based on state of the applications.
MAPREDUCE-3752. Blocker bug reported by Arun C Murthy and fixed by Arun C Murthy (mrv2)
Headroom should be capped by queue max-cap

Modified application limits to include queue max-capacities besides the usual user limits.
MAPREDUCE-3749. Blocker bug reported by Tom White and fixed by Tom White (mrv2)
ConcurrentModificationException in counter groups
MAPREDUCE-3748. Minor bug reported by Ramya Sunil and fixed by Ramya Sunil (mrv2)
Move CS related nodeUpdate log messages to DEBUG
MAPREDUCE-3747. Major bug reported by Ramya Sunil and fixed by Arun C Murthy (mrv2)
Memory Total is not refreshed until an app is launched
MAPREDUCE-3744. Blocker bug reported by Jason Lowe and fixed by Jason Lowe (mrv2)
Unable to retrieve application logs via "yarn logs" or "mapred job -logs"
MAPREDUCE-3742. Blocker bug reported by Jason Lowe and fixed by Jason Lowe (mrv2)
"yarn logs" command fails with ClassNotFoundException
MAPREDUCE-3737. Critical bug reported by Robert Joseph Evans and fixed by Robert Joseph Evans (mrv2)
The Web Application Proxy's is not documented very well
MAPREDUCE-3735. Blocker bug reported by Mahadev konar and fixed by Mahadev konar (mrv2)
Add distcp jar to the distribution (tar)
MAPREDUCE-3733. Major bug reported by Mahadev konar and fixed by Mahadev konar
Add Apache License Header to hadoop-distcp/pom.xml
MAPREDUCE-3732. Blocker bug reported by Arun C Murthy and fixed by Arun C Murthy (mrv2 , resourcemanager , scheduler)
CS should only use 'activeUsers with pending requests' for computing user-limits

Modified CapacityScheduler to use only users with pending requests for computing user-limits.
MAPREDUCE-3727. Critical bug reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur (security)
jobtoken location property in jobconf refers to wrong jobtoken file
MAPREDUCE-3723. Major bug reported by Bhallamudi Venkata Siva Kamesh and fixed by Bhallamudi Venkata Siva Kamesh (mrv2 , test , webapps)
TestAMWebServicesJobs & TestHSWebServicesJobs incorrectly asserting tests
MAPREDUCE-3721. Blocker bug reported by Siddharth Seth and fixed by Siddharth Seth (mrv2)
Race in shuffle can cause it to hang
MAPREDUCE-3720. Major bug reported by Vinod Kumar Vavilapalli and fixed by Vinod Kumar Vavilapalli (client , mrv2)
Command line listJobs should not visit each AM

Changed bin/mapred job -list to not print job-specific information not available at RM. Very minor incompatibility in cmd-line output, inevitable due to MRv2 architecture.
MAPREDUCE-3718. Major sub-task reported by Vinod Kumar Vavilapalli and fixed by Hitesh Shah (mrv2 , performance)
Default AM heartbeat interval should be one second
MAPREDUCE-3717. Blocker bug reported by Mahadev konar and fixed by Mahadev konar (mrv2)
JobClient test jar has missing files to run all the test programs.
MAPREDUCE-3716. Blocker bug reported by Jonathan Eagles and fixed by Jonathan Eagles (mrv2)
java.io.File.createTempFile fails in map/reduce tasks

Fixing YARN+MR to allow MR jobs to be able to use java.io.File.createTempFile to create temporary files as part of their tasks.
MAPREDUCE-3714. Blocker bug reported by Vinod Kumar Vavilapalli and fixed by Vinod Kumar Vavilapalli (mrv2 , task)
Reduce hangs in a corner case

Fixed EventFetcher and Fetcher threads to shut-down properly so that reducers don't hang in corner cases.
MAPREDUCE-3713. Blocker bug reported by Siddharth Seth and fixed by Arun C Murthy (mrv2 , resourcemanager)
Incorrect headroom reported to jobs

Fixed the way head-room is allocated to applications by CapacityScheduler so that it deducts current-usage per user and not per-application.
MAPREDUCE-3712. Blocker bug reported by Ravi Prakash and fixed by Mahadev konar (mrv2)
The mapreduce tar does not contain the hadoop-mapreduce-client-jobclient-tests.jar.
MAPREDUCE-3711. Blocker sub-task reported by Siddharth Seth and fixed by Robert Joseph Evans (mrv2)
AppMaster recovery for Medium to large jobs take long time

Fixed MR AM recovery so that only single selected task output is recovered and thus reduce the unnecessarily bloated recovery time.
MAPREDUCE-3710. Major bug reported by Siddharth Seth and fixed by Siddharth Seth (mrv1 , mrv2)
last split generated by FileInputFormat.getSplits may not have the best locality

Improved FileInputFormat to return better locality for the last split.
MAPREDUCE-3709. Major bug reported by Eli Collins and fixed by Hitesh Shah (mrv2 , test)
TestDistributedShell is failing
MAPREDUCE-3708. Major bug reported by Bhallamudi Venkata Siva Kamesh and fixed by Bhallamudi Venkata Siva Kamesh (mrv2)
Metrics: Incorrect Apps Submitted Count
MAPREDUCE-3705. Blocker bug reported by Thomas Graves and fixed by Thomas Graves (mrv2)
ant build fails on 0.23 branch
MAPREDUCE-3703. Critical bug reported by Eric Payne and fixed by Eric Payne (mrv2 , resourcemanager)
ResourceManager should provide node lists in JMX output

New JMX Bean in ResourceManager to provide list of live node managers: Hadoop:service=ResourceManager,name=RMNMInfo LiveNodeManagers
MAPREDUCE-3702. Critical bug reported by Thomas Graves and fixed by Thomas Graves (mrv2)
internal server error trying access application master via proxy with filter enabled
MAPREDUCE-3701. Major bug reported by Mahadev konar and fixed by Mahadev konar (mrv2)
Delete HadoopYarnRPC from 0.23 branch.
MAPREDUCE-3699. Major bug reported by Vinod Kumar Vavilapalli and fixed by Hitesh Shah (mrv2)
Default RPC handlers are very low for YARN servers

Increased RPC handlers for all YARN servers to reasonable values for working at scale.
MAPREDUCE-3698. Blocker sub-task reported by Siddharth Seth and fixed by Mahadev konar (mrv2)
Client cannot talk to the history server in secure mode
MAPREDUCE-3697. Blocker bug reported by John George and fixed by Mahadev konar (mrv2)
Hadoop Counters API limits Oozie's working across different hadoop versions
MAPREDUCE-3696. Blocker bug reported by John George and fixed by John George (mrv2)
MR job via oozie does not work on hadoop 23
MAPREDUCE-3693. Minor improvement reported by Roman Shaposhnik and fixed by Roman Shaposhnik (mrv2)
Add admin env to mapred-default.xml
MAPREDUCE-3692. Blocker improvement reported by Eli Collins and fixed by Eli Collins (mrv2)
yarn-resourcemanager out and log files can get big
MAPREDUCE-3691. Critical bug reported by Thomas Graves and fixed by Thomas Graves (mrv2)
webservices add support to compress response
MAPREDUCE-3689. Blocker bug reported by Thomas Graves and fixed by Thomas Graves (mrv2)
RM web UI doesn't handle newline in job name
MAPREDUCE-3684. Major bug reported by Tom White and fixed by Tom White (client)
LocalDistributedCacheManager does not shut down its thread pool
MAPREDUCE-3683. Blocker bug reported by Thomas Graves and fixed by Arun C Murthy (mrv2)
Capacity scheduler LeafQueues maximum capacity calculation issues
MAPREDUCE-3681. Critical bug reported by Thomas Graves and fixed by Arun C Murthy (mrv2)
capacity scheduler LeafQueues calculate used capacity wrong
MAPREDUCE-3679. Major improvement reported by Mahadev konar and fixed by Vinod Kumar Vavilapalli (mrv2)
AM logs and others should not automatically refresh after every 1 second.
MAPREDUCE-3669. Blocker bug reported by Thomas Graves and fixed by Mahadev konar (mrv2)
Getting a lot of PriviledgedActionException / SaslException when running a job
MAPREDUCE-3664. Minor bug reported by praveen sripati and fixed by Brandon Li (documentation)
HDFS Federation Documentation has incorrect configuration example
MAPREDUCE-3657. Minor bug reported by Jason Lowe and fixed by Jason Lowe (build , mrv2)
State machine visualize build fails
MAPREDUCE-3656. Blocker bug reported by Karam Singh and fixed by Siddharth Seth (applicationmaster , mrv2 , resourcemanager)
Sort job on 350 scale is consistently failing with latest MRV2 code

Fixed a race condition in MR AM which is failing the sort benchmark consistently.
MAPREDUCE-3652. Blocker bug reported by Thomas Graves and fixed by Thomas Graves (mrv2)
org.apache.hadoop.mapred.TestWebUIAuthorization.testWebUIAuthorization fails
MAPREDUCE-3651. Blocker bug reported by Thomas Graves and fixed by Thomas Graves (mrv2)
TestQueueManagerRefresh fails
MAPREDUCE-3649. Blocker bug reported by Mahadev konar and fixed by Ravi Prakash (mrv2)
Job End notification gives an error on calling back.
MAPREDUCE-3648. Blocker bug reported by Thomas Graves and fixed by Thomas Graves (mrv2)
TestJobConf failing
MAPREDUCE-3646. Major bug reported by Ramya Sunil and fixed by Jonathan Eagles (client , mrv2)
Remove redundant URL info from "mapred job" output
MAPREDUCE-3645. Blocker bug reported by Thomas Graves and fixed by Thomas Graves (mrv1)
TestJobHistory fails
MAPREDUCE-3641. Blocker sub-task reported by Arun C Murthy and fixed by Arun C Murthy (mrv2 , scheduler)
CapacityScheduler should be more conservative assigning off-switch requests

Making CapacityScheduler more conservative so as to assign only one off-switch container in a single scheduling iteration.
MAPREDUCE-3640. Blocker sub-task reported by Siddharth Seth and fixed by Arun C Murthy (mrv2)
AMRecovery should pick completed task form partial JobHistory files
MAPREDUCE-3639. Blocker bug reported by Siddharth Seth and fixed by Siddharth Seth (mrv2)
TokenCache likely broken for FileSystems which don't issue delegation tokens

Fixed TokenCache to work with absent FileSystem canonical service-names.
MAPREDUCE-3630. Critical task reported by Amol Kekre and fixed by Mahadev konar (mrv2)
NullPointerException running teragen

Committed to trunk and branch-0.23. Thanks Mahadev.
MAPREDUCE-3625. Critical bug reported by Arun C Murthy and fixed by Jason Lowe (mrv2)
CapacityScheduler web-ui display of queue's used capacity is broken
MAPREDUCE-3624. Major bug reported by Mahadev konar and fixed by Mahadev konar (mrv2)
bin/yarn script adds jdk tools.jar to the classpath.
MAPREDUCE-3618. Major sub-task reported by Siddharth Seth and fixed by Siddharth Seth (mrv2 , performance)
TaskHeartbeatHandler holds a global lock for all task-updates

Fixed TaskHeartbeatHandler to not hold a global lock for all task-updates.
MAPREDUCE-3617. Major bug reported by Jonathan Eagles and fixed by Jonathan Eagles (mrv2)
Remove yarn default values for resource manager and nodemanager principal
MAPREDUCE-3616. Major sub-task reported by Vinod Kumar Vavilapalli and fixed by Vinod Kumar Vavilapalli (mr-am , mrv2 , performance)
Thread pool for launching containers in MR AM not expanding as expected
MAPREDUCE-3610. Minor improvement reported by Sho Shimauchi and fixed by Sho Shimauchi
Some parts in MR use old property dfs.block.size
MAPREDUCE-3608. Major bug reported by Mahadev konar and fixed by Mahadev konar (mrv2)
MAPREDUCE-3522 commit causes compilation to fail
MAPREDUCE-3604. Blocker bug reported by Arun C Murthy and fixed by Arun C Murthy (contrib/streaming)
Streaming's check for local mode is broken
MAPREDUCE-3597. Major improvement reported by Ravi Gummadi and fixed by Ravi Gummadi (tools/rumen)
Provide a way to access other info of history file from Rumentool

Rumen now provides {{Parsed*}} objects. These objects provide extra information that are not provided by {{Logged*}} objects.
MAPREDUCE-3596. Blocker bug reported by Ravi Prakash and fixed by Vinod Kumar Vavilapalli (applicationmaster , mrv2)
Sort benchmark got hang after completion of 99% map phase
MAPREDUCE-3595. Major test reported by Tom White and fixed by Tom White (test)
Add missing TestCounters#testCounterValue test from branch 1 to 0.23
MAPREDUCE-3588. Blocker bug reported by Arun C Murthy and fixed by Arun C Murthy
bin/yarn broken after MAPREDUCE-3366
MAPREDUCE-3586. Blocker bug reported by Vinod Kumar Vavilapalli and fixed by Vinod Kumar Vavilapalli (mr-am , mrv2)
Lots of AMs hanging around in PIG testing

Modified CompositeService to avoid duplicate stop operations thereby solving race conditions in MR AM shutdown.
MAPREDUCE-3582. Major bug reported by Ahmed Radwan and fixed by Ahmed Radwan (mrv2 , test)
Move successfully passing MR1 tests to MR2 maven tree.
MAPREDUCE-3579. Major bug reported by Aaron T. Myers and fixed by Aaron T. Myers (mrv2)
ConverterUtils should not include a port in a path for a URL with no port
MAPREDUCE-3572. Critical sub-task reported by Vinod Kumar Vavilapalli and fixed by Vinod Kumar Vavilapalli (mr-am , mrv2 , performance)
MR AM's dispatcher is blocked by heartbeats to ResourceManager
MAPREDUCE-3569. Critical sub-task reported by Vinod Kumar Vavilapalli and fixed by Vinod Kumar Vavilapalli (mr-am , mrv2 , performance)
TaskAttemptListener holds a global lock for all task-updates
MAPREDUCE-3568. Critical sub-task reported by Vinod Kumar Vavilapalli and fixed by Vinod Kumar Vavilapalli (mr-am , mrv2 , performance)
Optimize Job's progress calculations in MR AM

Optimized Job's progress calculations in MR AM.
MAPREDUCE-3567. Major sub-task reported by Vinod Kumar Vavilapalli and fixed by Vinod Kumar Vavilapalli (mr-am , mrv2 , performance)
Extraneous JobConf objects in AM heap
MAPREDUCE-3566. Critical sub-task reported by Vinod Kumar Vavilapalli and fixed by Vinod Kumar Vavilapalli (mr-am , mrv2)
MR AM slows down due to repeatedly constructing ContainerLaunchContext
MAPREDUCE-3564. Blocker bug reported by Mahadev konar and fixed by Siddharth Seth (mrv2)
TestStagingCleanup and TestJobEndNotifier are failing on trunk.

Fixed failures in TestStagingCleanup and TestJobEndNotifier tests.
MAPREDUCE-3563. Major bug reported by Arun C Murthy and fixed by Arun C Murthy (mrv2)
LocalJobRunner doesn't handle Jobs using o.a.h.mapreduce.OutputCommitter
MAPREDUCE-3560. Blocker bug reported by Vinod Kumar Vavilapalli and fixed by Siddharth Seth (mrv2 , resourcemanager , test)
TestRMNodeTransitions is failing on trunk
MAPREDUCE-3557. Major bug reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur (build)
MR1 test fail to compile because of missing hadoop-archives dependency
MAPREDUCE-3553. Minor sub-task reported by Thomas Graves and fixed by Thomas Graves (mrv2)
Add support for data returned when exceptions thrown from web service apis to be in either xml or in JSON
MAPREDUCE-3549. Blocker bug reported by Thomas Graves and fixed by Thomas Graves (mrv2)
write api documentation for web service apis for RM, NM, mapreduce app master, and job history server

new files added: A hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/WebServicesIntro.apt.vm A hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/NodeManagerRest.apt.vm A hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/ResourceManagerRest.apt.vm A hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/MapredAppMasterRest.apt.vm A hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/HistoryServerRest.apt.vm The hadoop-project/src/site/site.xml is split into separate patch.
MAPREDUCE-3548. Critical sub-task reported by Thomas Graves and fixed by Thomas Graves (mrv2)
write unit tests for web services for mapreduce app master and job history server
MAPREDUCE-3547. Critical sub-task reported by Thomas Graves and fixed by Thomas Graves (mrv2)
finish unit tests for web services for RM and NM
MAPREDUCE-3544. Major bug reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur (build , tools/rumen)
gridmix build is broken, requires hadoop-archives to be added as ivy dependency
MAPREDUCE-3542. Major bug reported by Tom White and fixed by Tom White
Support "FileSystemCounter" legacy counter group name for compatibility
MAPREDUCE-3541. Blocker bug reported by Ravi Prakash and fixed by Ravi Prakash (mrv2)
Fix broken TestJobQueueClient test
MAPREDUCE-3537. Blocker bug reported by Arun C Murthy and fixed by Arun C Murthy
DefaultContainerExecutor has a race condn. with multiple concurrent containers
MAPREDUCE-3534. Blocker sub-task reported by Vinay Kumar Thota and fixed by Vinod Kumar Vavilapalli (mrv2)
Compression benchmark run-time increased by 13% in 0.23
MAPREDUCE-3532. Critical bug reported by Karam Singh and fixed by Bhallamudi Venkata Siva Kamesh (mrv2 , nodemanager)
When 0 is provided as port number in yarn.nodemanager.webapp.address, NMs webserver component picks up random port, NM keeps on Reporting 0 port to RM

Modified NM to report correct http address when an ephemeral web port is configured.
MAPREDUCE-3531. Blocker bug reported by Karam Singh and fixed by Robert Joseph Evans (mrv2 , resourcemanager , scheduler)
Sometimes java.lang.IllegalArgumentException: Invalid key to HMAC computation in NODE_UPDATE also causing RM to stop scheduling
MAPREDUCE-3530. Blocker bug reported by Karam Singh and fixed by Arun C Murthy (mrv2 , resourcemanager , scheduler)
Sometimes NODE_UPDATE to the scheduler throws an NPE causing the scheduling to stop

Fixed an NPE occuring during scheduling in the ResourceManager.
MAPREDUCE-3529. Critical bug reported by Siddharth Seth and fixed by Siddharth Seth (mrv2)
TokenCache does not cache viewfs credentials correctly
MAPREDUCE-3528. Major bug reported by Siddharth Seth and fixed by Siddharth Seth (mr-am , mrv2)
The task timeout check interval should be configurable independent of mapreduce.task.timeout

Fixed TaskHeartBeatHandler to use a new configuration for the thread loop interval separate from task-timeout configuration property.
MAPREDUCE-3527. Major bug reported by Tom White and fixed by Tom White
Fix minor API incompatibilities between 1.0 and 0.23
MAPREDUCE-3525. Blocker sub-task reported by Karam Singh and fixed by Vinod Kumar Vavilapalli (mrv2)
Shuffle benchmark is nearly 1.5x slower in 0.23
MAPREDUCE-3522. Major bug reported by Jonathan Eagles and fixed by Jonathan Eagles (mrv2)
Capacity Scheduler ACLs not inherited by default
MAPREDUCE-3521. Minor bug reported by Robert Joseph Evans and fixed by Robert Joseph Evans (mrv2)
Hadoop Streaming ignores unknown parameters
MAPREDUCE-3519. Blocker sub-task reported by Ravi Gummadi and fixed by Ravi Gummadi (mrv2 , nodemanager)
Deadlock in LocalDirsHandlerService and ShuffleHandler

Fixed a deadlock in NodeManager LocalDirectories's handling service.
MAPREDUCE-3518. Critical bug reported by Jonathan Eagles and fixed by Jonathan Eagles (client , mrv2)
mapred queue -info <queue> -showJobs throws NPE
MAPREDUCE-3513. Trivial bug reported by Mahadev konar and fixed by chackaravarthy (mrv2)
Capacity Scheduler web UI has a spelling mistake for Memory.
MAPREDUCE-3512. Blocker sub-task reported by Siddharth Seth and fixed by Siddharth Seth (mr-am , mrv2)
Batch jobHistory disk flushes

Batching JobHistory flushing to DFS so that we don't flush for every event slowing down AM.
MAPREDUCE-3511. Blocker sub-task reported by Siddharth Seth and fixed by Vinod Kumar Vavilapalli (mr-am , mrv2)
Counters occupy a good part of AM heap

Removed a multitude of cloned/duplicate counters in the AM thereby reducing the AM heap size and preventing full GCs.
MAPREDUCE-3510. Major bug reported by Jonathan Eagles and fixed by Jonathan Eagles (capacity-sched , mrv2)
Capacity Scheduler inherited ACLs not displayed by mapred queue -showacls
MAPREDUCE-3505. Major bug reported by Bruno Mahé and fixed by Ahmed Radwan (mrv2)
yarn APPLICATION_CLASSPATH needs to be overridable
MAPREDUCE-3500. Major bug reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur (mrv2)
MRJobConfig creates an LD_LIBRARY_PATH using the platform ARCH
MAPREDUCE-3499. Blocker bug reported by Alejandro Abdelnur and fixed by John George (mrv2 , test)
New MiniMR does not setup proxyuser configuration correctly, thus tests using doAs do not work
MAPREDUCE-3496. Major bug reported by Jonathan Eagles and fixed by Jonathan Eagles (mrv2)
Yarn initializes ACL operations from capacity scheduler config in a non-deterministic order
MAPREDUCE-3490. Blocker bug reported by Siddharth Seth and fixed by Sharad Agarwal (mr-am , mrv2)
RMContainerAllocator counts failed maps towards Reduce ramp up

Fixed MapReduce AM to count failed maps also towards Reduce ramp up.
MAPREDUCE-3488. Blocker bug reported by Mahadev konar and fixed by Mahadev konar (mrv2)
Streaming jobs are failing because the main class isnt set in the pom files.
MAPREDUCE-3487. Critical bug reported by Thomas Graves and fixed by Jason Lowe (mrv2)
jobhistory web ui task counters no longer links to singletakecounter page

Fixed JobHistory web-UI to display links to single task's counters' page.
MAPREDUCE-3485. Major sub-task reported by Hitesh Shah and fixed by Ravi Gummadi (mrv2)
DISKS_FAILED -101 error code should be defined in same location as ABORTED_CONTAINER_EXIT_STATUS
MAPREDUCE-3484. Major bug reported by Ravi Prakash and fixed by Ravi Prakash (mr-am , mrv2)
JobEndNotifier is getting interrupted before completing all its retries.

Fixed JobEndNotifier to not get interrupted before completing all its retries.
MAPREDUCE-3481. Major improvement reported by Amar Kamat and fixed by Amar Kamat (contrib/gridmix)
[Gridmix] Improve STRESS mode locking

Modified Gridmix STRESS mode locking structure. The submitted thread and the polling thread now run simultaneously without blocking each other.
MAPREDUCE-3479. Major bug reported by Tom White and fixed by Tom White (client)
JobClient#getJob cannot find local jobs
MAPREDUCE-3478. Minor bug reported by Andrew Bayer and fixed by Tom White (mrv2)
Cannot build against ZooKeeper 3.4.0
MAPREDUCE-3477. Major bug reported by Bruno Mahé and fixed by Jonathan Eagles (documentation , mrv2)
Hadoop site documentation cannot be built anymore on trunk and branch-0.23
MAPREDUCE-3468. Major task reported by Siddharth Seth and fixed by Siddharth Seth
Change version to 0.23.1 for ant builds on the 23 branch
MAPREDUCE-3465. Minor bug reported by Hitesh Shah and fixed by Hitesh Shah (mrv2)
org.apache.hadoop.yarn.util.TestLinuxResourceCalculatorPlugin fails on 0.23
MAPREDUCE-3464. Trivial bug reported by Dave Vronay and fixed by Dave Vronay
mapreduce jsp pages missing DOCTYPE [post-split branches]
MAPREDUCE-3463. Blocker bug reported by Karam Singh and fixed by Siddharth Seth (applicationmaster , mrv2)
Second AM fails to recover properly when first AM is killed with java.lang.IllegalArgumentException causing lost job
MAPREDUCE-3462. Blocker bug reported by Amar Kamat and fixed by Ravi Prakash (mrv2 , test)
Job submission failing in JUnit tests

Fixed failing JUnit tests in Gridmix.
MAPREDUCE-3460. Blocker bug reported by Siddharth Seth and fixed by Robert Joseph Evans (mr-am , mrv2)
MR AM can hang if containers are allocated on a node blacklisted by the AM
MAPREDUCE-3458. Major bug reported by Arun C Murthy and fixed by Devaraj K (mrv2)
Fix findbugs warnings in hadoop-examples
MAPREDUCE-3456. Blocker bug reported by Eric Payne and fixed by Eric Payne (mrv2)
$HADOOP_PREFIX/bin/yarn should set defaults for $HADOOP_*_HOME
MAPREDUCE-3454. Major bug reported by Amar Kamat and fixed by Hitesh Shah (contrib/gridmix)
[Gridmix] TestDistCacheEmulation is broken
MAPREDUCE-3453. Major bug reported by Thomas Graves and fixed by Jonathan Eagles (mrv2)
RM web ui application details page shows RM cluster about information
MAPREDUCE-3452. Major bug reported by Thomas Graves and fixed by Jonathan Eagles (mrv2)
fifoscheduler web ui page always shows 0% used for the queue
MAPREDUCE-3450. Major bug reported by Siddharth Seth and fixed by Siddharth Seth (mr-am , mrv2)
NM port info no longer available in JobHistory
MAPREDUCE-3448. Minor bug reported by Jonathan Eagles and fixed by Jonathan Eagles (mrv2)
TestCombineOutputCollector javac unchecked warning on mocked generics
MAPREDUCE-3447. Blocker bug reported by Thomas Graves and fixed by Mahadev konar (mrv2)
mapreduce examples not working
MAPREDUCE-3444. Blocker bug reported by Hitesh Shah and fixed by Hitesh Shah (mrv2)
trunk/0.23 builds broken
MAPREDUCE-3443. Blocker bug reported by Mahadev konar and fixed by Mahadev konar (mrv2)
Oozie jobs are running as oozie user even though they create the jobclient as doAs.
MAPREDUCE-3437. Blocker bug reported by Jonathan Eagles and fixed by Jonathan Eagles (build , mrv2)
Branch 23 fails to build with Failure to find org.apache.hadoop:hadoop-project:pom:0.24.0-SNAPSHOT
MAPREDUCE-3436. Major bug reported by Bruno Mahé and fixed by Ahmed Radwan (mrv2 , webapps)
JobHistory webapp address should use the host from the jobhistory address
MAPREDUCE-3434. Blocker bug reported by Hitesh Shah and fixed by Hitesh Shah (mrv2)
Nightly build broken
MAPREDUCE-3433. Major sub-task reported by Tom White and fixed by Tom White (client , mrv2)
Finding counters by legacy group name returns empty counters
MAPREDUCE-3427. Blocker bug reported by Alejandro Abdelnur and fixed by Hitesh Shah (contrib/streaming , mrv2)
streaming tests fail with MR2
MAPREDUCE-3426. Blocker sub-task reported by Hitesh Shah and fixed by Hitesh Shah (mrv2)
uber-jobs tried to write outputs into wrong dir

Fixed MR AM in uber mode to write map intermediate outputs in the correct directory to work properly in secure mode.
MAPREDUCE-3422. Major bug reported by Tom White and fixed by Jonathan Eagles (mrv2)
Counter display names are not being picked up
MAPREDUCE-3420. Major bug reported by Hitesh Shah and fixed by (mrv2)
[Umbrella ticket] Make uber jobs functional
MAPREDUCE-3417. Blocker bug reported by Thomas Graves and fixed by Jonathan Eagles (mrv2)
job access controls not working app master and job history UI's

Fixed job-access-controls to work with MR AM and JobHistoryServer web-apps.
MAPREDUCE-3415. Major improvement reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur (mrv2)
improve MiniMRYarnCluster & DistributedShell JAR resolution
MAPREDUCE-3413. Minor bug reported by Jonathan Eagles and fixed by Jonathan Eagles (mrv2)
RM web ui applications not sorted in any order by default
MAPREDUCE-3412. Major bug reported by Amar Kamat and fixed by Amar Kamat
'ant docs' is broken

Fixes 'ant docs' by removing stale references to capacity-scheduler docs.
MAPREDUCE-3411. Minor improvement reported by Jonathan Eagles and fixed by Jonathan Eagles (mrv2)
Performance Upgrade for jQuery
MAPREDUCE-3408. Major bug reported by Bruno Mahé and fixed by Bruno Mahé (mrv2 , nodemanager , resourcemanager)
yarn-daemon.sh unconditionnaly sets yarn.root.logger
MAPREDUCE-3407. Minor bug reported by Hitesh Shah and fixed by Hitesh Shah (mrv2)
Wrong jar getting used in TestMR*Jobs* for MiniMRYarnCluster

Fixed pom files to refer to the correct MR app-jar needed by the integration tests.
MAPREDUCE-3404. Critical bug reported by patrick white and fixed by Eric Payne (job submission , mrv2)
Speculative Execution: speculative map tasks launched even if -Dmapreduce.map.speculative=false

Corrected MR AM to honor speculative configuration and enable speculating either maps or reduces.
MAPREDUCE-3402. Blocker sub-task reported by Vinod Kumar Vavilapalli and fixed by Vinod Kumar Vavilapalli (applicationmaster , mrv2)
AMScalability test of Sleep job with 100K 1-sec maps regressed into running very slowly
MAPREDUCE-3399. Blocker sub-task reported by Siddharth Seth and fixed by Siddharth Seth (mrv2 , nodemanager)
ContainerLocalizer should request new resources after completing the current one

Modified ContainerLocalizer to send a heartbeat to NM immediately after downloading a resource instead of always waiting for a second.
MAPREDUCE-3398. Blocker bug reported by Siddharth Seth and fixed by Siddharth Seth (mrv2 , nodemanager)
Log Aggregation broken in Secure Mode

Fixed log aggregation to work correctly in secure mode. Contributed by Siddharth Seth.
MAPREDUCE-3392. Blocker sub-task reported by John George and fixed by John George
Cluster.getDelegationToken() throws NPE if client.getDelegationToken() returns null.

Fixed Cluster's getDelegationToken's API to return null when there isn't a supported token.
MAPREDUCE-3391. Minor bug reported by Subroto Sanyal and fixed by Subroto Sanyal (applicationmaster)
Connecting to CM is logged as Connecting to RM
MAPREDUCE-3389. Critical bug reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur (mrv2)
MRApps loads the 'mrapp-generated-classpath' file with classpath from the build machine
MAPREDUCE-3387. Critical bug reported by Robert Joseph Evans and fixed by Robert Joseph Evans (mrv2)
A tracking URL of N/A before the app master is launched breaks oozie

Fixed AM's tracking URL to always go through the proxy, even before the job started, so that it works properly with oozie throughout the job execution.
MAPREDUCE-3382. Critical bug reported by Vinod Kumar Vavilapalli and fixed by Ravi Prakash (applicationmaster , mrv2)
Network ACLs can prevent AMs to ping the Job-end notification URL

Enhanced MR AM to use a proxy to ping the job-end notification URL.
MAPREDUCE-3380. Blocker sub-task reported by Alejandro Abdelnur and fixed by Mahadev konar (mr-am , mrv2)
Token infrastructure for running clients which are not kerberos authenticated
MAPREDUCE-3379. Major bug reported by Siddharth Seth and fixed by Siddharth Seth (mrv2 , nodemanager)
LocalResourceTracker should not tracking deleted cache entries

Fixed LocalResourceTracker in NodeManager to remove deleted cache entries correctly.
MAPREDUCE-3376. Major bug reported by Robert Joseph Evans and fixed by Subroto Sanyal (mrv1 , mrv2)
Old mapred API combiner uses NULL reporter
MAPREDUCE-3375. Major task reported by Vinay Kumar Thota and fixed by Vinay Kumar Thota
Memory Emulation system tests.

Added system tests to test the memory emulation feature in Gridmix.
MAPREDUCE-3372. Major bug reported by Bruno Mahé and fixed by Bruno Mahé
HADOOP_PREFIX cannot be overriden
MAPREDUCE-3371. Minor improvement reported by Ravi Prakash and fixed by Ravi Prakash (documentation , mrv2)
Review and improve the yarn-api javadocs.
MAPREDUCE-3370. Major bug reported by Ahmed Radwan and fixed by Ahmed Radwan (mrv2 , test)
MiniMRYarnCluster uses a hard coded path location for the MapReduce application jar
MAPREDUCE-3369. Major improvement reported by Ahmed Radwan and fixed by Ahmed Radwan (mrv1 , mrv2 , test)
Migrate MR1 tests to run on MR2 using the new interfaces introduced in MAPREDUCE-3169
MAPREDUCE-3368. Critical bug reported by Ramya Sunil and fixed by Hitesh Shah (build , mrv2)
compile-mapred-test fails

Fixed ant test compilation.
MAPREDUCE-3366. Major bug reported by Eric Yang and fixed by Eric Yang (mrv2)
Mapreduce component should use consistent directory structure layout as HDFS/common
MAPREDUCE-3360. Critical improvement reported by Bhallamudi Venkata Siva Kamesh and fixed by Bhallamudi Venkata Siva Kamesh (mrv2)
Provide information about lost nodes in the UI.

Added information about lost/rebooted/decommissioned nodes on the webapps.
MAPREDUCE-3355. Blocker bug reported by Vinod Kumar Vavilapalli and fixed by Vinod Kumar Vavilapalli (applicationmaster , mrv2)
AM scheduling hangs frequently with sort job on 350 nodes

Fixed MR AM's ContainerLauncher to handle node-command timeouts correctly.
MAPREDUCE-3354. Blocker bug reported by Vinod Kumar Vavilapalli and fixed by Jonathan Eagles (jobhistoryserver , mrv2)
JobHistoryServer should be started by bin/mapred and not by bin/yarn
MAPREDUCE-3349. Blocker bug reported by Vinod Kumar Vavilapalli and fixed by Amar Kamat (mrv2)
No rack-name logged in JobHistory for unsuccessful tasks

Unsuccessful tasks now log hostname and rackname to job history.
MAPREDUCE-3346. Blocker bug reported by Karam Singh and fixed by Amar Kamat (tools/rumen)
Rumen LoggedTaskAttempt getHostName call returns hostname as null
MAPREDUCE-3345. Major bug reported by Vinod Kumar Vavilapalli and fixed by Hitesh Shah (mrv2 , resourcemanager)
Race condition in ResourceManager causing TestContainerManagerSecurity to fail sometimes

Fixed a race condition in ResourceManager that was causing TestContainerManagerSecurity to fail sometimes.
MAPREDUCE-3344. Major bug reported by Brock Noland and fixed by Brock Noland
o.a.h.mapreduce.Reducer since 0.21 blindly casts to ReduceContext.ValueIterator
MAPREDUCE-3342. Critical bug reported by Thomas Graves and fixed by Jonathan Eagles (jobhistoryserver , mrv2)
JobHistoryServer doesn't show job queue

Fixed JobHistoryServer to also show the job's queue name.
MAPREDUCE-3341. Major improvement reported by Anupam Seth and fixed by Anupam Seth (mrv2)
Enhance logging of initalized queue limit values
MAPREDUCE-3339. Blocker bug reported by Ramgopal N and fixed by Siddharth Seth (mrv2)
Job is getting hanged indefinitely,if the child processes are killed on the NM. KILL_CONTAINER eventtype is continuosly sent to the containers that are not existing

Fixed MR AM to stop considering node blacklisting after the number of nodes blacklisted crosses a threshold.
MAPREDUCE-3336. Critical bug reported by Thomas Graves and fixed by Thomas Graves (mrv2)
com.google.inject.internal.Preconditions not public api - shouldn't be using it
MAPREDUCE-3333. Blocker bug reported by Vinod Kumar Vavilapalli and fixed by Vinod Kumar Vavilapalli (applicationmaster , mrv2)
MR AM for sort-job going out of memory

Fixed bugs in ContainerLauncher of MR AppMaster due to which per-container connections to NodeManager were lingering long enough to hit the ulimits on number of processes.
MAPREDUCE-3331. Minor improvement reported by Anupam Seth and fixed by Anupam Seth (mrv2)
Improvement to single node cluster setup documentation for 0.23
MAPREDUCE-3329. Blocker bug reported by Thomas Graves and fixed by Arun C Murthy (mrv2)
capacity schedule maximum-capacity allowed to be less then capacity
MAPREDUCE-3328. Critical bug reported by Thomas Graves and fixed by Ravi Prakash (mrv2)
mapred queue -list output inconsistent and missing child queues
MAPREDUCE-3327. Critical bug reported by Thomas Graves and fixed by Anupam Seth (mrv2)
RM web ui scheduler link doesn't show correct max value for queues
MAPREDUCE-3326. Critical bug reported by Thomas Graves and fixed by Jason Lowe (mrv2)
RM web UI scheduler link not as useful as should be
MAPREDUCE-3325. Major improvement reported by Thomas Graves and fixed by Thomas Graves (mrv2)
Improvements to CapacityScheduler doc

document changes only.
MAPREDUCE-3324. Critical bug reported by Jonathan Eagles and fixed by Jonathan Eagles (jobhistoryserver , mrv2 , nodemanager)
Not All HttpServer tools links (stacks,logs,config,metrics) are accessible through all UI servers
MAPREDUCE-3312. Major bug reported by Robert Joseph Evans and fixed by Robert Joseph Evans (mrv2)
Make MR AM not send a stopContainer w/o corresponding start container

Modified MR AM to not send a stop-container request for a container that isn't launched at all.
MAPREDUCE-3299. Minor improvement reported by Siddharth Seth and fixed by Jonathan Eagles (mrv2)
Add AMInfo table to the AM job page

Added AMInfo table to the MR AM job pages to list all the job-attempts when AM restarts and recovers.
MAPREDUCE-3297. Major task reported by Siddharth Seth and fixed by Siddharth Seth (mrv2)
Move Log Related components from yarn-server-nodemanager to yarn-common

Moved log related components into yarn-common so that HistoryServer and clients can use them without depending on the yarn-server-nodemanager module.
MAPREDUCE-3291. Blocker bug reported by Ramya Sunil and fixed by Robert Joseph Evans (mrv2)
App fail to launch due to delegation token not found in cache
MAPREDUCE-3280. Major bug reported by Vinod Kumar Vavilapalli and fixed by Vinod Kumar Vavilapalli (applicationmaster , mrv2)
MR AM should not read the username from configuration

Removed the unnecessary job user-name configuration in mapred-site.xml.
MAPREDUCE-3265. Blocker improvement reported by Todd Lipcon and fixed by Arun C Murthy (mrv2)
Reduce log level on MR2 IPC construction, etc
MAPREDUCE-3251. Critical task reported by Anupam Seth and fixed by Anupam Seth (mrv2)
Network ACLs can prevent some clients to talk to MR ApplicationMaster
MAPREDUCE-3243. Major bug reported by Ramya Sunil and fixed by Jonathan Eagles (contrib/streaming , mrv2)
Invalid tracking URL for streaming jobs
MAPREDUCE-3238. Trivial improvement reported by Todd Lipcon and fixed by Todd Lipcon (mrv2)
Small cleanup in SchedulerApp
MAPREDUCE-3221. Minor sub-task reported by Hitesh Shah and fixed by Devaraj K (mrv2 , test)
ant test TestSubmitJob failing on trunk

Fixed a bug in TestSubmitJob.
MAPREDUCE-3219. Minor sub-task reported by Hitesh Shah and fixed by Hitesh Shah (mrv2 , test)
ant test TestDelegationToken failing on trunk

Reenabled and fixed bugs in the failing test TestDelegationToken.
MAPREDUCE-3217. Minor sub-task reported by Hitesh Shah and fixed by Devaraj K (mrv2 , test)
ant test TestAuditLogger fails on trunk

Reenabled and fixed bugs in the failing ant test TestAuditLogger.
MAPREDUCE-3215. Minor sub-task reported by Hitesh Shah and fixed by Hitesh Shah (mrv2)
org.apache.hadoop.mapreduce.TestNoJobSetupCleanup failing on trunk

Reneabled and fixed bugs in the failing test TestNoJobSetupCleanup.
MAPREDUCE-3194. Major bug reported by Siddharth Seth and fixed by Jason Lowe (mrv2)
"mapred mradmin" command is broken in mrv2
MAPREDUCE-3169. Major improvement reported by Todd Lipcon and fixed by Ahmed Radwan (mrv1 , mrv2 , test)
Create a new MiniMRCluster equivalent which only provides client APIs cross MR1 and MR2
MAPREDUCE-3147. Major improvement reported by Ravi Prakash and fixed by Ravi Prakash (mrv2)
Handle leaf queues with the same name properly
MAPREDUCE-3121. Blocker bug reported by Vinod Kumar Vavilapalli and fixed by Ravi Gummadi (mrv2 , nodemanager)
DFIP aka 'NodeManager should handle Disk-Failures In Place'
MAPREDUCE-3102. Major sub-task reported by Vinod Kumar Vavilapalli and fixed by Hitesh Shah (mrv2 , security)
NodeManager should fail fast with wrong configuration or permissions for LinuxContainerExecutor

Changed NodeManager to fail fast when LinuxContainerExecutor has wrong configuration or permissions.
MAPREDUCE-3045. Minor bug reported by Ramya Sunil and fixed by Jonathan Eagles (jobhistoryserver , mrv2)
Elapsed time filter on jobhistory server displays incorrect table entries
MAPREDUCE-2950. Major bug reported by Amar Kamat and fixed by Ravi Gummadi (contrib/gridmix)
[Gridmix] TestUserResolve fails in trunk

Fixes bug in TestUserResolve.
MAPREDUCE-2863. Blocker improvement reported by Arun C Murthy and fixed by Thomas Graves (mrv2 , nodemanager , resourcemanager)
Support web-services for RM & NM

Support for web-services in YARN and MR components.
MAPREDUCE-2784. Major bug reported by Amar Kamat and fixed by Amar Kamat (contrib/gridmix)
[Gridmix] TestGridmixSummary fails with NPE when run in DEBUG mode.

Fixed bugs in ExecutionSummarizer and ResourceUsageMatcher.
MAPREDUCE-2765. Major new feature reported by Mithun Radhakrishnan and fixed by Mithun Radhakrishnan (distcp , mrv2)
DistCp Rewrite

DistCpV2 added to hadoop-tools.
MAPREDUCE-2733. Major task reported by Vinay Kumar Thota and fixed by Vinay Kumar Thota
Gridmix v3 cpu emulation system tests.

Adds system tests for the CPU emulation feature in Gridmix3.
MAPREDUCE-2450. Major bug reported by Matei Zaharia and fixed by Rajesh Balamohan
Calls from running tasks to TaskTracker methods sometimes fail and incur a 60s timeout
MAPREDUCE-1744. Major bug reported by Dick King and fixed by Dick King
DistributedCache creates its own FileSytem instance when adding a file/archive to the path
MAPREDUCE-778. Major new feature reported by Hong Tang and fixed by Amar Kamat (tools/rumen)
[Rumen] Need a standalone JobHistory log anonymizer

Added an anonymizer tool to Rumen. Anonymizer takes a Rumen trace file and/or topology as input. It supports persistence and plugins to override the default behavior.
HDFS-2923. Critical bug reported by Todd Lipcon and fixed by Todd Lipcon (name-node)
Namenode IPC handler count uses the wrong configuration key
HDFS-2893. Minor bug reported by Eli Collins and fixed by Eli Collins
The start/stop scripts don't start/stop the 2NN when using the default configuration
HDFS-2889. Major bug reported by Gregory Chanan and fixed by Gregory Chanan (hdfs client)
getNumCurrentReplicas is package private but should be public on 0.23 (see HDFS-2408)
HDFS-2879. Major sub-task reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (data-node)
Change FSDataset to package private
HDFS-2869. Minor bug reported by Harsh J and fixed by Harsh J (webhdfs)
Error in Webhdfs documentation for mkdir
HDFS-2868. Minor improvement reported by Harsh J and fixed by Harsh J (data-node)
Add number of active transfer threads to the DataNode status
HDFS-2864. Major sub-task reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (data-node)
Remove redundant methods and a constant from FSDataset
HDFS-2840. Major bug reported by Eli Collins and fixed by Alejandro Abdelnur (test)
TestHostnameFilter should work with localhost or localhost.localdomain
HDFS-2837. Major bug reported by Robert Joseph Evans and fixed by Robert Joseph Evans
mvn javadoc:javadoc not seeing LimitedPrivate class
HDFS-2836. Major bug reported by Robert Joseph Evans and fixed by Robert Joseph Evans
HttpFSServer still has 2 javadoc warnings in trunk
HDFS-2835. Major bug reported by Robert Joseph Evans and fixed by Robert Joseph Evans (tools)
Fix org.apache.hadoop.hdfs.tools.GetConf$Command Findbug issue
HDFS-2827. Major bug reported by Uma Maheswara Rao G and fixed by Uma Maheswara Rao G (name-node)
Cannot save namespace after renaming a directory above a file with an open lease
HDFS-2826. Minor improvement reported by Todd Lipcon and fixed by Todd Lipcon (name-node , test)
Test case for HDFS-1476 (safemode can initialize repl queues before exiting)
HDFS-2825. Minor improvement reported by Todd Lipcon and fixed by Todd Lipcon (name-node)
Add test hook to turn off the writer preferring its local DN
HDFS-2822. Major bug reported by Todd Lipcon and fixed by Todd Lipcon (ha , name-node)
processMisReplicatedBlock incorrectly identifies under-construction blocks as under-replicated
HDFS-2818. Trivial bug reported by Todd Lipcon and fixed by Devaraj K (name-node)
dfshealth.jsp missing space between role and node name
HDFS-2817. Minor improvement reported by Todd Lipcon and fixed by Todd Lipcon (test)
Combine the two TestSafeMode test suites
HDFS-2816. Trivial bug reported by Hitesh Shah and fixed by Hitesh Shah
Fix missing license header in hadoop-hdfs-project/hadoop-hdfs-httpfs/dev-support/findbugsExcludeFile.xml
HDFS-2814. Minor improvement reported by Hitesh Shah and fixed by Hitesh Shah
NamenodeMXBean does not account for svn revision in the version information
HDFS-2810. Critical bug reported by Todd Lipcon and fixed by Todd Lipcon (hdfs client)
Leases not properly getting renewed by clients
HDFS-2803. Minor improvement reported by Jimmy Xiang and fixed by Jimmy Xiang (name-node)
Adding logging to LeaseRenewer for better lease expiration triage.
HDFS-2791. Major bug reported by Todd Lipcon and fixed by Todd Lipcon (data-node , name-node)
If block report races with closing of file, replica is incorrectly marked corrupt
HDFS-2790. Minor bug reported by Arpit Gupta and fixed by Arpit Gupta
FSNamesystem.setTimes throws exception with wrong configuration name in the message
HDFS-2788. Major improvement reported by Eli Collins and fixed by Eli Collins (data-node)
HdfsServerConstants#DN_KEEPALIVE_TIMEOUT is dead code
HDFS-2786. Major sub-task reported by Daryn Sharp and fixed by Kihwal Lee (name-node , security)
Fix host-based token incompatibilities in DFSUtil
HDFS-2785. Major sub-task reported by Daryn Sharp and fixed by Robert Joseph Evans (webhdfs)
Update webhdfs and httpfs for host-based token support
HDFS-2784. Major sub-task reported by Daryn Sharp and fixed by Kihwal Lee (hdfs client , name-node , security)
Update hftp and hdfs for host-based token support
HDFS-2761. Major improvement reported by Roman Shaposhnik and fixed by Roman Shaposhnik (build , hdfs client , scripts)
Improve Hadoop subcomponent integration in Hadoop 0.23
HDFS-2751. Major bug reported by Todd Lipcon and fixed by Todd Lipcon (data-node)
Datanode drops OS cache behind reads even for short reads
HDFS-2729. Minor improvement reported by Harsh J and fixed by Harsh J (name-node)
Update BlockManager's comments regarding the invalid block set
HDFS-2726. Major improvement reported by Michael Bieniosek and fixed by Harsh J
"Exception in createBlockOutputStream" shouldn't delete exception stack trace
HDFS-2722. Major bug reported by Harsh J and fixed by Harsh J (hdfs client)
HttpFs shouldn't be using an int for block size
HDFS-2710. Critical bug reported by Siddharth Seth and fixed by
HDFS part of MAPREDUCE-3529, HADOOP-7933
HDFS-2707. Major bug reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur (security)
HttpFS should read the hadoop-auth secret from a file instead inline from the configuration
HDFS-2706. Major bug reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (name-node)
Use configuration for blockInvalidateLimit if it is set
HDFS-2705. Major bug reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur
HttpFS server should check that upload requests have correct content-type
HDFS-2675. Trivial improvement reported by Todd Lipcon and fixed by Todd Lipcon (name-node)
Reduce verbosity when double-closing edit logs
HDFS-2658. Major bug reported by Eli Collins and fixed by Alejandro Abdelnur
HttpFS introduced 70 javadoc warnings
HDFS-2657. Major bug reported by Eli Collins and fixed by Alejandro Abdelnur
TestHttpFSServer and TestServerWebApp are failing on trunk
HDFS-2654. Major improvement reported by Eli Collins and fixed by Eli Collins (data-node)
Make BlockReaderLocal not extend RemoteBlockReader2
HDFS-2653. Major improvement reported by Eli Collins and fixed by Eli Collins (data-node)
DFSClient should cache whether addrs are non-local when short-circuiting is enabled
HDFS-2649. Major bug reported by Jason Lowe and fixed by Jason Lowe (build)
eclipse:eclipse build fails for hadoop-hdfs-httpfs
HDFS-2646. Major bug reported by Uma Maheswara Rao G and fixed by Alejandro Abdelnur
Hadoop HttpFS introduced 4 findbug warnings.
HDFS-2640. Major bug reported by Tom White and fixed by Tom White
Javadoc generation hangs
HDFS-2614. Major bug reported by Bruno Mahé and fixed by Alejandro Abdelnur (build)
hadoop dist tarball is missing hdfs headers
HDFS-2606. Critical bug reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur (webhdfs)
webhdfs client filesystem impl must set the content-type header for create/append
HDFS-2604. Minor improvement reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (webhdfs)
Add a log message to show if WebHDFS is enabled
HDFS-2596. Major bug reported by Eli Collins and fixed by Eli Collins (data-node , test)
TestDirectoryScanner doesn't test parallel scans
HDFS-2590. Major bug reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (webhdfs)
Some links in WebHDFS forrest doc do not work
HDFS-2588. Trivial bug reported by Dave Vronay and fixed by Dave Vronay (scripts)
hdfs jsp pages missing DOCTYPE [post-split branches]
HDFS-2587. Major task reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (webhdfs)
Add WebHDFS apt doc
HDFS-2575. Minor bug reported by Todd Lipcon and fixed by Todd Lipcon (test)
DFSTestUtil may create empty files
HDFS-2574. Trivial task reported by Joe Crobak and fixed by Joe Crobak (documentation)
remove references to deprecated properties in hdfs-site.xml template and hdfs-default.xml
HDFS-2572. Trivial improvement reported by Harsh J and fixed by Harsh J (data-node)
Unnecessary double-check in DN#getHostName
HDFS-2570. Trivial improvement reported by Eli Collins and fixed by Eli Collins (documentation)
Add descriptions for dfs.*.https.address in hdfs-default.xml
HDFS-2568. Trivial improvement reported by Harsh J and fixed by Harsh J (data-node)
Use a set to manage child sockets in XceiverServer
HDFS-2567. Major bug reported by Harsh J and fixed by Harsh J (name-node)
When 0 DNs are available, show a proper error when trying to browse DFS via web UI
HDFS-2566. Minor improvement reported by Todd Lipcon and fixed by Todd Lipcon (data-node)
Move BPOfferService to be a non-inner class
HDFS-2563. Major improvement reported by Todd Lipcon and fixed by Todd Lipcon (data-node)
Some cleanup in BPOfferService
HDFS-2562. Minor improvement reported by Todd Lipcon and fixed by Todd Lipcon (data-node)
Refactor DN configuration variables out of DataNode class
HDFS-2560. Major improvement reported by Todd Lipcon and fixed by Todd Lipcon (data-node)
Refactor BPOfferService to be a static inner class
HDFS-2553. Critical bug reported by Todd Lipcon and fixed by Uma Maheswara Rao G (data-node)
BlockPoolSliceScanner spinning in loop
HDFS-2552. Major task reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (webhdfs)
Add WebHdfs Forrest doc
HDFS-2545. Major bug reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (webhdfs)
Webhdfs: Support multiple namenodes in federation
HDFS-2544. Major bug reported by Bruno Mahé and fixed by Bruno Mahé (scripts)
Hadoop scripts unconditionally source "$bin"/../libexec/hadoop-config.sh.
HDFS-2543. Major bug reported by Bruno Mahé and fixed by Bruno Mahé (scripts)
HADOOP_PREFIX cannot be overriden
HDFS-2541. Major bug reported by Harsh J and fixed by Harsh J (data-node)
For a sufficiently large value of blocks, the DN Scanner may request a random number with a negative seed value.
HDFS-2536. Trivial improvement reported by Aaron T. Myers and fixed by Harsh J (name-node)
Remove unused imports
HDFS-2533. Minor improvement reported by Todd Lipcon and fixed by Todd Lipcon (data-node , performance)
Remove needless synchronization on FSDataSet.getBlockFile
HDFS-2511. Minor improvement reported by Todd Lipcon and fixed by Alejandro Abdelnur (build)
Add dev script to generate HDFS protobufs
HDFS-2502. Minor improvement reported by Eli Collins and fixed by Harsh J (documentation)
hdfs-default.xml should include dfs.name.dir.restore
HDFS-2454. Minor improvement reported by Uma Maheswara Rao G and fixed by Harsh J (data-node)
Move maxXceiverCount check to before starting the thread in dataXceiver
HDFS-2397. Major improvement reported by Todd Lipcon and fixed by Eli Collins (name-node)
Undeprecate SecondaryNameNode
HDFS-2349. Trivial improvement reported by Harsh J and fixed by Harsh J (data-node)
DN should log a WARN, not an INFO when it detects a corruption during block transfer
HDFS-2335. Major improvement reported by Eli Collins and fixed by Uma Maheswara Rao G (data-node , name-node)
DataNodeCluster and NNStorage always pull fresh entropy
HDFS-2246. Major improvement reported by Sanjay Radia and fixed by Jitendra Nath Pandey
Shortcut a local client reads to a Datanodes files directly

1. New configurations a. dfs.block.local-path-access.user is the key in datanode configuration to specify the user allowed to do short circuit read. b. dfs.client.read.shortcircuit is the key to enable short circuit read at the client side configuration. c. dfs.client.read.shortcircuit.skip.checksum is the key to bypass checksum check at the client side. 2. By default none of the above are enabled and short circuit read will not kick in. 3. If security is on, the feature can be used only for user that has kerberos credentials at the client, therefore map reduce tasks cannot benefit from it in general.
HDFS-2178. Major improvement reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur
HttpFS - a read/write Hadoop file system proxy
HDFS-2130. Major sub-task reported by Todd Lipcon and fixed by Todd Lipcon (hdfs client)
Switch default checksum to CRC32C

The default checksum algorithm used on HDFS is now CRC32C. Data from previous versions of Hadoop can still be read backwards-compatibly.
HDFS-2129. Major sub-task reported by Todd Lipcon and fixed by Todd Lipcon (hdfs client , performance)
Simplify BlockReader to not inherit from FSInputChecker

BlockReader has been reimplemented to use direct byte buffers. If you use a custom socket factory, it must generate sockets that have associated Channels.
HDFS-2080. Major improvement reported by Todd Lipcon and fixed by Todd Lipcon (hdfs client , performance)
Speed up DFS read path by lessening checksum overhead
HDFS-1314. Minor bug reported by Karim Saadah and fixed by Sho Shimauchi
dfs.blocksize accepts only absolute value

The default blocksize property 'dfs.blocksize' now accepts unit symbols to be used instead of byte length. Values such as "10k", "128m", "1g" are now OK to provide instead of just no. of bytes as was before.
HDFS-554. Minor improvement reported by Steve Loughran and fixed by Harsh J (name-node)
BlockInfo.ensureCapacity may get a speedup from System.arraycopy()
HDFS-442. Minor bug reported by Ramya Sunil and fixed by Harsh J (test)
dfsthroughput in test.jar throws NPE
HDFS-362. Major improvement reported by Tsz Wo (Nicholas), SZE and fixed by Uma Maheswara Rao G (name-node)
FSEditLog should not writes long and short as UTF8 and should not use ArrayWritable for writing non-array items
HDFS-69. Minor bug reported by Ravi Phulari and fixed by Harsh J
Improve dfsadmin command line help
HADOOP-8055. Major bug reported by Eric Charles and fixed by Harsh J (build)
Distribution tar.gz does not contain etc/hadoop/core-site.xml
HADOOP-8054. Critical bug reported by Amareshwari Sriramadasu and fixed by Daryn Sharp (fs)
NPE with FilterFileSystem
HADOOP-8052. Major bug reported by Varun Kapoor and fixed by Varun Kapoor (metrics)
Hadoop Metrics2 should emit Float.MAX_VALUE (instead of Double.MAX_VALUE) to avoid making Ganglia's gmetad core
HADOOP-8027. Minor improvement reported by Harsh J and fixed by Aaron T. Myers (metrics)
Visiting /jmx on the daemon web interfaces may print unnecessary error in logs
HADOOP-8018. Major bug reported by Matt Foley and fixed by Jonathan Eagles (build , test)
Hudson auto test for HDFS has started throwing javadoc: warning - Error fetching URL: http://java.sun.com/javase/6/docs/api/package-list
HADOOP-8015. Major improvement reported by Daryn Sharp and fixed by Daryn Sharp (fs)
ChRootFileSystem should extend FilterFileSystem
HADOOP-8013. Major bug reported by Daryn Sharp and fixed by Daryn Sharp (fs)
ViewFileSystem does not honor setVerifyChecksum
HADOOP-8012. Minor bug reported by Roman Shaposhnik and fixed by Roman Shaposhnik (scripts)
hadoop-daemon.sh and yarn-daemon.sh are trying to mkdir and chow log/pid dirs which can fail
HADOOP-8009. Critical improvement reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur (build)
Create hadoop-client and hadoop-minicluster artifacts for downstream projects

Generate integration artifacts "org.apache.hadoop:hadoop-client" and "org.apache.hadoop:hadoop-minicluster" containing all the jars needed to use Hadoop client APIs, and to run Hadoop MiniClusters, respectively. Push these artifacts to the maven repository when mvn-deploy, along with existing artifacts.
HADOOP-8006. Major bug reported by Uma Maheswara Rao G and fixed by Daryn Sharp (fs)
TestFSInputChecker is failing in trunk.
HADOOP-8002. Major bug reported by Arpit Gupta and fixed by Arpit Gupta
SecurityUtil acquired token message should be a debug rather than info
HADOOP-8001. Major bug reported by Daryn Sharp and fixed by Daryn Sharp (fs)
ChecksumFileSystem's rename doesn't correctly handle checksum files
HADOOP-8000. Critical bug reported by Arpit Gupta and fixed by Arpit Gupta
fetchdt command not available in bin/hadoop
HADOOP-7999. Critical bug reported by Jason Lowe and fixed by Jason Lowe (scripts)
"hadoop archive" fails with ClassNotFoundException
HADOOP-7998. Major bug reported by Daryn Sharp and fixed by Daryn Sharp (fs)
CheckFileSystem does not correctly honor setVerifyChecksum
HADOOP-7993. Major bug reported by Anupam Seth and fixed by Anupam Seth (conf)
Hadoop ignores old-style config options for enabling compressed output
HADOOP-7988. Major bug reported by Jitendra Nath Pandey and fixed by Jitendra Nath Pandey
Upper case in hostname part of the principals doesn't work with kerberos.
HADOOP-7987. Major improvement reported by Devaraj Das and fixed by Jitendra Nath Pandey (security)
Support setting the run-as user in unsecure mode
HADOOP-7986. Major bug reported by Mahadev konar and fixed by Mahadev konar
Add config for History Server protocol in hadoop-policy for service level authorization.

Adding config for MapReduce History Server protocol in hadoop-policy.xml for service level authorization.
HADOOP-7982. Major bug reported by Todd Lipcon and fixed by Todd Lipcon (security)
UserGroupInformation fails to login if thread's context classloader can't load HadoopLoginModule
HADOOP-7981. Major bug reported by Jonathan Eagles and fixed by Jonathan Eagles (io)
Improve documentation for org.apache.hadoop.io.compress.Decompressor.getRemaining
HADOOP-7975. Minor bug reported by Harsh J and fixed by Harsh J
Add entry to XML defaults for new LZ4 codec
HADOOP-7974. Major bug reported by Eli Collins and fixed by Harsh J (fs , test)
TestViewFsTrash incorrectly determines the user's home directory
HADOOP-7971. Blocker bug reported by Thomas Graves and fixed by Prashant Sharma
hadoop <job/queue/pipes> removed - should be added back, but deprecated
HADOOP-7964. Blocker bug reported by Kihwal Lee and fixed by Daryn Sharp (security , util)
Deadlock in class init.
HADOOP-7963. Blocker bug reported by Thomas Graves and fixed by Siddharth Seth
test failures: TestViewFileSystemWithAuthorityLocalFileSystem and TestViewFileSystemLocalFileSystem

Fix ViewFS to catch a null canonical service-name and pass tests TestViewFileSystem*
HADOOP-7949. Trivial bug reported by Eli Collins and fixed by Eli Collins (ipc)
Updated maxIdleTime default in the code to match core-default.xml
HADOOP-7948. Minor bug reported by Michajlo Matijkiw and fixed by Michajlo Matijkiw (build)
Shell scripts created by hadoop-dist/pom.xml to build tar do not properly propagate failure
HADOOP-7939. Major improvement reported by Roman Shaposhnik and fixed by Roman Shaposhnik (build , conf , documentation , scripts)
Improve Hadoop subcomponent integration in Hadoop 0.23
HADOOP-7936. Major bug reported by Eli Collins and fixed by Alejandro Abdelnur (build)
There's a Hoop README in the root dir of the tarball
HADOOP-7934. Critical improvement reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur (build)
Normalize dependencies versions across all modules
HADOOP-7933. Critical bug reported by Siddharth Seth and fixed by Siddharth Seth (viewfs)
Viewfs changes for MAPREDUCE-3529
HADOOP-7919. Trivial improvement reported by Harsh J and fixed by Harsh J (documentation)
[Doc] Remove hadoop.logfile.* properties.
HADOOP-7917. Major bug reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur (build)
compilation of protobuf files fails in windows/cygwin
HADOOP-7914. Major bug reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (build)
duplicate declaration of hadoop-hdfs test-jar
HADOOP-7912. Major bug reported by Robert Joseph Evans and fixed by Robert Joseph Evans (build)
test-patch should run eclipse:eclipse to verify that it does not break again
HADOOP-7910. Minor improvement reported by Sho Shimauchi and fixed by Sho Shimauchi (conf)
add configuration methods to handle human readable size values
HADOOP-7907. Blocker bug reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur (build)
hadoop-tools JARs are not part of the distro
HADOOP-7902. Major bug reported by Tsz Wo (Nicholas), SZE and fixed by Alejandro Abdelnur
skipping name rules setting (if already set) should be done on UGI initialization only
HADOOP-7898. Minor bug reported by Suresh Srinivas and fixed by Suresh Srinivas (security)
Fix javadoc warnings in AuthenticationToken.java
HADOOP-7890. Trivial improvement reported by Koji Noguchi and fixed by Koji Noguchi (scripts)
Redirect hadoop script's deprecation message to stderr
HADOOP-7887. Critical bug reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur (security)
KerberosAuthenticatorHandler is not setting KerberosName name rules from configuration
HADOOP-7878. Minor bug reported by Steve Loughran and fixed by Steve Loughran (util)
Regression HADOOP-7777 switch changes break HDFS tests when the isSingleSwitch() predicate is used
HADOOP-7877. Major task reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (documentation)
Federation: update Balancer documentation
HADOOP-7874. Major bug reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur (build)
native libs should be under lib/native/ dir
HADOOP-7870. Major bug reported by Jonathan Hsieh and fixed by Jonathan Hsieh
fix SequenceFile#createWriter with boolean createParent arg to respect createParent.
HADOOP-7864. Major bug reported by Andrew Bayer and fixed by Andrew Bayer (build)
Building mvn site with Maven < 3.0.2 causes OOM errors
HADOOP-7859. Major bug reported by Eli Collins and fixed by Eli Collins (fs)
TestViewFsHdfs.testgetFileLinkStatus is failing an assert
HADOOP-7858. Trivial improvement reported by Todd Lipcon and fixed by Todd Lipcon
Drop some info logging to DEBUG level in IPC, metrics, and HTTP
HADOOP-7854. Critical bug reported by Daryn Sharp and fixed by Daryn Sharp (security)
UGI getCurrentUser is not synchronized
HADOOP-7853. Blocker bug reported by Daryn Sharp and fixed by Daryn Sharp (security)
multiple javax security configurations cause conflicts
HADOOP-7851. Major bug reported by Amar Kamat and fixed by Uma Maheswara Rao G (conf)
Configuration.getClasses() never returns the default value.

Fixed Configuration.getClasses() API to return the default value if the key is not set.
HADOOP-7843. Major bug reported by John George and fixed by John George
compilation failing because workDir not initialized in RunJar.java
HADOOP-7841. Trivial improvement reported by Todd Lipcon and fixed by Todd Lipcon (build)
Run tests with non-secure random
HADOOP-7837. Major bug reported by Steve Loughran and fixed by Eli Collins (conf)
no NullAppender in the log4j config
HADOOP-7813. Major bug reported by Jonathan Eagles and fixed by Jonathan Eagles (build , test)
test-patch +1 patches that introduce javadoc and findbugs warnings in some cases
HADOOP-7811. Major bug reported by Jonathan Eagles and fixed by Jonathan Eagles (security , test)
TestUserGroupInformation#testGetServerSideGroups test fails in chroot
HADOOP-7810. Blocker bug reported by John George and fixed by John George
move hadoop archive to core from tools
HADOOP-7808. Major new feature reported by Daryn Sharp and fixed by Daryn Sharp (fs , security)
Port token service changes from 205
HADOOP-7804. Major improvement reported by Arpit Gupta and fixed by Arpit Gupta (conf)
enable hadoop config generator to set dfs.block.local-path-access.user to enable short circuit read
HADOOP-7802. Major bug reported by Bruno Mahé and fixed by Bruno Mahé
Hadoop scripts unconditionally source "$bin"/../libexec/hadoop-config.sh.

Here is a patch to enable this behavior
HADOOP-7801. Major bug reported by Bruno Mahé and fixed by Bruno Mahé (build)
HADOOP_PREFIX cannot be overriden
HADOOP-7787. Major bug reported by Bruno Mahé and fixed by Bruno Mahé (build)
Make source tarball use conventional name.
HADOOP-7761. Major improvement reported by Todd Lipcon and fixed by Todd Lipcon (io , performance , util)
Improve performance of raw comparisons
HADOOP-7758. Major improvement reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur (fs)
Make GlobFilter class public
HADOOP-7736. Trivial improvement reported by Harsh J and fixed by Harsh J (fs)
Remove duplicate call of Path#normalizePath during initialization.
HADOOP-7657. Major improvement reported by Bert Sanders and fixed by Binglin Chang
Add support for LZ4 compression
HADOOP-7590. Major sub-task reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur (build)
Mavenize streaming and MR examples
HADOOP-7574. Trivial improvement reported by XieXianshan and fixed by XieXianshan (fs)
Improvement for FSshell -stat
HADOOP-7504. Trivial improvement reported by Eli Collins and fixed by Harsh J (metrics)
hadoop-metrics.properties missing some Ganglia31 options
HADOOP-7470. Minor improvement reported by Steve Loughran and fixed by Enis Soztutar (util)
move up to Jackson 1.8.8
HADOOP-7424. Major improvement reported by Eli Collins and fixed by Uma Maheswara Rao G
Log an error if the topology script doesn't handle multiple args
HADOOP-7348. Major improvement reported by XieXianshan and fixed by XieXianshan (fs)
Modify the option of FsShell getmerge from [addnl] to [-nl] for consistency

The 'fs -getmerge' tool now uses a -nl flag to determine if adding a newline at end of each file is required, in favor of the 'addnl' boolean flag that was used earlier.
HADOOP-6886. Minor improvement reported by Nicolas Spiegelberg and fixed by Nicolas Spiegelberg (fs)
LocalFileSystem Needs createNonRecursive API
HADOOP-6840. Minor improvement reported by Nicolas Spiegelberg and fixed by Nicolas Spiegelberg (fs , io)
Support non-recursive create() in FileSystem & SequenceFile.Writer
HADOOP-6614. Minor improvement reported by Steve Loughran and fixed by Jonathan Hsieh (util)
RunJar should provide more diags when it can't create a temp file
HADOOP-6490. Minor bug reported by Zheng Shao and fixed by Uma Maheswara Rao G (fs)
Path.normalize should use StringUtils.replace in favor of String.replace
HADOOP-4515. Minor improvement reported by Abhijit Bagri and fixed by Sho Shimauchi
conf.getBoolean must be case insensitive

Hadoop 0.23.0 Release Notes

These release notes include new developer and user-facing incompatibilities, features, and major improvements.

Changes since Hadoop 1.0.0

MAPREDUCE-3332. Trivial bug reported by Hitesh Shah and fixed by Hitesh Shah (contrib/raid)
contrib/raid compile breaks due to changes in hdfs/protocol/datatransfer/Sender#writeBlock related to checksum handling
MAPREDUCE-3322. Major improvement reported by Arun C Murthy and fixed by Arun C Murthy (documentation , mrv2)
Create a better index.html for maven docs
MAPREDUCE-3321. Minor bug reported by Hitesh Shah and fixed by Hitesh Shah (mrv2)
Disable some failing legacy tests for MRv2 builds to go through
MAPREDUCE-3317. Major bug reported by Ravi Gummadi and fixed by Ravi Gummadi (tools/rumen)
Rumen TraceBuilder is emiting null as hostname

Fixes Rumen to get correct hostName that includes rackName in attempt info.
MAPREDUCE-3316. Major bug reported by Bhallamudi Venkata Siva Kamesh and fixed by Bhallamudi Venkata Siva Kamesh (resourcemanager)
Rebooted link is not working properly
MAPREDUCE-3313. Blocker bug reported by Ravi Gummadi and fixed by Hitesh Shah (mrv2 , test)
TestResourceTrackerService failing in trunk some times
MAPREDUCE-3306. Blocker bug reported by Vinod Kumar Vavilapalli and fixed by Vinod Kumar Vavilapalli (mrv2 , nodemanager)
Cannot run apps after MAPREDUCE-2989
MAPREDUCE-3304. Major bug reported by Ravi Prakash and fixed by Ravi Prakash (mrv2 , test)
TestRMContainerAllocator#testBlackListedNodes fails intermittently
MAPREDUCE-3296. Major bug reported by Vinod Kumar Vavilapalli and fixed by Vinod Kumar Vavilapalli (build)
Pending(9) findBugs warnings
MAPREDUCE-3295. Critical bug reported by Mahadev konar and fixed by
TestAMAuthorization failing on branch 0.23.
MAPREDUCE-3292. Critical bug reported by Mahadev konar and fixed by Mahadev konar (mrv2)
In secure mode job submission fails with Provider org.apache.hadoop.mapreduce.security.token.JobTokenIndentifier$Renewer not found.
MAPREDUCE-3290. Major bug reported by Ramya Sunil and fixed by Arun C Murthy (mrv2)
list-active-trackers throws NPE
MAPREDUCE-3288. Blocker bug reported by Ramya Sunil and fixed by Mahadev konar (mrv2)
Mapreduce 23 builds failing
MAPREDUCE-3285. Blocker bug reported by Arun C Murthy and fixed by Siddharth Seth (mrv2)
Tests on branch-0.23 failing
MAPREDUCE-3284. Major bug reported by Ramya Sunil and fixed by Arun C Murthy (mrv2)
bin/mapred queue fails with JobQueueClient ClassNotFoundException
MAPREDUCE-3282. Critical bug reported by Ramya Sunil and fixed by Arun C Murthy (mrv2)
bin/mapred job -list throws exception
MAPREDUCE-3281. Blocker bug reported by Vinod Kumar Vavilapalli and fixed by Vinod Kumar Vavilapalli (test)
TestLinuxContainerExecutorWithMocks failing on trunk.
MAPREDUCE-3279. Major bug reported by Siddharth Seth and fixed by Siddharth Seth (mrv2)
TestJobHistoryParsing broken
MAPREDUCE-3275. Critical improvement reported by Robert Joseph Evans and fixed by Robert Joseph Evans (documentation , mrv2)
Add docs for WebAppProxy
MAPREDUCE-3274. Blocker bug reported by Robert Joseph Evans and fixed by Robert Joseph Evans (applicationmaster , mrv2)
Race condition in MR App Master Preemtion can cause a dead lock
MAPREDUCE-3269. Blocker bug reported by Ramya Sunil and fixed by Mahadev konar (mrv2)
Jobsummary logs not being moved to a separate file
MAPREDUCE-3264. Blocker bug reported by Todd Lipcon and fixed by Arun C Murthy (mrv2)
mapreduce.job.user.name needs to be set automatically
MAPREDUCE-3263. Blocker bug reported by Ramya Sunil and fixed by Hitesh Shah (build , mrv2)
compile-mapred-test target fails
MAPREDUCE-3262. Critical bug reported by Hitesh Shah and fixed by Hitesh Shah (mrv2 , nodemanager)
A few events are not handled by the NodeManager in failure scenarios
MAPREDUCE-3261. Major bug reported by Chris Riccomini and fixed by (applicationmaster)
AM unable to release containers
MAPREDUCE-3259. Blocker bug reported by Kihwal Lee and fixed by Kihwal Lee (mrv2 , nodemanager)
ContainerLocalizer should get the proper java.library.path from LinuxContainerExecutor
MAPREDUCE-3258. Blocker bug reported by Siddharth Seth and fixed by Siddharth Seth (mrv2)
Job counters missing from AM and history UI
MAPREDUCE-3257. Blocker sub-task reported by Vinod Kumar Vavilapalli and fixed by Vinod Kumar Vavilapalli (applicationmaster , mrv2 , resourcemanager , security)
Authorization checks needed for AM->RM protocol
MAPREDUCE-3256. Blocker sub-task reported by Vinod Kumar Vavilapalli and fixed by Vinod Kumar Vavilapalli (applicationmaster , mrv2 , nodemanager , security)
Authorization checks needed for AM->NM protocol
MAPREDUCE-3254. Blocker bug reported by Ramya Sunil and fixed by Arun C Murthy (contrib/streaming , mrv2)
Streaming jobs failing with PipeMapRunner ClassNotFoundException
MAPREDUCE-3253. Blocker bug reported by Daniel Dai and fixed by Arun C Murthy (mrv2)
ContextFactory throw NoSuchFieldException
MAPREDUCE-3252. Critical bug reported by Todd Lipcon and fixed by Todd Lipcon (mrv2 , task)
MR2: Map tasks rewrite data once even if output fits in sort buffer
MAPREDUCE-3250. Blocker sub-task reported by Vinod Kumar Vavilapalli and fixed by Vinod Kumar Vavilapalli (applicationmaster , mrv2)
When AM restarts, client keeps reconnecting to the new AM and prints a lots of logs.
MAPREDUCE-3249. Blocker sub-task reported by Vinod Kumar Vavilapalli and fixed by Vinod Kumar Vavilapalli (applicationmaster , mrv2)
Recovery of MR AMs with reduces fails the subsequent generation of the job
MAPREDUCE-3248. Blocker bug reported by Arun C Murthy and fixed by Vinod Kumar Vavilapalli (test)
Log4j logs from unit tests are lost
MAPREDUCE-3242. Major bug reported by Mahadev konar and fixed by Mahadev konar (mrv2)
Trunk compilation broken with bad interaction from MAPREDUCE-3070 and MAPREDUCE-3239.
MAPREDUCE-3241. Major bug reported by Devaraj K and fixed by Amar Kamat
(Rumen)TraceBuilder throws IllegalArgumentException

Rumen is fixed to ignore the AMRestartedEvent.
MAPREDUCE-3240. Blocker bug reported by Vinod Kumar Vavilapalli and fixed by Hitesh Shah (mrv2 , nodemanager)
NM should send a SIGKILL for completed containers also
MAPREDUCE-3239. Minor improvement reported by Todd Lipcon and fixed by Todd Lipcon (mrv2)
Use new createSocketAddr API in MRv2 to give better error messages on misconfig
MAPREDUCE-3237. Major improvement reported by Tom White and fixed by Tom White (client)
Move LocalJobRunner to hadoop-mapreduce-client-core module
MAPREDUCE-3233. Blocker sub-task reported by Karam Singh and fixed by Mahadev konar (mrv2)
AM fails to restart when first AM is killed
MAPREDUCE-3228. Blocker bug reported by Vinod Kumar Vavilapalli and fixed by Vinod Kumar Vavilapalli (applicationmaster , mrv2)
MR AM hangs when one node goes bad
MAPREDUCE-3226. Blocker bug reported by Vinod Kumar Vavilapalli and fixed by Vinod Kumar Vavilapalli (mrv2 , task)
Few reduce tasks hanging in a gridmix-run
MAPREDUCE-3220. Minor sub-task reported by Hitesh Shah and fixed by Devaraj K (mrv2 , test)
ant test TestCombineOutputCollector failing on trunk
MAPREDUCE-3212. Minor bug reported by Bhallamudi Venkata Siva Kamesh and fixed by Bhallamudi Venkata Siva Kamesh (mrv2)
Message displays while executing yarn command should be proper
MAPREDUCE-3209. Major bug reported by Vinod Kumar Vavilapalli and fixed by Vinod Kumar Vavilapalli (build , mrv2)
Jenkins reports 160 FindBugs warnings
MAPREDUCE-3208. Minor bug reported by liangzhaowang and fixed by liangzhaowang (mrv2)
NPE while flushing TaskLogAppender
MAPREDUCE-3205. Blocker improvement reported by Todd Lipcon and fixed by Todd Lipcon (mrv2 , nodemanager)
MR2 memory limits should be pmem, not vmem

Resource limits are now expressed and enforced in terms of physical memory, rather than virtual memory. The virtual memory limit is set as a configurable multiple of the physical limit. The NodeManager's memory usage is now configured in units of MB rather than GB.
MAPREDUCE-3204. Major bug reported by Suresh Srinivas and fixed by Alejandro Abdelnur (build)
mvn site:site fails on MapReduce
MAPREDUCE-3203. Major bug reported by Mahadev konar and fixed by Mahadev konar (mrv2)
Fix some javac warnings in MRAppMaster.
MAPREDUCE-3199. Major bug reported by Vinod Kumar Vavilapalli and fixed by Vinod Kumar Vavilapalli (mrv2 , test)
TestJobMonitorAndPrint is broken on trunk
MAPREDUCE-3198. Trivial bug reported by Hitesh Shah and fixed by Arun C Murthy (mrv2)
Change mode for hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/mock-container-executor to 755
MAPREDUCE-3197. Major bug reported by Anupam Seth and fixed by Mahadev konar (mrv2)
TestMRClientService failing on building clean checkout of branch 0.23
MAPREDUCE-3196. Major bug reported by Arun C Murthy and fixed by Arun C Murthy (mrv2)
TestLinuxContainerExecutorWithMocks fails on Mac OSX
MAPREDUCE-3192. Major bug reported by Jitendra Nath Pandey and fixed by Jitendra Nath Pandey
Fix Javadoc warning in JobClient.java and Cluster.java
MAPREDUCE-3190. Major improvement reported by Todd Lipcon and fixed by Todd Lipcon (mrv2)
bin/yarn should barf early if HADOOP_COMMON_HOME or HADOOP_HDFS_HOME are not set
MAPREDUCE-3189. Major improvement reported by Todd Lipcon and fixed by Todd Lipcon (mrv2)
Add link decoration back to MR2's CSS
MAPREDUCE-3188. Major bug reported by Todd Lipcon and fixed by Todd Lipcon (mrv2)
Lots of errors in logs when daemon startup fails
MAPREDUCE-3187. Minor improvement reported by Todd Lipcon and fixed by Todd Lipcon (mrv2)
Add names for various unnamed threads in MR2
MAPREDUCE-3186. Blocker bug reported by Ramgopal N and fixed by Eric Payne (mrv2)
User jobs are getting hanged if the Resource manager process goes down and comes up while job is getting executed.

New Yarn configuration property: Name: yarn.app.mapreduce.am.scheduler.connection.retries Description: Number of times AM should retry to contact RM if connection is lost.
MAPREDUCE-3185. Critical bug reported by Mahadev konar and fixed by Jonathan Eagles (mrv2)
RM Web UI does not sort the columns in some cases.
MAPREDUCE-3183. Trivial bug reported by Hitesh Shah and fixed by Hitesh Shah (build)
hadoop-assemblies/src/main/resources/assemblies/hadoop-mapreduce-dist.xml missing license header
MAPREDUCE-3181. Blocker bug reported by Anupam Seth and fixed by Arun C Murthy (mrv2)
Terasort fails with Kerberos exception on secure cluster
MAPREDUCE-3179. Major bug reported by Jonathan Eagles and fixed by Jonathan Eagles (mrv2 , test)
Incorrect exit code for hadoop-mapreduce-test tests when exception thrown
MAPREDUCE-3176. Blocker bug reported by Ravi Prakash and fixed by Hitesh Shah (mrv2 , test)
ant mapreduce tests are timing out
MAPREDUCE-3175. Blocker sub-task reported by Thomas Graves and fixed by Jonathan Eagles (mrv2)
Yarn httpservers not created with access Control lists
MAPREDUCE-3171. Major improvement reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur (build)
normalize nodemanager native code compilation with common/hdfs native
MAPREDUCE-3170. Critical bug reported by Mahadev konar and fixed by Hitesh Shah (build , mrv1 , mrv2)
Trunk nightly commit builds are failing.
MAPREDUCE-3167. Minor bug reported by Mahadev konar and fixed by Mahadev konar (mrv2)
container-executor is not being packaged with the assembly target.
MAPREDUCE-3166. Major bug reported by Ravi Gummadi and fixed by Ravi Gummadi (tools/rumen)
Make Rumen use job history api instead of relying on current history file name format

Makes Rumen use job history api instead of relying on current history file name format.
MAPREDUCE-3165. Blocker bug reported by Arun C Murthy and fixed by Todd Lipcon (applicationmaster , mrv2)
Ensure logging option is set on child command line
MAPREDUCE-3163. Blocker bug reported by Todd Lipcon and fixed by Mahadev konar (job submission , mrv2)
JobClient spews errors when killing MR2 job
MAPREDUCE-3162. Minor improvement reported by Todd Lipcon and fixed by Todd Lipcon (mrv2 , nodemanager)
Separate application-init and container-init event types in NM's ApplicationImpl FSM
MAPREDUCE-3161. Minor improvement reported by Todd Lipcon and fixed by Todd Lipcon (mrv2)
Improve javadoc and fix some typos in MR2 code
MAPREDUCE-3159. Blocker bug reported by Todd Lipcon and fixed by Todd Lipcon (mrv2)
DefaultContainerExecutor removes appcache dir on every localization
MAPREDUCE-3158. Major bug reported by Hitesh Shah and fixed by Hitesh Shah (mrv2)
Fix trunk build failures
MAPREDUCE-3157. Major bug reported by Ravi Gummadi and fixed by Ravi Gummadi (tools/rumen)
Rumen TraceBuilder is skipping analyzing 0.20 history files

Fixes TraceBuilder to handle 0.20 history file names also.
MAPREDUCE-3154. Major improvement reported by Abhijit Suresh Shingate and fixed by Abhijit Suresh Shingate (client , mrv2)
Validate the Jobs Output Specification as the first statement in JobSubmitter.submitJobInternal(Job, Cluster) method
MAPREDUCE-3153. Major bug reported by Vinod Kumar Vavilapalli and fixed by Mahadev konar (mrv2 , test)
TestFileOutputCommitter.testFailAbort() is failing on trunk on Jenkins
MAPREDUCE-3148. Blocker sub-task reported by Arun C Murthy and fixed by Arun C Murthy (mrv2)
Port MAPREDUCE-2702 to old mapred api
MAPREDUCE-3146. Critical sub-task reported by Vinod Kumar Vavilapalli and fixed by Siddharth Seth (mrv2 , nodemanager)
Add a MR specific command line to dump logs for a given TaskAttemptID
MAPREDUCE-3144. Critical sub-task reported by Vinod Kumar Vavilapalli and fixed by Siddharth Seth (mrv2)
Augment JobHistory to include information needed for serving aggregated logs.
MAPREDUCE-3143. Major bug reported by Vinod Kumar Vavilapalli and fixed by (mrv2 , nodemanager)
Complete aggregation of user-logs spit out by containers onto DFS
MAPREDUCE-3141. Blocker sub-task reported by Vinod Kumar Vavilapalli and fixed by Vinod Kumar Vavilapalli (applicationmaster , mrv2 , security)
Yarn+MR secure mode is broken, uncovered after MAPREDUCE-3056
MAPREDUCE-3140. Major bug reported by Bhallamudi Venkata Siva Kamesh and fixed by Subroto Sanyal (mrv2)
Invalid JobHistory URL for failed applications
MAPREDUCE-3138. Blocker bug reported by Arun C Murthy and fixed by Owen O'Malley (client , mrv2)
Allow for applications to deal with MAPREDUCE-954
MAPREDUCE-3137. Trivial sub-task reported by Hitesh Shah and fixed by Hitesh Shah (mrv2)
Fix broken merge of MR-2719 to 0.23 branch for the distributed shell test case
MAPREDUCE-3136. Blocker sub-task reported by Arun C Murthy and fixed by Arun C Murthy (documentation , mrv2)
Add docs for setting up real-world MRv2 clusters
MAPREDUCE-3134. Blocker sub-task reported by Arun C Murthy and fixed by Arun C Murthy (documentation , mrv2 , scheduler)
Add documentation for CapacityScheduler
MAPREDUCE-3133. Major improvement reported by Jonathan Eagles and fixed by Jonathan Eagles (build)
Running a set of methods in a Single Test Class
MAPREDUCE-3127. Blocker sub-task reported by Amol Kekre and fixed by Arun C Murthy (mrv2 , resourcemanager)
Unable to restrict users based on resourcemanager.admin.acls value set
MAPREDUCE-3126. Blocker bug reported by Thomas Graves and fixed by Arun C Murthy (mrv2)
mr job stuck because reducers using all slots and mapper isn't scheduled
MAPREDUCE-3125. Critical bug reported by Thomas Graves and fixed by Hitesh Shah (mrv2)
app master web UI shows reduce task progress 100% even though reducers not complete and state running/scheduled
MAPREDUCE-3124. Blocker bug reported by Thomas Graves and fixed by John George (mrv2)
mapper failed with failed to load native libs
MAPREDUCE-3123. Blocker bug reported by Thomas Graves and fixed by Hitesh Shah (mrv2)
Symbolic links with special chars causing container/task.sh to fail
MAPREDUCE-3114. Major bug reported by Subroto Sanyal and fixed by Subroto Sanyal (mrv2)
Invalid ApplicationMaster URL in Applications Page
MAPREDUCE-3113. Minor improvement reported by XieXianshan and fixed by XieXianshan (mrv2)
the scripts yarn-daemon.sh and yarn are not working properly
MAPREDUCE-3112. Major bug reported by Eric Yang and fixed by Eric Yang (contrib/streaming)
Calling hadoop cli inside mapreduce job leads to errors

Removed inheritance of certain server environment variables (HADOOP_OPTS and HADOOP_ROOT_LOGGER) in task attempt process.
MAPREDUCE-3110. Major bug reported by Devaraj K and fixed by Vinod Kumar Vavilapalli (mrv2 , test)
TestRPC.testUnknownCall() is failing
MAPREDUCE-3104. Blocker sub-task reported by Vinod Kumar Vavilapalli and fixed by Vinod Kumar Vavilapalli (mrv2 , resourcemanager , security)
Implement Application ACLs, Queue ACLs and their interaction
MAPREDUCE-3099. Major sub-task reported by Mahadev konar and fixed by Mahadev konar
Add docs for setting up a single node MRv2 cluster.
MAPREDUCE-3098. Blocker sub-task reported by Hitesh Shah and fixed by Hitesh Shah (mrv2)
Report Application status as well as ApplicationMaster status in GetApplicationReportResponse
MAPREDUCE-3095. Major bug reported by John George and fixed by John George (mrv2)
fairscheduler ivy including wrong version for hdfs
MAPREDUCE-3092. Minor bug reported by Devaraj K and fixed by Devaraj K (mrv2)
Remove JOB_ID_COMPARATOR usage in JobHistory.java
MAPREDUCE-3090. Major improvement reported by Arun C Murthy and fixed by Arun C Murthy (applicationmaster , mrv2)
Change MR AM to use ApplicationAttemptId rather than <applicationId, startCount> everywhere
MAPREDUCE-3087. Critical bug reported by Ravi Prakash and fixed by Ravi Prakash (mrv2)
CLASSPATH not the same after MAPREDUCE-2880
MAPREDUCE-3081. Major bug reported by vitthal (Suhas) Gogate and fixed by (contrib/vaidya)
Change the name format for hadoop core and vaidya jar to be hadoop-{core/vaidya}-{version}.jar in vaidya.sh

contrib/vaidya/bin/vaidya.sh script fixed to use appropriate jars and classpath
MAPREDUCE-3078. Blocker bug reported by Vinod Kumar Vavilapalli and fixed by Vinod Kumar Vavilapalli (applicationmaster , mrv2 , resourcemanager)
Application's progress isn't updated from AM to RM.
MAPREDUCE-3073. Blocker bug reported by Mahadev konar and fixed by Mahadev konar
Build failure for MRv1 caused due to changes to MRConstants.
MAPREDUCE-3071. Major bug reported by Thomas Graves and fixed by Thomas Graves (mrv2)
app master configuration web UI link under the Job menu opens up application menu
MAPREDUCE-3070. Blocker bug reported by Ravi Teja Ch N V and fixed by Devaraj K (mrv2 , nodemanager)
NM not able to register with RM after NM restart
MAPREDUCE-3068. Blocker bug reported by Vinod Kumar Vavilapalli and fixed by Chris Riccomini (mrv2)
Should set MALLOC_ARENA_MAX for all YARN daemons and AMs/Containers
MAPREDUCE-3067. Blocker bug reported by Hitesh Shah and fixed by Hitesh Shah (mrv2)
Container exit status not set properly to launched process's exit code on successful completion of process
MAPREDUCE-3066. Major bug reported by Chris Riccomini and fixed by Chris Riccomini (mrv2 , nodemanager)
YARN NM fails to start
MAPREDUCE-3064. Blocker bug reported by Thomas Graves and fixed by Venu Gopala Rao
27 unit test failures with Invalid "mapreduce.jobtracker.address" configuration value for JobTracker: "local"
MAPREDUCE-3062. Major bug reported by Chris Riccomini and fixed by Chris Riccomini (mrv2 , nodemanager , resourcemanager)
YARN NM/RM fail to start
MAPREDUCE-3059. Blocker bug reported by Karam Singh and fixed by Devaraj K (mrv2)
QueueMetrics do not have metrics for aggregate containers-allocated and aggregate containers-released
MAPREDUCE-3058. Critical bug reported by Karam Singh and fixed by Vinod Kumar Vavilapalli (contrib/gridmix , mrv2)
Sometimes task keeps on running while its Syslog says that it is shutdown
MAPREDUCE-3057. Blocker bug reported by Karam Singh and fixed by Eric Payne (jobhistoryserver , mrv2)
Job History Server goes of OutOfMemory with 1200 Jobs and Heap Size set to 10 GB
MAPREDUCE-3056. Blocker bug reported by Devaraj K and fixed by Devaraj K (applicationmaster , mrv2)
Jobs are failing when those are submitted by other users
MAPREDUCE-3055. Minor bug reported by Hitesh Shah and fixed by Vinod Kumar Vavilapalli (mrv2)
Simplify parameter passing to Application Master from Client. SImplify approach to pass info such appId, ClusterTimestamp and failcount required by App Master.
MAPREDUCE-3054. Blocker bug reported by Siddharth Seth and fixed by Mahadev konar (mrv2)
Unable to kill submitted jobs
MAPREDUCE-3053. Major bug reported by Chris Riccomini and fixed by Vinod Kumar Vavilapalli (mrv2 , resourcemanager)
YARN Protobuf RPC Failures in RM
MAPREDUCE-3050. Blocker bug reported by Robert Joseph Evans and fixed by Robert Joseph Evans (mrv2 , resourcemanager)
YarnScheduler needs to expose Resource Usage Information
MAPREDUCE-3048. Major bug reported by Vinod Kumar Vavilapalli and fixed by Vinod Kumar Vavilapalli (build)
Fix test-patch to run tests via "mvn clean install test"
MAPREDUCE-3044. Blocker bug reported by Ramya Sunil and fixed by Mahadev konar (mrv2)
Pipes jobs stuck without making progress
MAPREDUCE-3042. Major bug reported by Chris Riccomini and fixed by Chris Riccomini (mrv2 , resourcemanager)
YARN RM fails to start

Simple typo fix to allow ResourceManager to start instead of fail
MAPREDUCE-3041. Blocker bug reported by Hitesh Shah and fixed by Hitesh Shah (mrv2)
Enhance YARN Client-RM protocol to provide access to information such as cluster's Min/Max Resource capabilities similar to that of AM-RM protocol
MAPREDUCE-3040. Major bug reported by Thomas Graves and fixed by Arun C Murthy (mrv2)
TestMRJobs, TestMRJobsWithHistoryService, TestMROldApiJobs fail
MAPREDUCE-3038. Blocker bug reported by Thomas Graves and fixed by Jeffrey Naisbitt (mrv2)
job history server not starting because conf() missing HsController
MAPREDUCE-3036. Blocker bug reported by Robert Joseph Evans and fixed by Robert Joseph Evans (mrv2)
Some of the Resource Manager memory metrics go negative.
MAPREDUCE-3035. Critical bug reported by Karam Singh and fixed by chackaravarthy (mrv2)
MR V2 jobhistory does not contain rack information
MAPREDUCE-3033. Blocker bug reported by Karam Singh and fixed by Hitesh Shah (job submission , mrv2)
JobClient requires mapreduce.jobtracker.address config even when mapreduce.framework.name is set to yarn
MAPREDUCE-3032. Blocker bug reported by Vinod Kumar Vavilapalli and fixed by Devaraj K (applicationmaster , mrv2)
JobHistory doesn't have error information from failed tasks
MAPREDUCE-3031. Blocker bug reported by Karam Singh and fixed by Siddharth Seth (mrv2)
Job Client goes into infinite loop when we kill AM
MAPREDUCE-3030. Blocker bug reported by Devaraj K and fixed by Devaraj K (mrv2 , resourcemanager)
RM is not processing heartbeat and continuously giving the message 'Node not found rebooting'
MAPREDUCE-3028. Blocker bug reported by Mohammad Kamrul Islam and fixed by Ravi Prakash (mrv2)
Support job end notification in .next /0.23
MAPREDUCE-3023. Major bug reported by Ravi Prakash and fixed by Ravi Prakash (mrv2)
Queue state is not being translated properly (is always assumed to be running)
MAPREDUCE-3021. Major bug reported by Thomas Graves and fixed by Thomas Graves (mrv2)
all yarn webapps use same base name of "yarn/"
MAPREDUCE-3020. Major bug reported by chackaravarthy and fixed by chackaravarthy (jobhistoryserver)
Node link in reduce task attempt page is not working [Job History Page]
MAPREDUCE-3018. Blocker bug reported by Mahadev konar and fixed by Mahadev konar (mrv2)
Streaming jobs with -file option fail to run.
MAPREDUCE-3017. Blocker bug reported by Mahadev konar and fixed by Mahadev konar (mrv2)
The Web UI shows FINISHED for killed/successful/failed jobs.
MAPREDUCE-3014. Major improvement reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur (build)
Rename and invert logic of '-cbuild' profile to 'native' and off by default
MAPREDUCE-3013. Major sub-task reported by Vinod Kumar Vavilapalli and fixed by Vinod Kumar Vavilapalli (mrv2 , security)
Remove YarnConfiguration.YARN_SECURITY_INFO
MAPREDUCE-3007. Major sub-task reported by Vinod Kumar Vavilapalli and fixed by Vinod Kumar Vavilapalli (jobhistoryserver , mrv2)
JobClient cannot talk to JobHistory server in secure mode
MAPREDUCE-3006. Major bug reported by Vinod Kumar Vavilapalli and fixed by Vinod Kumar Vavilapalli (applicationmaster , mrv2)
MapReduce AM exits prematurely before completely writing and closing the JobHistory file
MAPREDUCE-3005. Major bug reported by Vinod Kumar Vavilapalli and fixed by Arun C Murthy (mrv2)
MR app hangs because of a NPE in ResourceManager
MAPREDUCE-3004. Minor bug reported by Hitesh Shah and fixed by Hitesh Shah (mrv2)
sort example fails in shuffle/reduce stage as it assumes a local job by default
MAPREDUCE-3003. Major bug reported by Tom White and fixed by Alejandro Abdelnur (build)
Publish MR JARs to Maven snapshot repository
MAPREDUCE-3001. Blocker improvement reported by Robert Joseph Evans and fixed by Robert Joseph Evans (jobhistoryserver , mrv2)
Map Reduce JobHistory and AppMaster UI should have ability to display task specific counters.
MAPREDUCE-2999. Critical bug reported by Thomas Graves and fixed by Thomas Graves (mrv2)
hadoop.http.filter.initializers not working properly on yarn UI
MAPREDUCE-2998. Critical bug reported by Jeffrey Naisbitt and fixed by Vinod Kumar Vavilapalli (mrv2)
Failing to contact Am/History for jobs: java.io.EOFException in DataInputStream
MAPREDUCE-2997. Major bug reported by Vinod Kumar Vavilapalli and fixed by Vinod Kumar Vavilapalli (applicationmaster , mrv2)
MR task fails before launch itself with an NPE in ContainerLauncher
MAPREDUCE-2996. Blocker bug reported by Vinod Kumar Vavilapalli and fixed by Jonathan Eagles (jobhistoryserver , mrv2)
Log uberized information into JobHistory and use the same via CompletedJob
MAPREDUCE-2995. Major bug reported by Vinod Kumar Vavilapalli and fixed by Vinod Kumar Vavilapalli (mrv2)
MR AM crashes when a container-launch hangs on a faulty NM
MAPREDUCE-2994. Major bug reported by Devaraj K and fixed by Devaraj K (mrv2 , resourcemanager)
Parse Error is coming for App ID when we click application link on the RM UI
MAPREDUCE-2991. Major bug reported by Priyo Mustafi and fixed by Priyo Mustafi (scheduler)
queueinfo.jsp fails to show queue status if any Capacity scheduler queue name has dash/hiphen in it.
MAPREDUCE-2990. Blocker improvement reported by Mahadev konar and fixed by Subroto Sanyal (mrv2)
Health Report on Resource Manager UI is null if the NM's are all healthy.
MAPREDUCE-2989. Critical sub-task reported by Siddharth Seth and fixed by Siddharth Seth (mrv2)
JobHistory should link to task logs
MAPREDUCE-2988. Critical sub-task reported by Eric Payne and fixed by Robert Joseph Evans (mrv2 , security , test)
Reenable TestLinuxContainerExecutor reflecting the current NM code.
MAPREDUCE-2987. Major bug reported by Thomas Graves and fixed by Thomas Graves (mrv2)
RM UI display logged in user as null
MAPREDUCE-2986. Critical task reported by Anupam Seth and fixed by Anupam Seth (mrv2 , test)
Multiple node managers support for the MiniYARNCluster
MAPREDUCE-2985. Major bug reported by Thomas Graves and fixed by Thomas Graves (mrv2)
findbugs error in ResourceLocalizationService.handle(LocalizationEvent)
MAPREDUCE-2984. Minor bug reported by Devaraj K and fixed by Devaraj K (mrv2 , nodemanager)
Throwing NullPointerException when we open the container page
MAPREDUCE-2979. Major bug reported by Siddharth Seth and fixed by Siddharth Seth (mrv2)
Remove ClientProtocolProvider configuration under mapreduce-client-core
MAPREDUCE-2977. Blocker sub-task reported by Owen O'Malley and fixed by Arun C Murthy (mrv2 , resourcemanager , security)
ResourceManager needs to renew and cancel tokens associated with a job
MAPREDUCE-2975. Blocker bug reported by Mahadev konar and fixed by Mahadev konar
ResourceManager Delegate is not getting initialized with yarn-site.xml as default configuration.
MAPREDUCE-2971. Blocker bug reported by Thomas Graves and fixed by Thomas Graves (mrv2)
ant build mapreduce fails protected access jc.displayJobList(jobs);
MAPREDUCE-2970. Major bug reported by Venu Gopala Rao and fixed by Venu Gopala Rao (job submission , mrv2)
Null Pointer Exception while submitting a Job, If mapreduce.framework.name property is not set.
MAPREDUCE-2966. Major improvement reported by Abhijit Suresh Shingate and fixed by Abhijit Suresh Shingate (applicationmaster , jobhistoryserver , nodemanager , resourcemanager)
Add ShutDown hooks for MRV2 processes
MAPREDUCE-2965. Blocker bug reported by Vinod Kumar Vavilapalli and fixed by Siddharth Seth (mrv2)
Streamline hashCode(), equals(), compareTo() and toString() for all IDs
MAPREDUCE-2963. Critical bug reported by Mahadev konar and fixed by Siddharth Seth
TestMRJobs hangs waiting to connect to history server.
MAPREDUCE-2961. Blocker improvement reported by Mahadev konar and fixed by Vinod Kumar Vavilapalli (mrv2)
Increase the default threadpool size for container launching in the application master.
MAPREDUCE-2958. Critical bug reported by Thomas Graves and fixed by Arun C Murthy (mrv2)
mapred-default.xml not merged from mr279
MAPREDUCE-2954. Critical bug reported by Vinod Kumar Vavilapalli and fixed by Siddharth Seth (mrv2)
Deadlock in NM with threads racing for ApplicationAttemptId
MAPREDUCE-2953. Major bug reported by Vinod Kumar Vavilapalli and fixed by Thomas Graves (mrv2 , resourcemanager)
JobClient fails due to a race in RM, removes staged files and in turn crashes MR AM
MAPREDUCE-2952. Blocker bug reported by Vinod Kumar Vavilapalli and fixed by Arun C Murthy (mrv2 , resourcemanager)
Application failure diagnostics are not consumed in a couple of cases
MAPREDUCE-2949. Major bug reported by Ravi Teja Ch N V and fixed by Ravi Teja Ch N V (mrv2 , nodemanager)
NodeManager in a inconsistent state if a service startup fails.
MAPREDUCE-2948. Major bug reported by Milind Bhandarkar and fixed by Mahadev konar (contrib/streaming)
Hadoop streaming test failure, post MR-2767
MAPREDUCE-2947. Major bug reported by Vinod Kumar Vavilapalli and fixed by Vinod Kumar Vavilapalli (mrv2)
Sort fails on YARN+MR with lots of task failures
MAPREDUCE-2938. Trivial bug reported by Arun C Murthy and fixed by Arun C Murthy (mrv2 , scheduler)
Missing log stmt for app submission fail CS
MAPREDUCE-2937. Critical bug reported by Mahadev konar and fixed by Mahadev konar (mrv2)
Errors in Application failures are not shown in the client trace.
MAPREDUCE-2936. Major bug reported by Vinod Kumar Vavilapalli and fixed by Vinod Kumar Vavilapalli
Contrib Raid compilation broken after HDFS-1620
MAPREDUCE-2933. Blocker sub-task reported by Arun C Murthy and fixed by Arun C Murthy (applicationmaster , mrv2 , nodemanager , resourcemanager)
Change allocate call to return ContainerStatus for completed containers rather than Container
MAPREDUCE-2930. Major improvement reported by Sharad Agarwal and fixed by Binglin Chang (mrv2)
Generate state graph from the State Machine Definition

Generate state graph from State Machine Definition
MAPREDUCE-2925. Major bug reported by Devaraj K and fixed by Devaraj K (mrv2)
job -status <JOB_ID> is giving continuously info message for completed jobs on the console
MAPREDUCE-2917. Major bug reported by Arun C Murthy and fixed by Arun C Murthy (mrv2 , resourcemanager)
Corner case in container reservations
MAPREDUCE-2916. Major bug reported by Mahadev konar and fixed by Mahadev konar
Ivy build for MRv1 fails with bad organization for common daemon.
MAPREDUCE-2913. Critical bug reported by Robert Joseph Evans and fixed by Jonathan Eagles (mrv2 , test)
TestMRJobs.testFailingMapper does not assert the correct thing.
MAPREDUCE-2909. Major sub-task reported by Arun C Murthy and fixed by Arun C Murthy (documentation , mrv2)
Docs for remaining records in yarn-api
MAPREDUCE-2908. Critical bug reported by Mahadev konar and fixed by Vinod Kumar Vavilapalli (mrv2)
Fix findbugs warnings in Map Reduce.
MAPREDUCE-2907. Major bug reported by Ravi Prakash and fixed by Ravi Prakash (mrv2 , resourcemanager)
ResourceManager logs filled with [INFO] debug messages from org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue
MAPREDUCE-2904. Major bug reported by Sharad Agarwal and fixed by Sharad Agarwal
HDFS jars added incorrectly to yarn classpath
MAPREDUCE-2899. Major sub-task reported by Arun C Murthy and fixed by Arun C Murthy (mrv2 , resourcemanager)
Replace major parts of ApplicationSubmissionContext with a ContainerLaunchContext
MAPREDUCE-2898. Major sub-task reported by Arun C Murthy and fixed by Arun C Murthy (documentation , mrv2)
Docs for core protocols in yarn-api - ContainerManager
MAPREDUCE-2897. Major sub-task reported by Arun C Murthy and fixed by Arun C Murthy (documentation , mrv2)
Docs for core protocols in yarn-api - ClientRMProtocol
MAPREDUCE-2896. Major sub-task reported by Arun C Murthy and fixed by Arun C Murthy (mrv2)
Remove all apis other than getters and setters in all org/apache/hadoop/yarn/api/records/*
MAPREDUCE-2894. Blocker improvement reported by Arun C Murthy and fixed by (mrv2)
Improvements to YARN apis
MAPREDUCE-2893. Trivial improvement reported by Liang-Chi Hsieh and fixed by Liang-Chi Hsieh (client)
Removing duplicate service provider in hadoop-mapreduce-client-jobclient
MAPREDUCE-2891. Major sub-task reported by Arun C Murthy and fixed by Arun C Murthy (documentation , mrv2)
Docs for core protocols in yarn-api - AMRMProtocol
MAPREDUCE-2889. Critical sub-task reported by Arun C Murthy and fixed by Hitesh Shah (documentation , mrv2)
Add docs for writing new application frameworks
MAPREDUCE-2886. Critical bug reported by Mahadev konar and fixed by Mahadev konar (mrv2)
Fix Javadoc warnings in MapReduce.
MAPREDUCE-2885. Blocker bug reported by Arun C Murthy and fixed by Arun C Murthy
mapred-config.sh doesn't look for $HADOOP_COMMON_HOME/libexec/hadoop-config.sh
MAPREDUCE-2882. Minor bug reported by Todd Lipcon and fixed by Todd Lipcon (test)
TestLineRecordReader depends on ant jars
MAPREDUCE-2881. Major bug reported by Giridharan Kesavan and fixed by Giridharan Kesavan (build)
mapreduce ant compilation fails "java.lang.IllegalStateException: impossible to get artifacts"
MAPREDUCE-2880. Blocker improvement reported by Luke Lu and fixed by Arun C Murthy (mrv2)
Fix classpath construction for MRv2
MAPREDUCE-2879. Major bug reported by Arun C Murthy and fixed by Arun C Murthy
Change mrv2 version to be 0.23.0-SNAPSHOT
MAPREDUCE-2877. Major bug reported by Mahadev konar and fixed by Mahadev konar
Add missing Apache license header in some files in MR and also add the rat plugin to the poms.
MAPREDUCE-2876. Critical bug reported by Robert Joseph Evans and fixed by Anupam Seth (mrv2)
ContainerAllocationExpirer appears to use the incorrect configs
MAPREDUCE-2874. Major bug reported by Thomas Graves and fixed by Eric Payne (mrv2)
ApplicationId printed in 2 different formats and has 2 different toString routines that are used
MAPREDUCE-2868. Major bug reported by Thomas Graves and fixed by Mahadev konar (build)
ant build broken in hadoop-mapreduce dir
MAPREDUCE-2867. Major bug reported by Mahadev konar and fixed by Mahadev konar
Remove Unused TestApplicaitonCleanup in resourcemanager/applicationsmanager.
MAPREDUCE-2864. Major improvement reported by Robert Joseph Evans and fixed by Robert Joseph Evans (jobhistoryserver , mrv2 , nodemanager , resourcemanager)
Renaming of configuration property names in yarn
MAPREDUCE-2860. Major bug reported by Mahadev konar and fixed by Mahadev konar (mrv2)
Fix log4j logging in the maven test cases.
MAPREDUCE-2859. Major bug reported by Giridharan Kesavan and fixed by Giridharan Kesavan
mapreduce trunk is broken with eclipse plugin contrib
MAPREDUCE-2858. Blocker sub-task reported by Luke Lu and fixed by Robert Joseph Evans (applicationmaster , mrv2 , security)
MRv2 WebApp Security

A new server has been added to yarn. It is a web proxy that sits in front of the AM web UI. The server is controlled by the yarn.web-proxy.address config. If that config is set, and it points to an address that is different then the RM web interface then a separate proxy server needs to be launched. This can be done by running yarn-daemon.sh start proxyserver If a separate proxy server is needed other configs also may need to be set, if security is enabled. yarn.web-proxy.principal yarn.web-proxy.keytab The proxy server is stateless and should be able to support a VIP or other load balancing sitting in front of multiple instances of this server.
MAPREDUCE-2854. Major bug reported by Thomas Graves and fixed by Thomas Graves
update INSTALL with config necessary run mapred on yarn
MAPREDUCE-2848. Major improvement reported by Luke Lu and fixed by Luke Lu
Upgrade avro to 1.5.2
MAPREDUCE-2846. Blocker bug reported by Allen Wittenauer and fixed by Owen O'Malley (task , task-controller , tasktracker)
a small % of all tasks fail with DefaultTaskController

Fixed a race condition in writing the log index file that caused tasks to 'fail'.
MAPREDUCE-2844. Trivial bug reported by Ramya Sunil and fixed by Ravi Teja Ch N V (mrv2)
[MR-279] Incorrect node ID info
MAPREDUCE-2843. Major bug reported by Ramya Sunil and fixed by Abhijit Suresh Shingate (mrv2)
[MR-279] Node entries on the RM UI are not sortable
MAPREDUCE-2840. Minor bug reported by Thomas Graves and fixed by Jonathan Eagles (mrv2)
mr279 TestUberAM.testSleepJob test fails
MAPREDUCE-2839. Major bug reported by Siddharth Seth and fixed by Siddharth Seth
MR Jobs fail on a secure cluster with viewfs
MAPREDUCE-2821. Blocker bug reported by Ramya Sunil and fixed by Mahadev konar (mrv2)
[MR-279] Missing fields in job summary logs
MAPREDUCE-2808. Minor bug reported by Thomas Graves and fixed by Thomas Graves (mrv2)
pull MAPREDUCE-2797 into mr279 branch
MAPREDUCE-2807. Major sub-task reported by Sharad Agarwal and fixed by Sharad Agarwal (applicationmaster , mrv2 , resourcemanager)
MR-279: AM restart does not work after RM refactor
MAPREDUCE-2805. Minor improvement reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (contrib/raid)
Update RAID for HDFS-2241
MAPREDUCE-2802. Critical improvement reported by Ramya Sunil and fixed by Jonathan Eagles (mrv2)
[MR-279] Jobhistory filenames should have jobID to help in better parsing
MAPREDUCE-2800. Major bug reported by Ramya Sunil and fixed by Siddharth Seth (mrv2)
clockSplits, cpuUsages, vMemKbytes, physMemKbytes is set to -1 in jhist files
MAPREDUCE-2797. Major bug reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (contrib/raid , test)
Some java files cannot be compiled
MAPREDUCE-2796. Major bug reported by Ramya Sunil and fixed by Devaraj K (mrv2)
[MR-279] Start time for all the apps is set to 0
MAPREDUCE-2794. Blocker bug reported by Ramya Sunil and fixed by John George (mrv2)
[MR-279] Incorrect metrics value for AvailableGB per queue per user
MAPREDUCE-2792. Blocker sub-task reported by Ramya Sunil and fixed by Vinod Kumar Vavilapalli (mrv2 , security)
[MR-279] Replace IP addresses with hostnames
MAPREDUCE-2791. Blocker bug reported by Ramya Sunil and fixed by Devaraj K (mrv2)
[MR-279] Missing/incorrect info on job -status CLI
MAPREDUCE-2789. Major bug reported by Ramya Sunil and fixed by Eric Payne (mrv2)
[MR:279] Update the scheduling info on CLI

"mapred/job -list" now contains map/reduce, container, and resource information.
MAPREDUCE-2788. Major bug reported by Ahmed Radwan and fixed by Ahmed Radwan (mrv2)
Normalize requests in FifoScheduler.allocate to prevent NPEs later
MAPREDUCE-2783. Critical bug reported by Thomas Graves and fixed by Eric Payne (mrv2)
mr279 job history handling after killing application
MAPREDUCE-2782. Major test reported by Arun C Murthy and fixed by Arun C Murthy (mrv2)
MR-279: Unit (mockito) tests for CS
MAPREDUCE-2781. Minor bug reported by Thomas Graves and fixed by Thomas Graves (mrv2)
mr279 RM application finishtime not set
MAPREDUCE-2779. Major bug reported by Ming Ma and fixed by Ming Ma (job submission)
JobSplitWriter.java can't handle large job.split file
MAPREDUCE-2776. Major bug reported by Siddharth Seth and fixed by Siddharth Seth (mrv2)
MR 279: Fix some of the yarn findbug warnings
MAPREDUCE-2775. Blocker bug reported by Ramya Sunil and fixed by Devaraj K (mrv2)
[MR-279] Decommissioned node does not shutdown
MAPREDUCE-2774. Minor bug reported by Ramya Sunil and fixed by Venu Gopala Rao (mrv2)
[MR-279] Add a startup msg while starting RM/NM
MAPREDUCE-2773. Minor bug reported by Thomas Graves and fixed by Thomas Graves (mrv2)
[MR-279] server.api.records.NodeHealthStatus renamed but not updated in client NodeHealthStatus.java
MAPREDUCE-2772. Major bug reported by Robert Joseph Evans and fixed by Robert Joseph Evans (mrv2)
MR-279: mrv2 no longer compiles against trunk after common mavenization.
MAPREDUCE-2767. Blocker bug reported by Milind Bhandarkar and fixed by Milind Bhandarkar (security)
Remove Linux task-controller from 0.22 branch
MAPREDUCE-2766. Blocker sub-task reported by Ramya Sunil and fixed by Hitesh Shah (mrv2)
[MR-279] Set correct permissions for files in dist cache
MAPREDUCE-2764. Major bug reported by Daryn Sharp and fixed by Owen O'Malley
Fix renewal of dfs delegation tokens

Generalizes token renewal and canceling to a common interface and provides a plugin interface for adding renewers for new kinds of tokens. Hftp changed to store the tokens as HFTP and renew them over http.
MAPREDUCE-2763. Major bug reported by Ramya Sunil and fixed by (mrv2)
IllegalArgumentException while using the dist cache
MAPREDUCE-2762. Blocker bug reported by Ramya Sunil and fixed by Mahadev konar (mrv2)
[MR-279] - Cleanup staging dir after job completion
MAPREDUCE-2760. Minor bug reported by Todd Lipcon and fixed by Todd Lipcon (documentation)
mapreduce.jobtracker.split.metainfo.maxsize typoed in mapred-default.xml
MAPREDUCE-2756. Minor bug reported by Robert Joseph Evans and fixed by Robert Joseph Evans (client , mrv2)
JobControl can drop jobs if an error occurs
MAPREDUCE-2754. Blocker bug reported by Ramya Sunil and fixed by Ravi Teja Ch N V (mrv2)
MR-279: AM logs are incorrectly going to stderr and error messages going incorrectly to stdout
MAPREDUCE-2751. Blocker bug reported by Vinod Kumar Vavilapalli and fixed by Siddharth Seth (mrv2)
[MR-279] Lot of local files left on NM after the app finish.
MAPREDUCE-2749. Major bug reported by Vinod Kumar Vavilapalli and fixed by Thomas Graves (mrv2)
[MR-279] NM registers with RM even before it starts various servers
MAPREDUCE-2747. Blocker sub-task reported by Vinod Kumar Vavilapalli and fixed by Robert Joseph Evans (mrv2 , nodemanager , security)
[MR-279] [Security] Cleanup LinuxContainerExecutor binary sources
MAPREDUCE-2746. Blocker sub-task reported by Vinod Kumar Vavilapalli and fixed by Arun C Murthy (mrv2 , security)
[MR-279] [Security] Yarn servers can't communicate with each other with hadoop.security.authorization set to true
MAPREDUCE-2741. Major task reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur (build)
Make ant build system work with hadoop-common JAR generated by Maven
MAPREDUCE-2740. Major bug reported by Todd Lipcon and fixed by Todd Lipcon
MultipleOutputs in new API creates needless TaskAttemptContexts
MAPREDUCE-2738. Blocker bug reported by Ramya Sunil and fixed by Robert Joseph Evans (mrv2)
Missing cluster level stats on the RM UI
MAPREDUCE-2737. Major bug reported by Ramya Sunil and fixed by Siddharth Seth (mrv2)
Update the progress of jobs on client side
MAPREDUCE-2736. Major task reported by Eli Collins and fixed by Eli Collins (jobtracker , tasktracker)
Remove unused contrib components dependent on MR1

The pre-MR2 MapReduce implementation (JobTracker, TaskTracer, etc) and contrib components are no longer supported. This implementation is currently supported in the 0.20.20x releases.
MAPREDUCE-2735. Major bug reported by Thomas Graves and fixed by Thomas Graves (mrv2)
MR279: finished applications should be added to an application summary log
MAPREDUCE-2732. Major bug reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (test)
Some tests using FSNamesystem.LOG cannot be compiled
MAPREDUCE-2727. Major bug reported by Jeffrey Naisbitt and fixed by Jeffrey Naisbitt (mrv2)
MR-279: SleepJob throws divide by zero exception when count = 0
MAPREDUCE-2726. Blocker improvement reported by Jeffrey Naisbitt and fixed by Jeffrey Naisbitt (mrv2)
MR-279: Add the jobFile to the web UI
MAPREDUCE-2719. Major new feature reported by Sharad Agarwal and fixed by Hitesh Shah (mrv2)
MR-279: Write a shell command application

Adding a simple, DistributedShell application as an alternate framework to MapReduce and to act as an illustrative example for porting applications to YARN.
MAPREDUCE-2716. Major bug reported by Jeffrey Naisbitt and fixed by Jeffrey Naisbitt (mrv2)
MR279: MRReliabilityTest job fails because of missing job-file.
MAPREDUCE-2711. Major bug reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (contrib/raid)
TestBlockPlacementPolicyRaid cannot be compiled
MAPREDUCE-2710. Major bug reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (client)
Update DFSClient.stringifyToken(..) in JobSubmitter.printTokens(..) for HDFS-2161
MAPREDUCE-2708. Blocker sub-task reported by Sharad Agarwal and fixed by Sharad Agarwal (applicationmaster , mrv2)
[MR-279] Design and implement MR Application Master recovery
MAPREDUCE-2707. Major improvement reported by Jitendra Nath Pandey and fixed by Jitendra Nath Pandey
ProtoOverHadoopRpcEngine without using TunnelProtocol over WritableRpc
MAPREDUCE-2706. Major bug reported by Jeffrey Naisbitt and fixed by Jeffrey Naisbitt (mrv2)
MR-279: Submit jobs beyond the max jobs per queue limit no longer gets logged
MAPREDUCE-2705. Major bug reported by Thomas Graves and fixed by Thomas Graves (tasktracker)
tasks localized and launched serially by TaskLauncher - causing other tasks to be delayed
MAPREDUCE-2702. Blocker sub-task reported by Sharad Agarwal and fixed by Sharad Agarwal (applicationmaster , mrv2)
[MR-279] OutputCommitter changes for MR Application Master recovery

Enhance OutputCommitter and FileOutputCommitter to allow for recover of tasks across job restart.
MAPREDUCE-2701. Major improvement reported by Robert Joseph Evans and fixed by Robert Joseph Evans (mrv2)
MR-279: app/Job.java needs UGI for the user that launched it
MAPREDUCE-2697. Major bug reported by Arun C Murthy and fixed by Arun C Murthy (mrv2)
Enhance CS to cap concurrently running jobs
MAPREDUCE-2696. Major sub-task reported by Arun C Murthy and fixed by Siddharth Seth (mrv2 , nodemanager)
Container logs aren't getting cleaned up when LogAggregation is disabled
MAPREDUCE-2693. Critical bug reported by Amol Kekre and fixed by Hitesh Shah (mrv2)
NPE in AM causes it to lose containers which are never returned back to RM
MAPREDUCE-2692. Major new feature reported by Amol Kekre and fixed by Sharad Agarwal (mrv2)
Ensure AM Restart and Recovery-on-restart is complete
MAPREDUCE-2691. Major improvement reported by Amol Kekre and fixed by Siddharth Seth (mrv2)
Finish up the cleanup of distributed cache file resources and related tests.
MAPREDUCE-2690. Major bug reported by Ramya Sunil and fixed by Eric Payne (mrv2)
Construct the web page for default scheduler
MAPREDUCE-2689. Major bug reported by Ramya Sunil and fixed by (mrv2)
InvalidStateTransisiton when AM is not assigned to a job
MAPREDUCE-2687. Blocker bug reported by Ramya Sunil and fixed by Mahadev konar (mrv2)
Non superusers unable to launch apps in both secure and non-secure cluster
MAPREDUCE-2682. Trivial improvement reported by Arun C Murthy and fixed by Vinod Kumar Vavilapalli
Add a -classpath option to bin/mapred
MAPREDUCE-2680. Minor improvement reported by Arun C Murthy and fixed by Arun C Murthy
Enhance job-client cli to show queue information for running jobs
MAPREDUCE-2679. Trivial improvement reported by Arun C Murthy and fixed by Arun C Murthy
MR-279: Merge MR-279 related minor patches into trunk
MAPREDUCE-2678. Major bug reported by Jeffrey Naisbitt and fixed by Jeffrey Naisbitt (capacity-sched)
MR-279: minimum-user-limit-percent no longer honored
MAPREDUCE-2677. Major bug reported by Ramya Sunil and fixed by Robert Joseph Evans (mrv2)
MR-279: 404 error while accessing pages from history server
MAPREDUCE-2676. Major improvement reported by Robert Joseph Evans and fixed by Robert Joseph Evans (mrv2)
MR-279: JobHistory Job page needs reformatted
MAPREDUCE-2675. Major improvement reported by Robert Joseph Evans and fixed by Robert Joseph Evans (mrv2)
MR-279: JobHistory Server main page needs to be reformatted
MAPREDUCE-2672. Major improvement reported by Robert Joseph Evans and fixed by Robert Joseph Evans (mrv2)
MR-279: JobHistory Server needs Analysis this job
MAPREDUCE-2670. Trivial bug reported by Eli Collins and fixed by Eli Collins
Fixing spelling mistake in FairSchedulerServlet.java
MAPREDUCE-2668. Blocker bug reported by Robert Joseph Evans and fixed by Thomas Graves (mrv2)
MR-279: APPLICATION_STOP is never sent to AuxServices
MAPREDUCE-2667. Major bug reported by Thomas Graves and fixed by Thomas Graves (mrv2)
MR279: mapred job -kill leaves application in RUNNING state
MAPREDUCE-2666. Blocker sub-task reported by Robert Joseph Evans and fixed by Jonathan Eagles (mrv2)
MR-279: Need to retrieve shuffle port number on ApplicationMaster restart
MAPREDUCE-2664. Major improvement reported by Siddharth Seth and fixed by Siddharth Seth (mrv2)
MR 279: Implement JobCounters for MRv2 + Fix for Map Data Locality
MAPREDUCE-2663. Minor bug reported by Ahmed Radwan and fixed by Ahmed Radwan (mrv2)
MR-279: Refactoring StateMachineFactory inner classes
MAPREDUCE-2661. Minor bug reported by Ahmed Radwan and fixed by Ahmed Radwan (mrv2)
MR-279: Accessing MapTaskImpl from TaskImpl
MAPREDUCE-2655. Major bug reported by Thomas Graves and fixed by Thomas Graves (mrv2)
MR279: Audit logs for YARN
MAPREDUCE-2649. Major bug reported by Thomas Graves and fixed by Thomas Graves (mrv2)
MR279: Fate of finished Applications on RM

New config added: // the maximum number of completed applications the RM keeps <name>yarn.server.resourcemanager.expire.applications.completed.max</name>
MAPREDUCE-2646. Critical bug reported by Sharad Agarwal and fixed by Sharad Agarwal (applicationmaster , mrv2)
MR-279: AM with same sized maps and reduces hangs in presence of failing maps
MAPREDUCE-2644. Major bug reported by Josh Wills and fixed by Josh Wills (mrv2)
NodeManager fails to create containers when NM_LOG_DIR is not explicitly set in the Configuration
MAPREDUCE-2641. Minor sub-task reported by Josh Wills and fixed by Josh Wills (mrv2)
Fix the ExponentiallySmoothedTaskRuntimeEstimator and its unit test
MAPREDUCE-2630. Minor bug reported by Josh Wills and fixed by Josh Wills (mrv2)
MR-279: refreshQueues leads to NPEs when used w/FifoScheduler
MAPREDUCE-2629. Minor improvement reported by Eric Caspole and fixed by Eric Caspole (task)
Class loading quirk prevents inner class method compilation
MAPREDUCE-2628. Minor bug reported by Jonathan Eagles and fixed by Jonathan Eagles (mrv2)
MR-279: Add compiled on date to NM and RM info/about page
MAPREDUCE-2625. Minor bug reported by Jonathan Eagles and fixed by Jonathan Eagles (mrv2)
MR-279: Add Node Manager Version to NM info page
MAPREDUCE-2624. Major improvement reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (contrib/raid)
Update RAID for HDFS-2107
MAPREDUCE-2623. Minor improvement reported by Jim Plush and fixed by Harsh J (test)
Update ClusterMapReduceTestCase to use MiniDFSCluster.Builder
MAPREDUCE-2622. Minor task reported by Harsh J and fixed by Harsh J (test)
Remove the last remaining reference to "io.sort.mb"
MAPREDUCE-2620. Major bug reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (contrib/raid)
Update RAID for HDFS-2087
MAPREDUCE-2618. Major bug reported by Jeffrey Naisbitt and fixed by Jeffrey Naisbitt (mrv2)
MR-279: 0 map, 0 reduce job fails with Null Pointer Exception
MAPREDUCE-2615. Major bug reported by Siddharth Seth and fixed by Siddharth Seth (mrv2)
MR 279: KillJob should go through AM whenever possible
MAPREDUCE-2611. Major improvement reported by Siddharth Seth and fixed by (mrv2)
MR 279: Metrics, finishTimes, etc in JobHistory
MAPREDUCE-2606. Major bug reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur
Remove IsolationRunner

IsolationRunner is no longer maintained. See MAPREDUCE-2637 for its replacement.
MAPREDUCE-2603. Major bug reported by Vinay Kumar Thota and fixed by Vinay Kumar Thota (contrib/gridmix)
Gridmix system tests are failing due to high ram emulation enable by default for normal mr jobs in the trace which exceeds the solt capacity.
MAPREDUCE-2602. Major improvement reported by Ahmed Radwan and fixed by Ahmed Radwan
Allow setting of end-of-record delimiter for TextInputFormat (for the old API)
MAPREDUCE-2598. Minor bug reported by Siddharth Seth and fixed by Siddharth Seth (mrv2)
MR 279: miscellaneous UI, NPE fixes for JobHistory, UI
MAPREDUCE-2596. Major improvement reported by Arun C Murthy and fixed by Amar Kamat (benchmarks , contrib/gridmix)
Gridmix should notify job failures

Gridmix now prints a summary information after every run. It summarizes the runs w.r.t input trace details, input data statistics, cli arguments, data-gen runtime, simulation runtimes etc and also the cluster w.r.t map slots, reduce slots, jobtracker-address, hdfs-address etc.
MAPREDUCE-2595. Minor bug reported by Thomas Graves and fixed by Thomas Graves
MR279: update yarn INSTALL doc
MAPREDUCE-2588. Major bug reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (contrib/raid)
Raid is not compile after DataTransferProtocol refactoring
MAPREDUCE-2587. Minor bug reported by Thomas Graves and fixed by Thomas Graves
MR279: Fix RM version in the cluster->about page
MAPREDUCE-2582. Major bug reported by Siddharth Seth and fixed by Siddharth Seth (mrv2)
MR 279: Cleanup JobHistory event generation
MAPREDUCE-2581. Trivial bug reported by Dave Syer and fixed by Tim Sell
Spelling errors in log messages (MapTask)
MAPREDUCE-2580. Minor improvement reported by Siddharth Seth and fixed by Siddharth Seth (mrv2)
MR 279: RM UI should redirect finished jobs to History UI
MAPREDUCE-2576. Trivial bug reported by Sherry Chen and fixed by Tim Sell
Typo in comment in SimulatorLaunchTaskAction.java
MAPREDUCE-2575. Major bug reported by Thomas Graves and fixed by Thomas Graves (test)
TestMiniMRDFSCaching fails if test.build.dir is set to something other than build/test
MAPREDUCE-2573. Major bug reported by Todd Lipcon and fixed by Robert Joseph Evans
New findbugs warning after MAPREDUCE-2494
MAPREDUCE-2569. Minor bug reported by Jonathan Eagles and fixed by Jonathan Eagles (mrv2)
MR-279: Restarting resource manager with root capacity not equal to 100 percent should result in error
MAPREDUCE-2566. Major bug reported by Siddharth Seth and fixed by Siddharth Seth (mrv2)
MR 279: YarnConfiguration should reloadConfiguration if instantiated with a non YarnConfiguration object
MAPREDUCE-2563. Major task reported by Vinay Kumar Thota and fixed by Vinay Kumar Thota (contrib/gridmix)
Gridmix high ram jobs emulation system tests.

Adds system tests to test the High-Ram feature in Gridmix.
MAPREDUCE-2559. Major bug reported by Eric Yang and fixed by Eric Yang (build)
ant binary fails due to missing c++ lib dir
MAPREDUCE-2556. Major bug reported by Siddharth Seth and fixed by Siddharth Seth (mrv2)
MR 279: NodeStatus.getNodeHealthStatus().setBlah broken
MAPREDUCE-2554. Major task reported by Vinay Kumar Thota and fixed by Vinay Kumar Thota (contrib/gridmix)
Gridmix distributed cache emulation system tests.

Adds distributed cache related system tests to Gridmix.
MAPREDUCE-2552. Minor bug reported by Siddharth Seth and fixed by Siddharth Seth (mrv2)
MR 279: NPE when requesting attemptids for completed jobs
MAPREDUCE-2551. Major improvement reported by Siddharth Seth and fixed by Siddharth Seth (mrv2)
MR 279: Implement JobSummaryLog
MAPREDUCE-2550. Blocker bug reported by Eric Yang and fixed by Eric Yang (build)
bin/mapred no longer works from a source checkout
MAPREDUCE-2544. Major task reported by Vinay Kumar Thota and fixed by Vinay Kumar Thota (contrib/gridmix)
Gridmix compression emulation system tests.

Adds system tests for testing the compression emulation feature of Gridmix.
MAPREDUCE-2543. Major new feature reported by Amar Kamat and fixed by Amar Kamat (contrib/gridmix)
[Gridmix] Add support for HighRam jobs

Adds High-Ram feature emulation in Gridmix.
MAPREDUCE-2541. Critical bug reported by Binglin Chang and fixed by Binglin Chang (tasktracker)
Race Condition in IndexCache(readIndexFileToCache,removeMap) causes value of totalMemoryUsed corrupt, which may cause TaskTracker continue throw Exception
MAPREDUCE-2537. Minor bug reported by Robert Joseph Evans and fixed by Robert Joseph Evans (mrv2)
MR-279: The RM writes its log to yarn-mapred-resourcemanager-<RM_Host>.out
MAPREDUCE-2536. Minor test reported by Daryn Sharp and fixed by Daryn Sharp (test)
TestMRCLI broke due to change in usage output
MAPREDUCE-2534. Major bug reported by Luke Lu and fixed by Luke Lu (mrv2)
MR-279: Fix CI breaking hard coded version in jobclient pom
MAPREDUCE-2533. Major new feature reported by Luke Lu and fixed by Luke Lu (mrv2)
MR-279: Metrics for reserved resource in ResourceManager
MAPREDUCE-2532. Major new feature reported by Luke Lu and fixed by Luke Lu (mrv2)
MR-279: Metrics for NodeManager
MAPREDUCE-2531. Blocker bug reported by Robert Joseph Evans and fixed by Robert Joseph Evans (client)
org.apache.hadoop.mapred.jobcontrol.getAssignedJobID throw class cast exception
MAPREDUCE-2529. Major bug reported by Thomas Graves and fixed by Thomas Graves (tasktracker)
Recognize Jetty bug 1342 and handle it

Added 2 new config parameters: mapreduce.reduce.shuffle.catch.exception.stack.regex mapreduce.reduce.shuffle.catch.exception.message.regex
MAPREDUCE-2527. Major new feature reported by Luke Lu and fixed by Luke Lu (mrv2)
MR-279: Metrics for MRAppMaster
MAPREDUCE-2522. Major sub-task reported by Siddharth Seth and fixed by Siddharth Seth (mrv2)
MR 279: Security for JobHistory service
MAPREDUCE-2521. Major new feature reported by Eric Yang and fixed by Eric Yang (build)
Mapreduce RPM integration project

Created rpm and debian packages for MapReduce.
MAPREDUCE-2518. Major bug reported by Wei Yongjun and fixed by Wei Yongjun (distcp)
missing t flag in distcp help message '-p[rbugp]'
MAPREDUCE-2517. Major task reported by Vinay Kumar Thota and fixed by Vinay Kumar Thota (contrib/gridmix)
Porting Gridmix v3 system tests into trunk branch.

Adds system tests to Gridmix. These system tests cover various features like job types (load and sleep), user resolvers (round-robin, submitter-user, echo) and submission modes (stress, replay and serial).
MAPREDUCE-2514. Trivial bug reported by Jonathan Eagles and fixed by Jonathan Eagles (tasktracker)
ReinitTrackerAction class name misspelled RenitTrackerAction in task tracker log
MAPREDUCE-2509. Major bug reported by Luke Lu and fixed by Luke Lu (mrv2)
MR-279: Fix NPE in UI for pending attempts
MAPREDUCE-2504. Major bug reported by Siddharth Seth and fixed by Siddharth Seth (mrv2)
MR 279: race in JobHistoryEventHandler stop
MAPREDUCE-2501. Major improvement reported by Luke Lu and fixed by Luke Lu (mrv2)
MR-279: Attach sources in builds
MAPREDUCE-2500. Major bug reported by Siddharth Seth and fixed by Siddharth Seth (mrv2)
MR 279: PB factories are not thread safe
MAPREDUCE-2497. Trivial bug reported by Robert Henry and fixed by Eli Collins
missing spaces in error messages
MAPREDUCE-2495. Minor improvement reported by Robert Joseph Evans and fixed by Robert Joseph Evans (distributed-cache)
The distributed cache cleanup thread has no monitoring to check to see if it has died for some reason
MAPREDUCE-2494. Major improvement reported by Robert Joseph Evans and fixed by Robert Joseph Evans (distributed-cache)
Make the distributed cache delete entires using LRU priority

Added config option mapreduce.tasktracker.cache.local.keep.pct to the TaskTracker. It is the target percentage of the local distributed cache that should be kept in between garbage collection runs. In practice it will delete unused distributed cache entries in LRU order until the size of the cache is less than mapreduce.tasktracker.cache.local.keep.pct of the maximum cache size. This is a floating point value between 0.0 and 1.0. The default is 0.95.
MAPREDUCE-2492. Major improvement reported by Amar Kamat and fixed by Amar Kamat (task)
[MAPREDUCE] The new MapReduce API should make available task's progress to the task

Map and Reduce task can access the attempt's overall progress via TaskAttemptContext.
MAPREDUCE-2490. Trivial improvement reported by Jonathan Eagles and fixed by Jonathan Eagles (jobtracker)
Log blacklist debug count
MAPREDUCE-2489. Major bug reported by Jeffrey Naisbitt and fixed by Jeffrey Naisbitt (jobtracker)
Jobsplits with random hostnames can make the queue unusable
MAPREDUCE-2483. Major bug reported by Eric Yang and fixed by Eric Yang (build)
Clean up duplication of dependent jar files

Removed duplicated hadoop-common library dependencies.
MAPREDUCE-2480. Major bug reported by Luke Lu and fixed by Luke Lu (mrv2)
MR-279: mr app should not depend on hard-coded version of shuffle
MAPREDUCE-2478. Major improvement reported by Siddharth Seth and fixed by Siddharth Seth (mrv2)
MR 279: Improve history server

Looks great. I just committed this. Thanks Siddharth!
MAPREDUCE-2475. Major bug reported by Suresh Srinivas and fixed by Suresh Srinivas (test)
Disable IPV6 for junit tests
MAPREDUCE-2474. Minor improvement reported by Harsh J and fixed by Harsh J (documentation)
Add docs to the new API Partitioner on how to access Job Configuration data

Improve the Partitioner interface's docs to help fetch Job Configuration objects.
MAPREDUCE-2473. Major new feature reported by Aaron T. Myers and fixed by Aaron T. Myers (jobtracker)
MR portion of HADOOP-7214 - Hadoop /usr/bin/groups equivalent

Introduces a new command, "mapred groups", which displays what groups are associated with a user as seen by the JobTracker.
MAPREDUCE-2470. Major bug reported by Aaron Baff and fixed by Robert Joseph Evans (client)
Receiving NPE occasionally on RunningJob.getCounters() call
MAPREDUCE-2469. Major improvement reported by Amar Kamat and fixed by Amar Kamat (task)
Task counters should also report the total heap usage of the task

Task attempt's total heap usage gets recorded and published via counters as COMMITTED_HEAP_BYTES.
MAPREDUCE-2467. Major bug reported by Suresh Srinivas and fixed by Suresh Srinivas (contrib/raid)
HDFS-1052 changes break the raid contrib module in MapReduce
MAPREDUCE-2466. Blocker bug reported by Todd Lipcon and fixed by Todd Lipcon
TestFileInputFormat.testLocality failing after federation merge
MAPREDUCE-2463. Major bug reported by Devaraj K and fixed by Devaraj K (jobtracker)
Job History files are not moving to done folder when job history location is hdfs location
MAPREDUCE-2462. Minor improvement reported by Siddharth Seth and fixed by Siddharth Seth (mrv2)
MR 279: Write job conf along with JobHistory, other minor improvements
MAPREDUCE-2460. Blocker bug reported by Todd Lipcon and fixed by Todd Lipcon
TestFairSchedulerSystem failing on Hudson
MAPREDUCE-2459. Major improvement reported by Mac Yang and fixed by Mac Yang (harchive)
Cache HAR filesystem metadata
MAPREDUCE-2458. Major bug reported by Luke Lu and fixed by Luke Lu (mrv2)
MR-279: Rename sanitized pom.xml in build directory to work around IDE bug
MAPREDUCE-2456. Trivial improvement reported by Jeffrey Naisbitt and fixed by Jeffrey Naisbitt (jobtracker)
Show the reducer taskid and map/reduce tasktrackers for "Failed fetch notification #_ for task attempt..." log messages
MAPREDUCE-2455. Major sub-task reported by Tom White and fixed by Tom White (build , client)
Remove deprecated JobTracker.State in favour of JobTrackerStatus
MAPREDUCE-2451. Trivial bug reported by Thomas Graves and fixed by Thomas Graves (jobtracker)
Log the reason string of healthcheck script
MAPREDUCE-2449. Minor improvement reported by Jeff Zemerick and fixed by Jeff Zemerick (contrib/eclipse-plugin)
Allow for command line arguments when performing "Run on Hadoop" action.
MAPREDUCE-2440. Major bug reported by Luke Lu and fixed by Luke Lu (mrv2)
MR-279: Name clashes in TypeConverter
MAPREDUCE-2439. Major bug reported by Mahadev konar and fixed by Siddharth Seth (mrv2)
MR-279: Fix YarnRemoteException to give more details.
MAPREDUCE-2438. Major new feature reported by Mahadev konar and fixed by Krishna Ramachandran (mrv2)
MR-279: WebApp for Job History
MAPREDUCE-2434. Major new feature reported by Luke Lu and fixed by Luke Lu (mrv2)
MR-279: ResourceManager metrics

I just committed this. Thanks Luke!
MAPREDUCE-2433. Blocker bug reported by Luke Lu and fixed by Mahadev konar (mrv2)
MR-279: YARNApplicationConstants hard code app master jar version
MAPREDUCE-2432. Major improvement reported by Luke Lu and fixed by Luke Lu (mrv2)
MR-279: Install sanitized poms for downstream sanity
MAPREDUCE-2430. Major task reported by Nigel Daley and fixed by Nigel Daley
Remove mrunit contrib

MRUnit is now available as a separate Apache project.
MAPREDUCE-2429. Major bug reported by Arun C Murthy and fixed by Siddharth Seth (tasktracker)
Check jvmid during task status report
MAPREDUCE-2428. Blocker bug reported by Tom White and fixed by Tom White
start-mapred.sh script fails if HADOOP_HOME is not set
MAPREDUCE-2426. Trivial test reported by Todd Lipcon and fixed by Todd Lipcon (contrib/fair-share)
Make TestFairSchedulerSystem fail with more verbose output
MAPREDUCE-2424. Major improvement reported by Greg Roelofs and fixed by Greg Roelofs (mrv2)
MR-279: counters/UI/etc. for uber-AppMaster (in-cluster LocalJobRunner for MRv2)
MAPREDUCE-2422. Major sub-task reported by Tom White and fixed by Tom White (client)
Removed unused internal methods from DistributedCache
MAPREDUCE-2417. Major bug reported by Ravi Gummadi and fixed by Ravi Gummadi (contrib/gridmix)
In Gridmix, in RoundRobinUserResolver mode, the testing/proxy users are not associated with unique users in a trace

Fixes Gridmix in RoundRobinUserResolver mode to map testing/proxy users to unique users in a trace.
MAPREDUCE-2416. Major bug reported by Ravi Gummadi and fixed by Ravi Gummadi (contrib/gridmix)
In Gridmix, in RoundRobinUserResolver, the list of groups for a user obtained from users-list-file is incorrect

Removes the restriction of specifying group names in users-list file for Gridmix in RoundRobinUserResolver mode.
MAPREDUCE-2414. Major improvement reported by Arun C Murthy and fixed by Siddharth Seth (mrv2)
MR-279: Use generic interfaces for protocols
MAPREDUCE-2409. Major bug reported by Siddharth Seth and fixed by Siddharth Seth (distributed-cache)
Distributed Cache does not differentiate between file /archive for files with the same path
MAPREDUCE-2408. Major new feature reported by Ravi Gummadi and fixed by Amar Kamat (contrib/gridmix)
Make Gridmix emulate usage of data compression

Emulates the MapReduce compression feature in Gridmix. By default, compression emulation is turned on. Compression emulation can be disabled by setting 'gridmix.compression-emulation.enable' to 'false'. Use 'gridmix.compression-emulation.map-input.decompression-ratio', 'gridmix.compression-emulation.map-output.compression-ratio' and 'gridmix.compression-emulation.reduce-output.compression-ratio' to configure the compression ratios at map input, map output and reduce output side respectively. Currently, compression ratios in the range [0.07, 0.68] are supported. Gridmix auto detects whether map-input, map output and reduce output should emulate compression based on original job's compression related configuration parameters.
MAPREDUCE-2405. Major improvement reported by Mahadev konar and fixed by Greg Roelofs (mrv2)
MR-279: Implement uber-AppMaster (in-cluster LocalJobRunner for MRv2)

An efficient implementation of small jobs by running all tasks in the MR ApplicationMaster JVM, there-by affecting lower latency.
MAPREDUCE-2403. Major improvement reported by Mahadev konar and fixed by Krishna Ramachandran (mrv2)
MR-279: Improve job history event handling in AM to log to HDFS
MAPREDUCE-2399. Major improvement reported by Arun C Murthy and fixed by Luke Lu
The embedded web framework for MAPREDUCE-279
MAPREDUCE-2395. Critical bug reported by Todd Lipcon and fixed by Ramkumar Vadali (contrib/raid)
TestBlockFixer timing out on trunk
MAPREDUCE-2381. Major improvement reported by Philip Zeyliger and fixed by Philip Zeyliger
JobTracker instrumentation not consistent about error handling
MAPREDUCE-2379. Major bug reported by Todd Lipcon and fixed by Todd Lipcon (distributed-cache , documentation)
Distributed cache sizing configurations are missing from mapred-default.xml
MAPREDUCE-2367. Minor improvement reported by Todd Lipcon and fixed by Todd Lipcon
Allow using a file to exclude certain tests from build
MAPREDUCE-2365. Major bug reported by Owen O'Malley and fixed by Siddharth Seth
Add counters for FileInputFormat (BYTES_READ) and FileOutputFormat (BYTES_WRITTEN)
MAPREDUCE-2351. Major improvement reported by Tom White and fixed by Tom White
mapred.job.tracker.history.completed.location should support an arbitrary filesystem URI
MAPREDUCE-2331. Major test reported by Todd Lipcon and fixed by Todd Lipcon
Add coverage of task graph servlet to fair scheduler system test
MAPREDUCE-2326. Major improvement reported by Arun C Murthy and fixed by
Port gridmix changes from hadoop-0.20.100 to trunk
MAPREDUCE-2323. Major new feature reported by Todd Lipcon and fixed by Todd Lipcon (contrib/fair-share)
Add metrics to the fair scheduler
MAPREDUCE-2317. Minor bug reported by Devaraj K and fixed by Devaraj K (harchive)
HadoopArchives throwing NullPointerException while creating hadoop archives (.har files)
MAPREDUCE-2311. Blocker bug reported by Todd Lipcon and fixed by Scott Chen (contrib/fair-share)
TestFairScheduler failing on trunk
MAPREDUCE-2307. Minor bug reported by Devaraj K and fixed by Devaraj K (contrib/fair-share)
Exception thrown in Jobtracker logs, when the Scheduler configured is FairScheduler.
MAPREDUCE-2302. Major improvement reported by Scott Chen and fixed by Scott Chen (contrib/raid)
Add static factory methods in GaloisField
MAPREDUCE-2290. Major bug reported by Eli Collins and fixed by Eli Collins (test)
TestTaskCommit missing getProtocolSignature override
MAPREDUCE-2271. Blocker bug reported by Todd Lipcon and fixed by Liyin Liang (jobtracker)
TestSetupTaskScheduling failing in trunk
MAPREDUCE-2263. Major improvement reported by Hairong Kuang and fixed by Hairong Kuang
MapReduce side of HADOOP-6904
MAPREDUCE-2260. Major improvement reported by Roman Shaposhnik and fixed by Roman Shaposhnik (build)
Remove auto-generated native build files

The native build run when from trunk now requires autotools, libtool and openssl dev libraries.
MAPREDUCE-2258. Major bug reported by Todd Lipcon and fixed by Todd Lipcon (task)
IFile reader closes stream and compressor in wrong order
MAPREDUCE-2254. Major improvement reported by Ahmed Radwan and fixed by Ahmed Radwan
Allow setting of end-of-record delimiter for TextInputFormat

TextInputFormat may now split lines with delimiters other than newline, by specifying a configuration parameter "textinputformat.record.delimiter"
MAPREDUCE-2250. Trivial improvement reported by Ramkumar Vadali and fixed by Ramkumar Vadali (contrib/raid)
Fix logging in raid code.
MAPREDUCE-2249. Major improvement reported by Bhallamudi Venkata Siva Kamesh and fixed by Devaraj K
Better to check the reflexive property of the object while overriding equals method of it
MAPREDUCE-2248. Major improvement reported by Ramkumar Vadali and fixed by Ramkumar Vadali
DistributedRaidFileSystem should unraid only the corrupt block
MAPREDUCE-2243. Minor improvement reported by Bhallamudi Venkata Siva Kamesh and fixed by Devaraj K (jobtracker , tasktracker)
Close all the file streams propely in a finally block to avoid their leakage.
MAPREDUCE-2239. Major improvement reported by Scott Chen and fixed by Scott Chen (contrib/raid)
BlockPlacementPolicyRaid should call getBlockLocations only when necessary
MAPREDUCE-2225. Blocker improvement reported by Harsh J and fixed by Harsh J (job submission)
MultipleOutputs should not require the use of 'Writable'

MultipleOutputs should not require the use/check of 'Writable' interfaces in key and value classes.
MAPREDUCE-2215. Major bug reported by Patrick Kling and fixed by Patrick Kling (contrib/raid)
A more elegant FileSystem#listCorruptFileBlocks API (RAID changes)
MAPREDUCE-2207. Major improvement reported by Scott Chen and fixed by Liyin Liang (jobtracker)
Task-cleanup task should not be scheduled on the node that the task just failed

Task-cleanup task should not be scheduled on the node that the task just failed
MAPREDUCE-2206. Major improvement reported by Scott Chen and fixed by Scott Chen (jobtracker)
The task-cleanup tasks should be optional
MAPREDUCE-2203. Trivial improvement reported by Jingguo Yao and fixed by Jingguo Yao
Wong javadoc for TaskRunner's appendJobJarClasspaths method
MAPREDUCE-2202. Major improvement reported by Konstantin Boudnik and fixed by Konstantin Boudnik
Generalize CLITest structure and interfaces to facilitate upstream adoption (e.g. for web or system testing)
MAPREDUCE-2199. Major bug reported by Konstantin Boudnik and fixed by Konstantin Boudnik (build)
build is broken 0.22 branch creation
MAPREDUCE-2185. Major bug reported by Hairong Kuang and fixed by Ramkumar Vadali (job submission)
Infinite loop at creating splits using CombineFileInputFormat
MAPREDUCE-2172. Major bug reported by Patrick Kling and fixed by Nigel Daley
test-patch.properties contains incorrect/version-dependent values of OK_FINDBUGS_WARNINGS and OK_RELEASEAUDIT_WARNINGS
MAPREDUCE-2156. Major improvement reported by Patrick Kling and fixed by Patrick Kling (contrib/raid)
Raid-aware FSCK
MAPREDUCE-2155. Major improvement reported by Patrick Kling and fixed by Patrick Kling (contrib/raid)
RaidNode should optionally dispatch map reduce jobs to fix corrupt blocks (instead of fixing locally)
MAPREDUCE-2153. Major improvement reported by Ravi Gummadi and fixed by Rajesh Balamohan (tools/rumen)
Bring in more job configuration properties in to the trace file

Adds job configuration parameters to the job trace. The configuration parameters are stored under the 'jobProperties' field as key-value pairs.
MAPREDUCE-2137. Major bug reported by Ravi Gummadi and fixed by Ravi Gummadi (contrib/gridmix)
Mapping between Gridmix jobs and the corresponding original MR jobs is needed

New configuration properties gridmix.job.original-job-id and gridmix.job.original-job-name in the configuration of simulated job are exposed/documented to gridmix user for mapping between original cluster's jobs and simulated jobs.
MAPREDUCE-2127. Major bug reported by Giridharan Kesavan and fixed by Bruno Mahé (build , pipes)
mapreduce trunk builds are failing on hudson
MAPREDUCE-2107. Major improvement reported by Ranjit Mathew and fixed by Amar Kamat (contrib/gridmix)
Emulate Memory Usage of Tasks in GridMix3

Adds total heap usage emulation to Gridmix. Also, Gridmix can configure the simulated task's JVM heap options with max heap options obtained from the original task (via Rumen). Use 'gridmix.task.jvm-options.enable' to disable the task max heap options configuration.
MAPREDUCE-2106. Major improvement reported by Ranjit Mathew and fixed by Amar Kamat (contrib/gridmix)
Emulate CPU Usage of Tasks in GridMix3

Adds cumulative cpu usage emulation to Gridmix
MAPREDUCE-2105. Major improvement reported by Ranjit Mathew and fixed by Amar Kamat (contrib/gridmix)
Simulate Load Incrementally and Adaptively in GridMix3
MAPREDUCE-2104. Major bug reported by Ranjit Mathew and fixed by Amar Kamat (tools/rumen)
Rumen TraceBuilder Does Not Emit CPU/Memory Usage Details in Traces

Adds cpu, physical memory, virtual memory and heap usages to TraceBuilder's output.
MAPREDUCE-2081. Major test reported by Vinay Kumar Thota and fixed by Vinay Kumar Thota (contrib/gridmix)
[GridMix3] Implement functionality for get the list of job traces which has different intervals.
MAPREDUCE-2074. Minor bug reported by Koji Noguchi and fixed by Priyo Mustafi (distributed-cache)
Task should fail when symlink creation fail
MAPREDUCE-2053. Major task reported by Vinay Kumar Thota and fixed by Vinay Kumar Thota (contrib/gridmix)
[Herriot] Test Gridmix file pool for different input file sizes based on pool minimum size.
MAPREDUCE-2037. Major new feature reported by Dick King and fixed by Dick King
Capturing interim progress times, CPU usage, and memory usage, when tasks reach certain progress thresholds

Capture intermediate task resource consumption information: * Time taken so far * CPU load [either at the time the data are taken, or exponentially smoothed] * Memory load [also either at the time the data are taken, or exponentially smoothed] This would be taken at intervals that depend on the task progress plateaus. For example, reducers have three progress ranges - [0-1/3], (1/3-2/3], and (2/3-3/3] - where fundamentally different activities happen. Mappers have different boundaries that are not symmetrically placed [0-9/10], (9/10-1]. Data capture boundaries should coincide with activity boundaries. For the state information capture [CPU and memory] we should average over the covered interval.
MAPREDUCE-2033. Major task reported by Vinay Kumar Thota and fixed by Vinay Kumar Thota (contrib/gridmix)
[Herriot] Gridmix generate data tests with various submission policies and different user resolvers.
MAPREDUCE-2026. Major improvement reported by Scott Chen and fixed by Joydeep Sen Sarma
JobTracker.getJobCounters() should not hold JobTracker lock while calling JobInProgress.getCounters()
MAPREDUCE-1996. Trivial bug reported by Glynn Durham and fixed by Harsh J (documentation)
API: Reducer.reduce() method detail misstatement

Fix a misleading documentation note about the usage of Reporter objects in Reducers.
MAPREDUCE-1978. Major improvement reported by Amar Kamat and fixed by Ravi Gummadi (tools/rumen)
[Rumen] TraceBuilder should provide recursive input folder scanning

Adds -recursive option to TraceBuilder for scanning the input directories recursively.
MAPREDUCE-1938. Blocker new feature reported by Devaraj Das and fixed by Krishna Ramachandran (job submission , task , tasktracker)
Ability for having user's classes take precedence over the system classes for tasks' classpath
MAPREDUCE-1927. Minor test reported by Greg Roelofs and fixed by Greg Roelofs (test)
unit test for HADOOP-6835 (concatenated gzip support)
MAPREDUCE-1906. Major improvement reported by Scott Carey and fixed by Todd Lipcon (jobtracker , performance , tasktracker)
Lower default minimum heartbeat interval for tasktracker > Jobtracker

The default minimum heartbeat interval has been dropped from 3 seconds to 300ms to increase scheduling throughput on small clusters. Users may tune mapreduce.jobtracker.heartbeats.in.second to adjust this value.
MAPREDUCE-1831. Major improvement reported by Scott Chen and fixed by Scott Chen (contrib/raid)
BlockPlacement policy for RAID
MAPREDUCE-1811. Minor bug reported by Amareshwari Sriramadasu and fixed by Harsh J (client)
Job.monitorAndPrintJob() should print status of the job at completion

Print the resultant status of a Job on completion instead of simply saying 'Complete'.
MAPREDUCE-1788. Major bug reported by Arun C Murthy and fixed by Arun C Murthy (client)
o.a.h.mapreduce.Job shouldn't make a copy of the JobConf
MAPREDUCE-1783. Major improvement reported by Ramkumar Vadali and fixed by Ramkumar Vadali (contrib/fair-share)
Task Initialization should be delayed till when a job can be run
MAPREDUCE-1752. Major improvement reported by Dmytro Molkov and fixed by Dmytro Molkov (harchive)
Implement getFileBlockLocations in HarFilesystem
MAPREDUCE-1738. Major improvement reported by Luke Lu and fixed by Luke Lu
MapReduce portion of HADOOP-6728 (ovehaul metrics framework)
MAPREDUCE-1706. Major improvement reported by Rodrigo Schmidt and fixed by Scott Chen (contrib/raid)
Log RAID recoveries on HDFS
MAPREDUCE-1702. Minor improvement reported by Jaideep and fixed by (contrib/gridmix)
CPU/Memory emulation for GridMix3
MAPREDUCE-1624. Major improvement reported by Devaraj Das and fixed by Devaraj Das (documentation)
Document the job credentials and associated details to do with delegation tokens (on the client side)
MAPREDUCE-1461. Major improvement reported by Rajesh Balamohan and fixed by Rajesh Balamohan (tools/rumen)
Feature to instruct rumen-folder utility to skip jobs worth of specific duration

Added a ''-starts-after' option to Rumen's Folder utility. The time duration specified after the '-starts-after' option is an offset with respect to the submit time of the first job in the input trace. Jobs in the input trace having a submit time (relative to the first job's submit time) lesser than the specified offset will be ignored.
MAPREDUCE-1334. Major bug reported by Karthik K and fixed by Karthik K (contrib/index)
contrib/index - test - TestIndexUpdater fails due to an additional presence of file _SUCCESS in hdfs
MAPREDUCE-1242. Trivial bug reported by Amogh Vasekar and fixed by Harsh J
Chain APIs error misleading

Fix a misleading exception message in case the Chained Mappers have mismatch in input/output Key/Value pairs between them.
MAPREDUCE-1207. Blocker improvement reported by Arun C Murthy and fixed by Arun C Murthy (client , mrv2)
Allow admins to set java options for map/reduce tasks
MAPREDUCE-1159. Trivial improvement reported by Zheng Shao and fixed by Harsh J
Limit Job name on jobtracker.jsp to be 80 char long

Job names on jobtracker.jsp should be 80 characters long at most.
MAPREDUCE-993. Minor bug reported by Iyappan Srinivasan and fixed by Harsh J (jobtracker)
bin/hadoop job -events <jobid> <from-event-#> <#-of-events> help message is confusing

Added a helpful description message to the `mapred job -events` command.
MAPREDUCE-901. Major improvement reported by Owen O'Malley and fixed by Luke Lu (task)
Move Framework Counters into a TaskMetric structure

Efficient implementation of MapReduce framework counters.
MAPREDUCE-587. Minor bug reported by Steve Loughran and fixed by Amar Kamat (contrib/streaming)
Stream test TestStreamingExitStatus fails with Out of Memory

Fixed the streaming test TestStreamingExitStatus's failure due to an OutOfMemory error by reducing the testcase's io.sort.mb.
MAPREDUCE-517. Critical bug reported by Arun C Murthy and fixed by Arun C Murthy
The capacity-scheduler should assign multiple tasks per heartbeat
MAPREDUCE-461. Minor new feature reported by Fredrik Hedberg and fixed by Fredrik Hedberg
Enable ServicePlugins for the JobTracker
MAPREDUCE-279. Major improvement reported by Arun C Murthy and fixed by (mrv2)
Map-Reduce 2.0

MapReduce has undergone a complete re-haul in hadoop-0.23 and we now have, what we call, MapReduce 2.0 (MRv2). The fundamental idea of MRv2 is to split up the two major functionalities of the JobTracker, resource management and job scheduling/monitoring, into separate daemons. The idea is to have a global ResourceManager (RM) and per-application ApplicationMaster (AM). An application is either a single job in the classical sense of Map-Reduce jobs or a DAG of jobs. The ResourceManager and per-node slave, the NodeManager (NM), form the data-computation framework. The ResourceManager is the ultimate authority that arbitrates resources among all the applications in the system. The per-application ApplicationMaster is, in effect, a framework specific library and is tasked with negotiating resources from the ResourceManager and working with the NodeManager(s) to execute and monitor the tasks. The ResourceManager has two main components: * Scheduler (S) * ApplicationsManager (ASM) The Scheduler is responsible for allocating resources to the various running applications subject to familiar constraints of capacities, queues etc. The Scheduler is pure scheduler in the sense that it performs no monitoring or tracking of status for the application. Also, it offers no guarantees on restarting failed tasks either due to application failure or hardware failures. The Scheduler performs its scheduling function based the resource requirements of the applications; it does so based on the abstract notion of a Resource Container which incorporates elements such as memory, cpu, disk, network etc. The Scheduler has a pluggable policy plug-in, which is responsible for partitioning the cluster resources among the various queues, applications etc. The current Map-Reduce schedulers such as the CapacityScheduler and the FairScheduler would be some examples of the plug-in. The CapacityScheduler supports hierarchical queues to allow for more predictable sharing of cluster resources. The ApplicationsManager is responsible for accepting job-submissions, negotiating the first container for executing the application specific ApplicationMaster and provides the service for restarting the ApplicationMaster container on failure. The NodeManager is the per-machine framework agent who is responsible for launching the applications' containers, monitoring their resource usage (cpu, memory, disk, network) and reporting the same to the Scheduler. The per-application ApplicationMaster has the responsibility of negotiating appropriate resource containers from the Scheduler, tracking their status and monitoring for progress.
HDFS-2540. Major sub-task reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE
Change WebHdfsFileSystem to two-step create/append
HDFS-2539. Major sub-task reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (webhdfs)
Support doAs and GETHOMEDIRECTORY in webhdfs
HDFS-2528. Major sub-task reported by Arpit Gupta and fixed by Tsz Wo (Nicholas), SZE (webhdfs)
webhdfs rest call to a secure dn fails when a token is sent
HDFS-2527. Major sub-task reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (webhdfs)
Remove the use of Range header from webhdfs
HDFS-2522. Minor test reported by Suresh Srinivas and fixed by Suresh Srinivas
Disable TestDfsOverAvroRpc in 0.23
HDFS-2521. Major improvement reported by Todd Lipcon and fixed by Todd Lipcon (data-node , hdfs client)
Remove custom checksum headers from data transfer protocol
HDFS-2512. Major improvement reported by Todd Lipcon and fixed by Todd Lipcon (data-node , hdfs client)
Add textual error message to data transfer protocol responses
HDFS-2501. Major sub-task reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (webhdfs)
add version prefix and root methods to webhdfs
HDFS-2500. Major improvement reported by Todd Lipcon and fixed by Todd Lipcon (data-node)
Avoid file system operations in BPOfferService thread while processing deletes
HDFS-2494. Major sub-task reported by Uma Maheswara Rao G and fixed by Uma Maheswara Rao G (webhdfs)
[webhdfs] When Getting the file using OP=OPEN with DN http address, ESTABLISHED sockets are growing.
HDFS-2493. Major sub-task reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (name-node)
Remove reference to FSNamesystem in blockmanagement classes
HDFS-2485. Trivial improvement reported by Steve Loughran and fixed by Steve Loughran (data-node)
Improve code layout and constants in UnderReplicatedBlocks
HDFS-2471. Major new feature reported by Suresh Srinivas and fixed by Suresh Srinivas (documentation)
Add Federation feature, configuration and tools documentation
HDFS-2467. Major bug reported by Owen O'Malley and fixed by Owen O'Malley
HftpFileSystem uses incorrect compare for finding delegation tokens
HDFS-2465. Major improvement reported by Todd Lipcon and fixed by Todd Lipcon (data-node , performance)
Add HDFS support for fadvise readahead and drop-behind

HDFS now has the ability to use posix_fadvise and sync_data_range syscalls to manage the OS buffer cache. This support is currently considered experimental, and may be enabled by configuring the following keys: dfs.datanode.drop.cache.behind.writes - set to true to drop data out of the buffer cache after writing dfs.datanode.drop.cache.behind.reads - set to true to drop data out of the buffer cache when performing sequential reads dfs.datanode.sync.behind.writes - set to true to trigger dirty page writeback immediately after writing data dfs.datanode.readahead.bytes - set to a non-zero value to trigger readahead for sequential reads
HDFS-2453. Major sub-task reported by Arpit Gupta and fixed by Tsz Wo (Nicholas), SZE (webhdfs)
tail using a webhdfs uri throws an error
HDFS-2452. Major bug reported by Konstantin Shvachko and fixed by Uma Maheswara Rao G (data-node)
OutOfMemoryError in DataXceiverServer takes down the DataNode
HDFS-2445. Major bug reported by Jonathan Eagles and fixed by Jonathan Eagles (test)
Incorrect exit code for hadoop-hdfs-test tests when exception thrown
HDFS-2441. Major sub-task reported by Arpit Gupta and fixed by Tsz Wo (Nicholas), SZE (webhdfs)
webhdfs returns two content-type headers
HDFS-2439. Major sub-task reported by Arpit Gupta and fixed by Tsz Wo (Nicholas), SZE (webhdfs)
webhdfs open an invalid path leads to a 500 which states a npe, we should return a 404 with appropriate error message
HDFS-2436. Major bug reported by Arpit Gupta and fixed by Uma Maheswara Rao G
FSNamesystem.setTimes(..) expects the path is a file.
HDFS-2432. Major sub-task reported by Arpit Gupta and fixed by Tsz Wo (Nicholas), SZE (webhdfs)
webhdfs setreplication api should return a 403 when called on a directory
HDFS-2428. Major sub-task reported by Arpit Gupta and fixed by Tsz Wo (Nicholas), SZE (webhdfs)
webhdfs api parameter validation should be better
HDFS-2427. Major sub-task reported by Arpit Gupta and fixed by Tsz Wo (Nicholas), SZE (webhdfs)
webhdfs mkdirs api call creates path with 777 permission, we should default it to 755
HDFS-2422. Major bug reported by Jeff Bean and fixed by Aaron T. Myers (name-node)
The NN should tolerate the same number of low-resource volumes as failed volumes
HDFS-2416. Major sub-task reported by Arpit Gupta and fixed by Jitendra Nath Pandey (webhdfs)
distcp with a webhdfs uri on a secure cluster fails
HDFS-2414. Critical bug reported by Robert Joseph Evans and fixed by Todd Lipcon (name-node , test)
TestDFSRollback fails intermittently
HDFS-2412. Blocker bug reported by Todd Lipcon and fixed by Todd Lipcon
Add backwards-compatibility layer for FSConstants
HDFS-2411. Major bug reported by Arpit Gupta and fixed by Jitendra Nath Pandey (webhdfs)
with webhdfs enabled in secure mode the auth to local mappings are not being respected.
HDFS-2409. Major bug reported by Jitendra Nath Pandey and fixed by Jitendra Nath Pandey
_HOST in dfs.web.authentication.kerberos.principal.
HDFS-2404. Major sub-task reported by Arpit Gupta and fixed by Suresh Srinivas (webhdfs)
webhdfs liststatus json response is not correct
HDFS-2403. Major sub-task reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE
The renewer in NamenodeWebHdfsMethods.generateDelegationToken(..) is not used
HDFS-2401. Major improvement reported by Jonathan Eagles and fixed by Jonathan Eagles (build)
Running a set of methods in a Single Test Class
HDFS-2395. Critical sub-task reported by Arpit Gupta and fixed by Tsz Wo (Nicholas), SZE (webhdfs)
webhdfs api's should return a root element in the json response
HDFS-2385. Major sub-task reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (webhdfs)
Support delegation token renewal in webhdfs
HDFS-2371. Major improvement reported by Suresh Srinivas and fixed by Suresh Srinivas (data-node)
Refactor BlockSender.java for better readability
HDFS-2368. Major bug reported by Arpit Gupta and fixed by Tsz Wo (Nicholas), SZE
defaults created for web keytab and principal, these properties should not have defaults
HDFS-2366. Major sub-task reported by Arpit Gupta and fixed by Tsz Wo (Nicholas), SZE (webhdfs)
webhdfs throws a npe when ugi is null from getDelegationToken
HDFS-2363. Minor sub-task reported by Uma Maheswara Rao G and fixed by Uma Maheswara Rao G (name-node)
Move datanodes size printing to BlockManager from FSNameSystem's metasave API
HDFS-2361. Critical bug reported by Rajit Saha and fixed by Jitendra Nath Pandey (name-node)
hftp is broken
HDFS-2356. Major sub-task reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (webhdfs)
webhdfs: support case insensitive query parameter names
HDFS-2355. Major improvement reported by Suresh Srinivas and fixed by Suresh Srinivas (name-node)
Federation: enable using the same configuration file across all the nodes in the cluster.

This change allows when running multiple namenodes on different hosts, sharing the same configuration file across all the nodes in the cluster (Datanodes, NamNode, BackupNode, SecondaryNameNode), without the need to define dfs.federation.nameservice.id parameter.
HDFS-2348. Major sub-task reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (webhdfs)
Support getContentSummary and getFileChecksum in webhdfs
HDFS-2347. Trivial bug reported by Uma Maheswara Rao G and fixed by Uma Maheswara Rao G (name-node)
checkpointTxnCount's comment still saying about editlog size
HDFS-2346. Blocker bug reported by Uma Maheswara Rao G and fixed by Laxman (test)
TestHost2NodesMap & TestReplicasMap will fail depending upon execution order of test methods
HDFS-2344. Major bug reported by Uma Maheswara Rao G and fixed by Uma Maheswara Rao G (test)
Fix the TestOfflineEditsViewer test failure in 0.23 branch
HDFS-2340. Major sub-task reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (webhdfs)
Support getFileBlockLocations and getDelegationToken in webhdfs
HDFS-2338. Major sub-task reported by Jitendra Nath Pandey and fixed by Jitendra Nath Pandey (webhdfs)
Configuration option to enable/disable webhdfs.

Added a conf property dfs.webhdfs.enabled for enabling/disabling webhdfs.
HDFS-2333. Major bug reported by Ivan Kelly and fixed by Tsz Wo (Nicholas), SZE
HDFS-2284 introduced 2 findbugs warnings on trunk
HDFS-2332. Major test reported by Todd Lipcon and fixed by Todd Lipcon (test)
Add test for HADOOP-7629: using an immutable FsPermission as an IPC parameter
HDFS-2331. Major bug reported by Abhijit Suresh Shingate and fixed by Abhijit Suresh Shingate (hdfs client)
Hdfs compilation fails
HDFS-2323. Major bug reported by Tom White and fixed by Tom White
start-dfs.sh script fails for tarball install
HDFS-2322. Major bug reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur (build)
the build fails in Windows because commons-daemon TAR cannot be fetched
HDFS-2318. Major sub-task reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (webhdfs)
Provide authentication to webhdfs using SPNEGO

Added two new conf properties dfs.web.authentication.kerberos.principal and dfs.web.authentication.kerberos.keytab for the SPNEGO servlet filter.
HDFS-2317. Major sub-task reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE
Read access to HDFS using HTTP REST
HDFS-2314. Major bug reported by Vinod Kumar Vavilapalli and fixed by Todd Lipcon (test)
MRV1 test compilation broken after HDFS-2197
HDFS-2294. Major improvement reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur (build)
Download of commons-daemon TAR should not be under target
HDFS-2290. Major bug reported by Konstantin Shvachko and fixed by Benoy Antony (name-node)
Block with corrupt replica is not getting replicated
HDFS-2289. Blocker bug reported by Arun C Murthy and fixed by Alejandro Abdelnur
jsvc isn't part of the artifact
HDFS-2286. Trivial improvement reported by Todd Lipcon and fixed by Todd Lipcon (data-node)
DataXceiverServer logs AsynchronousCloseException at shutdown
HDFS-2284. Major sub-task reported by Sanjay Radia and fixed by Tsz Wo (Nicholas), SZE
Write Http access to HDFS
HDFS-2273. Minor improvement reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (name-node)
Refactor BlockManager.recentInvalidateSets to a new class
HDFS-2267. Trivial bug reported by Todd Lipcon and fixed by Todd Lipcon (data-node)
DataXceiver thread name incorrect while waiting on op during keepalive
HDFS-2266. Major sub-task reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (name-node)
Add a Namesystem interface to avoid directly referring to FSNamesystem
HDFS-2265. Major sub-task reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (name-node)
Remove unnecessary BlockTokenSecretManager fields/methods from BlockManager
HDFS-2260. Major improvement reported by Todd Lipcon and fixed by Todd Lipcon (hdfs client)
Refactor BlockReader into an interface and implementation
HDFS-2258. Major bug reported by Konstantin Shvachko and fixed by Konstantin Shvachko (name-node , test)
TestLeaseRecovery2 fails as lease hard limit is not reset to default
HDFS-2245. Major bug reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (name-node)
BlockManager.chooseTarget(..) throws NPE
HDFS-2241. Major improvement reported by Suresh Srinivas and fixed by Suresh Srinivas
Remove implementing FSConstants interface just to access the constants defined in the interface
HDFS-2240. Critical bug reported by Todd Lipcon and fixed by Tsz Wo (Nicholas), SZE (hdfs client)
Possible deadlock between LeaseRenewer and its factory
HDFS-2239. Major sub-task reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (name-node)
Reduce access levels of the fields and methods in FSNamesystem
HDFS-2238. Minor improvement reported by Tsz Wo (Nicholas), SZE and fixed by Uma Maheswara Rao G (name-node)
NamenodeFsck.toString() uses StringBuilder with + operator
HDFS-2237. Minor sub-task reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (name-node)
Change UnderReplicatedBlocks from public to package private
HDFS-2235. Major bug reported by Eli Collins and fixed by Eli Collins (name-node)
Encode servlet paths
HDFS-2233. Major test reported by Eli Collins and fixed by Eli Collins (name-node)
Add WebUI tests with URI reserved chars in the path and filename
HDFS-2232. Blocker bug reported by Konstantin Shvachko and fixed by Plamen Jeliazkov (test)
TestHDFSCLI fails on 0.22 branch
HDFS-2230. Major improvement reported by Giridharan Kesavan and fixed by Giridharan Kesavan (build)
hdfs it not resolving the latest common test jars published post common mavenization
HDFS-2229. Blocker bug reported by Vinod Kumar Vavilapalli and fixed by Tsz Wo (Nicholas), SZE (name-node)
Deadlock in NameNode
HDFS-2228. Major sub-task reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (name-node)
Move block and datanode code from FSNamesystem to BlockManager and DatanodeManager
HDFS-2227. Major improvement reported by Ivan Kelly and fixed by Ivan Kelly
HDFS-2018 Part 2 : getRemoteEditLogManifest should pull it's information from FileJournalManager
HDFS-2226. Trivial improvement reported by Todd Lipcon and fixed by Todd Lipcon (name-node)
Clean up counting of operations in FSEditLogLoader
HDFS-2225. Major improvement reported by Ivan Kelly and fixed by Ivan Kelly
HDFS-2018 Part 1 : Refactor file management so its not in classes which should be generic
HDFS-2212. Major improvement reported by Todd Lipcon and fixed by Todd Lipcon (name-node)
Refactor double-buffering code out of EditLogOutputStreams
HDFS-2210. Major task reported by Eli Collins and fixed by Eli Collins (contrib/hdfsproxy)
Remove hdfsproxy

The hdfsproxy contrib component is no longer supported.
HDFS-2209. Minor improvement reported by Steve Loughran and fixed by Steve Loughran (test)
Make MiniDFS easier to embed in other apps
HDFS-2205. Major improvement reported by Ravi Prakash and fixed by Ravi Prakash (hdfs client)
Log message for failed connection to datanode is not followed by a success message.
HDFS-2202. Major new feature reported by Eric Payne and fixed by Eric Payne (balancer , data-node)
Changes to balancer bandwidth should not require datanode restart.

New dfsadmin command added: [-setBalancerBandwidth <bandwidth>] where bandwidth is max network bandwidth in bytes per second that the balancer is allowed to use on each datanode during balacing. This is an incompatible change in 0.23. The versions of ClientProtocol and DatanodeProtocol are changed.
HDFS-2200. Minor sub-task reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (name-node)
Set FSNamesystem.LOG to package private
HDFS-2199. Major sub-task reported by Tsz Wo (Nicholas), SZE and fixed by Uma Maheswara Rao G (name-node)
Move blockTokenSecretManager from FSNamesystem to BlockManager
HDFS-2198. Minor improvement reported by Suresh Srinivas and fixed by Suresh Srinivas (data-node , hdfs client , name-node)
Remove hardcoded configuration keys
HDFS-2197. Major improvement reported by Todd Lipcon and fixed by Todd Lipcon (name-node)
Refactor RPC call implementations out of NameNode class
HDFS-2196. Major task reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur (build)
Make ant build system work with hadoop-common JAR generated by Maven
HDFS-2191. Major sub-task reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (name-node)
Move datanodeMap from FSNamesystem to DatanodeManager
HDFS-2187. Major improvement reported by Ivan Kelly and fixed by Ivan Kelly
HDFS-1580: Make EditLogInputStream act like an iterator over FSEditLogOps
HDFS-2186. Major bug reported by Eli Collins and fixed by Eli Collins (data-node)
DN volume failures on startup are not counted
HDFS-2180. Major improvement reported by Todd Lipcon and fixed by Todd Lipcon
Refactor NameNode HTTP server into new class
HDFS-2167. Major sub-task reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (name-node)
Move dnsToSwitchMapping and hostsReader from FSNamesystem to DatanodeManager
HDFS-2161. Minor improvement reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (balancer , data-node , hdfs client , name-node , security)
Move utilities to DFSUtil
HDFS-2159. Major sub-task reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (hdfs client)
Deprecate DistributedFileSystem.getClient()
HDFS-2157. Major improvement reported by Aaron T. Myers and fixed by Aaron T. Myers (documentation , name-node)
Improve header comment in o.a.h.hdfs.server.namenode.NameNode
HDFS-2156. Major bug reported by Owen O'Malley and fixed by Eric Yang
rpm should only require the same major version as common
HDFS-2154. Minor test reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (test)
TestDFSShell should use test dir
HDFS-2153. Minor bug reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (test)
DFSClientAdapter should be put under test
HDFS-2149. Major sub-task reported by Ivan Kelly and fixed by Ivan Kelly (name-node)
Move EditLogOp serialization formats into FsEditLogOp implementations
HDFS-2147. Major sub-task reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (name-node)
Move cluster network topology to block management
HDFS-2144. Major improvement reported by Ravi Prakash and fixed by Ravi Prakash (name-node)
If SNN shuts down during initialization it does not log the cause
HDFS-2143. Major improvement reported by Ravi Prakash and fixed by Ravi Prakash
Federation: we should link to the live nodes and dead nodes to cluster web console
HDFS-2141. Major sub-task reported by Suresh Srinivas and fixed by Suresh Srinivas (ha , name-node)
Remove NameNode roles Active and Standby (they become states)
HDFS-2140. Major sub-task reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (name-node)
Move Host2NodesMap to block management
HDFS-2134. Major sub-task reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (name-node)
Move DecommissionManager to block management
HDFS-2132. Major bug reported by Aaron T. Myers and fixed by Aaron T. Myers
Potential resource leak in EditLogFileOutputStream.close
HDFS-2131. Major test reported by Uma Maheswara Rao G and fixed by Uma Maheswara Rao G (test)
Tests for HADOOP-7361
HDFS-2118. Minor improvement reported by Eli Collins and fixed by Eli Collins (data-node)
Couple dfs data dir improvements
HDFS-2116. Minor improvement reported by Eli Collins and fixed by Plamen Jeliazkov (test)
Cleanup TestStreamFile and TestByteRangeInputStream
HDFS-2114. Major bug reported by John George and fixed by John George
re-commission of a decommissioned node does not delete excess replica
HDFS-2112. Major sub-task reported by Tsz Wo (Nicholas), SZE and fixed by Uma Maheswara Rao G (name-node)
Move ReplicationMonitor to block management
HDFS-2111. Major test reported by Harsh J and fixed by Harsh J (data-node , test)
Add tests for ensuring that the DN will start with a few bad data directories (Part 1 of testing DiskChecker)
HDFS-2110. Minor improvement reported by Eli Collins and fixed by Eli Collins (name-node)
Some StreamFile and ByteRangeInputStream cleanup
HDFS-2109. Major bug reported by Bharath Mundlapudi and fixed by Bharath Mundlapudi (hdfs client)
Store uMask as member variable to DFSClient.Conf
HDFS-2108. Major sub-task reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (name-node)
Move datanode heartbeat handling to BlockManager
HDFS-2107. Major sub-task reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (name-node)
Move block management code to a package

Moved block management codes to a new package org.apache.hadoop.hdfs.server.blockmanagement.
HDFS-2100. Minor test reported by Aaron T. Myers and fixed by Aaron T. Myers (test)
Improve TestStorageRestore
HDFS-2096. Major task reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur (build)
Mavenization of hadoop-hdfs
HDFS-2092. Major bug reported by Bharath Mundlapudi and fixed by Bharath Mundlapudi (hdfs client)
Create a light inner conf class in DFSClient
HDFS-2086. Major bug reported by Tanping Wang and fixed by Tanping Wang (name-node)
If the include hosts list contains host name, after restarting namenode, datanodes registrant is denied
HDFS-2083. Major new feature reported by Tanping Wang and fixed by Tanping Wang
Adopt JMXJsonServlet into HDFS in order to query statistics
HDFS-2082. Major bug reported by Aaron T. Myers and fixed by Aaron T. Myers
SecondaryNameNode web interface doesn't show the right info
HDFS-2073. Minor improvement reported by Suresh Srinivas and fixed by Suresh Srinivas (name-node)
Namenode is missing @Override annotations
HDFS-2069. Trivial sub-task reported by Ravi Phulari and fixed by Harsh J (documentation)
Incorrect default trash interval value in the docs
HDFS-2067. Major bug reported by Todd Lipcon and fixed by Tsz Wo (Nicholas), SZE (data-node , hdfs client)
Bump DATA_TRANSFER_VERSION in trunk for protobufs
HDFS-2066. Major sub-task reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (data-node , hdfs client , name-node)
Create a package and individual class files for DataTransferProtocol
HDFS-2065. Major bug reported by Bharath Mundlapudi and fixed by Uma Maheswara Rao G
Fix NPE in DFSClient.getFileChecksum
HDFS-2061. Minor bug reported by Matt Foley and fixed by Matt Foley (name-node)
two minor bugs in BlockManager block report processing
HDFS-2058. Major new feature reported by Todd Lipcon and fixed by Todd Lipcon
DataTransfer Protocol using protobufs
HDFS-2056. Minor improvement reported by Tanping Wang and fixed by Tanping Wang (documentation , tools)
Update fetchdt usage
HDFS-2055. Major new feature reported by Travis Crawford and fixed by Travis Crawford (libhdfs)
Add hflush support to libhdfs

Add hdfsHFlush to libhdfs.
HDFS-2054. Minor improvement reported by Kihwal Lee and fixed by Kihwal Lee (data-node)
BlockSender.sendChunk() prints ERROR for connection closures encountered during transferToFully()
HDFS-2053. Minor bug reported by Michael Noll and fixed by Michael Noll (name-node)
Bug in INodeDirectory#computeContentSummary warning
HDFS-2046. Major improvement reported by Todd Lipcon and fixed by Todd Lipcon (build , test)
Force entropy to come from non-true random for tests
HDFS-2041. Major bug reported by Todd Lipcon and fixed by Todd Lipcon
Some mtimes and atimes are lost when edit logs are replayed
HDFS-2040. Minor improvement reported by Eli Collins and fixed by Eli Collins
Only build libhdfs if a flag is passed
HDFS-2034. Minor bug reported by John George and fixed by John George (hdfs client)
length in getBlockRange becomes -ve when reading only from currently being written blk
HDFS-2030. Minor bug reported by Bharath Mundlapudi and fixed by Bharath Mundlapudi
Fix the usability of namenode upgrade command
HDFS-2029. Trivial improvement reported by Tsz Wo (Nicholas), SZE and fixed by John George (test)
Improve TestWriteRead
HDFS-2024. Trivial improvement reported by CW Chung and fixed by CW Chung (test)
Eclipse format HDFS Junit test hdfs/TestWriteRead.java
HDFS-2022. Major bug reported by Eli Collins and fixed by Eric Yang (build)
ant binary should build libhdfs
HDFS-2021. Major bug reported by CW Chung and fixed by John George (data-node)
TestWriteRead failed with inconsistent visible length of a file
HDFS-2020. Major bug reported by Suresh Srinivas and fixed by Suresh Srinivas (data-node , test)
TestDFSUpgradeFromImage fails
HDFS-2019. Minor bug reported by Bharath Mundlapudi and fixed by Bharath Mundlapudi (data-node)
Fix all the places where Java method File.list is used with FileUtil.list API
HDFS-2014. Critical bug reported by Todd Lipcon and fixed by Eric Yang (scripts)
bin/hdfs no longer works from a source checkout
HDFS-2011. Major bug reported by Ravi Prakash and fixed by Ravi Prakash (name-node)
Removal and restoration of storage directories on checkpointing failure doesn't work properly
HDFS-2003. Major improvement reported by Ivan Kelly and fixed by Ivan Kelly
Separate FSEditLog reading logic from editLog memory state building logic
HDFS-2002. Major bug reported by Konstantin Shvachko and fixed by Plamen Jeliazkov (name-node)
Incorrect computation of needed blocks in getTurnOffTip()
HDFS-1999. Major bug reported by Aaron T. Myers and fixed by Aaron T. Myers (test)
Tests use deprecated configs
HDFS-1998. Minor bug reported by Tanping Wang and fixed by Tanping Wang (scripts)
make refresh-namodenodes.sh refreshing all namenodes
HDFS-1996. Major improvement reported by Tsz Wo (Nicholas), SZE and fixed by Eric Yang (build)
ivy: hdfs test jar should be independent to common test jar
HDFS-1995. Minor improvement reported by Tanping Wang and fixed by Tanping Wang
Minor modification to both dfsclusterhealth and dfshealth pages for Web UI
HDFS-1990. Minor bug reported by ramkrishna.s.vasudevan and fixed by Uma Maheswara Rao G (data-node)
Resource leaks in HDFS
HDFS-1986. Minor bug reported by Tanping Wang and fixed by Tanping Wang (tools)
Add an option for user to return http or https ports regardless of security is on/off in DFSUtil.getInfoServer()
HDFS-1983. Major test reported by Daryn Sharp and fixed by Daryn Sharp (test)
Fix path display for copy & rm
HDFS-1968. Minor test reported by CW Chung and fixed by CW Chung (test)
Enhance TestWriteRead to support File Append and Position Read
HDFS-1966. Major sub-task reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (data-node , hdfs client)
Encapsulate individual DataTransferProtocol op header

Added header classes for individual DataTransferProtocol op headers.
HDFS-1964. Major bug reported by Aaron T. Myers and fixed by Aaron T. Myers
Incorrect HTML unescaping in DatanodeJspHelper.java
HDFS-1963. Major new feature reported by Eric Yang and fixed by Eric Yang (build)
HDFS rpm integration project

Create HDFS RPM package
HDFS-1959. Minor improvement reported by Eli Collins and fixed by Eli Collins
Better error message for missing namenode directory
HDFS-1958. Major improvement reported by Todd Lipcon and fixed by Todd Lipcon (name-node)
Format confirmation prompt should be more lenient of its input
HDFS-1955. Major bug reported by Matt Foley and fixed by Matt Foley (name-node)
FSImage.doUpgrade() was made too fault-tolerant by HDFS-1826
HDFS-1953. Minor bug reported by Tanping Wang and fixed by Tanping Wang
Change name node mxbean name in cluster web console
HDFS-1952. Major bug reported by Matt Foley and fixed by Andrew Wang
FSEditLog.open() appears to succeed even if all EDITS directories fail
HDFS-1945. Major sub-task reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (data-node , hdfs client)
Removed deprecated fields in DataTransferProtocol

Removed the deprecated fields in DataTransferProtocol.
HDFS-1943. Blocker bug reported by Wei Yongjun and fixed by Matt Foley (scripts)
fail to start datanode while start-dfs.sh is executed by root user
HDFS-1939. Major improvement reported by Tsz Wo (Nicholas), SZE and fixed by Eric Yang (build)
ivy: test conf should not extend common conf

* Removed duplicated jars in test class path.
HDFS-1938. Minor bug reported by Tsz Wo (Nicholas), SZE and fixed by Eric Yang (build)
Reference ivy-hdfs.classpath not found.
HDFS-1937. Major improvement reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (data-node , hdfs client)
Umbrella JIRA for improving DataTransferProtocol
HDFS-1936. Blocker bug reported by Suresh Srinivas and fixed by Suresh Srinivas (name-node)
Updating the layout version from HDFS-1822 causes upgrade problems.
HDFS-1934. Major bug reported by Bharath Mundlapudi and fixed by Bharath Mundlapudi
Fix NullPointerException when File.listFiles() API returns null
HDFS-1933. Major test reported by Daryn Sharp and fixed by Daryn Sharp (test)
Update tests for FsShell's "test"
HDFS-1931. Major test reported by Daryn Sharp and fixed by Daryn Sharp
Update tests for du/dus/df
HDFS-1928. Major test reported by Daryn Sharp and fixed by Daryn Sharp (test)
Fix path display for touchz
HDFS-1927. Major bug reported by John George and fixed by John George (name-node)
audit logs could ignore certain xsactions and also could contain "ip=null"
HDFS-1923. Major sub-task reported by Matt Foley and fixed by Tsz Wo (Nicholas), SZE (test)
Intermittent recurring failure in TestFiDataTransferProtocol2.pipeline_Fi_29
HDFS-1922. Major sub-task reported by Matt Foley and fixed by Luke Lu (test)
Recurring failure in TestJMXGet.testNameNode since build 477 on May 11
HDFS-1921. Blocker bug reported by Aaron T. Myers and fixed by Matt Foley
Save namespace can cause NN to be unable to come up on restart
HDFS-1920. Major bug reported by Trevor Robinson and fixed by Trevor Robinson (libhdfs)
libhdfs does not build for ARM processors
HDFS-1917. Major bug reported by Eric Yang and fixed by Eric Yang (build)
Clean up duplication of dependent jar files

Remove packaging of duplicated third party jar files
HDFS-1914. Major bug reported by Suresh Srinivas and fixed by Suresh Srinivas (name-node)
Federation: namenode storage directory must be configurable specific to a namenode
HDFS-1912. Major test reported by Daryn Sharp and fixed by Daryn Sharp (test)
Update tests for FsShell standardized error messages
HDFS-1911. Major test reported by Sanjay Radia and fixed by Sanjay Radia
HDFS tests for viewfs
HDFS-1908. Minor bug reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (test)
DataTransferTestUtil$CountdownDoosAction.run(..) throws NullPointerException
HDFS-1907. Major bug reported by CW Chung and fixed by John George (hdfs client)
BlockMissingException upon concurrent read and write: reader was doing file position read while writer is doing write without hflush
HDFS-1906. Minor improvement reported by Suresh Srinivas and fixed by Suresh Srinivas (hdfs client)
Remove logging exception stack trace when one of the datanode targets to read from is not reachable
HDFS-1905. Minor bug reported by Bharath Mundlapudi and fixed by Bharath Mundlapudi (name-node)
Improve the usability of namenode -format
HDFS-1903. Major test reported by Daryn Sharp and fixed by Daryn Sharp (test)
Fix path display for rm/rmr
HDFS-1902. Major test reported by Daryn Sharp and fixed by Daryn Sharp (test)
Fix path display for setrep
HDFS-1899. Major improvement reported by Todd Lipcon and fixed by Ted Yu
GenericTestUtils.formatNamenode is misplaced
HDFS-1898. Critical bug reported by Todd Lipcon and fixed by Todd Lipcon
Tests failing on trunk due to use of NameNode.format
HDFS-1890. Minor improvement reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (hdfs client)
A few improvements on the LeaseRenewer.pendingCreates map
HDFS-1889. Major bug reported by John George and fixed by John George
incorrect path in start/stop dfs script
HDFS-1888. Major bug reported by Suresh Srinivas and fixed by Suresh Srinivas
MiniDFSCluster#corruptBlockOnDatanodes() access must be public for MapReduce contrib raid
HDFS-1884. Major sub-task reported by Matt Foley and fixed by Aaron T. Myers (test)
Improve TestDFSStorageStateRecovery
HDFS-1883. Major sub-task reported by Matt Foley and fixed by (test)
Recurring failures in TestBackupNode since HDFS-1052
HDFS-1881. Major bug reported by Tanping Wang and fixed by Tanping Wang (data-node)
Federation: after taking snapshot the current directory of datanode is empty
HDFS-1877. Minor test reported by CW Chung and fixed by CW Chung (test)
Create a functional test for file read/write
HDFS-1876. Blocker bug reported by Todd Lipcon and fixed by Todd Lipcon
One MiniDFSCluster ignores numDataNodes parameter
HDFS-1875. Major bug reported by Eric Payne and fixed by Eric Payne (test)
MiniDFSCluster hard-codes dfs.datanode.address to localhost
HDFS-1873. Major new feature reported by Tanping Wang and fixed by Tanping Wang
Federation Cluster Management Web Console
HDFS-1871. Major bug reported by Suresh Srinivas and fixed by Suresh Srinivas (test)
Tests using MiniDFSCluster fail to compile due to HDFS-1052 changes
HDFS-1870. Minor improvement reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (hdfs client)
Refactor DFSClient.LeaseChecker
HDFS-1869. Major bug reported by Daryn Sharp and fixed by Daryn Sharp (name-node)
mkdirs should use the supplied permission for all of the created directories

A multi-level mkdir is now POSIX compliant. Instead of creating intermediate directories with the permissions of the parent directory, intermediate directories are created with permission bits of rwxrwxrwx (0777) as modified by the current umask, plus write and search permission for the owner.
HDFS-1865. Major improvement reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (hdfs client)
Share LeaseChecker thread among DFSClients
HDFS-1862. Major test reported by Aaron T. Myers and fixed by Aaron T. Myers (test)
Improve test reliability of HDFS-1594
HDFS-1861. Major improvement reported by Eli Collins and fixed by Eli Collins (data-node)
Rename dfs.datanode.max.xcievers and bump its default value
HDFS-1856. Major sub-task reported by Matt Foley and fixed by Matt Foley (test)
TestDatanodeBlockScanner waits forever, errs without giving information
HDFS-1855. Major test reported by Matt Foley and fixed by Matt Foley (test)
TestDatanodeBlockScanner.testBlockCorruptionRecoveryPolicy() part 2 fails in two different ways
HDFS-1854. Major sub-task reported by Matt Foley and fixed by Matt Foley (test)
make failure message more useful in DFSTestUtil.waitReplication()
HDFS-1846. Major improvement reported by Aaron T. Myers and fixed by Aaron T. Myers (name-node)
Don't fill preallocated portion of edits log with 0x00
HDFS-1845. Major bug reported by John George and fixed by John George
symlink comes up as directory after namenode restart
HDFS-1844. Major test reported by Daryn Sharp and fixed by Daryn Sharp (test)
Move -fs usage tests from hdfs into common
HDFS-1843. Minor improvement reported by Bharath Mundlapudi and fixed by Bharath Mundlapudi
Discover file not found early for file append

I have committed this. Thanks to Bharath!
HDFS-1840. Major improvement reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (hdfs client)
Terminate LeaseChecker when all writing files are closed.
HDFS-1835. Major bug reported by John Carrino and fixed by John Carrino (data-node)
DataNode.setNewStorageID pulls entropy from /dev/random
HDFS-1833. Minor improvement reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (data-node)
Refactor BlockReceiver
HDFS-1831. Major improvement reported by Suresh Srinivas and fixed by Suresh Srinivas (name-node)
HDFS equivalent of HADOOP-7223 changes to handle FileContext createFlag combinations
HDFS-1829. Major bug reported by Matt Foley and fixed by Matt Foley (name-node)
TestNodeCount waits forever, errs without giving information
HDFS-1827. Major bug reported by Matt Foley and fixed by Matt Foley (name-node)
TestBlockReplacement waits forever, errs without giving information
HDFS-1826. Major sub-task reported by Hairong Kuang and fixed by Matt Foley (name-node)
NameNode should save image to name directories in parallel during upgrade

I've committed this. Thanks, Matt!
HDFS-1823. Blocker bug reported by Tom White and fixed by Tom White (scripts)
start-dfs.sh script fails if HADOOP_HOME is not set
HDFS-1822. Blocker bug reported by Suresh Srinivas and fixed by Suresh Srinivas (name-node)
Editlog opcodes overlap between 20 security and later releases
HDFS-1821. Major bug reported by John George and fixed by John George
FileContext.createSymlink with kerberos enabled sets wrong owner
HDFS-1818. Major bug reported by Aaron T. Myers and fixed by Aaron T. Myers (test)
TestHDFSCLI is failing on trunk
HDFS-1817. Trivial improvement reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (test)
Split TestFiDataTransferProtocol.java into two files
HDFS-1814. Major new feature reported by Aaron T. Myers and fixed by Aaron T. Myers (hdfs client , name-node)
HDFS portion of HADOOP-7214 - Hadoop /usr/bin/groups equivalent

Introduces a new command, "hdfs groups", which displays what groups are associated with a user as seen by the NameNode.
HDFS-1812. Minor bug reported by Uma Maheswara Rao G and fixed by Uma Maheswara Rao G (test)
Address the cleanup issues in TestHDFSCLI.java
HDFS-1808. Major bug reported by Matt Foley and fixed by Matt Foley (data-node , name-node)
TestBalancer waits forever, errs without giving information
HDFS-1806. Major bug reported by Matt Foley and fixed by Matt Foley (data-node , name-node)
TestBlockReport.blockReport_08() and _09() are timing-dependent and likely to fail on fast servers
HDFS-1797. Major bug reported by Todd Lipcon and fixed by Todd Lipcon
New findbugs warning introduced by HDFS-1120
HDFS-1789. Minor improvement reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (data-node , hdfs client)
Refactor frequently used codes from DFSOutputStream, BlockReceiver and DataXceiver
HDFS-1786. Minor bug reported by Tsz Wo (Nicholas), SZE and fixed by Uma Maheswara Rao G (test)
Some cli test cases expect a "null" message
HDFS-1785. Major improvement reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (data-node)
Cleanup BlockReceiver and DataXceiver
HDFS-1782. Major bug reported by John George and fixed by John George (name-node)
FSNamesystem.startFileInternal(..) throws NullPointerException
HDFS-1781. Major bug reported by John George and fixed by John George (scripts)
jsvc executable delivered into wrong package...
HDFS-1776. Major bug reported by Dmytro Molkov and fixed by Bharath Mundlapudi
Bug in Concat code
HDFS-1774. Minor improvement reported by Uma Maheswara Rao G and fixed by Uma Maheswara Rao G (data-node)
Small optimization to FSDataset
HDFS-1773. Minor improvement reported by Tanping Wang and fixed by Tanping Wang (name-node)
Remove a datanode from cluster if include list is not empty and this datanode is removed from both include and exclude lists
HDFS-1770. Minor test reported by Eli Collins and fixed by Eli Collins
TestFiRename fails due to invalid block size
HDFS-1767. Major sub-task reported by Matt Foley and fixed by Matt Foley (data-node)
Namenode should ignore non-initial block reports from datanodes when in safemode during startup
HDFS-1763. Minor improvement reported by Eli Collins and fixed by Eli Collins
Replace hard-coded option strings with variables from DFSConfigKeys
HDFS-1761. Major sub-task reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (data-node)
Add a new DataTransferProtocol operation, Op.TRANSFER_BLOCK, instead of using RPC

Add a new DataTransferProtocol operation, Op.TRANSFER_BLOCK, for transferring RBW/Finalized with acknowledgement and without using RPC.
HDFS-1760. Major bug reported by Daryn Sharp and fixed by Daryn Sharp (name-node)
problems with getFullPathName
HDFS-1757. Major improvement reported by Eli Collins and fixed by Eli Collins (fuse-dfs)
Don't compile fuse-dfs by default
HDFS-1751. Major new feature reported by Daryn Sharp and fixed by Daryn Sharp (data-node)
Intrinsic limits for HDFS files, directories
HDFS-1750. Major bug reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE
fs -ls hftp://file not working
HDFS-1748. Major bug reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (balancer)
Balancer utilization classification is incomplete
HDFS-1741. Major improvement reported by Konstantin Boudnik and fixed by Konstantin Boudnik (build)
Provide a minimal pom file to allow integration of HDFS into Sonar analysis
HDFS-1739. Minor improvement reported by Uma Maheswara Rao G and fixed by Uma Maheswara Rao G (data-node)
When DataNode throws DiskOutOfSpaceException, it will be helpfull to the user if we log the available volume size and configured block size.
HDFS-1734. Major bug reported by Uma Maheswara Rao G and fixed by Uma Maheswara Rao G (name-node)
'Chunk size to view' option is not working in Name Node UI.
HDFS-1731. Minor improvement reported by Todd Lipcon and fixed by Todd Lipcon
Allow using a file to exclude certain tests from build
HDFS-1728. Minor bug reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (name-node)
SecondaryNameNode.checkpointSize is in byte but not MB.
HDFS-1727. Minor bug reported by Uma Maheswara Rao G and fixed by sravankorumilli
fsck command can display command usage if user passes any illegal argument
HDFS-1723. Minor improvement reported by Allen Wittenauer and fixed by Jim Plush
quota errors messages should use the same scale

Updated the Quota exceptions to now use human readable output.
HDFS-1703. Minor sub-task reported by Tanping Wang and fixed by Tanping Wang (scripts)
HDFS federation: Improve start/stop scripts and add script to decommission datanodes

The masters file is no longer used to indicate which hosts to start the 2NN on. The 2NN is now started on hosts when dfs.namenode.secondary.http-address is configured with a non-wildcard IP.
HDFS-1692. Major bug reported by Bharath Mundlapudi and fixed by Bharath Mundlapudi (data-node)
In secure mode, Datanode process doesn't exit when disks fail.
HDFS-1691. Minor bug reported by Alexey Diomin and fixed by Alexey Diomin (tools)
double static declaration in Configuration.addDefaultResource("hdfs-default.xml");
HDFS-1675. Major sub-task reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (data-node)
Transfer RBW between datanodes

Added a new stage TRANSFER_RBW to DataTransferProtocol
HDFS-1665. Minor bug reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (balancer)
Balancer sleeps inadequately
HDFS-1656. Major bug reported by Jitendra Nath Pandey and fixed by Jitendra Nath Pandey
getDelegationToken in HftpFileSystem should renew TGT if needed.
HDFS-1636. Minor improvement reported by Todd Lipcon and fixed by Harsh J (name-node)
If dfs.name.dir points to an empty dir, namenode format shouldn't require confirmation

If dfs.name.dir points to an empty dir, namenode -format no longer requires confirmation.
HDFS-1630. Major improvement reported by Hairong Kuang and fixed by Hairong Kuang (name-node)
Checksum fsedits
HDFS-1629. Major sub-task reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (name-node)
Add a method to BlockPlacementPolicy for not removing the chosen nodes
HDFS-1628. Minor improvement reported by Ramya Sunil and fixed by John George (name-node)
AccessControlException should display the full path
HDFS-1627. Major bug reported by Hairong Kuang and fixed by Hairong Kuang (name-node)
Fix NullPointerException in Secondary NameNode
HDFS-1626. Minor improvement reported by Arun C Murthy and fixed by Tsz Wo (Nicholas), SZE (name-node)
Make BLOCK_INVALIDATE_LIMIT configurable

Added a new configuration property dfs.block.invalidate.limit for FSNamesystem.blockInvalidateLimit.
HDFS-1625. Minor bug reported by Todd Lipcon and fixed by Tsz Wo (Nicholas), SZE (test)
TestDataNodeMXBean fails if disk space usage changes during test run
HDFS-1620. Minor improvement reported by Tsz Wo (Nicholas), SZE and fixed by Harsh J
Rename HdfsConstants -> HdfsServerConstants, FSConstants -> HdfsConstants

Rename HdfsConstants interface to HdfsServerConstants, FSConstants interface to HdfsConstants
HDFS-1612. Minor bug reported by Joe Crobak and fixed by Joe Crobak (documentation)
HDFS Design Documentation is outdated
HDFS-1611. Minor bug reported by Uma Maheswara Rao G and fixed by Uma Maheswara Rao G (hdfs client , name-node)
Some logical issues need to address.
HDFS-1606. Major new feature reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (data-node , hdfs client , name-node)
Provide a stronger data guarantee in the write pipeline

Added two configuration properties, dfs.client.block.write.replace-datanode-on-failure.enable and dfs.client.block.write.replace-datanode-on-failure.policy. Added a new feature to replace datanode on failure in DataTransferProtocol. Added getAdditionalDatanode(..) in ClientProtocol.
HDFS-1602. Major bug reported by Konstantin Boudnik and fixed by Boris Shkolnik (name-node)
NameNode storage failed replica restoration is broken
HDFS-1601. Major improvement reported by Todd Lipcon and fixed by Todd Lipcon (data-node)
Pipeline ACKs are sent as lots of tiny TCP packets
HDFS-1600. Major bug reported by Tsz Wo (Nicholas), SZE and fixed by Todd Lipcon (build , test)
editsStored.xml cause release audit warning
HDFS-1598. Major bug reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (name-node)
ListPathsServlet excludes .*.crc files
HDFS-1596. Major improvement reported by Patrick Angeles and fixed by Harsh J (documentation , name-node)
Move secondary namenode checkpoint configs from core-default.xml to hdfs-default.xml

Removed references to the older fs.checkpoint.* properties that resided in core-site.xml
HDFS-1594. Major bug reported by Devaraj K and fixed by Aaron T. Myers (name-node)
When the disk becomes full Namenode is getting shutdown and not able to recover

Implemented a daemon thread to monitor the disk usage for periodically and if the disk usage reaches the threshold value, put the name node into Safe mode so that no modification to file system will occur. Once the disk usage reaches below the threshold, name node will be put out of the safe mode. Here threshold value and interval to check the disk usage are configurable.
HDFS-1592. Major bug reported by Bharath Mundlapudi and fixed by Bharath Mundlapudi
Datanode startup doesn't honor volumes.tolerated
HDFS-1588. Major improvement reported by Erik Steffl and fixed by Erik Steffl
Add dfs.hosts.exclude to DFSConfigKeys and use constant in stead of hardcoded string
HDFS-1585. Blocker bug reported by Todd Lipcon and fixed by Todd Lipcon (test)
HDFS-1547 broke MR build
HDFS-1583. Major improvement reported by Liyin Liang and fixed by Liyin Liang (name-node)
Improve backup-node sync performance by wrapping RPC parameters
HDFS-1582. Major improvement reported by Roman Shaposhnik and fixed by Roman Shaposhnik (libhdfs)
Remove auto-generated native build files

The native build run when from trunk now requires autotools, libtool and openssl dev libraries.
HDFS-1573. Trivial improvement reported by Todd Lipcon and fixed by Todd Lipcon (hdfs client)
LeaseChecker thread name trace not that useful
HDFS-1568. Minor improvement reported by Todd Lipcon and fixed by Joey Echeverria (data-node)
Improve DataXceiver error logging
HDFS-1560. Minor bug reported by Todd Lipcon and fixed by Todd Lipcon (data-node)
dfs.data.dir permissions should default to 700

The permissions on datanode data directories (configured by dfs.datanode.data.dir.perm) now default to 0700. Upon startup, the datanode will automatically change the permissions to match the configured value.
HDFS-1557. Major sub-task reported by Ivan Kelly and fixed by Ivan Kelly (name-node)
Separate Storage from FSImage
HDFS-1551. Major bug reported by Giridharan Kesavan and fixed by Giridharan Kesavan (build)
fix the pom template's version
HDFS-1547. Major improvement reported by Suresh Srinivas and fixed by Suresh Srinivas (name-node)
Improve decommission mechanism

Summary of changes to the decommissioning process: # After nodes are decommissioned, they are not shutdown. The decommissioned nodes are not used for writes. For reads, the decommissioned nodes are given as the last location to read from. # Number of live and dead decommissioned nodes are displayed in the namenode webUI. # Decommissioned nodes free capacity is not count towards the the cluster free capacity.
HDFS-1541. Major sub-task reported by Hairong Kuang and fixed by Hairong Kuang (name-node)
Not marking datanodes dead When namenode in safemode
HDFS-1540. Major bug reported by dhruba borthakur and fixed by dhruba borthakur (data-node)
Make Datanode handle errors to namenode.register call more elegantly
HDFS-1539. Major improvement reported by dhruba borthakur and fixed by dhruba borthakur (data-node , hdfs client , name-node)
prevent data loss when a cluster suffers a power loss
HDFS-1536. Major improvement reported by Hairong Kuang and fixed by Hairong Kuang
Improve HDFS WebUI

On web UI, missing block number now becomes accurate and under-replicated blocks do not include missing blocks.
HDFS-1534. Minor improvement reported by Eli Collins and fixed by Eli Collins (name-node)
Fix some incorrect logs in FSDirectory
HDFS-1533. Major bug reported by Patrick Kling and fixed by Patrick Kling (hdfs client)
A more elegant FileSystem#listCorruptFileBlocks API (HDFS portion)
HDFS-1526. Major bug reported by Hairong Kuang and fixed by Hairong Kuang (hdfs client)
Dfs client name for a map/reduce task should have some randomness

Make a client name has this format: DFSClient_applicationid_randomint_threadid, where applicationid = mapred.task.id or else = "NONMAPREDUCE".
HDFS-1524. Blocker bug reported by Hairong Kuang and fixed by Hairong Kuang (name-node)
Image loader should make sure to read every byte in image file
HDFS-1523. Major bug reported by Konstantin Boudnik and fixed by Konstantin Boudnik (test)
TestLargeBlock is failing on trunk
HDFS-1518. Minor improvement reported by Jingguo Yao and fixed by Jingguo Yao (name-node)
Wrong description in FSNamesystem's javadoc
HDFS-1516. Major bug reported by Konstantin Boudnik and fixed by Konstantin Boudnik (build)
mvn-install is broken after 0.22 branch creation
HDFS-1513. Minor improvement reported by Eli Collins and fixed by Eli Collins
Fix a number of warnings
HDFS-1511. Blocker bug reported by Nigel Daley and fixed by Jakob Homan
98 Release Audit warnings on trunk and branch-0.22
HDFS-1510. Minor improvement reported by Nigel Daley and fixed by Nigel Daley
Add test-patch.properties required by test-patch.sh
HDFS-1509. Major improvement reported by dhruba borthakur and fixed by dhruba borthakur (name-node)
Resync discarded directories in fs.name.dir during saveNamespace command
HDFS-1506. Major improvement reported by Hairong Kuang and fixed by Hairong Kuang (name-node)
Refactor fsimage loading code
HDFS-1505. Blocker bug reported by Todd Lipcon and fixed by Aaron T. Myers
saveNamespace appears to succeed even if all directories fail to save
HDFS-1503. Minor bug reported by Eli Collins and fixed by Todd Lipcon (test)
TestSaveNamespace fails
HDFS-1502. Minor bug reported by Eli Collins and fixed by Hairong Kuang
TestBlockRecovery triggers NPE in assert
HDFS-1486. Major improvement reported by Konstantin Boudnik and fixed by Konstantin Boudnik (test)
Generalize CLITest structure and interfaces to facilitate upstream adoption (e.g. for web testing)
HDFS-1481. Major improvement reported by Hairong Kuang and fixed by Hairong Kuang (name-node)
NameNode should validate fsimage before rolling
HDFS-1480. Major bug reported by T Meyarivan and fixed by Todd Lipcon (name-node)
All replicas of a block can end up on the same rack when some datanodes are decommissioning.
HDFS-1476. Major improvement reported by Patrick Kling and fixed by Patrick Kling (name-node)
listCorruptFileBlocks should be functional while the name node is still in safe mode
HDFS-1473. Major sub-task reported by Todd Lipcon and fixed by Todd Lipcon (name-node)
Refactor storage management into separate classes than fsimage file reading/writing
HDFS-1467. Blocker bug reported by Todd Lipcon and fixed by Todd Lipcon (data-node)
Append pipeline never succeeds with more than one replica
HDFS-1463. Major bug reported by dhruba borthakur and fixed by dhruba borthakur (name-node)
accessTime updates should not occur in safeMode
HDFS-1458. Major improvement reported by Hairong Kuang and fixed by Hairong Kuang (name-node)
Improve checkpoint performance by avoiding unnecessary image downloads
HDFS-1448. Major new feature reported by Erik Steffl and fixed by Erik Steffl (tools)
Create multi-format parser for edits logs file, support binary and XML formats initially

Offline edits viewer feature adds oev tool to hdfs script. Oev makes it possible to convert edits logs to/from native binary and XML formats. It uses the same framework as Offline image viewer. Example usage: $HADOOP_HOME/bin/hdfs oev -i edits -o output.xml
HDFS-1445. Major sub-task reported by Matt Foley and fixed by Matt Foley (data-node)
Batch the calls in DataStorage to FileUtil.createHardLink(), so we call it once per directory instead of once per file

Batch hardlinking during "upgrade" snapshots, cutting time from aprx 8 minutes per volume to aprx 8 seconds. Validated in both Linux and Windows. Depends on prior integration with patch for HADOOP-7133.
HDFS-1442. Major improvement reported by Jitendra Nath Pandey and fixed by Jitendra Nath Pandey
Api to get delegation token in Hdfs
HDFS-1381. Major bug reported by Jakob Homan and fixed by Jim Plush (test)
HDFS javadocs hard-code references to dfs.namenode.name.dir and dfs.datanode.data.dir parameters

Updated the JavaDocs to appropriately represent the new Configuration Keys that are used in the code. The docs did not match the code.
HDFS-1378. Major improvement reported by Todd Lipcon and fixed by Colin Patrick McCabe (name-node)
Edit log replay should track and report file offsets in case of errors
HDFS-1377. Blocker bug reported by Eli Collins and fixed by Eli Collins (name-node)
Quota bug for partial blocks allows quotas to be violated
HDFS-1371. Major bug reported by Koji Noguchi and fixed by Tanping Wang (hdfs client , name-node)
One bad node can incorrectly flag many files as corrupt
HDFS-1360. Minor bug reported by Todd Lipcon and fixed by Todd Lipcon (test)
TestBlockRecovery should bind ephemeral ports
HDFS-1335. Major improvement reported by Hairong Kuang and fixed by Hairong Kuang (hdfs client , name-node)
HDFS side of HADOOP-6904: first step towards inter-version communications between dfs client and NameNode
HDFS-1332. Minor improvement reported by Todd Lipcon and fixed by Ted Yu (name-node)
When unable to place replicas, BlockPlacementPolicy should log reasons nodes were excluded
HDFS-1330. Major new feature reported by Hairong Kuang and fixed by John George (data-node)
Make RPCs to DataNodes timeout
HDFS-1321. Minor bug reported by gary murry and fixed by Jim Plush (name-node)
If service port and main port are the same, there is no clear log message explaining the issue.

Added a check to match the sure RPC and HTTP Port's on the NameNode were not set to the same value, otherwise an IOException is throw with the appropriate message.
HDFS-1295. Major sub-task reported by dhruba borthakur and fixed by Matt Foley (name-node)
Improve namenode restart times by short-circuiting the first block reports from datanodes
HDFS-1257. Major bug reported by Ramkumar Vadali and fixed by Eric Payne (name-node)
Race condition on FSNamesystem#recentInvalidateSets introduced by HADOOP-5124
HDFS-1206. Major bug reported by Tsz Wo (Nicholas), SZE and fixed by Konstantin Boudnik (test)
TestFiHFlush fails intermittently
HDFS-1189. Major bug reported by Kang Xiao and fixed by John George (name-node)
Quota counts missed between clear quota and set quota
HDFS-1149. Major bug reported by Todd Lipcon and fixed by Aaron T. Myers (name-node)
Lease reassignment is not persisted to edit log
HDFS-1120. Major improvement reported by Jeff Hammerbacher and fixed by Harsh J (data-node)
Make DataNode's block-to-device placement policy pluggable

Make the DataNode's block-volume choosing policy pluggable.
HDFS-1117. Major improvement reported by Luke Lu and fixed by Luke Lu
HDFS portion of HADOOP-6728 (ovehaul metrics framework)

Metrics names are standardized to use CapitalizedCamelCase. Some examples: # Metrics names using "_" is changed to new naming scheme. Eg: bytes_written changes to BytesWritten. # All metrics names start with capitals. Example: threadsBlocked changes to ThreadsBlocked.
HDFS-1073. Major improvement reported by Sanjay Radia and fixed by Todd Lipcon
Simpler model for Namenode's fs Image and edit Logs

The NameNode's storage layout for its name directories has been reorganized to be more robust. Each edit now has a unique transaction ID, and each file is associated with a transaction ID (for checkpoints) or a range of transaction IDs (for edit logs).
HDFS-1070. Major sub-task reported by Hairong Kuang and fixed by Hairong Kuang (name-node)
Speedup NameNode image loading and saving by storing local file names

This changes the fsimage format to be root directory-1 directory-2 ... directoy-n. Each directory stores all its children in the following format: Directory_full_path_name num_of_children child-1 ... child-n. Each inode stores only the last component of its path name into fsimage. This change requires an upgrade at deployment.
HDFS-1052. Major new feature reported by Suresh Srinivas and fixed by Suresh Srinivas (name-node)
HDFS scalability with multiple namenodes
HDFS-1001. Minor bug reported by bc Wong and fixed by bc Wong (data-node)
DataXceiver and BlockReader disagree on when to send/recv CHECKSUM_OK
HDFS-863. Major bug reported by Todd Lipcon and fixed by Ken Goodhope (test)
Potential deadlock in TestOverReplicatedBlocks
HDFS-780. Major test reported by Eli Collins and fixed by Eli Collins (fuse-dfs)
Revive TestFuseDFS
HDFS-560. Minor improvement reported by Steve Loughran and fixed by Steve Loughran (build)
Proposed enhancements/tuning to hadoop-hdfs/build.xml
HDFS-420. Major improvement reported by Dima Brodsky and fixed by Brian Bockelman (fuse-dfs)
Fuse-dfs should cache fs handles
HDFS-73. Blocker bug reported by Raghu Angadi and fixed by Uma Maheswara Rao G (hdfs client)
DFSOutputStream does not close all the sockets
HADOOP-8619. Major improvement reported by Radim Kolar and fixed by Chris Douglas (io)
WritableComparator must implement no-arg constructor
HADOOP-7798. Blocker bug reported by Arun C Murthy and fixed by Doug Cutting (build)
Release artifacts need to be signed for Nexus
HADOOP-7797. Major bug reported by Owen O'Malley and fixed by Owen O'Malley (build)
Fix the repository name to support pushing to the staging area of Nexus
HADOOP-7792. Major improvement reported by Jitendra Nath Pandey and fixed by Jitendra Nath Pandey
Common component for HDFS-2416: Add verifyToken method to AbstractDelegationTokenSecretManager
HADOOP-7789. Major improvement reported by Arun C Murthy and fixed by Arun C Murthy
Minor edits to top-level site
HADOOP-7785. Major improvement reported by Todd Lipcon and fixed by Todd Lipcon (io , util)
Add equals, hashcode, toString to DataChecksum
HADOOP-7782. Critical bug reported by Arun C Murthy and fixed by Tom White (build)
Aggregate project javadocs
HADOOP-7778. Major bug reported by Tom White and fixed by Tom White
FindBugs warning in Token.getKind()
HADOOP-7772. Trivial improvement reported by Steve Loughran and fixed by Steve Loughran
javadoc the topology classes
HADOOP-7771. Blocker bug reported by John George and fixed by John George
NPE when running hdfs dfs -copyToLocal, -get etc
HADOOP-7770. Blocker bug reported by Ravi Prakash and fixed by Ravi Prakash (viewfs)
ViewFS getFileChecksum throws FileNotFoundException for files in /tmp and /user
HADOOP-7768. Blocker bug reported by Jonathan Eagles and fixed by Tom White (build)
PreCommit-HADOOP-Build is failing on hadoop-auth-examples
HADOOP-7766. Major bug reported by Jitendra Nath Pandey and fixed by Jitendra Nath Pandey
The auth to local mappings are not being respected, with webhdfs and security enabled.
HADOOP-7764. Blocker bug reported by Jonathan Eagles and fixed by Jonathan Eagles
Allow both ACL list and global path spec filters to HttpServer
HADOOP-7763. Major improvement reported by Tom White and fixed by Tom White (documentation)
Add top-level navigation to APT docs
HADOOP-7762. Major task reported by Eli Collins and fixed by Eli Collins (scripts)
Common side of MR-2736 (MR1 removal)
HADOOP-7755. Blocker bug reported by Jonathan Eagles and fixed by Jonathan Eagles (build)
Detect MapReduce PreCommit Trunk builds silently failing when running test-patch.sh
HADOOP-7753. Major sub-task reported by Todd Lipcon and fixed by Todd Lipcon (io , native , performance)
Support fadvise and sync_data_range in NativeIO, add ReadaheadPool class
HADOOP-7749. Minor improvement reported by Todd Lipcon and fixed by Todd Lipcon (util)
Add NetUtils call which provides more help in exception messages
HADOOP-7745. Major bug reported by Ravi Prakash and fixed by Ravi Prakash
I switched variable names in HADOOP-7509
HADOOP-7744. Major bug reported by Jonathan Eagles and fixed by Jonathan Eagles (test)
Incorrect exit code for hadoop-core-test tests when exception thrown
HADOOP-7743. Major improvement reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur (build)
Add Maven profile to create a full source tarball
HADOOP-7740. Minor bug reported by Arpit Gupta and fixed by Arpit Gupta (conf)
security audit logger is not on by default, fix the log4j properties to enable the logger

Fixed security audit logger configuration. (Arpit Gupta via Eric Yang)
HADOOP-7737. Major improvement reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur (build)
normalize hadoop-mapreduce & hadoop-dist dist/tar build with common/hdfs
HADOOP-7728. Major bug reported by Ramya Sunil and fixed by Ramya Sunil (conf)
hadoop-setup-conf.sh should be modified to enable task memory manager

Enable task memory management to be configurable via hadoop config setup script.
HADOOP-7724. Major bug reported by Giridharan Kesavan and fixed by Arpit Gupta
hadoop-setup-conf.sh should put proxy user info into the core-site.xml

Fixed hadoop-setup-conf.sh to put proxy user in core-site.xml. (Arpit Gupta via Eric Yang)
HADOOP-7721. Major bug reported by Arpit Gupta and fixed by Jitendra Nath Pandey
dfs.web.authentication.kerberos.principal expects the full hostname and does not replace _HOST with the hostname
HADOOP-7720. Major improvement reported by Arpit Gupta and fixed by Arpit Gupta (conf)
improve the hadoop-setup-conf.sh to read in the hbase user and setup the configs

Added parameter for HBase user to setup config script. (Arpit Gupta via Eric Yang)
HADOOP-7715. Major bug reported by Arpit Gupta and fixed by Eric Yang (conf)
see log4j Error when running mr jobs and certain dfs calls

Removed unnecessary security logger configuration. (Eric Yang)
HADOOP-7711. Major bug reported by Arpit Gupta and fixed by Arpit Gupta (conf)
hadoop-env.sh generated from templates has duplicate info

Fixed recursive sourcing of HADOOP_OPTS environment variables (Arpit Gupta via Eric Yang)
HADOOP-7710. Major improvement reported by Arpit Gupta and fixed by Arpit Gupta
create a script to setup application in order to create root directories for application such hbase, hcat, hive etc
HADOOP-7709. Major improvement reported by Jonathan Eagles and fixed by Jonathan Eagles
Running a set of methods in a Single Test Class
HADOOP-7708. Critical bug reported by Arpit Gupta and fixed by Eric Yang (conf)
config generator does not update the properties file if on exists already

Fixed hadoop-setup-conf.sh to handle config file consistently. (Eric Yang)
HADOOP-7707. Major improvement reported by Arpit Gupta and fixed by Arpit Gupta (conf)
improve config generator to allow users to specify proxy user, turn append on or off, turn webhdfs on or off

Added toggle for dfs.support.append, webhdfs and hadoop proxy user to setup config script. (Arpit Gupta via Eric Yang)
HADOOP-7705. Minor new feature reported by Steve Loughran and fixed by Steve Loughran (util)
Add a log4j back end that can push out JSON data, one per line
HADOOP-7691. Major bug reported by Giridharan Kesavan and fixed by Eric Yang
hadoop deb pkg should take a diff group id

Fixed conflict uid for install packages. (Eric Yang)
HADOOP-7684. Major bug reported by Eric Yang and fixed by Eric Yang (scripts)
jobhistory server and secondarynamenode should have init.d script

Added init.d script for jobhistory server and secondary namenode. (Eric Yang)
HADOOP-7681. Minor bug reported by Arpit Gupta and fixed by Arpit Gupta (conf)
log4j.properties is missing properties for security audit and hdfs audit should be changed to info

HADOOP-7681. Fixed security and hdfs audit log4j properties (Arpit Gupta via Eric Yang)
HADOOP-7671. Major bug reported by Ravi Prakash and fixed by Ravi Prakash
Add license headers to hadoop-common/src/main/packages/templates/conf/
HADOOP-7668. Minor improvement reported by Suresh Srinivas and fixed by Steve Loughran (util)
Add a NetUtils method that can tell if an InetAddress belongs to local host

closing again
HADOOP-7664. Minor improvement reported by Ravi Prakash and fixed by Ravi Prakash (conf)
o.a.h.conf.Configuration complains of overriding final parameter even if the value with which its attempting to override is the same.
HADOOP-7663. Major bug reported by Mayank Bansal and fixed by Mayank Bansal (test)
TestHDFSTrash failing on 22
HADOOP-7662. Major bug reported by Thomas Graves and fixed by Thomas Graves
logs servlet should use pathspec of /*
HADOOP-7658. Major bug reported by Giridharan Kesavan and fixed by Eric Yang
to fix hadoop config template
HADOOP-7655. Major improvement reported by Arpit Gupta and fixed by Arpit Gupta
provide a small validation script that smoke tests the installed cluster

Committed to trunk and v23, since code reviewed by Eric.
HADOOP-7642. Major improvement reported by Alejandro Abdelnur and fixed by Tom White (build)
create hadoop-dist module where TAR stitching would happen
HADOOP-7639. Major bug reported by Thomas Graves and fixed by Thomas Graves
yarn ui not properly filtered in HttpServer
HADOOP-7637. Major bug reported by Eric Yang and fixed by Eric Yang (build)
Fair scheduler configuration file is not bundled in RPM
HADOOP-7633. Major bug reported by Arpit Gupta and fixed by Eric Yang (conf)
log4j.properties should be added to the hadoop conf on deploy
HADOOP-7631. Major bug reported by Ramya Sunil and fixed by Eric Yang (conf)
In mapred-site.xml, stream.tmpdir is mapped to ${mapred.temp.dir} which is undeclared.
HADOOP-7630. Major bug reported by Arpit Gupta and fixed by Eric Yang (conf)
hadoop-metrics2.properties should have a property *.period set to a default value foe metrics
HADOOP-7629. Major bug reported by Patrick Hunt and fixed by Todd Lipcon
regression with MAPREDUCE-2289 - setPermission passed immutable FsPermission (rpc failure)
HADOOP-7627. Minor improvement reported by Todd Lipcon and fixed by Todd Lipcon (metrics , test)
Improve MetricsAsserts to give more understandable output on failure
HADOOP-7626. Major bug reported by Eric Yang and fixed by Eric Yang (scripts)
Allow overwrite of HADOOP_CLASSPATH and HADOOP_OPTS
HADOOP-7624. Major improvement reported by Vinod Kumar Vavilapalli and fixed by Alejandro Abdelnur (build)
Set things up for a top level hadoop-tools module
HADOOP-7612. Major improvement reported by Tom White and fixed by Tom White (build)
Change test-patch to run tests for all nested modules
HADOOP-7610. Major bug reported by Eric Yang and fixed by Eric Yang (scripts)
/etc/profile.d does not exist on Debian
HADOOP-7608. Major bug reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur (io)
SnappyCodec check for Hadoop native lib is wrong
HADOOP-7606. Major bug reported by Aaron T. Myers and fixed by Alejandro Abdelnur (test)
Upgrade Jackson to version 1.7.1 to match the version required by Jersey
HADOOP-7604. Critical bug reported by Mahadev konar and fixed by Mahadev konar
Hadoop Auth examples pom in 0.23 point to 0.24 versions.
HADOOP-7603. Major bug reported by Eric Yang and fixed by Eric Yang
Set default hdfs, mapred uid, and hadoop group gid for RPM packages

Set hdfs uid, mapred uid, and hadoop gid to fixed numbers (201, 202, and 123, respectively).
HADOOP-7599. Major bug reported by Eric Yang and fixed by Eric Yang (scripts)
Improve hadoop setup conf script to setup secure Hadoop cluster
HADOOP-7598. Major bug reported by Robert Joseph Evans and fixed by Robert Joseph Evans (build)
smart-apply-patch.sh does not handle patching from a sub directory correctly.
HADOOP-7595. Major improvement reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur (build)
Upgrade dependency to Avro 1.5.3
HADOOP-7594. Major new feature reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE
Support HTTP REST in HttpServer
HADOOP-7593. Major bug reported by Tsz Wo (Nicholas), SZE and fixed by Uma Maheswara Rao G (test)
AssertionError in TestHttpServer.testMaxThreads()
HADOOP-7589. Major bug reported by Robert Joseph Evans and fixed by Robert Joseph Evans (build)
Prefer mvn test -DskipTests over mvn compile in test-patch.sh
HADOOP-7580. Major bug reported by Siddharth Seth and fixed by Siddharth Seth
Add a version of getLocalPathForWrite to LocalDirAllocator which doesn't create dirs
HADOOP-7579. Major task reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur (security)
Rename package names from alfredo to auth
HADOOP-7578. Major bug reported by Mahadev konar and fixed by Mahadev konar
Fix test-patch to be able to run on MR patches.
HADOOP-7576. Major bug reported by Tom White and fixed by Tsz Wo (Nicholas), SZE (security)
Fix findbugs warnings in Hadoop Auth (Alfredo)
HADOOP-7575. Minor bug reported by Jonathan Eagles and fixed by Jonathan Eagles (fs)
Support fully qualified paths as part of LocalDirAllocator
HADOOP-7568. Major bug reported by Konstantin Shvachko and fixed by Plamen Jeliazkov (io)
SequenceFile should not print into stdout
HADOOP-7566. Major bug reported by Mahadev konar and fixed by Alejandro Abdelnur
MR tests are failing webapps/hdfs not found in CLASSPATH
HADOOP-7564. Major sub-task reported by Tom White and fixed by Tom White
Remove test-patch SVN externals
HADOOP-7561. Major sub-task reported by Tom White and fixed by Tom White
Make test-patch only run tests for changed modules
HADOOP-7560. Major sub-task reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur
Make hadoop-common a POM module with sub-modules (common & alfredo)
HADOOP-7555. Trivial improvement reported by Aaron T. Myers and fixed by Aaron T. Myers (build)
Add a eclipse-generated files to .gitignore
HADOOP-7552. Minor improvement reported by Eli Collins and fixed by Eli Collins (fs)
FileUtil#fullyDelete doesn't throw IOE but lists it in the throws clause
HADOOP-7547. Minor bug reported by Uma Maheswara Rao G and fixed by Uma Maheswara Rao G (io)
Fix the warning in writable classes.[ WritableComparable is a raw type. References to generic type WritableComparable<T> should be parameterized ]
HADOOP-7545. Critical bug reported by Todd Lipcon and fixed by Todd Lipcon (build , test)
common -tests jar should not include properties and configs
HADOOP-7536. Major bug reported by Kihwal Lee and fixed by Alejandro Abdelnur (build)
Correct the dependency version regressions introduced in HADOOP-6671
HADOOP-7533. Major sub-task reported by Tom White and fixed by Tom White
Allow test-patch to be run from any subproject directory
HADOOP-7531. Major improvement reported by Eli Collins and fixed by Eli Collins (util)
Add servlet util methods for handling paths in requests
HADOOP-7529. Critical bug reported by Todd Lipcon and fixed by Luke Lu (metrics)
Possible deadlock in metrics2
HADOOP-7528. Major sub-task reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur (build)
Maven build fails in Windows
HADOOP-7526. Minor test reported by Eli Collins and fixed by Eli Collins (fs)
Add TestPath tests for URI conversion and reserved characters
HADOOP-7525. Major sub-task reported by Tom White and fixed by Tom White (scripts)
Make arguments to test-patch optional
HADOOP-7523. Blocker bug reported by John Lee and fixed by John Lee (test)
Test org.apache.hadoop.fs.TestFilterFileSystem fails due to java.lang.NoSuchMethodException
HADOOP-7520. Major bug reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur (build)
hadoop-main fails to deploy
HADOOP-7515. Major sub-task reported by Tom White and fixed by Tom White (build)
test-patch reports the wrong number of javadoc warnings
HADOOP-7512. Trivial task reported by Harsh J and fixed by Harsh J (documentation)
Fix example mistake in WritableComparable javadocs
HADOOP-7509. Trivial improvement reported by Ravi Prakash and fixed by Ravi Prakash
Improve message when Authentication is required
HADOOP-7508. Major sub-task reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur (build)
compiled nativelib is in wrong directory and it is not picked up by surefire setup
HADOOP-7507. Major bug reported by Jeff Bean and fixed by Alejandro Abdelnur (metrics)
jvm metrics all use the same namespace

JVM metrics published to Ganglia now include the process name as part of the gmetric name.
HADOOP-7502. Major sub-task reported by Luke Lu and fixed by Luke Lu
Use canonical (IDE friendly) generated-sources directory for generated sources
HADOOP-7501. Major sub-task reported by Alejandro Abdelnur and fixed by Tom White (build)
publish Hadoop Common artifacts (post HADOOP-6671) to Apache SNAPSHOTs repo
HADOOP-7499. Major bug reported by Jeffrey Naisbitt and fixed by Jeffrey Naisbitt (util)
Add method for doing a sanity check on hostnames in NetUtils
HADOOP-7498. Major sub-task reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur (build)
Remove legacy TAR layout creation
HADOOP-7496. Major sub-task reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur (build)
break Maven TAR & bintar profiles into just LAYOUT & TAR proper
HADOOP-7493. Major new feature reported by Uma Maheswara Rao G and fixed by Uma Maheswara Rao G (io)
[HDFS-362] Provide ShortWritable class in hadoop.
HADOOP-7491. Major improvement reported by Eli Collins and fixed by Eli Collins (scripts)
hadoop command should respect HADOOP_OPTS when given a class name
HADOOP-7474. Major improvement reported by Jitendra Nath Pandey and fixed by Jitendra Nath Pandey
Refactor ClientCache out of WritableRpcEngine.
HADOOP-7472. Minor improvement reported by Kihwal Lee and fixed by Kihwal Lee (ipc)
RPC client should deal with the IP address changes
HADOOP-7471. Major bug reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur (build)
the saveVersion.sh script sometimes fails to extract SVN URL
HADOOP-7469. Minor sub-task reported by Steve Loughran and fixed by Steve Loughran (util)
add a standard handler for socket connection problems which improves diagnostics
HADOOP-7465. Trivial sub-task reported by XieXianshan and fixed by XieXianshan (fs , ipc)
A several tiny improvements for the LOG format
HADOOP-7463. Minor improvement reported by Mahadev konar and fixed by Mahadev konar
Adding a configuration parameter to SecurityInfo interface.
HADOOP-7460. Major improvement reported by dhruba borthakur and fixed by Usman Masood (fs)
Support for pluggable Trash policies
HADOOP-7457. Blocker improvement reported by Jakob Homan and fixed by Jakob Homan (documentation)
Remove out-of-date Chinese language documentation
HADOOP-7451. Major improvement reported by Matt Foley and fixed by Matt Foley
merge for MR-279: Generalize StringUtils#join
HADOOP-7449. Major improvement reported by Matt Foley and fixed by Matt Foley
merge for MR-279: add Data(In,Out)putByteBuffer to work with ByteBuffer similar to Data(In,Out)putBuffer for byte[]
HADOOP-7448. Major improvement reported by Matt Foley and fixed by Matt Foley
merge for MR-279: HttpServer /stacks servlet should use plain text content type
HADOOP-7446. Major improvement reported by Todd Lipcon and fixed by Todd Lipcon (native , performance)
Implement CRC32C native code using SSE4.2 instructions
HADOOP-7445. Major improvement reported by Todd Lipcon and fixed by Todd Lipcon (native , util)
Implement bulk checksum verification using efficient native code
HADOOP-7444. Major improvement reported by Todd Lipcon and fixed by Todd Lipcon
Add Checksum API to verify and calculate checksums "in bulk"
HADOOP-7443. Major new feature reported by Todd Lipcon and fixed by Todd Lipcon (io , util)
Add CRC32C as another DataChecksum implementation
HADOOP-7442. Major bug reported by Aaron T. Myers and fixed by Aaron T. Myers (conf , documentation)
Docs in core-default.xml still reference deprecated config "topology.script.file.name"
HADOOP-7440. Major bug reported by Todd Lipcon and fixed by Todd Lipcon
HttpServer.getParameterValues throws NPE for missing parameters
HADOOP-7438. Major improvement reported by Ravi Prakash and fixed by Ravi Prakash
Using the hadoop-deamon.sh script to start nodes leads to a depricated warning
HADOOP-7437. Major bug reported by Uma Maheswara Rao G and fixed by Uma Maheswara Rao G (io)
IOUtils.copybytes will suppress the stream closure exceptions.
HADOOP-7434. Minor improvement reported by 严金双 and fixed by 严金双
Display error when using "daemonlog -setlevel" with illegal level
HADOOP-7430. Minor improvement reported by Ravi Prakash and fixed by Ravi Prakash (fs)
Improve error message when moving to trash fails due to quota issue
HADOOP-7428. Major bug reported by Todd Lipcon and fixed by Todd Lipcon (ipc)
IPC connection is orphaned with null 'out' member
HADOOP-7419. Major bug reported by Todd Lipcon and fixed by Bing Zheng
new hadoop-config.sh doesn't manage classpath for HADOOP_CONF_DIR correctly
HADOOP-7402. Trivial bug reported by Aaron T. Myers and fixed by Aaron T. Myers (test)
TestConfiguration doesn't clean up after itself
HADOOP-7392. Major improvement reported by Tanping Wang and fixed by Tanping Wang
Implement capability of querying individual property of a mbean using JMXProxyServlet
HADOOP-7389. Major bug reported by Aaron T. Myers and fixed by Aaron T. Myers (test)
Use of TestingGroups by tests causes subsequent tests to fail
HADOOP-7385. Minor bug reported by Bharath Mundlapudi and fixed by Bharath Mundlapudi
Remove StringUtils.stringifyException(ie) in logger functions
HADOOP-7384. Major improvement reported by Todd Lipcon and fixed by Todd Lipcon
Allow test-patch to be more flexible about patch format
HADOOP-7383. Blocker bug reported by Todd Lipcon and fixed by Todd Lipcon (build)
HDFS needs to export protobuf library dependency in pom
HADOOP-7380. Major sub-task reported by Aaron T. Myers and fixed by Aaron T. Myers (ha , ipc)
Add client failover functionality to o.a.h.io.(ipc|retry)
HADOOP-7379. Major improvement reported by Todd Lipcon and fixed by Todd Lipcon (io , ipc)
Add ability to include Protobufs in ObjectWritable

Protocol buffer-generated types may now be used as arguments or return values for Hadoop RPC.
HADOOP-7377. Major bug reported by Daryn Sharp and fixed by Daryn Sharp (fs)
Fix command name handling affecting DFSAdmin
HADOOP-7375. Major improvement reported by Sanjay Radia and fixed by Sanjay Radia
Add resolvePath method to FileContext
HADOOP-7374. Major improvement reported by Eli Collins and fixed by Eli Collins (scripts)
Don't add tools.jar to the classpath when running Hadoop

The scripts that run Hadoop no longer automatically add tools.jar from the JDK to the classpath (if it is present). If your job depends on tools.jar in the JDK you will need to add this dependency in your job.
HADOOP-7361. Minor improvement reported by Uma Maheswara Rao G and fixed by Uma Maheswara Rao G (fs)
Provide overwrite option (-overwrite/-f) in put and copyFromLocal command line options
HADOOP-7360. Major improvement reported by Daryn Sharp and fixed by Kihwal Lee (fs)
FsShell does not preserve relative paths with globs
HADOOP-7357. Trivial bug reported by Philip Zeyliger and fixed by Philip Zeyliger (test)
hadoop.io.compress.TestCodec#main() should exit with non-zero exit code if test failed
HADOOP-7356. Blocker bug reported by Eric Yang and fixed by Eric Yang
RPM packages broke bin/hadoop script for hadoop 0.20.205
HADOOP-7353. Major bug reported by Daryn Sharp and fixed by Daryn Sharp (fs)
Cleanup FsShell and prevent masking of RTE stacktraces
HADOOP-7342. Minor bug reported by Bharath Mundlapudi and fixed by Bharath Mundlapudi
Add an utility API in FileUtil for JDK File.list
HADOOP-7341. Major bug reported by Daryn Sharp and fixed by Daryn Sharp (fs)
Fix option parsing in CommandFormat
HADOOP-7337. Minor improvement reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (util)
Annotate PureJavaCrc32 as a public API
HADOOP-7336. Minor bug reported by Jitendra Nath Pandey and fixed by Jitendra Nath Pandey
TestFileContextResolveAfs will fail with default test.build.data property.
HADOOP-7333. Minor improvement reported by Eric Caspole and fixed by Eric Caspole (performance , util)
Performance improvement in PureJavaCrc32
HADOOP-7331. Trivial improvement reported by Tanping Wang and fixed by Tanping Wang (scripts)
Make hadoop-daemon.sh to return 1 if daemon processes did not get started

hadoop-daemon.sh now returns a non-zero exit code if it detects that the daemon was not still running after 3 seconds.
HADOOP-7329. Minor improvement reported by XieXianshan and fixed by XieXianshan (fs)
incomplete help message is displayed for df -h option
HADOOP-7328. Major improvement reported by Harsh J and fixed by Harsh J (io)
When a serializer class is missing, return null, not throw an NPE.
HADOOP-7327. Minor bug reported by Matt Foley and fixed by Matt Foley (fs)
FileSystem.listStatus() throws NullPointerException instead of IOException upon access permission failure
HADOOP-7324. Blocker bug reported by Luke Lu and fixed by Priyo Mustafi (metrics)
Ganglia plugins for metrics v2
HADOOP-7322. Minor bug reported by Bharath Mundlapudi and fixed by Bharath Mundlapudi
Adding a util method in FileUtil for JDK File.listFiles

Use of this new utility method avoids null result from File.listFiles(), and consequent NPEs.
HADOOP-7320. Major improvement reported by Daryn Sharp and fixed by Daryn Sharp
Refactor FsShell's copy & move commands
HADOOP-7316. Major improvement reported by Jonathan Hsieh and fixed by Eli Collins (documentation)
Add public javadocs to FSDataInputStream and FSDataOutputStream
HADOOP-7314. Major improvement reported by Jeffrey Naisbitt and fixed by Jeffrey Naisbitt
Add support for throwing UnknownHostException when a host doesn't resolve
HADOOP-7306. Major improvement reported by Luke Lu and fixed by Luke Lu (metrics)
Start metrics system even if config files are missing
HADOOP-7305. Minor improvement reported by Niels Basjes and fixed by Niels Basjes (build)
Eclipse project files are incomplete

Added missing library during creation of the eclipse project files.
HADOOP-7301. Major improvement reported by Jonathan Hsieh and fixed by Jonathan Hsieh
FSDataInputStream should expose a getWrappedStream method
HADOOP-7298. Major test reported by Todd Lipcon and fixed by Todd Lipcon (test)
Add test utility for writing multi-threaded tests
HADOOP-7292. Minor bug reported by Luke Lu and fixed by Luke Lu (metrics)
Metrics 2 TestSinkQueue is racy
HADOOP-7289. Major improvement reported by Tsz Wo (Nicholas), SZE and fixed by Eric Yang (build)
ivy: test conf should not extend common conf
HADOOP-7287. Blocker bug reported by Todd Lipcon and fixed by Aaron T. Myers (conf)
Configuration deprecation mechanism doesn't work properly for GenericOptionsParser/Tools
HADOOP-7286. Major improvement reported by Daryn Sharp and fixed by Daryn Sharp (fs)
Refactor FsShell's du/dus/df

The "Found X items" header on the output of the "du" command has been removed to more closely match unix. The displayed paths now correspond to the command line arguments instead of always being a fully qualified URI. For example, the output will have relative paths if the command line arguments are relative paths.
HADOOP-7285. Major improvement reported by Daryn Sharp and fixed by Daryn Sharp (fs)
Refactor FsShell's test
HADOOP-7284. Major bug reported by Sanjay Radia and fixed by Sanjay Radia (viewfs)
Trash and shell's rm does not work for viewfs
HADOOP-7282. Major bug reported by John George and fixed by John George (ipc)
getRemoteIp could return null in cases where the call is ongoing but the ip went away.
HADOOP-7276. Major bug reported by Trevor Robinson and fixed by Trevor Robinson (native)
Hadoop native builds fail on ARM due to -m32
HADOOP-7275. Major improvement reported by Daryn Sharp and fixed by Daryn Sharp (fs)
Refactor FsShell's stat
HADOOP-7271. Major improvement reported by Daryn Sharp and fixed by Daryn Sharp (fs)
Standardize error messages
HADOOP-7268. Major bug reported by Devaraj Das and fixed by Jitendra Nath Pandey (fs , security)
FileContext.getLocalFSFileContext() behavior needs to be fixed w.r.t tokens
HADOOP-7267. Major improvement reported by Daryn Sharp and fixed by Daryn Sharp (fs)
Refactor FsShell's rm/rmr/expunge
HADOOP-7265. Major improvement reported by Daryn Sharp and fixed by Daryn Sharp (fs)
Keep track of relative paths
HADOOP-7264. Major improvement reported by Luke Lu and fixed by Luke Lu (io)
Bump avro version to at least 1.4.1
HADOOP-7261. Major bug reported by Suresh Srinivas and fixed by Suresh Srinivas (test)
Disable IPV6 for junit tests
HADOOP-7259. Major bug reported by Owen O'Malley and fixed by Owen O'Malley (build)
contrib modules should include build.properties from parent.
HADOOP-7257. Major new feature reported by Sanjay Radia and fixed by Sanjay Radia
A client side mount table to give per-application/per-job file system view

viewfs - client-side mount table.
HADOOP-7251. Major improvement reported by Daryn Sharp and fixed by Daryn Sharp (fs)
Refactor FsShell's getmerge
HADOOP-7250. Major improvement reported by Daryn Sharp and fixed by Daryn Sharp (fs)
Refactor FsShell's setrep
HADOOP-7249. Major improvement reported by Daryn Sharp and fixed by Daryn Sharp (fs)
Refactor FsShell's chmod/chown/chgrp
HADOOP-7241. Minor improvement reported by Wei Yongjun and fixed by Wei Yongjun (fs , test)
fix typo of command 'hadoop fs -help tail'
HADOOP-7238. Major improvement reported by Daryn Sharp and fixed by Daryn Sharp (fs)
Refactor FsShell's cat & text
HADOOP-7237. Major improvement reported by Daryn Sharp and fixed by Daryn Sharp (fs)
Refactor FsShell's touchz
HADOOP-7236. Major improvement reported by Daryn Sharp and fixed by Daryn Sharp (fs)
Refactor FsShell's mkdir
HADOOP-7235. Major improvement reported by Daryn Sharp and fixed by Daryn Sharp
Refactor FsShell's tail
HADOOP-7233. Major improvement reported by Daryn Sharp and fixed by Daryn Sharp (fs)
Refactor FsShell's ls
HADOOP-7231. Major bug reported by Daryn Sharp and fixed by Daryn Sharp (util)
Fix synopsis for -count
HADOOP-7230. Major test reported by Daryn Sharp and fixed by Daryn Sharp (test)
Move -fs usage tests from hdfs into common
HADOOP-7227. Major improvement reported by Jitendra Nath Pandey and fixed by Jitendra Nath Pandey (ipc)
Remove protocol version check at proxy creation in Hadoop RPC.

1. Protocol version check is removed from proxy creation, instead version check is performed at server in every rpc call. 2. This change is backward incompatible because format of the rpc messages is changed to include client version, client method hash and rpc version. 3. rpc version is introduced which should change when the format of rpc messages is changed.
HADOOP-7223. Major bug reported by Suresh Srinivas and fixed by Suresh Srinivas (fs)
FileContext createFlag combinations during create are not clearly defined
HADOOP-7216. Major bug reported by Aaron T. Myers and fixed by Daryn Sharp (test)
HADOOP-7202 broke TestDFSShell in HDFS
HADOOP-7215. Blocker bug reported by Suresh Srinivas and fixed by Suresh Srinivas (security)
RPC clients must connect over a network interface corresponding to the host name in the client's kerberos principal key
HADOOP-7214. Major new feature reported by Aaron T. Myers and fixed by Aaron T. Myers
Hadoop /usr/bin/groups equivalent
HADOOP-7210. Major bug reported by Uma Maheswara Rao G and fixed by Uma Maheswara Rao G (fs)
Chown command is not working from FSShell.
HADOOP-7209. Major improvement reported by Olga Natkovich and fixed by Daryn Sharp
Extensions to FsShell
HADOOP-7208. Major bug reported by Uma Maheswara Rao G and fixed by Uma Maheswara Rao G
equals() and hashCode() implementation need to change in StandardSocketFactory
HADOOP-7206. Major new feature reported by Eli Collins and fixed by Alejandro Abdelnur
Integrate Snappy compression
HADOOP-7205. Trivial improvement reported by Daryn Sharp and fixed by Daryn Sharp
automatically determine JAVA_HOME on OS X
HADOOP-7202. Major improvement reported by Daryn Sharp and fixed by Daryn Sharp
Improve Command base class
HADOOP-7194. Major bug reported by Devaraj K and fixed by Devaraj K (io)
Potential Resource leak in IOUtils.java
HADOOP-7193. Minor improvement reported by Uma Maheswara Rao G and fixed by Uma Maheswara Rao G (fs)
Help message is wrong for touchz command.

Updated the help for the touchz command.
HADOOP-7187. Major bug reported by Uma Maheswara Rao G and fixed by Uma Maheswara Rao G (metrics)
Socket Leak in org.apache.hadoop.metrics.ganglia.GangliaContext
HADOOP-7180. Minor improvement reported by Daryn Sharp and fixed by Daryn Sharp (fs)
Improve CommandFormat
HADOOP-7178. Major bug reported by Uma Maheswara Rao G and fixed by Uma Maheswara Rao G (fs)
FileSystem should have an option to control the .crc file creations at Local.
HADOOP-7177. Trivial improvement reported by Allen Wittenauer and fixed by Allen Wittenauer (native)
CodecPool should report which compressor it is using
HADOOP-7176. Major bug reported by Daryn Sharp and fixed by Daryn Sharp
Redesign FsShell
HADOOP-7175. Major bug reported by Daryn Sharp and fixed by Daryn Sharp (fs)
Add isEnabled() to Trash
HADOOP-7174. Minor bug reported by Uma Maheswara Rao G and fixed by Uma Maheswara Rao G (fs)
null is displayed in the console,if the src path is invalid while doing copyToLocal operation from commandLine
HADOOP-7172. Critical bug reported by Todd Lipcon and fixed by Todd Lipcon (io , security)
SecureIO should not check owner on non-secure clusters that have no native support
HADOOP-7171. Major bug reported by Owen O'Malley and fixed by Jitendra Nath Pandey (security)
Support UGI in FileContext API
HADOOP-7167. Minor improvement reported by Todd Lipcon and fixed by Todd Lipcon
Allow using a file to exclude certain tests from build
HADOOP-7162. Minor bug reported by Alexey Diomin and fixed by Alexey Diomin (fs)
FsShell: call srcFs.listStatus(src) twice
HADOOP-7159. Trivial improvement reported by Scott Chen and fixed by Scott Chen (ipc)
RPC server should log the client hostname when read exception happened
HADOOP-7153. Minor improvement reported by Nicholas Telford and fixed by Nicholas Telford (io)
MapWritable violates contract of Map interface for equals() and hashCode()

MapWritable now implements equals() and hashCode() based on the map contents rather than object identity in order to correctly implement the Map interface.
HADOOP-7151. Minor bug reported by Dmitriy V. Ryaboy and fixed by Dmitriy V. Ryaboy
Document need for stable hashCode() in WritableComparable
HADOOP-7144. Major new feature reported by Luke Lu and fixed by Robert Joseph Evans
Expose JMX with something like JMXProxyServlet
HADOOP-7136. Major task reported by Nigel Daley and fixed by Nigel Daley
Remove failmon contrib

Failmon removed from contrib codebase.
HADOOP-7133. Major improvement reported by Matt Foley and fixed by Matt Foley (util)
CLONE to COMMON - HDFS-1445 Batch the calls in DataStorage to FileUtil.createHardLink(), so we call it once per directory instead of once per file

This is the COMMON portion of a fix requiring coordinated change of COMMON and HDFS. Please see HDFS-1445 for HDFS portion and release note.
HADOOP-7131. Minor improvement reported by Uma Maheswara Rao G and fixed by Uma Maheswara Rao G (io)
set() and toString Methods of the org.apache.hadoop.io.Text class does not include the root exception, in the wrapping RuntimeException.
HADOOP-7120. Major bug reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (test)
200 new Findbugs warnings
HADOOP-7119. Major new feature reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur (security)
add Kerberos HTTP SPNEGO authentication support to Hadoop JT/NN/DN/TT web-consoles

Adding support for Kerberos HTTP SPNEGO authentication to the Hadoop web-consoles
HADOOP-7117. Major improvement reported by Patrick Angeles and fixed by Harsh J (conf)
Move secondary namenode checkpoint configs from core-default.xml to hdfs-default.xml

Removed references to the older fs.checkpoint.* properties that resided in core-site.xml
HADOOP-7114. Minor improvement reported by Todd Lipcon and fixed by Todd Lipcon (fs)
FsShell should dump all exceptions at DEBUG level
HADOOP-7112. Major improvement reported by Tom White and fixed by Tom White (conf , filecache)
Issue a warning when GenericOptionsParser libjars are not on local filesystem
HADOOP-7111. Critical bug reported by Todd Lipcon and fixed by Aaron T. Myers (io)
Several TFile tests failing when native libraries are present
HADOOP-7098. Major bug reported by Bernd Fondermann and fixed by Bernd Fondermann (conf)
tasktracker property not set in conf/hadoop-env.sh
HADOOP-7096. Major improvement reported by Ahmed Radwan and fixed by Ahmed Radwan
Allow setting of end-of-record delimiter for TextInputFormat
HADOOP-7090. Major bug reported by Gokul and fixed by Uma Maheswara Rao G (fs/s3 , io)
Possible resource leaks in hadoop core code
HADOOP-7089. Minor bug reported by Eli Collins and fixed by Eli Collins (scripts)
Fix link resolution logic in hadoop-config.sh

Updates hadoop-config.sh to always resolve symlinks when determining HADOOP_HOME. Bash built-ins or POSIX:2001 compliant cmds are now required.
HADOOP-7078. Trivial improvement reported by Todd Lipcon and fixed by Harsh J
Add better javadocs for RawComparator interface
HADOOP-7071. Minor bug reported by Nigel Daley and fixed by Nigel Daley (build)
test-patch.sh has bad ps arg
HADOOP-7061. Minor improvement reported by Jingguo Yao and fixed by Jingguo Yao (io)
unprecise javadoc for CompressionCodec
HADOOP-7060. Major improvement reported by Hairong Kuang and fixed by Patrick Kling (fs)
A more elegant FileSystem#listCorruptFileBlocks API
HADOOP-7059. Major improvement reported by Noah Watkins and fixed by Noah Watkins (native)
Remove "unused" warning in native code

Adds __attribute__ ((unused))
HADOOP-7058. Trivial improvement reported by Todd Lipcon and fixed by Todd Lipcon
Expose number of bytes in FSOutputSummer buffer to implementatins
HADOOP-7057. Minor bug reported by Konstantin Boudnik and fixed by Konstantin Boudnik (util)
IOUtils.readFully and IOUtils.skipFully have typo in exception creation's message
HADOOP-7055. Major bug reported by Jingguo Yao and fixed by Jingguo Yao (metrics)
Update of commons logging libraries causes EventCounter to count logging events incorrectly
HADOOP-7053. Minor bug reported by Jingguo Yao and fixed by Jingguo Yao (conf)
wrong FSNamesystem Audit logging setting in conf/log4j.properties
HADOOP-7052. Major bug reported by Jingguo Yao and fixed by Jingguo Yao (conf)
misspelling of threshold in conf/log4j.properties
HADOOP-7049. Trivial improvement reported by Patrick Kling and fixed by Patrick Kling (conf)
TestReconfiguration should be junit v4
HADOOP-7048. Minor improvement reported by Jingguo Yao and fixed by Jingguo Yao (io)
Wrong description of Block-Compressed SequenceFile Format in SequenceFile's javadoc
HADOOP-7046. Blocker bug reported by Nigel Daley and fixed by Po Cheung (security)
1 Findbugs warning on trunk and branch-0.22
HADOOP-7045. Minor bug reported by Eli Collins and fixed by Eli Collins (fs)
TestDU fails on systems with local file systems with extended attributes
HADOOP-7042. Minor improvement reported by Nigel Daley and fixed by Nigel Daley (test)
Update test-patch.sh to include failed test names and move test-patch.properties
HADOOP-7023. Major improvement reported by Patrick Kling and fixed by Patrick Kling
Add listCorruptFileBlocks to FileSystem

Add a new API listCorruptFileBlocks to FIleContext that returns a list of files that have corrupt blocks.
HADOOP-7015. Minor bug reported by Sanjay Radia and fixed by Sanjay Radia
RawLocalFileSystem#listStatus does not deal with a directory whose entries are changing ( e.g. in a multi-thread or multi-process environment)
HADOOP-7014. Major improvement reported by Konstantin Boudnik and fixed by Konstantin Boudnik (test)
Generalize CLITest structure and interfaces to facilitate upstream adoption (e.g. for web testing)
HADOOP-7001. Major task reported by Patrick Kling and fixed by Patrick Kling (conf)
Allow configuration changes without restarting configured nodes
HADOOP-6994. Major improvement reported by Jitendra Nath Pandey and fixed by Jitendra Nath Pandey
Api to get delegation token in AbstractFileSystem
HADOOP-6949. Major improvement reported by Navis and fixed by Matt Foley (io)
Reduces RPC packet size for primitive arrays, especially long[], which is used at block reporting

Increments the RPC protocol version in org.apache.hadoop.ipc.Server from 4 to 5. Introduces ArrayPrimitiveWritable for a much more efficient wire format to transmit arrays of primitives over RPC. ObjectWritable uses the new writable for array of primitives for RPC and continues to use existing format for on-disk data.
HADOOP-6939. Minor bug reported by Todd Lipcon and fixed by Todd Lipcon
Inconsistent lock ordering in AbstractDelegationTokenSecretManager
HADOOP-6929. Major improvement reported by Sharad Agarwal and fixed by Sharad Agarwal (ipc , security)
RPC should have a way to pass Security information other than protocol annotations
HADOOP-6921. Major sub-task reported by Luke Lu and fixed by Luke Lu
metrics2: metrics plugins

Metrics names are standardized to CapitalizedCamelCase. See release note of HADOOP-6918 and HADOOP-6920.
HADOOP-6920. Major sub-task reported by Luke Lu and fixed by Luke Lu
Metrics2: metrics instrumentation

Metrics names are standardized to use CapitalizedCamelCase. Some examples of this is: # Metrics names using "_" is changed to new naming scheme. Eg: bytes_written changes to BytesWritten. # All metrics names start with capitals. Example: threadsBlocked changes to ThreadsBlocked.
HADOOP-6919. Major sub-task reported by Luke Lu and fixed by Luke Lu (metrics)
Metrics2: metrics framework

New metrics2 framework for Hadoop.
HADOOP-6912. Major bug reported by Kan Zhang and fixed by Kan Zhang (security)
Guard against NPE when calling UGI.isLoginKeytabBased()
HADOOP-6904. Major new feature reported by Hairong Kuang and fixed by Hairong Kuang (ipc)
A baby step towards inter-version RPC communications
HADOOP-6889. Major new feature reported by Hairong Kuang and fixed by John George (ipc)
Make RPC to have an option to timeout
HADOOP-6887. Major improvement reported by Bharath Mundlapudi and fixed by Luke Lu (metrics)
Need a separate metrics per garbage collector
HADOOP-6864. Major improvement reported by Erik Steffl and fixed by Boris Shkolnik (security)
Provide a JNI-based implementation of ShellBasedUnixGroupsNetgroupMapping (implementation of GroupMappingServiceProvider)
HADOOP-6764. Major improvement reported by Dmytro Molkov and fixed by Dmytro Molkov (ipc)
Add number of reader threads and queue length as configuration parameters in RPC.getServer
HADOOP-6754. Major bug reported by Aaron Kimball and fixed by Aaron Kimball (io)
DefaultCodec.createOutputStream() leaks memory
HADOOP-6683. Minor sub-task reported by Kang Xiao and fixed by Kang Xiao (io)
the first optimization: ZlibCompressor does not fully utilize the buffer

Improve the buffer utilization of ZlibCompressor to avoid invoking a JNI per write request.
HADOOP-6671. Major sub-task reported by Giridharan Kesavan and fixed by Alejandro Abdelnur (build)
To use maven for hadoop common builds
HADOOP-6622. Major bug reported by Jitendra Nath Pandey and fixed by Eli Collins (security)
Token should not print the password in toString.
HADOOP-6578. Minor improvement reported by Todd Lipcon and fixed by Michele Catasta (conf)
Configuration should trim whitespace around a lot of value types
HADOOP-6508. Major bug reported by Amareshwari Sriramadasu and fixed by Luke Lu (metrics)
Incorrect values for metrics with CompositeContext
HADOOP-6436. Major improvement reported by Eli Collins and fixed by Roman Shaposhnik
Remove auto-generated native build files

The native build run when from trunk now requires autotools, libtool and openssl dev libraries.
HADOOP-6432. Major new feature reported by Jitendra Nath Pandey and fixed by Jitendra Nath Pandey
Statistics support in FileContext
HADOOP-6385. Minor new feature reported by Scott Phillips and fixed by Daryn Sharp (fs)
dfs does not support -rmdir (was HDFS-639)

The "rm" family of FsShell commands now supports -rmdir and -f options.
HADOOP-6376. Minor improvement reported by Karthik K and fixed by Karthik K (conf)
slaves file to have a header specifying the format of conf/slaves file
HADOOP-6255. Major new feature reported by Owen O'Malley and fixed by Eric Yang
Create an rpm integration project

Added RPM/DEB packages to build system.
HADOOP-6158. Minor task reported by Owen O'Malley and fixed by Eli Collins (util)
Move CyclicIteration to HDFS
HADOOP-5647. Major bug reported by Ravi Gummadi and fixed by Ravi Gummadi (test)
TestJobHistory fails if /tmp/_logs is not writable to. Testcase should not depend on /tmp

Removed dependency of testcase on /tmp and made it to use test.build.data directory instead.
HADOOP-2081. Major bug reported by Owen O'Malley and fixed by Harsh J (conf)
Configuration getInt, getLong, and getFloat replace invalid numbers with the default value

Invalid configuration values now result in a number format exception rather than the default value being used.
HADOOP-1886. Trivial improvement reported by Konstantin Shvachko and fixed by Frank Conrad (fs)
Undocumented parameters in FilesSystem