Hadoop 2.1.1-beta Release Notes
These release notes include new developer and user-facing incompatibilities, features, and major improvements.
Changes since Hadoop 2.1.0-beta
- YARN-1194.
Minor bug reported by Roman Shaposhnik and fixed by Roman Shaposhnik (nodemanager)
TestContainerLogsPage fails with native builds
Running TestContainerLogsPage on trunk while Native IO is enabled makes it fail
- YARN-1189.
Blocker bug reported by Jason Lowe and fixed by Omkar Vinit Joshi
NMTokenSecretManagerInNM is not being told when applications have finished
The {{appFinished}} method is not being called when applications have finished. This causes a couple of leaks as {{oldMasterKeys}} and {{appToAppAttemptMap}} are never being pruned.
- YARN-1184.
Major bug reported by J.Andreina and fixed by Chris Douglas (capacityscheduler , resourcemanager)
ClassCastException is thrown during preemption When a huge job is submitted to a queue B whose resources is used by a job in queueA
preemption is enabled.
Queue = a,b
a capacity = 30%
b capacity = 70%
Step 1: Assign a big job to queue a ( so that job_a will utilize some resources from queue b)
Step 2: Assigne a big job to queue b.
Following exception is thrown at Resource Manager
{noformat}
2013-09-12 10:42:32,535 ERROR [SchedulingMonitor (ProportionalCapacityPreemptionPolicy)] yarn.YarnUncaughtExceptionHandler (YarnUncaughtExceptionHandler.java:uncaughtException(68)) - Thread Thread[SchedulingMonitor (ProportionalCapacityPreemptionPolicy),5,main] threw an Exception.
java.lang.ClassCastException: java.util.Collections$UnmodifiableSet cannot be cast to java.util.NavigableSet
at org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy.getContainersToPreempt(ProportionalCapacityPreemptionPolicy.java:403)
at org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy.containerBasedPreemptOrKill(ProportionalCapacityPreemptionPolicy.java:202)
at org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy.editSchedule(ProportionalCapacityPreemptionPolicy.java:173)
at org.apache.hadoop.yarn.server.resourcemanager.monitor.SchedulingMonitor.invokePolicy(SchedulingMonitor.java:72)
at org.apache.hadoop.yarn.server.resourcemanager.monitor.SchedulingMonitor$PreemptionChecker.run(SchedulingMonitor.java:82)
at java.lang.Thread.run(Thread.java:662)
{noformat}
- YARN-1176.
Critical bug reported by Thomas Graves and fixed by Jonathan Eagles (resourcemanager)
RM web services ClusterMetricsInfo total nodes doesn't include unhealthy nodes
In the web services api for the cluster/metrics, the totalNodes reported doesn't include the unhealthy nodes.
this.totalNodes = activeNodes + lostNodes + decommissionedNodes
+ rebootedNodes;
- YARN-1170.
Blocker bug reported by Arun C Murthy and fixed by Binglin Chang
yarn proto definitions should specify package as 'hadoop.yarn'
yarn proto definitions should specify package as 'hadoop.yarn' similar to protos with 'hadoop.common' & 'hadoop.hdfs' in Common & HDFS respectively.
- YARN-1152.
Blocker bug reported by Jason Lowe and fixed by Jason Lowe (resourcemanager)
Invalid key to HMAC computation error when getting application report for completed app attempt
On a secure cluster, an invalid key to HMAC error is thrown when trying to get an application report for an application with an attempt that has unregistered.
- YARN-1144.
Critical bug reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur (resourcemanager)
Unmanaged AMs registering a tracking URI should not be proxy-fied
Unmanaged AMs do not run in the cluster, their tracking URL should not be proxy-fied.
- YARN-1137.
Major improvement reported by Alejandro Abdelnur and fixed by Roman Shaposhnik (nodemanager)
Add support whitelist for system users to Yarn container-executor.c
Currently container-executor.c has a banned set of users (mapred, hdfs & bin) and configurable min.user.id (defaulting to 1000).
This presents a problem for systems that run as system users (below 1000) if these systems want to start containers.
Systems like Impala fit in this category. A (local) 'impala' system user is created when installing Impala on the nodes.
Note that the same thing happens when installing system like HDFS, Yarn, Oozie, from packages (Bigtop); local system users are created.
For Impala to be able to run containers in a secure cluster, the 'impala' system user must whitelisted.
For this, adding a configuration 'allowed.system.users' option in the container-executor.cfg and the logic in container-executor.c would allow the usernames in that list.
Because system users are not guaranteed to have the same UID in different machines, the 'allowed.system.users' property should use usernames and not UIDs.
- YARN-1124.
Blocker bug reported by Omkar Vinit Joshi and fixed by Xuan Gong
By default yarn application -list should display all the applications in a state other than FINISHED / FAILED
Today we are just listing application in RUNNING state by default for "yarn application -list". Instead we should show all the applications which are either submitted/accepted/running.
- YARN-1120.
Minor bug reported by Chuan Liu and fixed by Chuan Liu
Make ApplicationConstants.Environment.USER definition OS neutral
In YARN-557, we added some code to make {{ApplicationConstants.Environment.USER}} has OS-specific definition in order to fix the unit test TestUnmanagedAMLauncher. In YARN-571, the relevant test code was corrected. In YARN-602, we actually will explicitly set the environment variables for the child containers. With these changes, I think we can revert the YARN-557 change to make {{ApplicationConstants.Environment.USER}} OS neutral. The main benefit is that we can use the same method over the Enum constants. This should also fix the TestContainerLaunch#testContainerEnvVariables failure on Windows.
- YARN-1117.
Major improvement reported by Tassapol Athiapinya and fixed by Xuan Gong (client)
Improve help message for $ yarn applications and $yarn node
There is standardization of help message in YARN-1080. It is nice to have similar changes for $ yarn appications and yarn node
- YARN-1116.
Major sub-task reported by Jian He and fixed by Jian He (resourcemanager)
Populate AMRMTokens back to AMRMTokenSecretManager after RM restarts
The AMRMTokens are now only saved in RMStateStore and not populated back to AMRMTokenSecretManager after RM restarts. This is more needed now since AMRMToken also becomes used in non-secure env.
- YARN-1107.
Blocker bug reported by Arpit Gupta and fixed by Omkar Vinit Joshi (resourcemanager)
Job submitted with Delegation token in secured environment causes RM to fail during RM restart
If secure RM with recovery enabled is restarted while oozie jobs are running rm fails to come up.
- YARN-1101.
Major bug reported by Robert Parker and fixed by Robert Parker (resourcemanager)
Active nodes can be decremented below 0
The issue is in RMNodeImpl where both RUNNING and UNHEALTHY states that transition to a deactive state (LOST, DECOMMISSIONED, REBOOTED) use the same DeactivateNodeTransition class. The DeactivateNodeTransition class naturally decrements the active node, however the in cases where the node has transition to UNHEALTHY the active count has already been decremented.
- YARN-1094.
Blocker bug reported by Yesha Vora and fixed by Vinod Kumar Vavilapalli
RM restart throws Null pointer Exception in Secure Env
Enable rmrestart feature And restart Resorce Manager while a job is running.
Resorce Manager fails to start with below error
2013-08-23 17:57:40,705 INFO resourcemanager.RMAppManager (RMAppManager.java:recover(370)) - Recovering application application_1377280618693_0001
2013-08-23 17:57:40,763 ERROR resourcemanager.ResourceManager (ResourceManager.java:serviceStart(617)) - Failed to load/recover state
java.lang.NullPointerException
at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.setTimerForTokenRenewal(DelegationTokenRenewer.java:371)
at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.addApplication(DelegationTokenRenewer.java:307)
at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:291)
at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:371)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:819)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:613)
at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:832)
2013-08-23 17:57:40,766 INFO util.ExitUtil (ExitUtil.java:terminate(124)) - Exiting with status 1
- YARN-1093.
Major bug reported by Wing Yew Poon and fixed by (documentation)
Corrections to Fair Scheduler documentation
The fair scheduler is still evolving, but the current documentation contains some inaccuracies.
- YARN-1085.
Blocker task reported by Jaimin D Jetly and fixed by Omkar Vinit Joshi (nodemanager , resourcemanager)
Yarn and MRv2 should do HTTP client authentication in kerberos setup.
In kerberos setup it's expected for a http client to authenticate to kerberos before allowing user to browse any information.
- YARN-1083.
Major bug reported by Yesha Vora and fixed by Zhijie Shen (resourcemanager)
ResourceManager should fail when yarn.nm.liveness-monitor.expiry-interval-ms is set less than heartbeat interval
if 'yarn.nm.liveness-monitor.expiry-interval-ms' is set to less than heartbeat iterval, all the node managers will be added in 'Lost Nodes'
Instead, Resource Manager should validate these property and It should fail to start if combination of such property is invalid.
- YARN-1082.
Blocker bug reported by Arpit Gupta and fixed by Vinod Kumar Vavilapalli (resourcemanager)
Secure RM with recovery enabled and rm state store on hdfs fails with gss exception
- YARN-1081.
Minor improvement reported by Tassapol Athiapinya and fixed by Akira AJISAKA (client)
Minor improvement to output header for $ yarn node -list
Output of $ yarn node -list shows number of running containers at each node. I found a case when new user of YARN thinks that this is container ID, use it later in other YARN commands and find an error due to misunderstanding.
{code:title=current output}
2013-07-31 04:00:37,814|beaver.machine|INFO|RUNNING: /usr/bin/yarn node -list
2013-07-31 04:00:38,746|beaver.machine|INFO|Total Nodes:1
2013-07-31 04:00:38,747|beaver.machine|INFO|Node-Id Node-State Node-Http-Address Running-Containers
2013-07-31 04:00:38,747|beaver.machine|INFO|myhost:45454 RUNNING myhost:50060 2
{code}
{code:title=proposed output}
2013-07-31 04:00:37,814|beaver.machine|INFO|RUNNING: /usr/bin/yarn node -list
2013-07-31 04:00:38,746|beaver.machine|INFO|Total Nodes:1
2013-07-31 04:00:38,747|beaver.machine|INFO|Node-Id Node-State Node-Http-Address Number-of-Running-Containers
2013-07-31 04:00:38,747|beaver.machine|INFO|myhost:45454 RUNNING myhost:50060 2
{code}
- YARN-1080.
Major improvement reported by Tassapol Athiapinya and fixed by Xuan Gong (client)
Improve help message for $ yarn logs
There are 2 parts I am proposing in this jira. They can be fixed together in one patch.
1. Standardize help message for required parameter of $ yarn logs
YARN CLI has a command "logs" ($ yarn logs). The command always requires a parameter of "-applicationId <arg>". However, help message of the command does not make it clear. It lists -applicationId as optional parameter. If I don't set it, YARN CLI will complain this is missing. It is better to use standard required notation used in other Linux command for help message. Any user familiar to the command can understand that this parameter is needed more easily.
{code:title=current help message}
-bash-4.1$ yarn logs
usage: general options are:
-applicationId <arg> ApplicationId (required)
-appOwner <arg> AppOwner (assumed to be current user if not
specified)
-containerId <arg> ContainerId (must be specified if node address is
specified)
-nodeAddress <arg> NodeAddress in the format nodename:port (must be
specified if container id is specified)
{code}
{code:title=proposed help message}
-bash-4.1$ yarn logs
usage: yarn logs -applicationId <application ID> [OPTIONS]
general options are:
-appOwner <arg> AppOwner (assumed to be current user if not
specified)
-containerId <arg> ContainerId (must be specified if node address is
specified)
-nodeAddress <arg> NodeAddress in the format nodename:port (must be
specified if container id is specified)
{code}
2. Add description for help command. As far as I know, a user cannot get logs for running job. Since I spent some time trying to get logs of running applications, it should be nice to say this in command description.
{code:title=proposed help}
Retrieve logs for completed/killed YARN application
usage: general options are...
{code}
- YARN-1078.
Minor bug reported by Chuan Liu and fixed by Chuan Liu
TestNodeManagerResync, TestNodeManagerShutdown, and TestNodeStatusUpdater fail on Windows
The three unit tests fail on Windows due to host name resolution differences on Windows, i.e. 127.0.0.1 does not resolve to host name "localhost".
{noformat}
org.apache.hadoop.security.token.SecretManager$InvalidToken: Given Container container_0_0000_01_000000 identifier is not valid for current Node manager. Expected : 127.0.0.1:12345 Found : localhost:12345
{noformat}
{noformat}
testNMConnectionToRM(org.apache.hadoop.yarn.server.nodemanager.TestNodeStatusUpdater) Time elapsed: 8343 sec <<< FAILURE!
org.junit.ComparisonFailure: expected:<[localhost]:12345> but was:<[127.0.0.1]:12345>
at org.junit.Assert.assertEquals(Assert.java:125)
at org.junit.Assert.assertEquals(Assert.java:147)
at org.apache.hadoop.yarn.server.nodemanager.TestNodeStatusUpdater$MyResourceTracker6.registerNodeManager(TestNodeStatusUpdater.java:712)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:101)
at $Proxy26.registerNodeManager(Unknown Source)
at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.registerWithRM(NodeStatusUpdaterImpl.java:212)
at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.serviceStart(NodeStatusUpdaterImpl.java:149)
at org.apache.hadoop.yarn.server.nodemanager.TestNodeStatusUpdater$MyNodeStatusUpdater4.serviceStart(TestNodeStatusUpdater.java:369)
at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
at org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:101)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceStart(NodeManager.java:213)
at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
at org.apache.hadoop.yarn.server.nodemanager.TestNodeStatusUpdater.testNMConnectionToRM(TestNodeStatusUpdater.java:985)
{noformat}
- YARN-1077.
Minor bug reported by Chuan Liu and fixed by Chuan Liu
TestContainerLaunch fails on Windows
Several cases in this unit tests fail on Windows. (Append error log at the end.)
testInvalidEnvSyntaxDiagnostics fails because the difference between cmd and bash script error handling. If some command fails in the cmd script, cmd will continue execute the the rest of the script command. Error handling needs to be explicitly carried out in the script file. The error code of the last command will be returned as the error code of the whole script. In this test, some error happened in the middle of the cmd script, the test expect an exception and non-zero error code. In the cmd script, the intermediate errors are ignored. The last command "call" succeeded and there is no exception.
testContainerLaunchStdoutAndStderrDiagnostics fails due to wrong cmd commands used by the test.
testContainerEnvVariables and testDelayedKill fail due to a regression from YARN-906.
{noformat}
-------------------------------------------------------------------------------
Test set: org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch
-------------------------------------------------------------------------------
Tests run: 7, Failures: 4, Errors: 0, Skipped: 0, Time elapsed: 11.526 sec <<< FAILURE!
testInvalidEnvSyntaxDiagnostics(org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch) Time elapsed: 583 sec <<< FAILURE!
junit.framework.AssertionFailedError: Should catch exception
at junit.framework.Assert.fail(Assert.java:50)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch.testInvalidEnvSyntaxDiagnostics(TestContainerLaunch.java:269)
...
testContainerLaunchStdoutAndStderrDiagnostics(org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch) Time elapsed: 561 sec <<< FAILURE!
junit.framework.AssertionFailedError: Should catch exception
at junit.framework.Assert.fail(Assert.java:50)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch.testContainerLaunchStdoutAndStderrDiagnostics(TestContainerLaunch.java:314)
...
testContainerEnvVariables(org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch) Time elapsed: 4136 sec <<< FAILURE!
junit.framework.AssertionFailedError: expected:<137> but was:<143>
at junit.framework.Assert.fail(Assert.java:50)
at junit.framework.Assert.failNotEquals(Assert.java:287)
at junit.framework.Assert.assertEquals(Assert.java:67)
at junit.framework.Assert.assertEquals(Assert.java:199)
at junit.framework.Assert.assertEquals(Assert.java:205)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch.testContainerEnvVariables(TestContainerLaunch.java:500)
...
testDelayedKill(org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch) Time elapsed: 2744 sec <<< FAILURE!
junit.framework.AssertionFailedError: expected:<137> but was:<143>
at junit.framework.Assert.fail(Assert.java:50)
at junit.framework.Assert.failNotEquals(Assert.java:287)
at junit.framework.Assert.assertEquals(Assert.java:67)
at junit.framework.Assert.assertEquals(Assert.java:199)
at junit.framework.Assert.assertEquals(Assert.java:205)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch.testDelayedKill(TestContainerLaunch.java:601)
...
{noformat}
- YARN-1074.
Major improvement reported by Tassapol Athiapinya and fixed by Xuan Gong (client)
Clean up YARN CLI app list to show only running apps.
Once a user brings up YARN daemon, runs jobs, jobs will stay in output returned by $ yarn application -list even after jobs complete already. We want YARN command line to clean up this list. Specifically, we want to remove applications with FINISHED state(not Final-State) or KILLED state from the result.
{code}
[user1@host1 ~]$ yarn application -list
Total Applications:150
Application-Id Application-Name Application-Type User Queue State Final-State Progress Tracking-URL
application_1374638600275_0109 Sleep job MAPREDUCE user1 default KILLED KILLED 100% host1:54059
application_1374638600275_0121 Sleep job MAPREDUCE user1 default FINISHED SUCCEEDED 100% host1:19888/jobhistory/job/job_1374638600275_0121
application_1374638600275_0020 Sleep job MAPREDUCE user1 default FINISHED SUCCEEDED 100% host1:19888/jobhistory/job/job_1374638600275_0020
application_1374638600275_0038 Sleep job MAPREDUCE user1 default
....
{code}
- YARN-1049.
Blocker bug reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur (api)
ContainerExistStatus should define a status for preempted containers
With the current behavior is impossible to determine if a container has been preempted or lost due to a NM crash.
Adding a PREEMPTED exit status (-102) will help an AM determine that a container has been preempted.
Note the change of scope from the original summary/description. The original scope proposed API/behavior changes. Because we are passed 2.1.0-beta I'm reducing the scope of this JIRA.
- YARN-1034.
Trivial task reported by Sandy Ryza and fixed by Karthik Kambatla (documentation , scheduler)
Remove "experimental" in the Fair Scheduler documentation
The YARN Fair Scheduler is largely stable now, and should no longer be declared experimental.
- YARN-1025.
Major bug reported by Chris Nauroth and fixed by Chris Nauroth (nodemanager , resourcemanager)
ResourceManager and NodeManager do not load native libraries on Windows.
ResourceManager and NodeManager do not have the correct setting for java.library.path when launched on Windows. This prevents the processes from loading native code from hadoop.dll. The native code is required for correct functioning on Windows (not optional), so this ultimately can cause failures.
- YARN-1008.
Major bug reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur (nodemanager)
MiniYARNCluster with multiple nodemanagers, all nodes have same key for allocations
While the NMs are keyed using the NodeId, the allocation is done based on the hostname.
This makes the different nodes indistinguishable to the scheduler.
There should be an option to enabled the host:port instead just port for allocations. The nodes reported to the AM should report the 'key' (host or host:port).
- YARN-1006.
Major bug reported by Jian He and fixed by Xuan Gong
Nodes list web page on the RM web UI is broken
The nodes web page which list all the connected nodes of the cluster is broken.
1. The page is not showing in correct format/style.
2. If we restart the NM, the node list is not refreshed, but just add the new started NM to the list. The old NMs information still remain.
- YARN-1001.
Blocker task reported by Srimanth Gunturi and fixed by Zhijie Shen (api)
YARN should provide per application-type and state statistics
In Ambari we plan to show for MR2 the number of applications finished, running, waiting, etc. It would be efficient if YARN could provide per application-type and state aggregated counts.
- YARN-994.
Major bug reported by Xuan Gong and fixed by Xuan Gong
HeartBeat thread in AMRMClientAsync does not handle runtime exception correctly
YARN-654 performs sanity checks for parameters of public methods in AMRMClient. Those may create runtime exception.
Currently, heartBeat thread in AMRMClientAsync only captures IOException and YarnException, and will not handle Runtime Exception properly.
Possible solution can be: heartbeat thread will catch throwable and notify the callbackhandler thread via existing savedException
- YARN-981.
Major bug reported by Xuan Gong and fixed by Jian He
YARN/MR2/Job-history /logs link does not have correct content
- YARN-966.
Major bug reported by Zhijie Shen and fixed by Zhijie Shen
The thread of ContainerLaunch#call will fail without any signal if getLocalizedResources() is called when the container is not at LOCALIZED
In ContainerImpl.getLocalizedResources(), there's:
{code}
assert ContainerState.LOCALIZED == getContainerState(); // TODO: FIXME!!
{code}
ContainerImpl.getLocalizedResources() is called in ContainerLaunch.call(), which is scheduled on a separate thread. If the container is not at LOCALIZED (e.g. it is at KILLING, see YARN-906), an AssertError will be thrown and fails the thread without notifying NM. Therefore, the container cannot receive more events, which are supposed to be sent from ContainerLaunch.call(), and move towards completion.
- YARN-957.
Blocker bug reported by Omkar Vinit Joshi and fixed by Omkar Vinit Joshi
Capacity Scheduler tries to reserve the memory more than what node manager reports.
I have 2 node managers.
* one with 1024 MB memory.(nm1)
* second with 2048 MB memory.(nm2)
I am submitting simple map reduce application with 1 mapper and one reducer with 1024mb each. The steps to reproduce this are
* stop nm2 with 2048MB memory.( This I am doing to make sure that this node's heartbeat doesn't reach RM first).
* now submit application. As soon as it receives first node's (nm1) heartbeat it will try to reserve memory for AM-container (2048MB). However it has only 1024MB of memory.
* now start nm2 with 2048 MB memory.
It hangs forever... Ideally this has two potential issues.
* It should not try to reserve memory on a node manager which is never going to give requested memory. i.e. Current max capability of node manager is 1024MB but 2048MB is reserved on it. But it still does that.
* Say 2048MB is reserved on nm1 but nm2 comes back with 2048MB available memory. In this case if the original request was made without any locality then scheduler should unreserve memory on nm1 and allocate requested 2048MB container on nm2.
- YARN-948.
Major bug reported by Omkar Vinit Joshi and fixed by Omkar Vinit Joshi
RM should validate the release container list before actually releasing them
At present we are blinding passing the allocate request containing containers to be released to the scheduler. This may result into one application releasing another application's container.
{code}
@Override
@Lock(Lock.NoLock.class)
public Allocation allocate(ApplicationAttemptId applicationAttemptId,
List<ResourceRequest> ask, List<ContainerId> release,
List<String> blacklistAdditions, List<String> blacklistRemovals) {
FiCaSchedulerApp application = getApplication(applicationAttemptId);
....
....
// Release containers
for (ContainerId releasedContainerId : release) {
RMContainer rmContainer = getRMContainer(releasedContainerId);
if (rmContainer == null) {
RMAuditLogger.logFailure(application.getUser(),
AuditConstants.RELEASE_CONTAINER,
"Unauthorized access or invalid container", "CapacityScheduler",
"Trying to release container not owned by app or with invalid id",
application.getApplicationId(), releasedContainerId);
}
completedContainer(rmContainer,
SchedulerUtils.createAbnormalContainerStatus(
releasedContainerId,
SchedulerUtils.RELEASED_CONTAINER),
RMContainerEventType.RELEASED);
}
{code}
Current checks are not sufficient and we should prevent this..... thoughts?
- YARN-942.
Major bug reported by Sandy Ryza and fixed by Akira AJISAKA (scheduler)
In Fair Scheduler documentation, inconsistency on which properties have prefix
locality.threshold.node and locality.threshold.rack should have the yarn.scheduler.fair prefix like the items before them
http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html
- YARN-910.
Major improvement reported by Sandy Ryza and fixed by Alejandro Abdelnur (nodemanager)
Allow auxiliary services to listen for container starts and completions
Making container start and completion events available to auxiliary services would allow them to be resource-aware. The auxiliary service would be able to notify a co-located service that is opportunistically using free capacity of allocation changes.
- YARN-906.
Major sub-task reported by Zhijie Shen and fixed by Zhijie Shen
Cancelling ContainerLaunch#call at KILLING causes that the container cannot be completed
See https://builds.apache.org/job/PreCommit-YARN-Build/1435//testReport/org.apache.hadoop.yarn.client.api.impl/TestNMClient/testNMClientNoCleanupOnStop/
- YARN-903.
Major bug reported by Abhishek Kapoor and fixed by Omkar Vinit Joshi (applications/distributed-shell)
DistributedShell throwing Errors in logs after successfull completion
I have tried running DistributedShell and also used ApplicationMaster of the same for my test.
The application is successfully running through logging some errors which would be useful to fix.
Below are the logs from NodeManager and ApplicationMasterode
Log Snippet for NodeManager
=============================
2013-07-07 13:39:18,787 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Connecting to ResourceManager at localhost/127.0.0.1:9990. current no. of attempts is 1
2013-07-07 13:39:19,050 INFO org.apache.hadoop.yarn.server.nodemanager.security.NMContainerTokenSecretManager: Rolling master-key for container-tokens, got key with id -325382586
2013-07-07 13:39:19,052 INFO org.apache.hadoop.yarn.server.nodemanager.security.NMTokenSecretManagerInNM: Rolling master-key for nm-tokens, got key with id :1005046570
2013-07-07 13:39:19,053 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Registered with ResourceManager as sunny-Inspiron:9993 with total resource of <memory:10240, vCores:8>
2013-07-07 13:39:19,053 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Notifying ContainerManager to unblock new container-requests
2013-07-07 13:39:35,256 INFO SecurityLogger.org.apache.hadoop.ipc.Server: Auth successful for appattempt_1373184544832_0001_000001 (auth:SIMPLE)
2013-07-07 13:39:35,492 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: Start request for container_1373184544832_0001_01_000001 by user sunny
2013-07-07 13:39:35,507 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: Creating a new application reference for app application_1373184544832_0001
2013-07-07 13:39:35,511 INFO org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=sunny IP=127.0.0.1 OPERATION=Start Container Request TARGET=ContainerManageImpl RESULT=SUCCESS APPID=application_1373184544832_0001 CONTAINERID=container_1373184544832_0001_01_000001
2013-07-07 13:39:35,511 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Application application_1373184544832_0001 transitioned from NEW to INITING
2013-07-07 13:39:35,512 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Adding container_1373184544832_0001_01_000001 to application application_1373184544832_0001
2013-07-07 13:39:35,518 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Application application_1373184544832_0001 transitioned from INITING to RUNNING
2013-07-07 13:39:35,528 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1373184544832_0001_01_000001 transitioned from NEW to LOCALIZING
2013-07-07 13:39:35,540 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource: Resource hdfs://localhost:9000/application/test.jar transitioned from INIT to DOWNLOADING
2013-07-07 13:39:35,540 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Created localizer for container_1373184544832_0001_01_000001
2013-07-07 13:39:35,675 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Writing credentials to the nmPrivate file /home/sunny/Hadoop2/hadoopdata/nodemanagerdata/nmPrivate/container_1373184544832_0001_01_000001.tokens. Credentials list:
2013-07-07 13:39:35,694 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Initializing user sunny
2013-07-07 13:39:35,803 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Copying from /home/sunny/Hadoop2/hadoopdata/nodemanagerdata/nmPrivate/container_1373184544832_0001_01_000001.tokens to /home/sunny/Hadoop2/hadoopdata/nodemanagerdata/usercache/sunny/appcache/application_1373184544832_0001/container_1373184544832_0001_01_000001.tokens
2013-07-07 13:39:35,803 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: CWD set to /home/sunny/Hadoop2/hadoopdata/nodemanagerdata/usercache/sunny/appcache/application_1373184544832_0001 = file:/home/sunny/Hadoop2/hadoopdata/nodemanagerdata/usercache/sunny/appcache/application_1373184544832_0001
2013-07-07 13:39:36,136 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Sending out status for container: container_id {, app_attempt_id {, application_id {, id: 1, cluster_timestamp: 1373184544832, }, attemptId: 1, }, id: 1, }, state: C_RUNNING, diagnostics: "", exit_status: -1000,
2013-07-07 13:39:36,406 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource: Resource hdfs://localhost:9000/application/test.jar transitioned from DOWNLOADING to LOCALIZED
2013-07-07 13:39:36,409 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1373184544832_0001_01_000001 transitioned from LOCALIZING to LOCALIZED
2013-07-07 13:39:36,524 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1373184544832_0001_01_000001 transitioned from LOCALIZED to RUNNING
2013-07-07 13:39:36,692 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: launchContainer: [bash, -c, /home/sunny/Hadoop2/hadoopdata/nodemanagerdata/usercache/sunny/appcache/application_1373184544832_0001/container_1373184544832_0001_01_000001/default_container_executor.sh]
2013-07-07 13:39:37,144 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Sending out status for container: container_id {, app_attempt_id {, application_id {, id: 1, cluster_timestamp: 1373184544832, }, attemptId: 1, }, id: 1, }, state: C_RUNNING, diagnostics: "", exit_status: -1000,
2013-07-07 13:39:38,147 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Sending out status for container: container_id {, app_attempt_id {, application_id {, id: 1, cluster_timestamp: 1373184544832, }, attemptId: 1, }, id: 1, }, state: C_RUNNING, diagnostics: "", exit_status: -1000,
2013-07-07 13:39:39,151 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Sending out status for container: container_id {, app_attempt_id {, application_id {, id: 1, cluster_timestamp: 1373184544832, }, attemptId: 1, }, id: 1, }, state: C_RUNNING, diagnostics: "", exit_status: -1000,
2013-07-07 13:39:39,209 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Starting resource-monitoring for container_1373184544832_0001_01_000001
2013-07-07 13:39:39,259 WARN org.apache.hadoop.yarn.util.ProcfsBasedProcessTree: Unexpected: procfs stat file is not in the expected format for process with pid 11552
2013-07-07 13:39:39,264 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Memory usage of ProcessTree 29524 for container-id container_1373184544832_0001_01_000001: 79.9 MB of 1 GB physical memory used; 2.2 GB of 2.1 GB virtual memory used
2013-07-07 13:39:39,645 INFO SecurityLogger.org.apache.hadoop.ipc.Server: Auth successful for appattempt_1373184544832_0001_000001 (auth:SIMPLE)
2013-07-07 13:39:39,651 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: Start request for container_1373184544832_0001_01_000002 by user sunny
2013-07-07 13:39:39,651 INFO org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=sunny IP=127.0.0.1 OPERATION=Start Container Request TARGET=ContainerManageImpl RESULT=SUCCESS APPID=application_1373184544832_0001 CONTAINERID=container_1373184544832_0001_01_000002
2013-07-07 13:39:39,651 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Adding container_1373184544832_0001_01_000002 to application application_1373184544832_0001
2013-07-07 13:39:39,652 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1373184544832_0001_01_000002 transitioned from NEW to LOCALIZED
2013-07-07 13:39:39,660 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: Getting container-status for container_1373184544832_0001_01_000002
2013-07-07 13:39:39,661 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: Returning container_id {, app_attempt_id {, application_id {, id: 1, cluster_timestamp: 1373184544832, }, attemptId: 1, }, id: 2, }, state: C_RUNNING, diagnostics: "", exit_status: -1000,
2013-07-07 13:39:39,728 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1373184544832_0001_01_000002 transitioned from LOCALIZED to RUNNING
2013-07-07 13:39:39,873 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: launchContainer: [bash, -c, /home/sunny/Hadoop2/hadoopdata/nodemanagerdata/usercache/sunny/appcache/application_1373184544832_0001/container_1373184544832_0001_01_000002/default_container_executor.sh]
2013-07-07 13:39:39,898 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: Container container_1373184544832_0001_01_000002 succeeded
2013-07-07 13:39:39,899 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1373184544832_0001_01_000002 transitioned from RUNNING to EXITED_WITH_SUCCESS
2013-07-07 13:39:39,900 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: Cleaning up container container_1373184544832_0001_01_000002
2013-07-07 13:39:39,942 INFO org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=sunny OPERATION=Container Finished - Succeeded TARGET=ContainerImpl RESULT=SUCCESS APPID=application_1373184544832_0001 CONTAINERID=container_1373184544832_0001_01_000002
2013-07-07 13:39:39,943 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1373184544832_0001_01_000002 transitioned from EXITED_WITH_SUCCESS to DONE
2013-07-07 13:39:39,944 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Removing container_1373184544832_0001_01_000002 from application application_1373184544832_0001
2013-07-07 13:39:40,155 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Sending out status for container: container_id {, app_attempt_id {, application_id {, id: 1, cluster_timestamp: 1373184544832, }, attemptId: 1, }, id: 1, }, state: C_RUNNING, diagnostics: "", exit_status: -1000,
2013-07-07 13:39:40,157 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Sending out status for container: container_id {, app_attempt_id {, application_id {, id: 1, cluster_timestamp: 1373184544832, }, attemptId: 1, }, id: 2, }, state: C_COMPLETE, diagnostics: "", exit_status: 0,
2013-07-07 13:39:40,158 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Removed completed container container_1373184544832_0001_01_000002
2013-07-07 13:39:40,683 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: Getting container-status for container_1373184544832_0001_01_000002
2013-07-07 13:39:40,686 ERROR org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:appattempt_1373184544832_0001_000001 (auth:TOKEN) cause:org.apache.hadoop.yarn.exceptions.YarnException: Container container_1373184544832_0001_01_000002 is not handled by this NodeManager
2013-07-07 13:39:40,687 INFO org.apache.hadoop.ipc.Server: IPC Server handler 4 on 9993, call org.apache.hadoop.yarn.api.ContainerManagementProtocolPB.stopContainer from 127.0.0.1:51085: error: org.apache.hadoop.yarn.exceptions.YarnException: Container container_1373184544832_0001_01_000002 is not handled by this NodeManager
org.apache.hadoop.yarn.exceptions.YarnException: Container container_1373184544832_0001_01_000002 is not handled by this NodeManager
at org.apache.hadoop.yarn.ipc.RPCUtil.getRemoteException(RPCUtil.java:45)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.authorizeGetAndStopContainerRequest(ContainerManagerImpl.java:614)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.stopContainer(ContainerManagerImpl.java:538)
at org.apache.hadoop.yarn.api.impl.pb.service.ContainerManagementProtocolPBServiceImpl.stopContainer(ContainerManagementProtocolPBServiceImpl.java:88)
at org.apache.hadoop.yarn.proto.ContainerManagementProtocol$ContainerManagementProtocolService$2.callBlockingMethod(ContainerManagementProtocol.java:85)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:605)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1033)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1868)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1864)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1489)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1862)
2013-07-07 13:39:41,162 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Sending out status for container: container_id {, app_attempt_id {, application_id {, id: 1, cluster_timestamp: 1373184544832, }, attemptId: 1, }, id: 1, }, state: C_RUNNING, diagnostics: "", exit_status: -1000,
2013-07-07 13:39:41,691 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: Container container_1373184544832_0001_01_000001 succeeded
2013-07-07 13:39:41,692 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1373184544832_0001_01_000001 transitioned from RUNNING to EXITED_WITH_SUCCESS
2013-07-07 13:39:41,692 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: Cleaning up container container_1373184544832_0001_01_000001
2013-07-07 13:39:41,714 INFO org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=sunny OPERATION=Container Finished - Succeeded TARGET=ContainerImpl RESULT=SUCCESS APPID=application_1373184544832_0001 CONTAINERID=container_1373184544832_0001_01_000001
2013-07-07 13:39:41,714 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1373184544832_0001_01_000001 transitioned from EXITED_WITH_SUCCESS to DONE
2013-07-07 13:39:41,714 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Removing container_1373184544832_0001_01_000001 from application application_1373184544832_0001
2013-07-07 13:39:42,166 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Sending out status for container: container_id {, app_attempt_id {, application_id {, id: 1, cluster_timestamp: 1373184544832, }, attemptId: 1, }, id: 1, }, state: C_COMPLETE, diagnostics: "", exit_status: 0,
2013-07-07 13:39:42,166 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Removed completed container container_1373184544832_0001_01_000001
2013-07-07 13:39:42,191 INFO SecurityLogger.org.apache.hadoop.ipc.Server: Auth successful for appattempt_1373184544832_0001_000001 (auth:SIMPLE)
2013-07-07 13:39:42,195 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: Getting container-status for container_1373184544832_0001_01_000001
2013-07-07 13:39:42,196 ERROR org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:appattempt_1373184544832_0001_000001 (auth:TOKEN) cause:org.apache.hadoop.yarn.exceptions.YarnException: Container container_1373184544832_0001_01_000001 is not handled by this NodeManager
2013-07-07 13:39:42,196 INFO org.apache.hadoop.ipc.Server: IPC Server handler 5 on 9993, call org.apache.hadoop.yarn.api.ContainerManagementProtocolPB.stopContainer from 127.0.0.1:51086: error: org.apache.hadoop.yarn.exceptions.YarnException: Container container_1373184544832_0001_01_000001 is not handled by this NodeManager
org.apache.hadoop.yarn.exceptions.YarnException: Container container_1373184544832_0001_01_000001 is not handled by this NodeManager
at org.apache.hadoop.yarn.ipc.RPCUtil.getRemoteException(RPCUtil.java:45)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.authorizeGetAndStopContainerRequest(ContainerManagerImpl.java:614)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.stopContainer(ContainerManagerImpl.java:538)
at org.apache.hadoop.yarn.api.impl.pb.service.ContainerManagementProtocolPBServiceImpl.stopContainer(ContainerManagementProtocolPBServiceImpl.java:88)
at org.apache.hadoop.yarn.proto.ContainerManagementProtocol$ContainerManagementProtocolService$2.callBlockingMethod(ContainerManagementProtocol.java:85)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:605)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1033)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1868)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1864)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1489)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1862)
2013-07-07 13:39:42,264 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Starting resource-monitoring for container_1373184544832_0001_01_000002
2013-07-07 13:39:42,265 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Stopping resource-monitoring for container_1373184544832_0001_01_000002
2013-07-07 13:39:42,265 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Stopping resource-monitoring for container_1373184544832_0001_01_000001
2013-07-07 13:39:43,173 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Application application_1373184544832_0001 transitioned from RUNNING to APPLICATION_RESOURCES_CLEANINGUP
2013-07-07 13:39:43,174 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: Got event APPLICATION_STOP for appId application_1373184544832_0001
2013-07-07 13:39:43,180 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Application application_1373184544832_0001 transitioned from APPLICATION_RESOURCES_CLEANINGUP to FINISHED
2013-07-07 13:39:43,180 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.loghandler.NonAggregatingLogHandler: Scheduling Log Deletion for application: application_1373184544832_0001, with delay of 10800 seconds
Log Snippet for Application Manager
==================================
13/07/07 13:39:36 INFO client.SimpleApplicationMaster: Initializing ApplicationMaster
13/07/07 13:39:37 INFO client.SimpleApplicationMaster: Application master for app, appId=1, clustertimestamp=1373184544832, attemptId=1
13/07/07 13:39:37 INFO client.SimpleApplicationMaster: Starting ApplicationMaster
13/07/07 13:39:37 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
13/07/07 13:39:37 INFO impl.NMClientAsyncImpl: Upper bound of the thread pool size is 500
13/07/07 13:39:37 INFO impl.ContainerManagementProtocolProxy: yarn.client.max-nodemanagers-proxies : 500
13/07/07 13:39:37 INFO client.SimpleApplicationMaster: Max mem capabililty of resources in this cluster 8192
13/07/07 13:39:37 INFO client.SimpleApplicationMaster: Requested container ask: Capability[<memory:100, vCores:0>]Priority[0]ContainerCount[1]
13/07/07 13:39:39 INFO client.SimpleApplicationMaster: Got response from RM for container ask, allocatedCnt=1
13/07/07 13:39:39 INFO client.SimpleApplicationMaster: Launching shell command on a new container., containerId=container_1373184544832_0001_01_000002, containerNode=sunny-Inspiron:9993, containerNodeURI=sunny-Inspiron:8042, containerResourceMemory1024
13/07/07 13:39:39 INFO client.SimpleApplicationMaster: Setting up container launch container for containerid=container_1373184544832_0001_01_000002
13/07/07 13:39:39 INFO impl.NMClientAsyncImpl: Processing Event EventType: START_CONTAINER for Container container_1373184544832_0001_01_000002
13/07/07 13:39:39 INFO impl.ContainerManagementProtocolProxy: Opening proxy : sunny-Inspiron:9993
13/07/07 13:39:39 INFO client.SimpleApplicationMaster: Succeeded to start Container container_1373184544832_0001_01_000002
13/07/07 13:39:39 INFO impl.NMClientAsyncImpl: Processing Event EventType: QUERY_CONTAINER for Container container_1373184544832_0001_01_000002
13/07/07 13:39:40 INFO client.SimpleApplicationMaster: Got response from RM for container ask, completedCnt=1
13/07/07 13:39:40 INFO client.SimpleApplicationMaster: Got container status for containerID=container_1373184544832_0001_01_000002, state=COMPLETE, exitStatus=0, diagnostics=
13/07/07 13:39:40 INFO client.SimpleApplicationMaster: Container completed successfully., containerId=container_1373184544832_0001_01_000002
13/07/07 13:39:40 INFO client.SimpleApplicationMaster: Application completed. Stopping running containers
13/07/07 13:39:40 ERROR impl.NMClientImpl: Failed to stop Container container_1373184544832_0001_01_000002when stopping NMClientImpl
13/07/07 13:39:40 INFO impl.ContainerManagementProtocolProxy: Closing proxy : sunny-Inspiron:9993
13/07/07 13:39:40 INFO client.SimpleApplicationMaster: Application completed. Signalling finish to RM
13/07/07 13:39:41 INFO impl.AMRMClientAsyncImpl: Interrupted while waiting for queue
java.lang.InterruptedException
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:1899)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1934)
at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399)
at org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl$CallbackHandlerThread.run(AMRMClientAsyncImpl.java:281)
13/07/07 13:39:41 INFO client.SimpleApplicationMaster: Application Master completed successfully. exiting
- YARN-881.
Major bug reported by Jian He and fixed by Jian He
Priority#compareTo method seems to be wrong.
if lower int value means higher priority, shouldn't we "return other.getPriority() - this.getPriority() "
- YARN-771.
Major sub-task reported by Bikas Saha and fixed by Junping Du
AMRMClient support for resource blacklisting
After YARN-750 AMRMClient should support blacklisting via the new YARN API's
- YARN-758.
Minor improvement reported by Bikas Saha and fixed by Karthik Kambatla
Augment MockNM to use multiple cores
YARN-757 got fixed by changing the scheduler from Fair to default (which is capacity).
- YARN-707.
Blocker improvement reported by Bikas Saha and fixed by Jason Lowe
Add user info in the YARN ClientToken
If user info is present in the client token then it can be used to do limited authz in the AM.
- YARN-696.
Major improvement reported by Trevor Lorimer and fixed by Trevor Lorimer (resourcemanager)
Enable multiple states to to be specified in Resource Manager apps REST call
Within the YARN Resource Manager REST API the GET call which returns all Applications can be filtered by a single State query parameter (http://<rm http address:port>/ws/v1/cluster/apps).
There are 8 possible states (New, Submitted, Accepted, Running, Finishing, Finished, Failed, Killed), if no state parameter is specified all states are returned, however if a sub-set of states is required then multiple REST calls are required (max. of 7).
The proposal is to be able to specify multiple states in a single REST call.
- YARN-643.
Major bug reported by Jian He and fixed by Xuan Gong
WHY appToken is removed both in BaseFinalTransition and AMUnregisteredTransition AND clientToken is removed in FinalTransition and not BaseFinalTransition
The jira is tracking why appToken and clientToAMToken is removed separately, and why they are distributed in different transitions, ideally there may be a common place where these two tokens can be removed at the same time.
- YARN-602.
Major bug reported by Xuan Gong and fixed by Kenji Kikushima
NodeManager should mandatorily set some Environment variables into every containers that it launches
NodeManager should mandatorily set some Environment variables into every containers that it launches, such as Environment.user, Environment.pwd. If both users and NodeManager set those variables, the value set by NM should be used
- YARN-589.
Major improvement reported by Sandy Ryza and fixed by Sandy Ryza (scheduler)
Expose a REST API for monitoring the fair scheduler
The fair scheduler should have an HTTP interface that exposes information such as applications per queue, fair shares, demands, current allocations.
- YARN-573.
Critical sub-task reported by Omkar Vinit Joshi and fixed by Omkar Vinit Joshi
Shared data structures in Public Localizer and Private Localizer are not Thread safe.
PublicLocalizer
1) pending accessed by addResource (part of event handling) and run method (as a part of PublicLocalizer.run() ).
PrivateLocalizer
1) pending accessed by addResource (part of event handling) and findNextResource (i.remove()). Also update method should be fixed. It too is sharing pending list.
- YARN-540.
Major sub-task reported by Jian He and fixed by Jian He (resourcemanager)
Race condition causing RM to potentially relaunch already unregistered AMs on RM restart
When job succeeds and successfully call finishApplicationMaster, RM shutdown and restart-dispatcher is stopped before it can process REMOVE_APP event. The next time RM comes back, it will reload the existing state files even though the job is succeeded
- YARN-502.
Major sub-task reported by Lohit Vijayarenu and fixed by Mayank Bansal
RM crash with NPE on NODE_REMOVED event with FairScheduler
While running some test and adding/removing nodes, we see RM crashed with the below exception. We are testing with fair scheduler and running hadoop-2.0.3-alpha
{noformat}
2013-03-22 18:54:27,015 INFO org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Deactivating Node YYYY:55680 as it is now LOST
2013-03-22 18:54:27,015 INFO org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: YYYY:55680 Node Transitioned from UNHEALTHY to LOST
2013-03-22 18:54:27,015 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in handling event type NODE_REMOVED to the scheduler
java.lang.NullPointerException
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeNode(FairScheduler.java:619)
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:856)
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:98)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:375)
at java.lang.Thread.run(Thread.java:662)
2013-03-22 18:54:27,016 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye..
2013-03-22 18:54:27,020 INFO org.mortbay.log: Stopped SelectChannelConnector@XXXX:50030
{noformat}
- YARN-337.
Major bug reported by Jason Lowe and fixed by Jason Lowe (resourcemanager)
RM handles killed application tracking URL poorly
When the ResourceManager kills an application, it leaves the proxy URL redirecting to the original tracking URL for the application even though the ApplicationMaster is no longer there to service it. It should redirect it somewhere more useful, like the RM's web page for the application, where the user can find that the application was killed and links to the AM logs.
In addition, sometimes the AM during teardown from the kill can attempt to unregister and provide an updated tracking URL, but unfortunately the RM has "forgotten" the AM due to the kill and refuses to process the unregistration. Instead it logs:
{noformat}
2013-01-09 17:37:49,671 [IPC Server handler 2 on 8030] ERROR
org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: AppAttemptId doesnt exist in cache appattempt_1357575694478_28614_000001
{noformat}
It should go ahead and process the unregistration to update the tracking URL since the application offered it.
- YARN-292.
Major sub-task reported by Devaraj K and fixed by Zhijie Shen (resourcemanager)
ResourceManager throws ArrayIndexOutOfBoundsException while handling CONTAINER_ALLOCATED for application attempt
{code:xml}
2012-12-26 08:41:15,030 ERROR org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler: Calling allocate on removed or non existant application appattempt_1356385141279_49525_000001
2012-12-26 08:41:15,031 ERROR org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in handling event type CONTAINER_ALLOCATED for applicationAttempt application_1356385141279_49525
java.lang.ArrayIndexOutOfBoundsException: 0
at java.util.Arrays$ArrayList.get(Arrays.java:3381)
at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AMContainerAllocatedTransition.transition(RMAppAttemptImpl.java:655)
at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AMContainerAllocatedTransition.transition(RMAppAttemptImpl.java:644)
at org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:357)
at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:298)
at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:490)
at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:80)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:433)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:414)
at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126)
at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75)
at java.lang.Thread.run(Thread.java:662)
{code}
- YARN-107.
Major bug reported by Devaraj K and fixed by Xuan Gong (resourcemanager)
ClientRMService.forceKillApplication() should handle the non-RUNNING applications properly
- MAPREDUCE-5497.
Major bug reported by Jian He and fixed by Jian He
'5s sleep' in MRAppMaster.shutDownJob is only needed before stopping ClientService
- MAPREDUCE-5493.
Blocker bug reported by Jason Lowe and fixed by Jason Lowe (mrv2)
In-memory map outputs can be leaked after shuffle completes
- MAPREDUCE-5483.
Major bug reported by Alejandro Abdelnur and fixed by Robert Kanter (distcp)
revert MAPREDUCE-5357
- MAPREDUCE-5478.
Minor improvement reported by Sandy Ryza and fixed by Sandy Ryza (examples)
TeraInputFormat unnecessarily defines its own FileSplit subclass
- MAPREDUCE-5476.
Blocker bug reported by Jian He and fixed by Jian He
Job can fail when RM restarts after staging dir is cleaned but before MR successfully unregister with RM
- MAPREDUCE-5475.
Blocker bug reported by Jason Lowe and fixed by Jason Lowe (mr-am , mrv2)
MRClientService does not verify ACLs properly
- MAPREDUCE-5470.
Major bug reported by Chris Nauroth and fixed by Sandy Ryza
LocalJobRunner does not work on Windows.
- MAPREDUCE-5468.
Blocker bug reported by Yesha Vora and fixed by Vinod Kumar Vavilapalli
AM recovery does not work for map only jobs
- MAPREDUCE-5466.
Blocker bug reported by Yesha Vora and fixed by Jian He
Historyserver does not refresh the result of restarted jobs after RM restart
- MAPREDUCE-5462.
Major sub-task reported by Sandy Ryza and fixed by Sandy Ryza (performance , task)
In map-side sort, swap entire meta entries instead of indexes for better cache performance
- MAPREDUCE-5454.
Major bug reported by Karthik Kambatla and fixed by Karthik Kambatla (test)
TestDFSIO fails intermittently on JDK7
- MAPREDUCE-5446.
Major bug reported by Jason Lowe and fixed by Jason Lowe (mrv2 , test)
TestJobHistoryEvents and TestJobHistoryParsing have race conditions
- MAPREDUCE-5441.
Major bug reported by Rohith Sharma K S and fixed by Jian He (applicationmaster , client)
JobClient exit whenever RM issue Reboot command to 1st attempt App Master.
- MAPREDUCE-5440.
Major bug reported by Robert Parker and fixed by Robert Parker (mrv2)
TestCopyCommitter Fails on JDK7
- MAPREDUCE-5428.
Major bug reported by Jason Lowe and fixed by Karthik Kambatla (jobhistoryserver , mrv2)
HistoryFileManager doesn't stop threads when service is stopped
- MAPREDUCE-5425.
Major bug reported by Ashwin Shankar and fixed by Robert Parker (jobhistoryserver)
Junit in TestJobHistoryServer failing in jdk 7
- MAPREDUCE-5414.
Major bug reported by Nemon Lou and fixed by Nemon Lou (test)
TestTaskAttempt fails jdk7 with NullPointerException
- MAPREDUCE-5385.
Blocker bug reported by Omkar Vinit Joshi and fixed by Omkar Vinit Joshi
JobContext cache files api are broken
- MAPREDUCE-5379.
Major improvement reported by Sandy Ryza and fixed by Karthik Kambatla (job submission , security)
Include token tracking ids in jobconf
- MAPREDUCE-5367.
Major improvement reported by Sandy Ryza and fixed by Sandy Ryza
Local jobs all use same local working directory
- MAPREDUCE-5358.
Major bug reported by Devaraj K and fixed by Devaraj K (mr-am)
MRAppMaster throws invalid transitions for JobImpl
- MAPREDUCE-5317.
Major bug reported by Ravi Prakash and fixed by Ravi Prakash (mrv2)
Stale files left behind for failed jobs
- MAPREDUCE-5251.
Major bug reported by Jason Lowe and fixed by Ashwin Shankar (mrv2)
Reducer should not implicate map attempt if it has insufficient space to fetch map output
- MAPREDUCE-5164.
Major bug reported by Nemon Lou and fixed by Nemon Lou
command "mapred job" and "mapred queue" omit HADOOP_CLIENT_OPTS
- MAPREDUCE-5020.
Major bug reported by Trevor Robinson and fixed by Trevor Robinson (client)
Compile failure with JDK8
- MAPREDUCE-5001.
Major bug reported by Brock Noland and fixed by Sandy Ryza
LocalJobRunner has race condition resulting in job failures
- MAPREDUCE-3193.
Major bug reported by Ramgopal N and fixed by Devaraj K (mrv1 , mrv2)
FileInputFormat doesn't read files recursively in the input path dir
- MAPREDUCE-1981.
Major improvement reported by Hairong Kuang and fixed by Hairong Kuang (job submission)
Improve getSplits performance by using listLocatedStatus
- HDFS-5199.
Major sub-task reported by Brandon Li and fixed by Brandon Li (nfs)
Add more debug trace for NFS READ and WRITE
- HDFS-5192.
Minor bug reported by Jing Zhao and fixed by Jing Zhao
NameNode may fail to start when dfs.client.test.drop.namenode.response.number is set
- HDFS-5159.
Major bug reported by Aaron T. Myers and fixed by Aaron T. Myers (namenode)
Secondary NameNode fails to checkpoint if error occurs downloading edits on first checkpoint
- HDFS-5150.
Blocker bug reported by Kihwal Lee and fixed by Kihwal Lee
Allow per NN SPN for internal SPNEGO.
- HDFS-5140.
Blocker bug reported by Arpit Gupta and fixed by Jing Zhao (ha)
Too many safemode monitor threads being created in the standby namenode causing it to fail with out of memory error
- HDFS-5136.
Major sub-task reported by Brandon Li and fixed by Brandon Li (nfs)
MNT EXPORT should give the full group list which can mount the exports
- HDFS-5132.
Blocker bug reported by Arpit Gupta and fixed by Kihwal Lee (namenode)
Deadlock in NameNode between SafeModeMonitor#run and DatanodeManager#handleHeartbeat
- HDFS-5128.
Critical improvement reported by Kihwal Lee and fixed by Kihwal Lee
Allow multiple net interfaces to be used with HA namenode RPC server
- HDFS-5124.
Blocker bug reported by Deepesh Khandelwal and fixed by Daryn Sharp (namenode)
DelegationTokenSecretManager#retrievePassword can cause deadlock in NameNode
- HDFS-5118.
Major new feature reported by Jing Zhao and fixed by Jing Zhao
Provide testing support for DFSClient to drop RPC responses
Used for testing when NameNode HA is enabled. Users can use a new configuration property "dfs.client.test.drop.namenode.response.number" to specify the number of responses that DFSClient will drop in each RPC call. This feature can help testing functionalities such as NameNode retry cache.
- HDFS-5111.
Minor bug reported by Jing Zhao and fixed by Jing Zhao (snapshots)
Remove duplicated error message for snapshot commands when processing invalid arguments
- HDFS-5110.
Major sub-task reported by Brandon Li and fixed by Brandon Li (nfs)
Change FSDataOutputStream to HdfsDataOutputStream for opened streams to fix type cast error
- HDFS-5107.
Major sub-task reported by Brandon Li and fixed by Brandon Li (nfs)
Fix array copy error in Readdir and Readdirplus responses
- HDFS-5106.
Minor bug reported by Chuan Liu and fixed by Chuan Liu (test)
TestDatanodeBlockScanner fails on Windows due to incorrect path format
- HDFS-5105.
Minor bug reported by Chuan Liu and fixed by Chuan Liu
TestFsck fails on Windows
- HDFS-5104.
Major sub-task reported by Brandon Li and fixed by Brandon Li (nfs)
Support dotdot name in NFS LOOKUP operation
- HDFS-5103.
Minor bug reported by Chuan Liu and fixed by Chuan Liu (test)
TestDirectoryScanner fails on Windows
- HDFS-5102.
Major bug reported by Aaron T. Myers and fixed by Jing Zhao (snapshots)
Snapshot names should not be allowed to contain slash characters
- HDFS-5100.
Minor bug reported by Chuan Liu and fixed by Chuan Liu (test)
TestNamenodeRetryCache fails on Windows due to incorrect cleanup
- HDFS-5099.
Major bug reported by Chuan Liu and fixed by Chuan Liu (namenode)
Namenode#copyEditLogSegmentsToSharedDir should close EditLogInputStreams upon finishing
- HDFS-5091.
Minor bug reported by Jing Zhao and fixed by Jing Zhao
Support for spnego keytab separate from the JournalNode keytab for secure HA
- HDFS-5085.
Major sub-task reported by Brandon Li and fixed by Jing Zhao (nfs)
Refactor o.a.h.nfs to support different types of authentications
- HDFS-5080.
Major bug reported by Jing Zhao and fixed by Jing Zhao (ha , qjm)
BootstrapStandby not working with QJM when the existing NN is active
- HDFS-5078.
Major sub-task reported by Brandon Li and fixed by Brandon Li (nfs)
Support file append in NFSv3 gateway to enable data streaming to HDFS
- HDFS-5076.
Minor new feature reported by Jing Zhao and fixed by Jing Zhao
Add MXBean methods to query NN's transaction information and JournalNode's journal status
- HDFS-5071.
Major sub-task reported by Kihwal Lee and fixed by Brandon Li (nfs)
Change hdfs-nfs parent project to hadoop-project
- HDFS-5069.
Major sub-task reported by Brandon Li and fixed by Brandon Li (nfs)
Include hadoop-nfs and hadoop-hdfs-nfs into hadoop dist for NFS deployment
- HDFS-5067.
Major sub-task reported by Brandon Li and fixed by Brandon Li (nfs)
Support symlink operations
- HDFS-5061.
Major improvement reported by Arpit Agarwal and fixed by Arpit Agarwal (namenode)
Make FSNameSystem#auditLoggers an unmodifiable list
- HDFS-5055.
Blocker bug reported by Allen Wittenauer and fixed by Vinay (namenode)
nn fails to download checkpointed image from snn in some setups
- HDFS-5047.
Major bug reported by Kihwal Lee and fixed by Robert Parker (namenode)
Supress logging of full stack trace of quota and lease exceptions
- HDFS-5045.
Minor improvement reported by Jing Zhao and fixed by Jing Zhao
Add more unit tests for retry cache to cover all AtMostOnce methods
- HDFS-5043.
Major bug reported by Brandon Li and fixed by Brandon Li
For HdfsFileStatus, set default value of childrenNum to -1 instead of 0 to avoid confusing applications
- HDFS-5028.
Major bug reported by zhaoyunjiong and fixed by zhaoyunjiong
LeaseRenewer throw java.util.ConcurrentModificationException when timeout
- HDFS-4993.
Major bug reported by Kihwal Lee and fixed by Robert Parker
fsck can fail if a file is renamed or deleted
- HDFS-4962.
Minor sub-task reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (nfs)
Use enum for nfs constants
- HDFS-4947.
Major sub-task reported by Brandon Li and fixed by Jing Zhao (nfs)
Add NFS server export table to control export by hostname or IP range
- HDFS-4926.
Trivial improvement reported by Joseph Lorenzini and fixed by Vivek Ganesan (namenode)
namenode webserver's page has a tooltip that is inconsistent with the datanode HTML link
- HDFS-4905.
Minor improvement reported by Arpit Agarwal and fixed by Arpit Agarwal (tools)
Add appendToFile command to "hdfs dfs"
- HDFS-4898.
Minor bug reported by Eric Sirianni and fixed by Tsz Wo (Nicholas), SZE (namenode)
BlockPlacementPolicyWithNodeGroup.chooseRemoteRack() fails to properly fallback to local rack
- HDFS-4763.
Major sub-task reported by Brandon Li and fixed by Brandon Li (nfs)
Add script changes/utility for starting NFS gateway
- HDFS-4680.
Major bug reported by Andrew Wang and fixed by Andrew Wang (namenode , security)
Audit logging of delegation tokens for MR tracing
- HDFS-4632.
Major bug reported by Chris Nauroth and fixed by Chuan Liu (test)
globStatus using backslash for escaping does not work on Windows
- HDFS-4594.
Minor bug reported by Arpit Gupta and fixed by Chris Nauroth (webhdfs)
WebHDFS open sets Content-Length header to what is specified by length parameter rather than how much data is actually returned.
- HDFS-4329.
Major bug reported by Andy Isaacson and fixed by Cristina L. Abad (hdfs-client)
DFSShell issues with directories with spaces in name
- HDFS-3245.
Major improvement reported by Todd Lipcon and fixed by Ravi Prakash (namenode)
Add metrics and web UI for cluster version summary
- HDFS-2933.
Major improvement reported by Philip Zeyliger and fixed by Vivek Ganesan (datanode)
Improve DataNode Web UI Index Page
- HADOOP-9962.
Major improvement reported by Roman Shaposhnik and fixed by Roman Shaposhnik (build)
in order to avoid dependency divergence within Hadoop itself lets enable DependencyConvergence
- HADOOP-9961.
Minor bug reported by Roman Shaposhnik and fixed by Roman Shaposhnik (build)
versions of a few transitive dependencies diverged between hadoop subprojects
- HADOOP-9960.
Blocker bug reported by Brock Noland and fixed by Karthik Kambatla
Upgrade Jersey version to 1.9
- HADOOP-9958.
Major bug reported by Andrew Wang and fixed by Andrew Wang
Add old constructor back to DelegationTokenInformation to unbreak downstream builds
- HADOOP-9945.
Minor improvement reported by Karthik Kambatla and fixed by Karthik Kambatla (ha)
HAServiceState should have a state for stopped services
- HADOOP-9944.
Blocker bug reported by Arun C Murthy and fixed by Arun C Murthy
RpcRequestHeaderProto defines callId as uint32 while ipc.Client.CONNECTION_CONTEXT_CALL_ID is signed (-3)
- HADOOP-9932.
Blocker bug reported by Kihwal Lee and fixed by Kihwal Lee
Improper synchronization in RetryCache
- HADOOP-9924.
Major bug reported by shanyu zhao and fixed by shanyu zhao (fs)
FileUtil.createJarWithClassPath() does not generate relative classpath correctly
- HADOOP-9918.
Minor improvement reported by Karthik Kambatla and fixed by Karthik Kambatla
Add addIfService() to CompositeService
- HADOOP-9916.
Minor bug reported by Binglin Chang and fixed by Binglin Chang
Race condition in ipc.Client causes TestIPC timeout
- HADOOP-9910.
Minor bug reported by André Kelpe and fixed by
proxy server start and stop documentation wrong
- HADOOP-9906.
Minor bug reported by Karthik Kambatla and fixed by Karthik Kambatla (ha)
Move HAZKUtil to o.a.h.util.ZKUtil and make inner-classes public
- HADOOP-9899.
Minor bug reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (security)
Remove the debug message added by HADOOP-8855
- HADOOP-9886.
Minor improvement reported by Arpit Gupta and fixed by Arpit Gupta
Turn warning message in RetryInvocationHandler to debug
- HADOOP-9880.
Blocker bug reported by Kihwal Lee and fixed by Daryn Sharp
SASL changes from HADOOP-9421 breaks Secure HA NN
- HADOOP-9879.
Minor improvement reported by Karthik Kambatla and fixed by Karthik Kambatla (build)
Move the version info of zookeeper dependencies to hadoop-project/pom
- HADOOP-9868.
Blocker bug reported by Daryn Sharp and fixed by Daryn Sharp (ipc)
Server must not advertise kerberos realm
- HADOOP-9858.
Trivial bug reported by Chris Nauroth and fixed by Chris Nauroth (fs)
Remove unused private RawLocalFileSystem#execCommand method from branch-2.
- HADOOP-9857.
Major bug reported by Chris Nauroth and fixed by Chris Nauroth (build , test)
Tests block and sometimes timeout on Windows due to invalid entropy source.
- HADOOP-9833.
Minor improvement reported by Steve Loughran and fixed by Kousuke Saruta (build)
move slf4j to version 1.7.5
- HADOOP-9831.
Minor improvement reported by Chris Nauroth and fixed by Chris Nauroth (bin)
Make checknative shell command accessible on Windows.
- HADOOP-9821.
Minor improvement reported by Tsuyoshi OZAWA and fixed by Tsuyoshi OZAWA
ClientId should have getMsb/getLsb methods
- HADOOP-9820.
Blocker bug reported by Daryn Sharp and fixed by Daryn Sharp (ipc , security)
RPCv9 wire protocol is insufficient to support multiplexing
- HADOOP-9806.
Major bug reported by Brandon Li and fixed by Brandon Li (nfs)
PortmapInterface should check if the procedure is out-of-range
- HADOOP-9803.
Minor improvement reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (ipc)
Add generic type parameter to RetryInvocationHandler
- HADOOP-9802.
Major improvement reported by Chris Nauroth and fixed by Chris Nauroth (io)
Support Snappy codec on Windows.
- HADOOP-9801.
Major bug reported by Chris Nauroth and fixed by Chris Nauroth (conf)
Configuration#writeXml uses platform defaulting encoding, which may mishandle multi-byte characters.
- HADOOP-9789.
Critical new feature reported by Daryn Sharp and fixed by Daryn Sharp (ipc , security)
Support server advertised kerberos principals
- HADOOP-9774.
Major bug reported by shanyu zhao and fixed by shanyu zhao (fs)
RawLocalFileSystem.listStatus() return absolute paths when input path is relative on Windows
- HADOOP-9768.
Major bug reported by Chris Nauroth and fixed by Chris Nauroth (fs)
chown and chgrp reject users and groups with spaces on platforms where spaces are otherwise acceptable
- HADOOP-9757.
Major bug reported by Jason Lowe and fixed by Cristina L. Abad (fs)
Har metadata cache can grow without limit
- HADOOP-9686.
Major improvement reported by Jason Lowe and fixed by Jason Lowe (conf)
Easy access to final parameters in Configuration
- HADOOP-9672.
Major improvement reported by Sandy Ryza and fixed by Sandy Ryza
Upgrade Avro dependency to 1.7.4
- HADOOP-9557.
Major bug reported by Lohit Vijayarenu and fixed by Lohit Vijayarenu (build)
hadoop-client excludes commons-httpclient
- HADOOP-9446.
Major improvement reported by Yu Gao and fixed by Yu Gao (security)
Support Kerberos HTTP SPNEGO authentication for non-SUN JDK
- HADOOP-9435.
Major bug reported by Tian Hong Wang and fixed by Tian Hong Wang (build)
Support building the JNI code against the IBM JVM
- HADOOP-9381.
Trivial bug reported by Keegan Witt and fixed by Keegan Witt
Document dfs cp -f option
- HADOOP-9315.
Major bug reported by Dennis Y and fixed by Chris Nauroth (build)
Port HADOOP-9249 hadoop-maven-plugins Clover fix to branch-2 to fix build failures
- HADOOP-8814.
Minor improvement reported by Brandon Li and fixed by Brandon Li (conf , fs , fs/s3 , ha , io , metrics , performance , record , security , util)
Inefficient comparison with the empty string. Use isEmpty() instead