Hadoop 2.7.1 Release Notes
These release notes include new developer and user-facing incompatibilities, features, and major improvements.
Changes since Hadoop 2.7.0
- YARN-3850.
Blocker bug reported by Varun Saxena and fixed by Varun Saxena (log-aggregation , nodemanager)
NM fails to read files from full disks which can lead to container logs being lost and other issues
- YARN-3842.
Critical bug reported by Karthik Kambatla and fixed by Robert Kanter
NMProxy should retry on NMNotYetReadyException
- YARN-3832.
Critical bug reported by Ranga Swamy and fixed by Brahma Reddy Battula (nodemanager)
Resource Localization fails on a cluster due to existing cache directories
- YARN-3809.
Major bug reported by Jun Gong and fixed by Jun Gong (resourcemanager)
Failed to launch new attempts because ApplicationMasterLauncher's threads all hang
- YARN-3804.
Critical bug reported by Bibin A Chundatt and fixed by Varun Saxena (resourcemanager)
Both RM are on standBy state when kerberos user not in yarn.admin.acl
- YARN-3764.
Blocker bug reported by Wangda Tan and fixed by Wangda Tan
CapacityScheduler should forbid moving LeafQueue from one parent to another
- YARN-3753.
Critical bug reported by Sumana Sathish and fixed by Jian He (yarn)
RM failed to come up with "java.io.IOException: Wait for ZKClient creation timed out"
- YARN-3733.
Blocker bug reported by Bibin A Chundatt and fixed by Rohith Sharma K S (resourcemanager)
Fix DominantRC#compare() does not work as expected if cluster resource is empty
- YARN-3725.
Blocker bug reported by Zhijie Shen and fixed by Zhijie Shen (resourcemanager , timelineserver)
App submission via REST API is broken in secure mode due to Timeline DT service address is empty
- YARN-3723.
Critical bug reported by Zhijie Shen and fixed by Zhijie Shen (timelineserver)
Need to clearly document primaryFilter and otherInfo value type
- YARN-3711.
Minor sub-task reported by Masatake Iwasaki and fixed by Masatake Iwasaki (documentation)
Documentation of ResourceManager HA should explain configurations about listen addresses
- YARN-3701.
Blocker bug reported by Zhijie Shen and fixed by Zhijie Shen (timelineserver)
Isolating the error of generating a single app report when getting all apps from generic history service
- YARN-3694.
Minor bug reported by Akira AJISAKA and fixed by Jagadesh Kiran N (documentation)
Fix dead link for TimelineServer REST API
- YARN-3686.
Critical sub-task reported by Wangda Tan and fixed by Sunil G (api , client , resourcemanager)
CapacityScheduler should trim default_node_label_expression
- YARN-3681.
Blocker bug reported by Sumana Sathish and fixed by Varun Saxena (yarn)
yarn cmd says "could not find main class 'queue'" in windows
- YARN-3677.
Minor bug reported by Akira AJISAKA and fixed by Vinod Kumar Vavilapalli (resourcemanager)
Fix findbugs warnings in yarn-server-resourcemanager
- YARN-3675.
Critical bug reported by Anubhav Dhoot and fixed by Anubhav Dhoot (fairscheduler)
FairScheduler: RM quits when node removal races with continousscheduling on the same node
- YARN-3646.
Major bug reported by Raju Bairishetti and fixed by Raju Bairishetti (client)
Applications are getting stuck some times in case of retry policy forever
- YARN-3626.
Major bug reported by Craig Welch and fixed by Craig Welch (yarn)
On Windows localized resources are not moved to the front of the classpath when they should be
- YARN-3614.
Critical bug reported by lachisis and fixed by (resourcemanager)
FileSystemRMStateStore throw exception when failed to remove application, that cause resourcemanager to crash
- YARN-3609.
Major sub-task reported by Wangda Tan and fixed by Wangda Tan (resourcemanager)
Move load labels from storage from serviceInit to serviceStart to make it works with RM HA case.
- YARN-3601.
Critical bug reported by Weiwei Yang and fixed by Weiwei Yang (resourcemanager , webapp)
Fix UT TestRMFailover.testRMWebAppRedirect
- YARN-3585.
Critical bug reported by Peng Zhang and fixed by Rohith Sharma K S
NodeManager cannot exit on SHUTDOWN event triggered and NM recovery is enabled
- YARN-3554.
Major bug reported by Jason Lowe and fixed by Naganarasimha G R
Default value for maximum nodemanager connect wait time is too high
- YARN-3544.
Blocker sub-task reported by Hitesh Shah and fixed by Xuan Gong
AM logs link missing in the RM UI for a completed app
- YARN-3539.
Major improvement reported by Steve Loughran and fixed by Steve Loughran (documentation)
Compatibility doc to state that ATS v1 is a stable REST API
- YARN-3537.
Major bug reported by Brahma Reddy Battula and fixed by Brahma Reddy Battula (nodemanager)
NPE when NodeManager.serviceInit fails and stopRecoveryStore invoked
- YARN-3526.
Major bug reported by Weiwei Yang and fixed by Weiwei Yang (resourcemanager , webapp)
ApplicationMaster tracking URL is incorrectly redirected on a QJM cluster
- YARN-3522.
Blocker bug reported by Zhijie Shen and fixed by Zhijie Shen (timelineserver)
DistributedShell uses the wrong user to put timeline data
- YARN-3516.
Minor bug reported by zhihai xu and fixed by zhihai xu (nodemanager)
killing ContainerLocalizer action doesn't take effect when private localizer receives FETCH_FAILURE status.
- YARN-3497.
Major bug reported by Jason Lowe and fixed by Jason Lowe (client)
ContainerManagementProtocolProxy modifies IPC timeout conf without making a copy
- YARN-3493.
Critical bug reported by Sumana Sathish and fixed by Jian He (yarn)
RM fails to come up with error "Failed to load/recover state" when mem settings are changed
- YARN-3487.
Critical sub-task reported by Jason Lowe and fixed by Jason Lowe (capacityscheduler)
CapacityScheduler scheduler lock obtained unnecessarily when calling getQueue
- YARN-3485.
Critical bug reported by Karthik Kambatla and fixed by Karthik Kambatla (fairscheduler)
FairScheduler headroom calculation doesn't consider maxResources for Fifo and FairShare policies
- YARN-3476.
Major bug reported by Jason Lowe and fixed by Rohith Sharma K S (log-aggregation , nodemanager)
Nodemanager can fail to delete local logs if log aggregation fails
- YARN-3472.
Major bug reported by Jian He and fixed by Rohith Sharma K S
Possible leak in DelegationTokenRenewer#allTokens
- YARN-3469.
Minor improvement reported by Jun Gong and fixed by Jun Gong
ZKRMStateStore: Avoid setting watches that are not required
- YARN-3466.
Major bug reported by Jason Lowe and fixed by Jason Lowe (resourcemanager , webapp)
Fix RM nodes web page to sort by node HTTP-address, #containers and node-label column
- YARN-3465.
Major bug reported by zhihai xu and fixed by zhihai xu (nodemanager)
Use LinkedHashMap to preserve order of resource requests
- YARN-3464.
Critical bug reported by zhihai xu and fixed by zhihai xu (nodemanager)
Race condition in LocalizerRunner kills localizer before localizing all resources
- YARN-3462.
Major bug reported by Sidharta Seethana and fixed by Naganarasimha G R
Patches applied for YARN-2424 are inconsistent between trunk and branch-2
- YARN-3457.
Minor bug reported by Bibin A Chundatt and fixed by Bibin A Chundatt (nodemanager)
NPE when NodeManager.serviceInit fails and stopRecoveryStore called
- YARN-3434.
Major bug reported by Thomas Graves and fixed by Thomas Graves (capacityscheduler)
Interaction between reservations and userlimit can result in significant ULF violation
- YARN-3385.
Critical bug reported by zhihai xu and fixed by zhihai xu (resourcemanager)
Race condition: KeeperException$NoNodeException will cause RM shutdown during ZK node deletion.
- YARN-3382.
Major bug reported by Rohit Agarwal and fixed by Rohit Agarwal (webapp)
Some of UserMetricsInfo metrics are incorrectly set to root queue metrics
- YARN-3358.
Minor bug reported by Varun Saxena and fixed by Varun Saxena (resourcemanager)
Audit log not present while refreshing Service ACLs
- YARN-3351.
Major bug reported by Anubhav Dhoot and fixed by Anubhav Dhoot (webapp)
AppMaster tracking URL is broken in HA
- YARN-3243.
Major bug reported by Wangda Tan and fixed by Wangda Tan (capacityscheduler , resourcemanager)
CapacityScheduler should pass headroom from parent to children to make sure ParentQueue obey its capacity limits.
- YARN-3193.
Minor improvement reported by Japs_123 and fixed by Steve Loughran (webapp)
When visit standby RM webui, it will redirect to the active RM webui slowly.
- YARN-3006.
Minor sub-task reported by Akira AJISAKA and fixed by Akira AJISAKA
Improve the error message when attempting manual failover with auto-failover enabled
- YARN-2918.
Major sub-task reported by Rohith Sharma K S and fixed by Wangda Tan (resourcemanager)
Don't fail RM if queue's configured labels are not existed in cluster-node-labels
- YARN-2900.
Major sub-task reported by Jonathan Eagles and fixed by Mit Desai (timelineserver)
Application (Attempt and Container) Not Found in AHS results in Internal Server Error (500)
- YARN-2605.
Major sub-task reported by bc Wong and fixed by Xuan Gong (resourcemanager)
[RM HA] Rest api endpoints doing redirect incorrectly
- YARN-2238.
Major bug reported by Sangjin Lee and fixed by Jian He (webapp)
filtering on UI sticks even if I move away from the page
- MAPREDUCE-6410.
Critical bug reported by Zhang Wei and fixed by Varun Saxena
Aggregated Logs Deletion doesnt work after refreshing Log Retention Settings in secure cluster
- MAPREDUCE-6387.
Minor bug reported by Arun Suresh and fixed by Arun Suresh
Serialize the recently added Task#encryptedSpillKey field at the end
- MAPREDUCE-6361.
Critical bug reported by Junping Du and fixed by Junping Du
NPE issue in shuffle caused by concurrent issue between copySucceeded() in one thread and copyFailed() in another thread on the same host
- MAPREDUCE-6339.
Critical bug reported by zhihai xu and fixed by zhihai xu (mrv2)
Job history file is not flushed correctly because isTimerActive flag is not set true when flushTimerTask is scheduled.
- MAPREDUCE-6334.
Blocker bug reported by Eric Payne and fixed by Eric Payne
Fetcher#copyMapOutput is leaking usedMemory upon IOException during InMemoryMapOutput shuffle handler
- MAPREDUCE-6324.
Blocker bug reported by Jason Lowe and fixed by Jason Lowe (mr-am)
Uber jobs fail to update AMRM token when it rolls over
- MAPREDUCE-6300.
Minor bug reported by Siqi Li and fixed by Siqi Li
Task list sort by task id broken
- MAPREDUCE-6259.
Major bug reported by zhihai xu and fixed by zhihai xu (jobhistoryserver)
IllegalArgumentException due to missing job submit time
- MAPREDUCE-6252.
Major bug reported by Craig Welch and fixed by Craig Welch (jobhistoryserver)
JobHistoryServer should not fail when encountering a missing directory
- MAPREDUCE-6251.
Major bug reported by Craig Welch and fixed by Craig Welch (jobhistoryserver , mrv2)
JobClient needs additional retries at a higher level to address not-immediately-consistent dfs corner cases
- MAPREDUCE-6238.
Critical bug reported by zhihai xu and fixed by zhihai xu (mrv2)
MR2 can't run local jobs with -libjars command options which is a regression from MR1
- HDFS-8681.
Blocker bug reported by Andrew Wang and fixed by Arpit Agarwal (datanode)
BlockScanner is incorrectly disabled by default
- HDFS-8633.
Minor bug reported by Ray Chiang and fixed by Ray Chiang (HDFS)
Fix setting of dfs.datanode.readahead.bytes in hdfs-default.xml to match DFSConfigKeys
- HDFS-8626.
Blocker bug reported by kanaka kumar avvaru and fixed by kanaka kumar avvaru
Reserved RBW space is not released if creation of RBW File fails
- HDFS-8600.
Major bug reported by Arpit Agarwal and fixed by Arpit Agarwal (webhdfs)
TestWebHdfsFileSystemContract.testGetFileBlockLocations fails in branch-2.7
- HDFS-8597.
Major sub-task reported by Xiaoyu Yao and fixed by Xiaoyu Yao (datanode , test)
Fix TestFSImage#testZeroBlockSize on Windows
- HDFS-8596.
Blocker bug reported by Yongjun Zhang and fixed by Yongjun Zhang (HDFS)
TestDistributedFileSystem et al tests are broken in branch-2 due to incorrect setting of "datanode" attribute
- HDFS-8595.
Major bug reported by Arpit Agarwal and fixed by Arpit Agarwal (test)
TestCommitBlockSynchronization fails in branch-2.7
- HDFS-8583.
Major bug reported by Arpit Agarwal and fixed by Arpit Agarwal (documentation)
Document that NFS gateway does not work with rpcbind on SLES 11
- HDFS-8576.
Major bug reported by J.Andreina and fixed by J.Andreina (namenode)
Lease recovery should return true if the lease can be released and the file can be closed
- HDFS-8572.
Blocker bug reported by Haohui Mai and fixed by Haohui Mai
DN always uses HTTP/localhost@REALM principals in SPNEGO
- HDFS-8566.
Major bug reported by Surendra Singh Lilhore and fixed by Surendra Singh Lilhore (documentation)
HDFS documentation about debug commands wrongly identifies them as "hdfs dfs" commands
- HDFS-8544.
Major bug reported by Brahma Reddy Battula and fixed by Brahma Reddy Battula (documentation)
Incorrect port specified in HFTP Guide document in branch-2
- HDFS-8523.
Major bug reported by J.Andreina and fixed by J.Andreina (documentation)
Remove usage information on unsupported operation "fsck -showprogress" from branch-2
- HDFS-8522.
Major bug reported by Xiaoyu Yao and fixed by Xiaoyu Yao (namenode)
Change heavily recorded NN logs from INFO to DEBUG level
- HDFS-8521.
Trivial improvement reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe
Add @VisibleForTesting annotation to {{BlockPoolSlice#selectReplicaToDelete}}
- HDFS-8486.
Blocker bug reported by Daryn Sharp and fixed by Daryn Sharp (datanode)
DN startup may cause severe data loss
- HDFS-8480.
Critical bug reported by Zhe Zhang and fixed by Zhe Zhang
Fix performance and timeout issues in HDFS-7929 by using hard-links to preserve old edit logs instead of copying them
- HDFS-8451.
Blocker bug reported by Steve Loughran and fixed by Steve Loughran (encryption)
DFSClient probe for encryption testing interprets empty URI property for "enabled"
- HDFS-8405.
Minor bug reported by Tsz Wo Nicholas Sze and fixed by Takanobu Asanuma (namenode)
Fix a typo in NamenodeFsck
- HDFS-8404.
Major bug reported by Nathan Roberts and fixed by Nathan Roberts (namenode)
Pending block replication can get stuck using older genstamp
- HDFS-8361.
Major improvement reported by Tsz Wo Nicholas Sze and fixed by Tsz Wo Nicholas Sze (namenode)
Choose SSD over DISK in block placement
- HDFS-8305.
Major bug reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe
HDFS INotify: the destination field of RenameOp should always end with the file name
- HDFS-8273.
Blocker bug reported by Jing Zhao and fixed by Haohui Mai (namenode)
FSNamesystem#Delete() should not call logSync() when holding the lock
- HDFS-8270.
Major bug reported by Andrey Stepachev and fixed by J.Andreina (hdfs-client)
create() always retried with hardcoded timeout when file already exists with open lease
Proxy level retries will not be done on AlreadyBeingCreatedExeption for create() op.
- HDFS-8269.
Blocker bug reported by Yesha Vora and fixed by Haohui Mai
getBlockLocations() does not resolve the .reserved path and generates incorrect edit logs when updating the atime
- HDFS-8245.
Major bug reported by Rushabh S Shah and fixed by Rushabh S Shah
Standby namenode doesn't process DELETED_BLOCK if the add block request is in edit log.
- HDFS-8226.
Blocker bug reported by J.Andreina and fixed by J.Andreina
Non-HA rollback compatibility broken
Non-HA rollback steps have been changed. Run the rollback command on the namenode (`bin/hdfs namenode -rollback`) before starting cluster with '-rollback' option using (sbin/start-dfs.sh -rollback).
- HDFS-8213.
Critical bug reported by Billie Rinaldi and fixed by Colin Patrick McCabe
DFSClient should use hdfs.client.htrace HTrace configuration prefix rather than hadoop.htrace
- HDFS-8204.
Minor improvement reported by Walter Su and fixed by Walter Su (balancer & mover)
Mover/Balancer should not schedule two replicas to the same DN
- HDFS-8179.
Blocker bug reported by Xiaoyu Yao and fixed by Xiaoyu Yao
DFSClient#getServerDefaults returns null within 1 hour of system start
- HDFS-8163.
Blocker bug reported by Arpit Agarwal and fixed by Arpit Agarwal (datanode)
Using monotonicNow for block report scheduling causes test failures on recently restarted systems
- HDFS-8153.
Major bug reported by Anu Engineer and fixed by Anu Engineer (namenode)
Error Message points to wrong parent directory in case of path component name length error
- HDFS-8151.
Minor bug reported by Sushmitha Sreenivasan and fixed by Jing Zhao (distcp)
Always use snapshot path as source when invalid snapshot names are used for diff based distcp
- HDFS-8149.
Major bug reported by Akira AJISAKA and fixed by Brahma Reddy Battula
The footer of the Web UI "Hadoop, 2014" is old
- HDFS-8147.
Major bug reported by Surendra Singh Lilhore and fixed by Surendra Singh Lilhore (balancer & mover)
Mover should not schedule two replicas to the same DN storage
- HDFS-8127.
Blocker bug reported by Jing Zhao and fixed by Jing Zhao (ha)
NameNode Failover during HA upgrade can cause DataNode to finalize upgrade
- HDFS-8091.
Major bug reported by Arun Suresh and fixed by Arun Suresh (HDFS)
ACLStatus and XAttributes not properly presented to INodeAttributesProvider before returning to client
- HDFS-8081.
Major bug reported by Konstantin Shvachko and fixed by Konstantin Shvachko
Split getAdditionalBlock() into two methods.
- HDFS-8070.
Blocker bug reported by Gopal V and fixed by Colin Patrick McCabe (caching)
Pre-HDFS-7915 DFSClient cannot use short circuit on post-HDFS-7915 DataNode
- HDFS-7980.
Major bug reported by Hui Zheng and fixed by Walter Su
Incremental BlockReport will dramatically slow down the startup of a namenode
- HDFS-7934.
Critical bug reported by J.Andreina and fixed by J.Andreina (documentation)
Update RollingUpgrade rollback documentation: should use bootstrapstandby for standby NN
- HDFS-7931.
Minor bug reported by Arun Suresh and fixed by Arun Suresh (hdfs-client)
DistributedFIleSystem should not look for keyProvider in cache if Encryption is disabled
- HDFS-7916.
Critical bug reported by Vinayakumar B and fixed by Rushabh S Shah (datanode)
'reportBadBlocks' from datanodes to standby Node BPServiceActor goes for infinite loop
- HDFS-7894.
Critical bug reported by Kihwal Lee and fixed by Brahma Reddy Battula
Rolling upgrade readiness is not updated in jmx until query command is issued.
- HDFS-7770.
Major improvement reported by Xiaoyu Yao and fixed by Xiaoyu Yao (documentation)
Need document for storage type label of data node storage locations under dfs.datanode.data.dir
- HDFS-7546.
Minor improvement reported by Harsh J and fixed by Harsh J (security)
Document, and set an accepting default for dfs.namenode.kerberos.principal.pattern
- HDFS-7164.
Major sub-task reported by Arpit Agarwal and fixed by Arpit Agarwal (documentation)
Feature documentation for HDFS-6581
- HDFS-6300.
Critical bug reported by Rakesh R and fixed by Rakesh R (balancer & mover)
Prevent multiple balancers from running simultaneously
- HDFS-5215.
Major bug reported by Brahma Reddy Battula and fixed by Brahma Reddy Battula (datanode)
dfs.datanode.du.reserved is not considered while computing available space
- HDFS-4660.
Blocker bug reported by Peng Zhang and fixed by Kihwal Lee (datanode)
Block corruption can happen during pipeline recovery
- HADOOP-12103.
Minor bug reported by Yongjun Zhang and fixed by Yongjun Zhang (security)
Small refactoring of DelegationTokenAuthenticationFilter to allow code sharing
- HADOOP-12100.
Major bug reported by Robert Kanter and fixed by Bibin A Chundatt
ImmutableFsPermission should not override applyUmask since that method doesn't modify the FsPermission
- HADOOP-12078.
Critical bug reported by Arpit Agarwal and fixed by Arpit Agarwal (ipc)
The default retry policy does not handle RetriableException correctly
- HADOOP-12058.
Minor bug reported by Kazuho Fujii and fixed by Kazuho Fujii (documentation , site)
Fix dead links to DistCp and Hadoop Archives pages.
- HADOOP-11973.
Major bug reported by Gregory Chanan and fixed by Gregory Chanan (security)
Ensure ZkDelegationTokenSecretManager namespace znodes get created with ACLs
- HADOOP-11966.
Critical bug reported by Chris Nauroth and fixed by Chris Nauroth (scripts)
Variable cygwin is undefined in hadoop-config.sh when executed through hadoop-daemon.sh.
- HADOOP-11934.
Blocker bug reported by Mike Yoder and fixed by Larry McCay (security)
Use of JavaKeyStoreProvider in LdapGroupsMapping causes infinite loop
- HADOOP-11891.
Major bug reported by Arun Suresh and fixed by Arun Suresh (security)
OsSecureRandom should lazily fill its reservoir
- HADOOP-11872.
Minor bug reported by Varun Vasudev and fixed by Varun Vasudev (scripts)
"hadoop dfs" command prints message about using "yarn jar" on Windows(branch-2 only)
- HADOOP-11868.
Major bug reported by Chang Li and fixed by Chang Li
Invalid user logins trigger large backtraces in server log
- HADOOP-11851.
Minor improvement reported by Steve Loughran and fixed by Takenori Sato (fs/s3)
s3n to swallow IOEs on inner stream close
- HADOOP-11802.
Major bug reported by Eric Payne and fixed by Colin Patrick McCabe
DomainSocketWatcher thread terminates sometimes after there is an I/O error during requestShortCircuitShm
- HADOOP-11730.
Major bug reported by Takenori Sato and fixed by Takenori Sato (fs/s3)
Regression: s3n read failure recovery broken
- HADOOP-11663.
Minor bug reported by Masatake Iwasaki and fixed by Masatake Iwasaki (documentation)
Remove description about Java 6 from docs
- HADOOP-9658.
Major bug reported by Zhijie Shen and fixed by Zhijie Shen
SnappyCodec#checkNativeCodeLoaded may unexpectedly fail when native code is not loaded