Hadoop 1.2.1 Release Notes
These release notes include new developer and user-facing incompatibilities, features, and major improvements.
Changes since Hadoop 1.2.0
Jiras with Release Notes (describe major or incompatible changes)
- MAPREDUCE-3859.
Major bug reported by sergeant and fixed by sergeant (capacity-sched)
CapacityScheduler incorrectly utilizes extra-resources of queue for high-memory jobs
Fixed wrong CapacityScheduler resource allocation for high memory consumption jobs
Other Jiras (describe bug fixes and minor changes)
- HADOOP-9504.
Critical bug reported by xieliang007 and fixed by xieliang007 (metrics)
MetricsDynamicMBeanBase has concurrency issues in createMBeanInfo
Please see HBASE-8416 for detail information.
we need to take care of the synchronization for HashMap put(), otherwise it may lead to spin loop.
- HADOOP-9665.
Critical bug reported by zjshen and fixed by zjshen
BlockDecompressorStream#decompress will throw EOFException instead of return -1 when EOF
BlockDecompressorStream#decompress ultimately calls rawReadInt, which will throw EOFException instead of return -1 when encountering end of a stream. Then, decompress will be called by read. However, InputStream#read is supposed to return -1 instead of throwing EOFException to indicate the end of a stream. This explains why in LineReader,
{code}
if (bufferPosn >= bufferLength) {
startPosn = bufferPosn = 0;
if (prevCharCR)
++bytesConsumed; //account for CR from ...
- HADOOP-9730.
Major bug reported by gkesavan and fixed by gkesavan (build)
fix hadoop.spec to add task-log4j.properties
- HDFS-4261.
Major bug reported by szetszwo and fixed by djp (balancer)
TestBalancerWithNodeGroup times out
When I manually ran TestBalancerWithNodeGroup, it always timed out in my machine. Looking at the Jerkins report [build #3573|https://builds.apache.org/job/PreCommit-HDFS-Build/3573//testReport/org.apache.hadoop.hdfs.server.balancer/], TestBalancerWithNodeGroup somehow was skipped so that the problem was not detected.
- HDFS-4581.
Major bug reported by rohit_kochar and fixed by rohit_kochar (datanode)
DataNode#checkDiskError should not be called on network errors
- HDFS-4699.
Major bug reported by cnauroth and fixed by cnauroth (test)
TestPipelinesFailover#testPipelineRecoveryStress fails sporadically
I have seen {{TestPipelinesFailover#testPipelineRecoveryStress}} fail sporadically due to timeout during {{loopRecoverLease}}, which waits for up to 30 seconds before timing out.
- HDFS-4880.
Major bug reported by arpitagarwal and fixed by sureshms (namenode)
Diagnostic logging while loading name/edits files
Add some minimal diagnostic logging to help determine location of the files being loaded.
- MAPREDUCE-4838.
Major improvement reported by acmurthy and fixed by zjshen
Add extra info to JH files
It will be useful to add more task-info to JH for analytics.
- MAPREDUCE-5148.
Major bug reported by yeshavora and fixed by acmurthy (tasktracker)
Syslog missing from Map/Reduce tasks
MAPREDUCE-4970 introduced incompatible change and causes syslog to be missing from tasktracker on old clusters which just have log4j.properties configured
- MAPREDUCE-5206.
Minor bug reported by acmurthy and fixed by acmurthy
JT can show the same job multiple times in Retired Jobs section
JT can show the same job multiple times in Retired Jobs section since the RetireJobs thread has a bug which adds the same job multiple times to collection of retired jobs.
- MAPREDUCE-5256.
Major bug reported by vinodkv and fixed by vinodkv
CombineInputFormat isn't thread safe affecting HiveServer
This was originally fixed as part of MAPREDUCE-5038, but that got reverted now. Which uncovers this issue, breaking HiveServer. Originally reported by [~thejas].
- MAPREDUCE-5260.
Major bug reported by zhaoyunjiong and fixed by zhaoyunjiong (tasktracker)
Job failed because of JvmManager running into inconsistent state
In our cluster, jobs failed due to randomly task initialization failed because of JvmManager running into inconsistent state and TaskTracker failed to exit:
java.lang.Throwable: Child Error
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:271)
Caused by: java.lang.NullPointerException
at org.apache.hadoop.mapred.JvmManager$JvmManagerForType.getDetails(JvmManager.java:402)
at org.apache.hadoop.mapred.JvmManager$JvmManagerForType.reapJvm(JvmManager.java:387)
at org.apache.hadoop....
- MAPREDUCE-5318.
Minor bug reported by bohou and fixed by bohou (jobtracker)
Ampersand in JSPUtil.java is not escaped
The malformed urls cause hue crash. The malformed urls are caused by the unescaped ampersand "&".
- MAPREDUCE-5351.
Critical bug reported by sandyr and fixed by sandyr (jobtracker)
JobTracker memory leak caused by CleanupQueue reopening FileSystem
When a job is completed, closeAllForUGI is called to close all the cached FileSystems in the FileSystem cache. However, the CleanupQueue may run after this occurs and call FileSystem.get() to delete the staging directory, adding a FileSystem to the cache that will never be closed.
People on the user-list have reported this causing their JobTrackers to OOME every two weeks.
- MAPREDUCE-5364.
Major bug reported by kkambatl and fixed by kkambatl
Deadlock between RenewalTimerTask methods cancel() and run()
MAPREDUCE-4860 introduced a local variable {{cancelled}} in {{RenewalTimerTask}} to fix the race where {{DelegationTokenRenewal}} attempts to renew a token even after the job is removed. However, the patch also makes {{run()}} and {{cancel()}} synchronized methods leading to a potential deadlock against {{run()}}'s catch-block (error-path).
The deadlock stacks below:
{noformat}
- org.apache.hadoop.mapreduce.security.token.DelegationTokenRenewal$RenewalTimerTask.cancel() @bci=0, line=240 (I...
- MAPREDUCE-5368.
Major improvement reported by zhaoyunjiong and fixed by zhaoyunjiong (mrv1)
Save memory by set capacity, load factor and concurrency level for ConcurrentHashMap in TaskInProgress
Below is histo from our JobTracker:
num #instances #bytes class name
----------------------------------------------
1: 136048824 11347237456 [C
2: 124156992 5959535616 java.util.concurrent.locks.ReentrantLock$NonfairSync
3: 124156973 5959534704 java.util.concurrent.ConcurrentHashMap$Segment
4: 135887753 5435510120 java.lang.String
5: 124213692 3975044400 [Ljava.util.concurrent.ConcurrentHashMap$HashEntry;
6: 637...
- MAPREDUCE-5375.
Critical bug reported by venkatnrangan and fixed by venkatnrangan
Delegation Token renewal exception in jobtracker logs
Filing on behalf of [~venkatnrangan] who found this originally and provided a patch.
Saw this in the JT logs while oozie tests were running with Hadoop.
When Oozie java action is executed, the following shows up in the job tracker log.
{code}
ERROR org.apache.hadoop.mapreduce.security.token.DelegationTokenRenewal: Exception renewing tokenIdent: 00 07 68 64 70 75 73 65 72 06 6d 61 70 72 65 64 26 6f 6f 7a 69 65 2f 63 6f 6e 64 6f 72 2d 73 65 63 2e 76 65 6e 6b 61 74 2e 6f 72 67 40 76 65 6e 6b ...
Changes since Hadoop 1.1.2
Jiras with Release Notes (describe major or incompatible changes)
- HADOOP-7698.
Critical bug reported by daryn and fixed by daryn (build)
jsvc target fails on x86_64
The jsvc build target is now supported for Mac OSX and other platforms as well.
- HADOOP-8164.
Major sub-task reported by sureshms and fixed by daryn (fs)
Handle paths using back slash as path separator for windows only
This jira only allows providing paths using back slash as separator on Windows. The back slash on *nix system will be used as escape character. The support for paths using back slash as path separator will be removed in HADOOP-8139 in release 23.3.
- HADOOP-8817.
Major sub-task reported by djp and fixed by djp
Backport Network Topology Extension for Virtualization (HADOOP-8468) to branch-1
A new 4-layer network topology NetworkToplogyWithNodeGroup is available to make Hadoop more robust and efficient in virtualized environment.
- HADOOP-8971.
Major improvement reported by gopalv and fixed by gopalv (util)
Backport: hadoop.util.PureJavaCrc32 cache hit-ratio is low for static data (HADOOP-8926)
Backport cache-aware improvements for PureJavaCrc32 from trunk (HADOOP-8926)
- HDFS-385.
Major improvement reported by dhruba and fixed by dhruba
Design a pluggable interface to place replicas of blocks in HDFS
New experimental API BlockPlacementPolicy allows investigating alternate rules for locating block replicas.
- HDFS-3697.
Minor improvement reported by tlipcon and fixed by tlipcon (datanode, performance)
Enable fadvise readahead by default
The datanode now performs 4MB readahead by default when reading data from its disks, if the native libraries are present. This has been shown to improve performance in many workloads. The feature may be disabled by setting dfs.datanode.readahead.bytes to "0".
- HDFS-4071.
Minor sub-task reported by jingzhao and fixed by jingzhao (datanode, namenode)
Add number of stale DataNodes to metrics for Branch-1
This jira adds a new metric with name "StaleDataNodes" under metrics context "dfs" of type Gauge. This tracks the number of DataNodes marked as stale. A DataNode is marked stale when the heartbeat message from the DataNode is not received within the configured time ""dfs.namenode.stale.datanode.interval".
Please see hdfs-default.xml documentation corresponding to "dfs.namenode.stale.datanode.interval" for more details on how to configure this feature. When this feature is not configured, this metrics would return zero.
- HDFS-4122.
Major bug reported by sureshms and fixed by sureshms (datanode, hdfs-client, namenode)
Cleanup HDFS logs and reduce the size of logged messages
The change from this jira changes the content of some of the log messages. No log message are removed. Only the content of the log messages is changed to reduce the size. If you have a tool that depends on the exact content of the log, please look at the patch and make appropriate updates to the tool.
- HDFS-4320.
Major improvement reported by mostafae and fixed by mostafae (datanode, namenode)
Add a separate configuration for namenode rpc address instead of only using fs.default.name
The namenode RPC address is currently identified from configuration "fs.default.name". In some setups where default FS is other than HDFS, the "fs.default.name" cannot be used to get the namenode address. When such a setup co-exists with HDFS, with this change namenode can be identified using a separate configuration parameter "dfs.namenode.rpc-address".
"dfs.namenode.rpc-address", when configured, overrides fs.default.name for identifying namenode RPC address.
- HDFS-4337.
Major bug reported by djp and fixed by mgong@vmware.com (namenode)
Backport HDFS-4240 to branch-1: Make sure nodes are avoided to place replica if some replica are already under the same nodegroup.
Backport HDFS-4240 to branch-1
- HDFS-4350.
Major bug reported by andrew.wang and fixed by andrew.wang
Make enabling of stale marking on read and write paths independent
This patch makes an incompatible configuration change, as described below:
In releases 1.1.0 and other point releases 1.1.x, the configuration parameter "dfs.namenode.check.stale.datanode" could be used to turn on checking for the stale nodes. This configuration is no longer supported in release 1.2.0 onwards and is renamed as "dfs.namenode.avoid.read.stale.datanode".
How feature works and configuring this feature:
As described in HDFS-3703 release notes, datanode stale period can be configured using parameter "dfs.namenode.stale.datanode.interval" in seconds (default value is 30 seconds). NameNode can be configured to use this staleness information for reads using configuration "dfs.namenode.avoid.read.stale.datanode". When this parameter is set to true, namenode picks a stale datanode as the last target to read from when returning block locations for reads. Using staleness information for writes is as described in the releases notes of HDFS-3912.
- HDFS-4519.
Major bug reported by cnauroth and fixed by cnauroth (datanode, scripts)
Support override of jsvc binary and log file locations when launching secure datanode.
With this improvement the following options are available in release 1.2.0 and later on 1.x release stream:
1. jsvc location can be overridden by setting environment variable JSVC_HOME. Defaults to jsvc binary packaged within the Hadoop distro.
2. jsvc log output is directed to the file defined by JSVC_OUTFILE. Defaults to $HADOOP_LOG_DIR/jsvc.out.
3. jsvc error output is directed to the file defined by JSVC_ERRFILE file. Defaults to $HADOOP_LOG_DIR/jsvc.err.
With this improvement the following options are available in release 2.0.4 and later on 2.x release stream:
1. jsvc log output is directed to the file defined by JSVC_OUTFILE. Defaults to $HADOOP_LOG_DIR/jsvc.out.
2. jsvc error output is directed to the file defined by JSVC_ERRFILE file. Defaults to $HADOOP_LOG_DIR/jsvc.err.
For overriding jsvc location on 2.x releases, here is the release notes from HDFS-2303:
To run secure Datanodes users must install jsvc for their platform and set JSVC_HOME to point to the location of jsvc in their environment.
- MAPREDUCE-3678.
Major new feature reported by bejoyks and fixed by qwertymaniac (mrv1, mrv2)
The Map tasks logs should have the value of input split it processed
A map-task's syslogs now carries basic info on the InputSplit it processed.
- MAPREDUCE-4415.
Major improvement reported by qwertymaniac and fixed by qwertymaniac (mrv1)
Backport the Job.getInstance methods from MAPREDUCE-1505 to branch-1
Backported new APIs to get a Job object to 1.2.0 from 2.0.0. Job API static methods Job.getInstance(), Job.getInstance(Configuration) and Job.getInstance(Configuration, jobName) are now available across both releases to avoid porting pain.
- MAPREDUCE-4451.
Major bug reported by erik.fang and fixed by erik.fang (contrib/fair-share)
fairscheduler fail to init job with kerberos authentication configured
Using FairScheduler with security configured, job initialization fails. The problem is that threads in JobInitializer runs as RPC user instead of jobtracker, pre-start all the threads fix this bug
- MAPREDUCE-4565.
Major improvement reported by kkambatl and fixed by kkambatl
Backport MR-2855 to branch-1: ResourceBundle lookup during counter name resolution takes a lot of time
Passing a cached class-loader to ResourceBundle creator to minimize counter names lookup time.
- MAPREDUCE-4737.
Major bug reported by daijy and fixed by acmurthy
Hadoop does not close output file / does not call Mapper.cleanup if exception in map
Ensure that mapreduce APIs are semantically consistent with mapred API w.r.t Mapper.cleanup and Reducer.cleanup; in the sense that cleanup is now called even if there is an error. The old mapred API already ensures that Mapper.close and Reducer.close are invoked during error handling. Note that it is an incompatible change, however end-users can override Mapper.run and Reducer.run to get the old (inconsistent) behaviour.
Other Jiras (describe bug fixes and minor changes)
- HADOOP-6496.
Minor bug reported by lars_francke and fixed by ivanmi
HttpServer sends wrong content-type for CSS files (and others)
CSS files are send as text/html causing problems if the HTML page is rendered in standards mode. The HDFS interface for example still works because it is rendered in quirks mode, the HBase interface doesn't work because it is rendered in standards mode. See HBASE-2110 for more details.
I've had a quick look at HttpServer but I'm too unfamiliar with it to see the problem. I think this started happening with HADOOP-6441 which would lead me to believe that the filter is called for every request...
- HADOOP-7096.
Major improvement reported by ahmed.radwan and fixed by ahmed.radwan
Allow setting of end-of-record delimiter for TextInputFormat
The patch for https://issues.apache.org/jira/browse/MAPREDUCE-2254 required minor changes to the LineReader class to allow extensions (see attached 2.patch). Description copied below:
It will be useful to allow setting the end-of-record delimiter for TextInputFormat. The current implementation hardcodes '\n', '\r' or '\r\n' as the only possible record delimiters. This is a problem if users have embedded newlines in their data fields (which is pretty common). This is also a problem for other ...
- HADOOP-7101.
Blocker bug reported by tlipcon and fixed by tlipcon (security)
UserGroupInformation.getCurrentUser() fails when called from non-Hadoop JAAS context
If a Hadoop client is run from inside a container like Tomcat, and the current AccessControlContext has a Subject associated with it that is not created by Hadoop, then UserGroupInformation.getCurrentUser() will throw NoSuchElementException, since it assumes that any Subject will have a hadoop User principal.
- HADOOP-7688.
Major improvement reported by szetszwo and fixed by umamaheswararao
When a servlet filter throws an exception in init(..), the Jetty server failed silently.
When a servlet filter throws a ServletException in init(..), the exception is logged by Jetty but not re-throws to the caller. As a result, the Jetty server failed silently.
- HADOOP-7754.
Major sub-task reported by tlipcon and fixed by tlipcon (native, performance)
Expose file descriptors from Hadoop-wrapped local FileSystems
In HADOOP-7714, we determined that using fadvise inside of the MapReduce shuffle can yield very good performance improvements. But many parts of the shuffle are FileSystem-agnostic and thus operate on FSDataInputStreams and RawLocalFileSystems. This JIRA is to figure out how to allow RawLocalFileSystem to expose its FileDescriptor object without unnecessarily polluting the public APIs.
- HADOOP-7827.
Trivial bug reported by davevr and fixed by davevr
jsp pages missing DOCTYPE
The various jsp pages in the UI are all missing a DOCTYPE declaration. This causes the pages to render incorrectly on some browsers, such as IE9. Every UI page should have a valid tag, such as <!DOCTYPE HTML>, as their first line. There are 31 files that need to be changed, all in the core\src\webapps tree.
- HADOOP-7836.
Minor bug reported by eli and fixed by daryn (ipc, test)
TestSaslRPC#testDigestAuthMethodHostBasedToken fails with hostname localhost.localdomain
TestSaslRPC#testDigestAuthMethodHostBasedToken fails on branch-1 on some hosts.
null expected:<localhost[]> but was:<localhost[.localdomain]>
junit.framework.ComparisonFailure: null expected:<localhost[]> but was:<localhost[.localdomain]>
null expected:<[localhost]> but was:<[eli-thinkpad]>
junit.framework.ComparisonFailure: null expected:<[localhost]> but was:<[eli-thinkpad]>
- HADOOP-7868.
Major bug reported by javacruft and fixed by scurrilous (native)
Hadoop native fails to compile when default linker option is -Wl,--as-needed
Recent releases of Ubuntu and Debian have switched to using --as-needed as default when linking binaries.
As a result the AC_COMPUTE_NEEDED_DSO fails to find the required DSO names during execution of configure resulting in a build failure.
Explicitly using "-Wl,--no-as-needed" in this macro when required resolves this issue.
See http://wiki.debian.org/ToolChain/DSOLinking for a few more details
- HADOOP-8023.
Critical new feature reported by tucu00 and fixed by tucu00 (conf)
Add unset() method to Configuration
HADOOP-7001 introduced the *Configuration.unset(String)* method.
MAPREDUCE-3727 requires that method in order to be back-ported.
This is required to fix an issue manifested when running MR/Hive/Sqoop jobs from Oozie, details are in MAPREDUCE-3727.
- HADOOP-8249.
Major bug reported by bcwalrus and fixed by tucu00 (security)
invalid hadoop-auth cookies should trigger authentication if info is avail before returning HTTP 401
WebHdfs gives out cookies. But when the client passes them back, it'd sometimes reject them and return a HTTP 401 instead. ("Sometimes" as in after a restart.) The interesting thing is that if the client doesn't pass the cookie back, WebHdfs will be totally happy.
The correct behaviour should be to ignore the cookie if it looks invalid, and attempt to proceed with the request handling.
I haven't tried HttpFs to see whether it handles restart better.
Reproducing it with curl:
{noformat}
###...
- HADOOP-8355.
Minor bug reported by tucu00 and fixed by tucu00 (security)
SPNEGO filter throws/logs exception when authentication fails
if the auth-token is NULL means the authenticator has not authenticated the request and it has already issue an UNAUTHORIZED response, there is no need to throw an exception and then immediately catch it and log it. The 'else throw' can be removed.
- HADOOP-8386.
Major bug reported by cberner and fixed by cberner (scripts)
hadoop script doesn't work if 'cd' prints to stdout (default behavior in Ubuntu)
if the 'hadoop' script is run as 'bin/hadoop' on a distro where the 'cd' command prints to stdout, the script will fail due to this line: 'bin=`cd "$bin"; pwd`'
Workaround: execute from the bin/ directory as './hadoop'
Fix: change that line to 'bin=`cd "$bin" > /dev/null; pwd`'
- HADOOP-8423.
Major bug reported by jason98 and fixed by tlipcon (io)
MapFile.Reader.get() crashes jvm or throws EOFException on Snappy or LZO block-compressed data
I am using Cloudera distribution cdh3u1.
When trying to check native codecs for better decompression
performance such as Snappy or LZO, I ran into issues with random
access using MapFile.Reader.get(key, value) method.
First call of MapFile.Reader.get() works but a second call fails.
Also I am getting different exceptions depending on number of entries
in a map file.
With LzoCodec and 10 record file, jvm gets aborted.
At the same time the DefaultCodec works fine for all cases, as well as
r...
- HADOOP-8460.
Major bug reported by revans2 and fixed by revans2 (documentation)
Document proper setting of HADOOP_PID_DIR and HADOOP_SECURE_DN_PID_DIR
We should document that in a properly setup cluster HADOOP_PID_DIR and HADOOP_SECURE_DN_PID_DIR should not point to /tmp, but should point to a directory that normal users do not have access to.
- HADOOP-8512.
Minor bug reported by tucu00 and fixed by tucu00 (security)
AuthenticatedURL should reset the Token when the server returns other than OK on authentication
Currently the token is not being reset and if using AuthenticatedURL, it will keep sending the invalid token as Cookie. There is not security concern with this, the main inconvenience is the logging being generated on the server side.
- HADOOP-8580.
Major bug reported by ekoontz and fixed by
ant compile-native fails with automake version 1.11.3
The following:
{code}
ant -d -v -DskipTests -Dcompile.native=true clean compile-native
{code}
works with GNU automake version 1.11.1, but fails with automake version 1.11.3.
Relevant lines of failure seem to be these:
{code}
[exec] make[1]: Leaving directory `/tmp/hadoop-common/build/native/Linux-amd64-64'
[exec] Current OS is Linux
[exec] Executing 'sh' with arguments:
[exec] '/tmp/hadoop-common/build/native/Linux-amd64-64/libtool'
[exec] '--mode=install'
[exec]...
- HADOOP-8586.
Major bug reported by eli and fixed by eli
Fixup a bunch of SPNEGO misspellings
SPNEGO is misspelled as "SPENGO" a bunch of places.
- HADOOP-8587.
Minor bug reported by eli and fixed by eli (fs)
HarFileSystem access of harMetaCache isn't threadsafe
HarFileSystem's use of the static harMetaCache map is not threadsafe. Credit to Todd for pointing this out.
- HADOOP-8606.
Major bug reported by daryn and fixed by daryn (fs)
FileSystem.get may return the wrong filesystem
{{FileSystem.get(URI, conf)}} will return the default fs if the scheme is null, regardless of whether the authority is null too. This causes URIs of "//authority/path" to _always_ refer to "/path" on the default fs. To the user, this appears to "work" if the authority in the null-scheme URI matches the authority of the default fs. When the authorities don't match, the user is very surprised that the default fs is used.
- HADOOP-8611.
Major bug reported by kihwal and fixed by robsparker (security)
Allow fall-back to the shell-based implementation when JNI-based users-group mapping fails
When the JNI-based users-group mapping is enabled, the process/command will fail if the native library, libhadoop.so, cannot be found. This mostly happens at client-side where users may use hadoop programatically. Instead of failing, falling back to the shell-based implementation will be desirable. Depending on how cluster is configured, use of the native netgroup mapping cannot be subsituted by the shell-based default. For this reason, this behavior must be configurable with the default bein...
- HADOOP-8612.
Major bug reported by mattf and fixed by eli (fs)
Backport HADOOP-8599 to branch-1 (Non empty response when read beyond eof)
When FileSystem.getFileBlockLocations(file,start,len) is called with "start" argument equal to the file size, the response is not empty. See HADOOP-8599 for details and tiny patch.
- HADOOP-8613.
Critical bug reported by daryn and fixed by daryn
AbstractDelegationTokenIdentifier#getUser() should set token auth type
{{AbstractDelegationTokenIdentifier#getUser()}} returns the UGI associated with a token. The UGI's auth type will either be SIMPLE for non-proxy tokens, or PROXY (effective user) and SIMPLE (real user). Instead of SIMPLE, it needs to be TOKEN.
- HADOOP-8711.
Major improvement reported by brandonli and fixed by brandonli (ipc)
provide an option for IPC server users to avoid printing stack information for certain exceptions
Currently it's hard coded in the server that it doesn't print the exception stack for StandbyException.
Similarly, other components may have their own exceptions which don't need to save the stack trace in log. One example is HDFS-3817.
- HADOOP-8767.
Minor bug reported by surfercrs4 and fixed by surfercrs4 (bin)
secondary namenode on slave machines
when the default value for HADOOP_SLAVES is changed in hadoop-env.sh the hdfs starting (with start-dfs.sh) creates secondary namenodes on all the machines in the file conf/slaves instead of conf/masters.
- HADOOP-8781.
Major bug reported by tucu00 and fixed by tucu00 (scripts)
hadoop-config.sh should add JAVA_LIBRARY_PATH to LD_LIBRARY_PATH
Snappy SO fails to load properly if LD_LIBRARY_PATH does not include the path where snappy SO is. This is observed in setups that don't have an independent snappy installation (not installed by Hadoop)
- HADOOP-8786.
Major bug reported by tlipcon and fixed by tlipcon
HttpServer continues to start even if AuthenticationFilter fails to init
As seen in HDFS-3904, if the AuthenticationFilter fails to initialize, the web server will continue to start up. We need to check for context initialization errors after starting the server.
- HADOOP-8791.
Major bug reported by bdechoux and fixed by jingzhao (documentation)
rm "Only deletes non empty directory and files."
The documentation (1.0.3) is describing the opposite of what rm does.
It should be "Only delete files and empty directories."
With regards to file, the size of the file should not matter, should it?
OR I am totally misunderstanding the semantic of this command and I am not the only one.
- HADOOP-8819.
Major bug reported by brandonli and fixed by brandonli (fs)
Should use && instead of & in a few places in FTPFileSystem,FTPInputStream,S3InputStream,ViewFileSystem,ViewFs
Should use && instead of & in a few places in FTPFileSystem,FTPInputStream,S3InputStream,ViewFileSystem,ViewFs.
- HADOOP-8820.
Major new feature reported by djp and fixed by djp (net)
Backport HADOOP-8469 and HADOOP-8470: add "NodeGroup" layer in new NetworkTopology (also known as NetworkTopologyWithNodeGroup)
This patch backport HADOOP-8469 and HADOOP-8470 to branch-1 and includes:
1. Make NetworkTopology class pluggable for extension.
2. Implement a 4-layer NetworkTopology class (named as NetworkTopologyWithNodeGroup) to use in virtualized environment (or other situation with additional layer between host and rack).
- HADOOP-8832.
Major bug reported by brandonli and fixed by brandonli
backport serviceplugin to branch-1
The original patch was only partially back ported to branch-1. This JIRA is to back port the rest of it.
- HADOOP-8861.
Major bug reported by amareshwari and fixed by amareshwari (fs)
FSDataOutputStream.sync should call flush() if the underlying wrapped stream is not Syncable
Currently FSDataOutputStream.sync is a no-op if the wrapped stream is not Syncable. Instead it should call flush() if the wrapped stream is not syncable.
This behavior is already present in trunk, but branch-1 does not have this.
- HADOOP-8900.
Major bug reported by slavik_krassovsky and fixed by adi2
BuiltInGzipDecompressor throws IOException - stored gzip size doesn't match decompressed size
Encountered failure when processing large GZIP file
¥ Gz: Failed in 1hrs, 13mins, 57sec with the error:
üjava.io.IOException: IO error in map input file hdfs://localhost:9000/Halo4/json_m/gz/NewFileCat.txt.gz
at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:242)
at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:216)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.j...
- HADOOP-8917.
Major bug reported by arpitgupta and fixed by arpitgupta
add LOCALE.US to toLowerCase in SecurityUtil.replacePattern
Webhdfs and fsck when getting the kerberos principal use Locale.US in toLowerCase. We should do the same in replacePattern as this method is used when service prinicpals log in.
see https://issues.apache.org/jira/browse/HADOOP-8878?focusedCommentId=13472245&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13472245 for more details
- HADOOP-8931.
Trivial improvement reported by eli and fixed by eli
Add Java version to startup message
I often look at logs and have to track down the java version they were run with, it would be useful if we logged this as part of the startup message.
- HADOOP-8951.
Minor improvement reported by stevel@apache.org and fixed by stevel@apache.org (util)
RunJar to fail with user-comprehensible error message if jar missing
When the RunJar JAR is missing or not a file, exit with a meaningful message.
- HADOOP-8963.
Trivial bug reported by billie.rinaldi and fixed by arpitgupta
CopyFromLocal doesn't always create user directory
When you use the command "hadoop fs -copyFromLocal filename ." before the /user/username directory has been created, the file is created with name /user/username instead of a directory being created with file /user/username/filename. The command "hadoop fs -copyFromLocal filename filename" works as expected, creating /user/username and /user/username/filename, and "hadoop fs -copyFromLocal filename ." works as expected if the /user/username directory already exists.
- HADOOP-8968.
Major improvement reported by tucu00 and fixed by tucu00
Add a flag to completely disable the worker version check
The current logic in the TaskTracker and the DataNode to allow a relax version check with the JobTracker and NameNode works only if the versions of Hadoop are exactly the same.
We should add a switch to disable version checking completely, to enable rolling upgrades between compatible versions (typically patch versions).
- HADOOP-8988.
Major new feature reported by jingzhao and fixed by jingzhao (conf)
Backport HADOOP-8343 to branch-1
Backport HADOOP-8343 to branch-1 so as to specifically control the authorization requirements for accessing /jmx, /metrics, and /conf in branch-1.
- HADOOP-9036.
Major bug reported by ivanmi and fixed by sureshms
TestSinkQueue.testConcurrentConsumers fails intermittently (Backports HADOOP-7292)
org.apache.hadoop.metrics2.impl.TestSinkQueue.testConcurrentConsumers
Error Message
should've thrown
Stacktrace
junit.framework.AssertionFailedError: should've thrown
at org.apache.hadoop.metrics2.impl.TestSinkQueue.shouldThrowCME(TestSinkQueue.java:229)
at org.apache.hadoop.metrics2.impl.TestSinkQueue.testConcurrentConsumers(TestSinkQueue.java:195)
Standard Output
2012-10-03 16:51:31,694 INFO impl.TestSinkQueue (TestSinkQueue.java:consume(243)) - sleeping
- HADOOP-9071.
Major improvement reported by gkesavan and fixed by gkesavan (build)
configure ivy log levels for resolve/retrieve
- HADOOP-9090.
Minor new feature reported by mostafae and fixed by mostafae (metrics)
Support on-demand publish of metrics
Updated description based on feedback:
We have a need to publish metrics out of some short-living processes, which is not really well-suited to the current metrics system implementation which periodically publishes metrics asynchronously (a behavior that works great for long-living processes). Of course I could write my own metrics system, but it seems like such a waste to rewrite all the awesome code currently in the MetricsSystemImpl and supporting classes.
The way this JIRA solves this pr...
- HADOOP-9095.
Minor bug reported by szetszwo and fixed by jingzhao (net)
TestNNThroughputBenchmark fails in branch-1
{noformat}
java.lang.StringIndexOutOfBoundsException: String index out of range: 0
at java.lang.String.charAt(String.java:686)
at org.apache.hadoop.net.NetUtils.normalizeHostName(NetUtils.java:539)
at org.apache.hadoop.net.NetUtils.normalizeHostNames(NetUtils.java:562)
at org.apache.hadoop.net.CachedDNSToSwitchMapping.resolve(CachedDNSToSwitchMapping.java:88)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1047)
...
at org...
- HADOOP-9098.
Blocker bug reported by tomwhite and fixed by arpitagarwal (build)
Add missing license headers
There are missing license headers in some source files (e.g. TestUnderReplicatedBlocks.java is one) according to the RAT report.
- HADOOP-9099.
Minor bug reported by ivanmi and fixed by ivanmi (test)
NetUtils.normalizeHostName fails on domains where UnknownHost resolves to an IP address
I just hit this failure. We should use some more unique string for "UnknownHost":
Testcase: testNormalizeHostName took 0.007 sec
FAILED
expected:<[65.53.5.181]> but was:<[UnknownHost]>
junit.framework.AssertionFailedError: expected:<[65.53.5.181]> but was:<[UnknownHost]>
at org.apache.hadoop.net.TestNetUtils.testNormalizeHostName(TestNetUtils.java:347)
Will post a patch in a bit.
- HADOOP-9124.
Minor bug reported by phunt and fixed by snihalani (io)
SortedMapWritable violates contract of Map interface for equals() and hashCode()
This issue is similar to HADOOP-7153. It was found when using MRUnit - see MRUNIT-158, specifically https://issues.apache.org/jira/browse/MRUNIT-158?focusedCommentId=13501985&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13501985
--
o.a.h.io.SortedMapWritable implements the java.util.Map interface, however it does not define an implementation of the equals() or hashCode() methods; instead the default implementations in java.lang.Object are used.
This violates...
- HADOOP-9154.
Major bug reported by kkambatl and fixed by kkambatl (io)
SortedMapWritable#putAll() doesn't add key/value classes to the map
In the following code from {{SortedMapWritable}}, #putAll() doesn't add key/value classes to the class-id maps.
{code}
@Override
public Writable put(WritableComparable key, Writable value) {
addToMap(key.getClass());
addToMap(value.getClass());
return instance.put(key, value);
}
@Override
public void putAll(Map<? extends WritableComparable, ? extends Writable> t){
for (Map.Entry<? extends WritableComparable, ? extends Writable> e:
t.entrySet()) {
...
- HADOOP-9174.
Major test reported by arpitagarwal and fixed by arpitagarwal (test)
TestSecurityUtil fails on Open JDK 7
TestSecurityUtil.TestBuildTokenServiceSockAddr fails due to implicit dependency on the test case execution order.
Testcase: testBuildTokenServiceSockAddr took 0.003 sec
Caused an ERROR
expected:<[127.0.0.1]:123> but was:<[localhost]:123>
at org.apache.hadoop.security.TestSecurityUtil.testBuildTokenServiceSockAddr(TestSecurityUtil.java:133)
Similar bug exists in TestSecurityUtil.testBuildDTServiceName.
The root cause is that a helper routine (verifyAddress) used by some test cases has a ...
- HADOOP-9175.
Major test reported by arpitagarwal and fixed by arpitagarwal (test)
TestWritableName fails with Open JDK 7
TestWritableName.testAddName fails due to a test order execution dependency on testSetName.
java.io.IOException: WritableName can't load class: mystring
at org.apache.hadoop.io.WritableName.getClass(WritableName.java:73)
at org.apache.hadoop.io.TestWritableName.testAddName(TestWritableName.java:92)
Caused by: java.lang.ClassNotFoundException: mystring
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessCon...
- HADOOP-9179.
Major bug reported by brandonli and fixed by brandonli
TestFileSystem fails with open JDK7
This is a test order-dependency bug as pointed out in HADOOP-8390. This JIRA is to track the fix in branch-1.
- HADOOP-9191.
Major bug reported by arpitagarwal and fixed by arpitagarwal (test)
TestAccessControlList and TestJobHistoryConfig fail with JDK7
Individual test cases have dependencies on a specific order of execution and fail when the order is changed.
TestAccessControlList.testNetGroups relies on Groups being initialized with a hard-coded test class that subsequent test cases depend on.
TestJobHistoryConfig.testJobHistoryLogging fails to shutdown the MiniDFSCluster on exit.
- HADOOP-9253.
Major improvement reported by arpitgupta and fixed by arpitgupta
Capture ulimit info in the logs at service start time
output of ulimit -a is helpful while debugging issues on the system.
- HADOOP-9349.
Major bug reported by sandyr and fixed by sandyr (tools)
Confusing output when running hadoop version from one hadoop installation when HADOOP_HOME points to another
Hadoop version X is downloaded to ~/hadoop-x, and Hadoop version Y is downloaded to ~/hadoop-y. HADOOP_HOME is set to hadoop-x. A user running hadoop-y/bin/hadoop might expect to be running the hadoop-y jars, but, because of HADOOP_HOME, will actually be running hadoop-x jars.
"hadoop version" could help clear this up a little by reporting the current HADOOP_HOME.
- HADOOP-9369.
Major bug reported by kkambatl and fixed by kkambatl (net)
DNS#reverseDns() can return hostname with . appended at the end
DNS#reverseDns uses javax.naming.InitialDirContext to do a reverse DNS lookup. This can sometimes return hostnames with a . at the end.
Saw this happen on hadoop-1: two nodes with tasktracker.dns.interface set to eth0
- HADOOP-9375.
Trivial bug reported by teledriver and fixed by sureshms (test)
Port HADOOP-7290 to branch-1 to fix TestUserGroupInformation failure
Unit test failure in TestUserGroupInformation.testGetServerSideGroups. port HADOOP-7290 to branch-1.1
- HADOOP-9379.
Trivial improvement reported by arpitgupta and fixed by arpitgupta
capture the ulimit info after printing the log to the console
Based on the discussions in HADOOP-9253 people prefer if we dont print the ulimit info to the console but still have it in the logs.
Just need to move the head statement to before the capture of ulimit code.
- HADOOP-9434.
Minor improvement reported by carp84 and fixed by carp84 (bin)
Backport HADOOP-9267 to branch-1
Currently in hadoop 1.1.2, if user issue "bin/hadoop help" in command line, it will throw below exception. We can improve this to print the usage message.
===============================================
Exception in thread "main" java.lang.NoClassDefFoundError: help
===============================================
This issue is already resolved in HADOOP-9267 in trunk, so we only need to backport it into branch-1
- HADOOP-9451.
Major bug reported by djp and fixed by djp (net)
Node with one topology layer should be handled as fault topology when NodeGroup layer is enabled
Currently, nodes with one layer topology are allowed to join in the cluster that with enabling NodeGroup layer which cause some exception cases.
When NodeGroup layer is enabled, the cluster should assumes that at least two layer (Rack/NodeGroup) is valid topology for each nodes, so should throw exceptions for one layer node in joining.
- HADOOP-9458.
Critical bug reported by szetszwo and fixed by szetszwo (ipc)
In branch-1, RPC.getProxy(..) may call proxy.getProtocolVersion(..) without retry
RPC.getProxy(..) may call proxy.getProtocolVersion(..) without retry even when client has specified retry in the conf.
- HADOOP-9467.
Major bug reported by cnauroth and fixed by cnauroth (metrics)
Metrics2 record filtering (.record.filter.include/exclude) does not filter by name
Filtering by record considers only the record's tag for filtering and not the record's name.
- HADOOP-9473.
Trivial bug reported by gmazza and fixed by (fs)
typo in FileUtil copy() method
typo:
{code}
Index: src/core/org/apache/hadoop/fs/FileUtil.java
===================================================================
--- src/core/org/apache/hadoop/fs/FileUtil.java (revision 1467295)
+++ src/core/org/apache/hadoop/fs/FileUtil.java (working copy)
@@ -178,7 +178,7 @@
// Check if dest is directory
if (!dstFS.exists(dst)) {
throw new IOException("`" + dst +"': specified destination directory " +
- "doest not exist");
+ ...
- HADOOP-9492.
Trivial bug reported by jingzhao and fixed by jingzhao (test)
Fix the typo in testConf.xml to make it consistent with FileUtil#copy()
HADOOP-9473 fixed a typo in FileUtil#copy(). We need to fix the same typo in testConf.xml accordingly. Otherwise TestCLI will fail in branch-1.
- HADOOP-9502.
Minor bug reported by rramya and fixed by szetszwo (fs)
chmod does not return error exit codes for some exceptions
When some dfs operations fail due to SnapshotAccessControlException, valid exit codes are not returned.
E.g:
{noformat}
-bash-4.1$ hadoop dfs -chmod -R 755 /user/foo/hdfs-snapshots/test0/.snapshot/s0
chmod: changing permissions of 'hdfs://<namenode>:8020/user/foo/hdfs-snapshots/test0/.snapshot/s0':org.apache.hadoop.hdfs.server.namenode.snapshot.SnapshotAccessControlException: Modification on read-only snapshot is disallowed
-bash-4.1$ echo $?
0
-bash-4.1$ hadoop dfs -chown -R hdfs:users ...
- HADOOP-9537.
Major bug reported by arpitagarwal and fixed by arpitagarwal (security)
Backport AIX patches to branch-1
Backport couple of trivial Jiras to branch-1.
HADOOP-9305 Add support for running the Hadoop client on 64-bit AIX
HADOOP-9283 Add support for running the Hadoop client on AIX
- HADOOP-9543.
Minor bug reported by szetszwo and fixed by szetszwo (test)
TestFsShellReturnCode may fail in branch-1
There is a hardcoded username "admin" in TestFsShellReturnCode. If "admin" does not exist in the local fs, the test may fail. Before HADOOP-9502, the failure of the command is ignored silently, i.e. the command returns success even if it indeed failed.
- HADOOP-9544.
Major bug reported by cnauroth and fixed by cnauroth (io)
backport UTF8 encoding fixes to branch-1
The trunk code has received numerous bug fixes related to UTF8 encoding. I recently observed a branch-1-based cluster fail to load its fsimage due to these bugs. I've confirmed that the bug fixes existing on trunk will resolve this, so I'd like to backport to branch-1.
- HDFS-1957.
Minor improvement reported by asrabkin and fixed by asrabkin (documentation)
Documentation for HFTP
There should be some documentation for HFTP.
- HDFS-2533.
Minor improvement reported by tlipcon and fixed by tlipcon (datanode, performance)
Remove needless synchronization on FSDataSet.getBlockFile
HDFS-1148 discusses lock contention issues in FSDataset. It provides a more comprehensive fix, converting it all to RWLocks, etc. This JIRA is for one very specific fix which gives a decent performance improvement for TestParallelRead: getBlockFile() currently holds the lock which is completely unnecessary.
- HDFS-2757.
Major bug reported by jdcryans and fixed by jdcryans
Cannot read a local block that's being written to when using the local read short circuit
When testing the tail'ing of a local file with the read short circuit on, I get:
{noformat}
2012-01-06 00:17:31,598 WARN org.apache.hadoop.hdfs.DFSClient: BlockReaderLocal requested with incorrect offset: Offset 0 and length 8230400 don't match block blk_-2842916025951313698_454072 ( blockLen 124 )
2012-01-06 00:17:31,598 WARN org.apache.hadoop.hdfs.DFSClient: BlockReaderLocal: Removing blk_-2842916025951313698_454072 from cache because local file /export4/jdcryans/dfs/data/blocksBeingWritt...
- HDFS-2827.
Major bug reported by umamaheswararao and fixed by umamaheswararao (namenode)
Cannot save namespace after renaming a directory above a file with an open lease
When i execute the following operations and wait for checkpoint to complete.
fs.mkdirs(new Path("/test1"));
FSDataOutputStream create = fs.create(new Path("/test/abc.txt")); //dont close
fs.rename(new Path("/test/"), new Path("/test1/"));
Check-pointing is failing with the following exception.
2012-01-23 15:03:14,204 ERROR namenode.FSImage (FSImage.java:run(795)) - Unable to save image for E:\HDFS-1623\hadoop-hdfs-project\hadoop-hdfs\build\test\data\dfs\name3
java.io.IOException: saveLease...
- HDFS-3163.
Trivial improvement reported by brandonli and fixed by brandonli (test)
TestHDFSCLI.testAll fails if the user name is not all lowercase
In the test resource file testHDFSConf.xml, the test comparators expect user name to be all lowercase.
If the user issuing the test has an uppercase in the username (e.g., Brandon instead of brandon), many RegexpComarator tests will fail. The following is one example:
{noformat}
<comparator>
<type>RegexpComparator</type>
<expected-output>^-rw-r--r--( )*1( )*[a-z]*( )*supergroup( )*0( )*[0-9]{4,}-[0-9]{2,}-[0-9]{2,} [0-9]{2,}:[0-9]{2,}( )*/file1</expected-output>
...
- HDFS-3402.
Minor bug reported by benoyantony and fixed by benoyantony (scripts, security)
Fix hdfs scripts for secure datanodes
Starting secure datanode gives out the following error :
Error thrown :
09/04/2012 12:09:30 2524 jsvc error: Invalid option -server
09/04/2012 12:09:30 2524 jsvc error: Cannot parse command line arguments
- HDFS-3479.
Major improvement reported by cmccabe and fixed by cmccabe
backport HDFS-3335 (check for edit log corruption at the end of the log) to branch-1
backport HDFS-3335 (check for edit log corruption at the end of the log) to branch-1
- HDFS-3515.
Major new feature reported by eli2 and fixed by eli (namenode)
Port HDFS-1457 to branch-1
Let's port HDFS-1457 (configuration option to enable limiting the transfer rate used when sending the image and edits for checkpointing) to branch-1.
- HDFS-3521.
Major improvement reported by szetszwo and fixed by szetszwo (namenode)
Allow namenode to tolerate edit log corruption
HDFS-3479 adds checking for edit log corruption. It uses a fixed UNCHECKED_REGION_LENGTH (=PREALLOCATION_LENGTH) so that the bytes at the end within the length is not checked. Instead of not checking the bytes, we should check everything and allow toleration.
- HDFS-3540.
Major bug reported by szetszwo and fixed by szetszwo (namenode)
Further improvement on recovery mode and edit log toleration in branch-1
*Recovery Mode*: HDFS-3479 backported HDFS-3335 to branch-1. However, the recovery mode feature in branch-1 is dramatically different from the recovery mode in trunk since the edit log implementations in these two branch are different. For example, there is UNCHECKED_REGION_LENGTH in branch-1 but not in trunk.
*Edit Log Toleration*: HDFS-3521 added this feature to branch-1 to remedy UNCHECKED_REGION_LENGTH and to tolerate edit log corruption.
There are overlaps between these two features....
- HDFS-3595.
Major bug reported by cmccabe and fixed by cmccabe (namenode)
TestEditLogLoading fails in branch-1
TestEditLogLoading currently fails in branch-1, with this error message:
{code}
Testcase: testDisplayRecentEditLogOpCodes took 1.965 sec
FAILED
error message contains opcodes message
junit.framework.AssertionFailedError: error message contains opcodes message
at org.apache.hadoop.hdfs.server.namenode.TestEditLogLoading.testDisplayRecentEditLogOpCodes(TestEditLogLoading.java:75)
{code}
- HDFS-3596.
Minor improvement reported by cmccabe and fixed by cmccabe
Improve FSEditLog pre-allocation in branch-1
Implement HDFS-3510 in branch-1. This will improve FSEditLog preallocation to decrease the incidence of corrupted logs after disk full conditions. (See HDFS-3510 for a longer description.)
- HDFS-3604.
Minor improvement reported by eli and fixed by eli
Add dfs.webhdfs.enabled to hdfs-default.xml
Let's add {{dfs.webhdfs.enabled}} to hdfs-default.xml.
- HDFS-3628.
Blocker bug reported by qwertymaniac and fixed by qwertymaniac (datanode, namenode)
The dfsadmin -setBalancerBandwidth command on branch-1 does not check for superuser privileges
The changes from HDFS-2202 for 0.20.x/1.x failed to add in a checkSuperuserPrivilege();, and hence any user (not admins alone) can reset the balancer bandwidth across the cluster if they wished to.
- HDFS-3647.
Major improvement reported by hoffman60613 and fixed by qwertymaniac (datanode)
Backport HDFS-2868 (Add number of active transfer threads to the DataNode status) to branch-1
Not sure if this is in a newer version of Hadoop, but in CDH3u3 it isn't there.
There is a lot of mystery surrounding how large to set dfs.datanode.max.xcievers. Most people say to just up it to 4096, but given that exceeding this will cause an HBase RegionServer shutdown (see Lars' blog post here: http://www.larsgeorge.com/2012/03/hadoop-hbase-and-xceivers.html), it would be nice if we could expose the current count via the built-in metrics framework (most likely under dfs). In this way w...
- HDFS-3679.
Minor bug reported by cmeyerisi and fixed by cmeyerisi (fuse-dfs)
fuse_dfs notrash option sets usetrash
fuse_dfs sets usetrash option when the "notrash" flag is given. This is the exact opposite of the desired behavior. The "usetrash" flag sets usetrash as well, but this is correct. Here are the relevant lines from fuse_options.c, in latest HDFS HEAD[0]:
123 case KEY_USETRASH:
124 options.usetrash = 1;
125 break;
126 case KEY_NOTRASH:
127 options.usetrash = 1;
128 break;
This is a pretty trivial bug to fix. I'm not familiar with the process here, but I can attach a patch i...
- HDFS-3698.
Major bug reported by atm and fixed by atm (security)
TestHftpFileSystem is failing in branch-1 due to changed default secure port
This test is failing since the default secure port changed to the HTTP port upon the commit of HDFS-2617.
- HDFS-3754.
Major bug reported by eli and fixed by eli (datanode)
BlockSender doesn't shutdown ReadaheadPool threads
The BlockSender doesn't shutdown the ReadaheadPool threads so when tests are run with native libraries some tests fail (time out) because shutdown hangs waiting for the outstanding threads to exit.
- HDFS-3817.
Major improvement reported by brandonli and fixed by brandonli (namenode)
avoid printing stack information for SafeModeException
When NN is in safemode, any namespace change request could cause a SafeModeException to be thrown and logged in the server log, which can make the server side log grow very quickly.
The server side log can be more concise if only the exception and error message will be printed but not the stack trace.
- HDFS-3819.
Minor improvement reported by jingzhao and fixed by jingzhao
Should check whether invalidate work percentage default value is not greater than 1.0f
In DFSUtil#getInvalidateWorkPctPerIteration we should also check that the configured value is not greater than 1.0f.
- HDFS-3838.
Trivial improvement reported by brandonli and fixed by brandonli (namenode)
fix the typo in FSEditLog.java: isToterationEnabled should be isTolerationEnabled
- HDFS-3912.
Major sub-task reported by jingzhao and fixed by jingzhao
Detecting and avoiding stale datanodes for writing
1. Make stale timeout adaptive to the number of nodes marked stale in the cluster.
2. Consider having a separate configuration for write skipping the stale nodes.
- HDFS-3940.
Minor improvement reported by eli and fixed by sureshms
Add Gset#clear method and clear the block map when namenode is shutdown
Per HDFS-3936 it would be useful if GSet has a clear method so BM#close could clear out the LightWeightGSet.
- HDFS-3941.
Major new feature reported by djp and fixed by djp (namenode)
Backport HDFS-3498 and HDFS3601: update replica placement policy for new added "NodeGroup" layer topology
With enabling additional layer of "NodeGroup", the replica placement policy used in BlockPlacementPolicyWithNodeGroup is updated to following rules:
0. No more than one replica is placed within a NodeGroup (*)
1. First replica on the local node.
2. Second and third replicas are within the same rack but remote rack with 1st replica.
3. Other replicas on random nodes with restriction that no more than two replicas are placed in the same rack, if there is enough racks.
Also, this patch abstract...
- HDFS-3942.
Major new feature reported by djp and fixed by djp (balancer)
Backport HDFS-3495: Update balancer policy for Network Topology with additional 'NodeGroup' layer
This is the backport work for HDFS-3495 and HDFS-4234.
- HDFS-3961.
Major bug reported by jingzhao and fixed by jingzhao
FSEditLog preallocate() needs to reset the position of PREALLOCATE_BUFFER when more than 1MB size is needed
In the new preallocate() function, when the required size is larger 1MB, we need to reset the position for PREALLOCATION_BUFFER every time when we have allocated 1MB. Otherwise seems only 1MB can be allocated even if need is larger than 1MB.
- HDFS-3963.
Major bug reported by brandonli and fixed by brandonli
backport namenode/datanode serviceplugin to branch-1
backport namenode/datanode serviceplugin to branch-1
- HDFS-4057.
Minor improvement reported by brandonli and fixed by brandonli (namenode)
NameNode.namesystem should be private. Use getNamesystem() instead.
NameNode.namesystem should be private. One should use NameNode.getNamesystem() to get it instead.
- HDFS-4062.
Minor improvement reported by jingzhao and fixed by jingzhao
In branch-1, FSNameSystem#invalidateWorkForOneNode and FSNameSystem#computeReplicationWorkForBlock should print logs outside of the namesystem lock
Similar to HDFS-4052 for trunk, both FSNameSystem#invalidateWorkForOneNode and FSNameSystem#computeReplicationWorkForBlock in branch-1 should print long log info level information outside of the namesystem lock. We create this separate jira since the description and code is different for 1.x.
- HDFS-4072.
Minor bug reported by jingzhao and fixed by jingzhao (namenode)
On file deletion remove corresponding blocks pending replication
Currently when deleting a file, blockManager does not remove records that are corresponding to the file's blocks from pendingRelications. These records can only be removed after timeout (5~10 min).
- HDFS-4168.
Major bug reported by szetszwo and fixed by jingzhao (namenode)
TestDFSUpgradeFromImage fails in branch-1
{noformat}
java.lang.NullPointerException
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.removeBlocks(FSNamesystem.java:2212)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.removePathAndBlocks(FSNamesystem.java:2225)
at org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedDelete(FSDirectory.java:645)
at org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.java:833)
at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSImage.java:1024)
...
- HDFS-4180.
Minor bug reported by szetszwo and fixed by jingzhao (test)
TestFileCreation fails in branch-1 but not branch-1.1
{noformat}
Testcase: testFileCreation took 3.419 sec
Caused an ERROR
java.io.IOException: Cannot create /test_dir; already exists as a directory
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:1374)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:1334)
...
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1387)
org.apache.hadoop.ipc.RemoteException: java.io.IOException: Cannot create /test_dir; already e...
- HDFS-4207.
Minor bug reported by stevel@apache.org and fixed by jingzhao (hdfs-client)
All hadoop fs operations fail if the default fs is down even if a different file system is specified in the command
you can't do any {{hadoop fs}} commands against any hadoop filesystem (e.g, s3://, a remote hdfs://, webhdfs://) if the default FS of the client is offline. Only operations that need the local fs should be expected to fail in this situation
- HDFS-4219.
Major new feature reported by arpitgupta and fixed by arpitgupta
Port slive to branch-1
Originally it was committed in HDFS-708 and MAPREDUCE-1804
- HDFS-4222.
Minor bug reported by teledriver and fixed by teledriver (namenode)
NN is unresponsive and loses heartbeats of DNs when Hadoop is configured to use LDAP and LDAP has issues
For Hadoop clusters configured to access directory information by LDAP, the FSNamesystem calls on behave of DFS clients might hang due to LDAP issues (including LDAP access issues caused by networking issues) while holding the single lock of FSNamesystem. That will result in the NN unresponsive and loss of the heartbeats from DNs.
The places LDAP got accessed by FSNamesystem calls are the instantiation of FSPermissionChecker, which could be moved out of the lock scope since the instantiation...
- HDFS-4256.
Major test reported by sureshms and fixed by sanjay.radia (namenode)
Backport concatenation of files into a single file to branch-1
HDFS-222 added support concatenation of multiple files in a directory into a single file. This helps several use cases where writes can be parallelized and several folks have expressed in this functionality.
This jira intends to make changes equivalent from HDFS-222 into branch-1 to be made available release 1.2.0.
- HDFS-4351.
Major bug reported by andrew.wang and fixed by andrew.wang (namenode)
Fix BlockPlacementPolicyDefault#chooseTarget when avoiding stale nodes
There's a bug in {{BlockPlacementPolicyDefault#chooseTarget}} with stale node avoidance enabled (HDFS-3912). If a NotEnoughReplicasException is thrown in the call to {{chooseRandom()}}, {{numOfReplicas}} is not updated together with the partial result in {{result}} since it is pass by value. The retry call to {{chooseTarget}} then uses this incorrect value.
This can be seen if you enable stale node detection for {{TestReplicationPolicy#testChooseTargetWithMoreThanAvaiableNodes()}}.
- HDFS-4355.
Major bug reported by brandonli and fixed by brandonli (test)
TestNameNodeMetrics.testCorruptBlock fails with open JDK7
Argument(s) are different! Wanted:
metricsRecordBuilder.addGauge(
"CorruptBlocks",
<any>,
1
);
-> at org.apache.hadoop.test.MetricsAsserts.assertGauge(MetricsAsserts.java:96)
Actual invocation has different arguments:
metricsRecordBuilder.addGauge(
"FilesTotal",
"",
4
);
-> at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getMetrics(FSNamesystem.java:5818)
at java.lang.reflect.Constructor.newInstance(Constructor.java:525)
at org.apache.hadoop.test.MetricsAsserts.assertGauge(MetricsAsse...
- HDFS-4358.
Major bug reported by arpitagarwal and fixed by arpitagarwal (test)
TestCheckpoint failure with JDK7
testMultipleSecondaryNameNodes doesn't shutdown the SecondaryNameNode which causes testCheckpoint to fail.
Testcase: testCheckpoint took 2.736 sec
Caused an ERROR
Cannot lock storage C:\hdp1-2\build\test\data\dfs\namesecondary1. The directory is already locked.
java.io.IOException: Cannot lock storage C:\hdp1-2\build\test\data\dfs\namesecondary1. The directory is already locked.
at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.lock(Storage.java:602)
at org.apache.hadoop.hd...
- HDFS-4413.
Major bug reported by mostafae and fixed by mostafae (namenode)
Secondary namenode won't start if HDFS isn't the default file system
If HDFS is not the default file system (fs.default.name is something other than hdfs://...), then secondary namenode throws early on in its initialization. This is a needless check as far as I can tell, and blocks scenarios where HDFS services are up but HDFS is not the default file system.
- HDFS-4444.
Trivial bug reported by schu and fixed by schu
Add space between total transaction time and number of transactions in FSEditLog#printStatistics
Currently, when we log statistics, we see something like
{code}
13/01/25 23:16:59 INFO namenode.FSNamesystem: Number of transactions: 0 Total time for transactions(ms): 0Number of transactions batched in Syncs: 0 Number of syncs: 0 SyncTimes(ms): 0
{code}
Notice how the value for total transactions time and "Number of transactions batched in Syncs" needs a space to separate them.
FSEditLog#printStatistics:
{code}
private void printStatistics(boolean force) {
long now = now();
if (...
- HDFS-4466.
Major bug reported by brandonli and fixed by brandonli (namenode, security)
Remove the deadlock from AbstractDelegationTokenSecretManager
In HDFS-3374, new synchronization in AbstractDelegationTokenSecretManager.ExpiredTokenRemover was added to make sure the ExpiredTokenRemover thread can be interrupted in time. Otherwise TestDelegation fails intermittently because the MiniDFScluster thread could be shut down before tokenRemover thread.
However, as Todd pointed out in HDFS-3374, a potential deadlock was introduced by its patch:
{quote}
* FSNamesystem.saveNamespace (holding FSN lock) calls DTSM.saveSecretManagerState (which ...
- HDFS-4479.
Major bug reported by jingzhao and fixed by jingzhao
logSync() with the FSNamesystem lock held in commitBlockSynchronization
In FSNamesystem#commitBlockSynchronization of branch-1, logSync() may be called when the FSNamesystem lock is held. Similar to HDFS-4186, this may cause some performance issue.
The following issue was observed in a cluster that was running a Hive job and was writing to 100,000 temporary files (each task is writing to 1000s of files). When this job is killed, a large number of files are left open for write. Eventually when the lease for open files expires, lease recovery is started for all th...
- HDFS-4518.
Major bug reported by arpitagarwal and fixed by arpitagarwal
Finer grained metrics for HDFS capacity
Namenode should export disk usage metrics in bytes via FSNamesystemMetrics.
- HDFS-4544.
Major bug reported by amareshwari and fixed by arpitagarwal
Error in deleting blocks should not do check disk, for all types of errors
The following code in Datanode.java
{noformat}
try {
if (blockScanner != null) {
blockScanner.deleteBlocks(toDelete);
}
data.invalidate(toDelete);
} catch(IOException e) {
checkDiskError();
throw e;
}
{noformat}
causes check disk to happen in case of any errors during invalidate.
We have seen errors like :
2013-03-02 00:08:28,849 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Unexpected error trying to delete bloc...
- HDFS-4551.
Major improvement reported by mwagner and fixed by mwagner (webhdfs)
Change WebHDFS buffersize behavior to improve default performance
Currently on 1.X branch, the buffer size used to copy bytes to network defaults to io.file.buffer.size. This causes performance problems if that buffersize is large.
- HDFS-4558.
Critical bug reported by gujilangzi and fixed by djp (balancer)
start balancer failed with NPE
start balancer failed with NPE
File this issue to track for QE and dev take a look
balancer.log:
2013-03-06 00:19:55,174 ERROR org.apache.hadoop.hdfs.server.balancer.Balancer: java.lang.NullPointerException
at org.apache.hadoop.hdfs.server.namenode.BlockPlacementPolicy.getInstance(BlockPlacementPolicy.java:165)
at org.apache.hadoop.hdfs.server.balancer.Balancer.checkReplicationPolicyCompatibility(Balancer.java:799)
at org.apache.hadoop.hdfs.server.balancer.Balancer.<init>(Balancer.java:...
- HDFS-4597.
Major new feature reported by szetszwo and fixed by szetszwo (webhdfs)
Backport WebHDFS concat to branch-1
HDFS-3598 adds cancat to WebHDFS. Let's also add it to branch-1.
- HDFS-4635.
Major improvement reported by sureshms and fixed by sureshms (namenode)
Move BlockManager#computeCapacity to LightWeightGSet
The computeCapacity in BlockManager that calculates the LightWeightGSet capacity as the percentage of total JVM memory should be moved to LightWeightGSet. This helps in other maps that are based on the GSet to make use of the same functionality.
- HDFS-4651.
Major improvement reported by cnauroth and fixed by cnauroth (tools)
Offline Image Viewer backport to branch-1
This issue tracks backporting the Offline Image Viewer tool to branch-1.
- HDFS-4715.
Major bug reported by szetszwo and fixed by mwagner (webhdfs)
Backport HDFS-3577 and other related WebHDFS issue to branch-1
The related JIRAs are HDFS-3577, HDFS-3318, and HDFS-3788. Backporting them can fix some WebHDFS performance issues in branch-1.
- HDFS-4774.
Major new feature reported by yuzhihong@gmail.com and fixed by yuzhihong@gmail.com (hdfs-client, namenode)
Backport HDFS-4525 'Provide an API for knowing whether file is closed or not' to branch-1
HDFS-4525 compliments lease recovery API which allows user to know whether the recovery has completed.
This JIRA backports the API to branch-1.
- HDFS-4776.
Minor new feature reported by szetszwo and fixed by szetszwo (namenode)
Backport SecondaryNameNode web ui to branch-1
The related JIRAs are
- HADOOP-3741: SecondaryNameNode has http server on dfs.secondary.http.address but without any contents
- HDFS-1728: SecondaryNameNode.checkpointSize is in byte but not MB.
- MAPREDUCE-461.
Minor new feature reported by fhedberg and fixed by fhedberg
Enable ServicePlugins for the JobTracker
Allow ServicePlugins (see HADOOP-5257) for the JobTracker.
- MAPREDUCE-987.
Minor new feature reported by philip and fixed by ahmed.radwan (build, test)
Exposing MiniDFS and MiniMR clusters as a single process command-line
It's hard to test non-Java programs that rely on significant mapreduce functionality. The patch I'm proposing shortly will let you just type "bin/hadoop jar hadoop-hdfs-hdfswithmr-test.jar minicluster" to start a cluster (internally, it's using Mini{MR,HDFS}Cluster) with a specified number of daemons, etc. A test that checks how some external process interacts with Hadoop might start minicluster as a subprocess, run through its thing, and then simply kill the java subprocess.
I've been usi...
- MAPREDUCE-1684.
Major bug reported by amareshwari and fixed by knoguchi (capacity-sched)
ClusterStatus can be cached in CapacityTaskScheduler.assignTasks()
Currently, CapacityTaskScheduler.assignTasks() calls getClusterStatus() thrice: once in assignTasks(), once in MapTaskScheduler and once in ReduceTaskScheduler. It can be cached in assignTasks() and re-used.
- MAPREDUCE-1806.
Major bug reported by pauly and fixed by jira.shegalov (harchive)
CombineFileInputFormat does not work with paths not on default FS
In generating the splits in CombineFileInputFormat, the scheme and authority are stripped out. This creates problems when trying to access the files while generating the splits, as without the har:/, the file won't be accessed through the HarFileSystem.
- MAPREDUCE-2217.
Major bug reported by schen and fixed by kkambatl (jobtracker)
The expire launching task should cover the UNASSIGNED task
The ExpireLaunchingTask thread kills the task that are scheduled but not responded.
Currently if a task is scheduled on tasktracker and for some reason tasktracker cannot put it to RUNNING.
The task will just hang in the UNASSIGNED status and JobTracker will keep waiting for it.
JobTracker.ExpireLaunchingTask should be able to kill this task.
- MAPREDUCE-2264.
Major bug reported by akramer and fixed by devaraj.k (jobtracker)
Job status exceeds 100% in some cases
I'm looking now at my jobtracker's list of running reduce tasks. One of them is 120.05% complete, the other is 107.28% complete.
I understand that these numbers are estimates, but there is no case in which an estimate of 100% for a non-complete task is better than an estimate of 99.99%, nor is there any case in which an estimate greater than 100% is valid.
I suggest that whatever logic is computing these set 99.99% as a hard maximum.
- MAPREDUCE-2289.
Major bug reported by tlipcon and fixed by ahmed.radwan (job submission)
Permissions race can make getStagingDir fail on local filesystem
I've observed the following race condition in TestFairSchedulerSystem which uses a MiniMRCluster on top of RawLocalFileSystem:
- two threads call getStagingDir at the same time
- Thread A checks fs.exists(stagingArea) and sees false
-- Calls mkdirs(stagingArea, JOB_DIR_PERMISSIONS)
--- mkdirs calls the Java mkdir API which makes the file with umask-based permissions
- Thread B runs, checks fs.exists(stagingArea) and sees true
-- checks permissions, sees the default permissions, and throws IOE...
- MAPREDUCE-2770.
Trivial improvement reported by eli and fixed by sandyr (documentation)
Improve hadoop.job.history.location doc in mapred-default.xml
The documentation for hadoop.job.history.location in mapred-default.xml should indicate that this parameter can be a URI and any file system that Hadoop supports (eg hdfs and file).
- MAPREDUCE-2931.
Major improvement reported by forest520 and fixed by sandyr
CLONE - LocalJobRunner should support parallel mapper execution
The LocalJobRunner currently supports only a single execution thread. Given the prevalence of multi-core CPUs, it makes sense to allow users to run multiple tasks in parallel for improved performance on small (local-only) jobs.
It is necessary to patch back MAPREDUCE-1367 into Hadoop 0.20.X version. Also, MapReduce-434 should be submitted together.
- MAPREDUCE-3727.
Critical bug reported by tucu00 and fixed by tucu00 (security)
jobtoken location property in jobconf refers to wrong jobtoken file
Oozie launcher job (for MR/Pig/Hive/Sqoop action) reads the location of the jobtoken file from the *HADOOP_TOKEN_FILE_LOCATION* ENV var and seeds it as the *mapreduce.job.credentials.binary* property in the jobconf that will be used to launch the real (MR/Pig/Hive/Sqoop) job.
The MR/Pig/Hive/Sqoop submission code (via Hadoop job submission) uses correctly the injected *mapreduce.job.credentials.binary* property to load the credentials and submit their MR jobs.
The problem is that the *mapre...
- MAPREDUCE-3993.
Major bug reported by tlipcon and fixed by kkambatl (mrv1, mrv2)
Graceful handling of codec errors during decompression
When using a compression codec for intermediate compression, some cases of corrupt data can cause the codec to throw exceptions other than IOException (eg java.lang.InternalError). This will currently cause the whole reduce task to fail, instead of simply treating it like another case of a failed fetch.
- MAPREDUCE-4036.
Major bug reported by tucu00 and fixed by tucu00 (test)
Streaming TestUlimit fails on CentOS 6
CentOS 6 seems to have higher memory requirements than other distros and together with the new MALLOC library makes the TestUlimit to fail with exit status 134.
- MAPREDUCE-4195.
Critical bug reported by jira.shegalov and fixed by (jobtracker)
With invalid queueName request param, jobqueue_details.jsp shows NPE
When you access /jobqueue_details.jsp manually, instead of via a link, it has queueName set to null internally and this goes for a lookup into the scheduling info maps as well.
As a result, if using FairScheduler, a Pool with String name = null gets created and this brings the scheduler down. I have not tested what happens to the CapacityScheduler, but ideally if no queueName is set in that jsp, it should fall back to 'default'. Otherwise, this brings down the JobTracker completely.
FairSch...
- MAPREDUCE-4278.
Major bug reported by araceli and fixed by sandyr
cannot run two local jobs in parallel from the same gateway.
I cannot run two local mode jobs from Pig in parallel from the same gateway, this is a typical use case. If I re-run the tests sequentially, then the test pass. This seems to be a problem from Hadoop.
Additionally, the pig harness, expects to be able to run Pig-version-undertest against Pig-version-stable from the same gateway.
To replicate the error:
I have two clusters running from the same gateway.
If I run the Pig regression suites nightly.conf in local mode in paralell - once on each...
- MAPREDUCE-4315.
Major bug reported by alo.alt and fixed by sandyr (jobhistoryserver)
jobhistory.jsp throws 500 when a .txt file is found in /done
if a .txt file located in /done the parser throws an 500 error.
Trace:
java.lang.ArrayIndexOutOfBoundsException: 1
at org.apache.hadoop.mapred.jobhistory_jsp$2.compare(jobhistory_jsp.java:295)
at org.apache.hadoop.mapred.jobhistory_jsp$2.compare(jobhistory_jsp.java:279)
at java.util.Arrays.mergeSort(Arrays.java:1270)
at java.util.Arrays.mergeSort(Arrays.java:1282)
at java.util.Arrays.mergeSort(Arrays.java:1282)
at java.util.Arrays.mergeSort(Arra...
- MAPREDUCE-4317.
Major bug reported by qwertymaniac and fixed by kkambatl (mrv1)
Job view ACL checks are too permissive
The class that does view-based checks, JSPUtil.JobWithViewAccessCheck, has the following internal member:
{code}private boolean isViewAllowed = true;{code}
Note that its true.
Now, in the method that sets proper view-allowed rights, has:
{code}
if (user != null && job != null && jt.areACLsEnabled()) {
final UserGroupInformation ugi =
UserGroupInformation.createRemoteUser(user);
try {
ugi.doAs(new PrivilegedExceptionAction<Void>() {
public Void run() t...
- MAPREDUCE-4355.
Major new feature reported by kkambatl and fixed by kkambatl (mrv1, mrv2)
Add RunningJob.getJobStatus()
Usecase: Read the start/end-time of a particular job.
Currently, one has to iterate through JobClient.getAllJobStatuses() and iterate through them. JobClient.getJob(JobID) returns RunningJob, which doesn't hold the job's start time.
Adding RunningJob.getJobStatus() solves the issue.
- MAPREDUCE-4359.
Major bug reported by tlipcon and fixed by tomwhite
Potential deadlock in Counters
jcarder identified this deadlock in branch-1 (though it may also be present in trunk):
- Counters.size() is synchronized and locks Counters before Group
- Counters.Group.getCounterForName() is synchronized and calls through to Counters.size()
This creates a potential cycle which could cause a deadlock (though probably quite rare in practice)
- MAPREDUCE-4385.
Major bug reported by kkambatl and fixed by kkambatl
FairScheduler.maxTasksToAssign() should check for fairscheduler.assignmultiple.maps < TaskTracker.availableSlots
FairScheduler.maxTasksToAssign() can potentially return a value greater than the available slots. Currently, we rely on canAssignMaps()/canAssignReduces() to reject such requests.
These additional calls can be avoided by check against the available slots in maxTasksToAssign().
- MAPREDUCE-4408.
Major improvement reported by tucu00 and fixed by rkanter (mrv1, mrv2)
allow jobs to set a JAR that is in the distributed cached
Setting a job JAR with JobConf.setJar(String) and Job.setJar(String) assumes that the JAR is local to the client submitting the job, thus it triggers copying the JAR to HDFS and injecting it to the distributed cached.
AFAIK, this is the only way to use uber JARs (JARs with JARs inside) in MR jobs.
For jobs launched by Oozie, all JARs are already in HDFS. In order for Oozie to suport uber JARs (OOZIE-654) there should be a way for specifying as JAR a JAR that is already in HDFS.
- MAPREDUCE-4434.
Major bug reported by kkambatl and fixed by kkambatl (mrv1)
Backport MR-2779 (JobSplitWriter.java can't handle large job.split file) to branch-1
- MAPREDUCE-4463.
Blocker bug reported by tomwhite and fixed by tomwhite (mrv1)
JobTracker recovery fails with HDFS permission issue
Recovery fails when the job user is different to the JT owner (i.e. on anything bigger than a pseudo-distributed cluster).
- MAPREDUCE-4464.
Minor improvement reported by heathcd and fixed by heathcd (task)
Reduce tasks failing with NullPointerException in ConcurrentHashMap.get()
If DNS does not resolve hostnames properly, reduce tasks can fail with a very misleading exception.
as per my peer Ahmed's diagnosis:
In ReduceTask, it seems that event.getTaskTrackerHttp() returns a malformed URI, and so host from:
{code}
String host = u.getHost();
{code}
is evaluated to null and the NullPointerException is thrown afterwards in the ConcurrentHashMap.
I have written a patch to check for a null hostname condition when getHost is called in the getMapCompletionEvents method a...
- MAPREDUCE-4499.
Major improvement reported by nroberts and fixed by knoguchi (mrv1, performance)
Looking for speculative tasks is very expensive in 1.x
When there are lots of jobs and tasks active in a cluster, the process of figuring out whether or not to launch a speculative task becomes very expensive.
I could be missing something but it certainly looks like on every heartbeat we could be scanning 10's of thousands of tasks looking for something which might need to be speculatively executed. In most cases, nothing gets chosen so we completely trashed our data cache and didn't even find a task to schedule, just to do it all over again on...
- MAPREDUCE-4556.
Minor improvement reported by kkambatl and fixed by kkambatl (contrib/fair-share)
FairScheduler: PoolSchedulable#updateDemand() has potential redundant computation
- MAPREDUCE-4572.
Major bug reported by ahmed.radwan and fixed by ahmed.radwan (tasktracker, webapps)
Can not access user logs - Jetty is not configured by default to serve aliases/symlinks
The task log servlet can no longer access user logs because MAPREDUCE-2415 introduce symlinks to the logs and jetty is not configured by default to serve symlinks.
- MAPREDUCE-4576.
Major bug reported by revans2 and fixed by revans2
Large dist cache can block tasktracker heartbeat
- MAPREDUCE-4595.
Critical bug reported by kkambatl and fixed by kkambatl
TestLostTracker failing - possibly due to a race in JobHistory.JobHistoryFilesManager#run()
The source for occasional failure of TestLostTracker seems like the following:
On job completion, JobHistoryFilesManager#run() spawns another thread to move history files to done folder. TestLostTracker waits for job completion, before checking the file format of the history file. However, the history files move might be in the process or might not have started in the first place.
The attachment (force-TestLostTracker-failure.patch) helps reproducing the error locally, by increasing the cha...
- MAPREDUCE-4629.
Major bug reported by kkambatl and fixed by kkambatl
Remove JobHistory.DEBUG_MODE
Remove JobHistory.DEBUG_MODE for the following reasons:
1. No one seems to be using it - the config parameter corresponding to enabling it does not even exist in mapred-default.xml
2. The logging being done in DEBUG_MODE needs to move to LOG.debug() and LOG.trace()
3. Buggy handling of helper methods in DEBUG_MODE; e.g. directoryTime() and timestampDirectoryComponent().
- MAPREDUCE-4643.
Major bug reported by kkambatl and fixed by sandyr (jobhistoryserver)
Make job-history cleanup-period configurable
Job history cleanup should be made configurable. Currently, it is set to 1 month by default. The DEBUG_MODE (to be removed, see MAPREDUCE-4629) sets it to 20 minutes, but it should be configurable.
- MAPREDUCE-4652.
Major bug reported by ahmed.radwan and fixed by ahmed.radwan (examples, mrv1)
ValueAggregatorJob sets the wrong job jar
Using branch-1 tarball, if the user tries to submit an example aggregatewordcount, the job fails with the following error:
{code}
ahmed@ubuntu:~/demo/deploy/hadoop-1.2.0-SNAPSHOT$ bin/hadoop jar hadoop-examples-1.2.0-SNAPSHOT.jar aggregatewordcount input examples-output/aggregatewordcount 2 textinputformat
12/09/12 17:09:46 INFO mapred.JobClient: originalJarPath: /home/ahmed/demo/deploy/hadoop-1.2.0-SNAPSHOT/hadoop-core-1.2.0-SNAPSHOT.jar
12/09/12 17:09:48 INFO mapred.JobClient: submitJarFil...
- MAPREDUCE-4660.
Major new feature reported by djp and fixed by djp (jobtracker, mrv1, scheduler)
Update task placement policy for NetworkTopology with 'NodeGroup' layer
- MAPREDUCE-4662.
Major bug reported by tgraves and fixed by kihwal (jobhistoryserver)
JobHistoryFilesManager thread pool never expands
The job history file manager creates a threadpool with core size 1 thread, max pool size 3. It never goes beyond 1 thread though because its using a LinkedBlockingQueue which doesn't have a max size.
void start() {
executor = new ThreadPoolExecutor(1, 3, 1,
TimeUnit.HOURS, new LinkedBlockingQueue<Runnable>());
}
According to the ThreadPoolExecutor java doc page it only increases the number of threads when the queue is full. Since the queue we are using has no max ...
- MAPREDUCE-4703.
Major improvement reported by ahmed.radwan and fixed by ahmed.radwan (mrv1, mrv2, test)
Add the ability to start the MiniMRClientCluster using the configurations used before it is being stopped.
The objective here is to enable starting back the cluster, after being stopped, using the same configurations/port numbers used before stopping.
- MAPREDUCE-4706.
Critical bug reported by kkambatl and fixed by kkambatl (contrib/fair-share)
FairScheduler#dump(): Computing of # running maps and reduces is commented out
In FairScheduler#dump(), we conveniently comment the updating of number of running maps and reduces. It needs to be fixed for the dump to throw out meaningful information.
- MAPREDUCE-4765.
Minor bug reported by rkanter and fixed by rkanter (jobtracker, mrv1)
Restarting the JobTracker programmatically can cause DelegationTokenRenewal to throw an exception
The DelegationTokenRenewal class has a global Timer; when you stop the JobTracker by calling {{stopTracker()}} on it (or {{stopJobTracker()}} in MiniMRCluster), the JobTracker will call {{close()}} on DelegationTokenRenewal, which cancels the Timer. If you then start up the JobTracker again by calling {{startTracker()}} on it (or {{startJobTracker()}} in MiniMRCluster), the Timer won't necessarily be re-created; and DelegationTokenRenewal will later throw an exception when it tries to use th...
- MAPREDUCE-4778.
Major bug reported by sandyr and fixed by sandyr (jobtracker, scheduler)
Fair scheduler event log is only written if directory exists on HDFS
The fair scheduler event log is supposed to be written to the local filesystem, at {hadoop.log.dir}/fairscheduler. The event log will not be written unless this directory exists on HDFS.
- MAPREDUCE-4806.
Major bug reported by kkambatl and fixed by kkambatl (mrv1)
Cleanup: Some (5) private methods in JobTracker.RecoveryManager are not used anymore after MAPREDUCE-3837
MAPREDUCE-3837 re-organized the job recovery code, moving out the code that was using the methods in RecoveryManager.
Now, the following methods in {{JobTracker.RecoveryManager}}seem to be unused:
# {{updateJob()}}
# {{updateTip()}}
# {{createTaskAttempt()}}
# {{addSuccessfulAttempt()}}
# {{addUnsuccessfulAttempt()}}
- MAPREDUCE-4824.
Major new feature reported by tomwhite and fixed by tomwhite (mrv1)
Provide a mechanism for jobs to indicate they should not be recovered on restart
Some jobs (like Sqoop or HBase jobs) are not idempotent, so should not be recovered on jobtracker restart. MAPREDUCE-2702 solves this problem for MR2, however the approach there is not applicable for MR1, since even if we only use the job-level part of the patch and add a isRecoverySupported method to OutputCommitter, there is no way to use that information from the JT (which initiates recovery), since the JT does not instantiate OutputCommitters - and it shouldn't since they are user-level c...
- MAPREDUCE-4837.
Major improvement reported by acmurthy and fixed by acmurthy
Add webservices for jobtracker
Add MR-AM web-services to branch-1
- MAPREDUCE-4838.
Major improvement reported by acmurthy and fixed by zjshen
Add extra info to JH files
It will be useful to add more task-info to JH for analytics.
- MAPREDUCE-4843.
Critical bug reported by zhaoyunjiong and fixed by kkambatl (tasktracker)
When using DefaultTaskController, JobLocalizer not thread safe
In our cluster, some times job will failed due to below exception:
2012-12-03 23:11:54,811 WARN org.apache.hadoop.mapred.TaskTracker: Error initializing attempt_201212031626_1115_r_000023_0:
org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find taskTracker/$username/jobcache/job_201212031626_1115/job.xml in any of the configured local directories
at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathToRead(LocalDirAllocator.java:424)
at org.apache.hadoop....
- MAPREDUCE-4845.
Major improvement reported by sandyr and fixed by sandyr (client)
ClusterStatus.getMaxMemory() and getUsedMemory() exist in MR1 but not MR2
For backwards compatibility, these methods should exist in both MR1 and MR2.
Confusingly, these methods return the max memory and used memory of the jobtracker, not the entire cluster.
I'd propose to add them to MR2 and return -1, and deprecate them in both MR1 and MR2. Alternatively, I could add plumbing to get the resource manager memory stats.
- MAPREDUCE-4850.
Major bug reported by tomwhite and fixed by tomwhite (mrv1)
Job recovery may fail if staging directory has been deleted
The job staging directory is deleted in the job cleanup task, which happens before the job-info file is deleted from the system directory (by the JobInProgress garbageCollect() method). If the JT shuts down between these two operations, then when the JT restarts and tries to recover the job, it fails since the job.xml and splits are no longer available.
- MAPREDUCE-4860.
Major bug reported by kkambatl and fixed by kkambatl (security)
DelegationTokenRenewal attempts to renew token even after a job is removed
mapreduce.security.token.DelegationTokenRenewal synchronizes on removeDelegationToken, but fails to synchronize on addToken, and renewing tokens in run().
This inconsistency is exposed by frequent failures of TestDelegationTokenRenewal:
{noformat}
Error Message
renew wasn't called as many times as expected expected:<4> but was:<5>
Stacktrace
junit.framework.AssertionFailedError: renew wasn't called as many times as expected expected:<4> but was:<5>
at org.apache.hadoop.mapreduce.security....
- MAPREDUCE-4904.
Major bug reported by mgong@vmware.com and fixed by djp (test)
TestMultipleLevelCaching failed in branch-1
TestMultipleLevelCaching will failed:
{noformat}
Testcase: testMultiLevelCaching took 30.406 sec
FAILED
Number of local maps expected:<0> but was:<1>
junit.framework.AssertionFailedError: Number of local maps expected:<0> but was:<1>
at org.apache.hadoop.mapred.TestRackAwareTaskPlacement.launchJobAndTestCounters(TestRackAwareTaskPlacement.java:78)
at org.apache.hadoop.mapred.TestMultipleLevelCaching.testCachingAtLevel(TestMultipleLevelCaching.java:113)
at org.a...
- MAPREDUCE-4907.
Major improvement reported by sandyr and fixed by sandyr (mrv1, tasktracker)
TrackerDistributedCacheManager issues too many getFileStatus calls
TrackerDistributedCacheManager issues a number of redundant getFileStatus calls when determining the timestamps and visibilities of files in the distributed cache. 300 distributed cache files deep in the directory structure can hammer HDFS with a couple thousand requests.
A couple optimizations can reduce this load:
1. determineTimestamps and determineCacheVisibilities both call getFileStatus on every file. We could cache the results of the former and use them for the latter.
2. determineC...
- MAPREDUCE-4909.
Major bug reported by arpitagarwal and fixed by arpitagarwal (test)
TestKeyValueTextInputFormat fails with Open JDK 7 on Windows
TestKeyValueTextInputFormat.testFormat fails with Open JDK 7. The root cause appears to be a failure to delete in-use files via LocalFileSystem.delete (RawLocalFileSystem.delete).
- MAPREDUCE-4914.
Major bug reported by brandonli and fixed by brandonli (test)
TestMiniMRDFSSort fails with openJDK7
{noformat}
Testcase: testJvmReuse took 0.063 sec
Caused an ERROR
Input path does not exist: hdfs://127.0.0.1:62473/sort/input
org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: hdfs://127.0.0.1:62473/sort/input
at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:197)
at org.apache.hadoop.mapred.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:40)
at org.apache.hadoop.mapred.FileInputFormat.getSplit...
- MAPREDUCE-4915.
Major bug reported by brandonli and fixed by brandonli (test)
TestShuffleExceptionCount fails with open JDK7
{noformat}
Testcase: testShuffleExceptionTrailingSize took 0.203 sec
Testcase: testExceptionCount took 0 sec
Testcase: testShuffleExceptionTrailing took 0 sec
Testcase: testCheckException took 0 sec
FAILED
abort called when set to off
junit.framework.AssertionFailedError: abort called when set to off
at org.apache.hadoop.mapred.TestShuffleExceptionCount.testCheckException(TestShuffleExceptionCount.java:57)
{noformat}
This is a test order-dependency bug. The static variable ab...
- MAPREDUCE-4916.
Major bug reported by acmurthy and fixed by xgong
TestTrackerDistributedCacheManager is flaky due to other badly written tests in branch-1
Credit to Xuan figuring this: TestTrackerDistributedCacheManager is flaky due to other badly written tests since it checks for existence of a directory upfront which might have bad perms.
- MAPREDUCE-4923.
Minor bug reported by sandyr and fixed by sandyr (mrv1, mrv2, task)
Add toString method to TaggedInputSplit
Per MAPREDUCE-3678, map task logs now contain information about the input split being processed. Because TaggedInputSplit has no overridden toString method, nothing useful gets printed out.
- MAPREDUCE-4924.
Trivial bug reported by rkanter and fixed by rkanter (mrv1)
flakey test: org.apache.hadoop.mapred.TestClusterMRNotification.testMR
I occasionally get a failure like this on {{org.apache.hadoop.mapred.TestClusterMRNotification.testMR}}
{code}
junit.framework.AssertionFailedError: expected:<6> but was:<4>
at junit.framework.Assert.fail(Assert.java:47)
at junit.framework.Assert.failNotEquals(Assert.java:283)
at junit.framework.Assert.assertEquals(Assert.java:64)
at junit.framework.Assert.assertEquals(Assert.java:195)
at junit.framework.Assert.assertEquals(Assert.java:201)
at org.apache.hadoop.mapred.NotificationTestC...
- MAPREDUCE-4929.
Major bug reported by sandyr and fixed by sandyr (mrv1)
mapreduce.task.timeout is ignored
In MR1, only mapred.task.timeout works. Both should be made to work.
- MAPREDUCE-4930.
Major bug reported by kkambatl and fixed by kkambatl (examples)
Backport MAPREDUCE-4678 and MAPREDUCE-4925 to branch-1
MAPREDUCE-4678 adds convenient arguments to Pentomino, which would be nice to have in other branches as well.
However, MR-4678 introduces a bug - MR-4925 addresses this bug for all branches.
- MAPREDUCE-4933.
Major bug reported by sandyr and fixed by sandyr (mrv1, task)
MR1 final merge asks for length of file it just wrote before flushing it
createKVIterator in ReduceTask contains the following code:
{code}
try {
Merger.writeFile(rIter, writer, reporter, job);
addToMapOutputFilesOnDisk(fs.getFileStatus(outputPath));
} catch (Exception e) {
if (null != outputPath) {
fs.delete(outputPath, true);
}
throw new IOException("Final merge failed", e);
} finally {
if (null != writer) {
writer.close();
...
- MAPREDUCE-4962.
Major bug reported by sandyr and fixed by sandyr (jobtracker, mrv1)
jobdetails.jsp uses display name instead of real name to get counters
jobdetails.jsp displays details for a job including its counters. Counters may have different real names and display names, but the display names are used to look the counter values up, so counter values can incorrectly show up as 0.
- MAPREDUCE-4963.
Major bug reported by rkanter and fixed by rkanter (mrv1)
StatisticsCollector improperly keeps track of "Last Day" and "Last Hour" statistics for new TaskTrackers
The StatisticsCollector keeps track of updates to the "Total Tasks Last Day", "Succeed Tasks Last Day", "Total Tasks Last Hour", and "Succeeded Tasks Last Hour" per Task Tracker which is displayed on the JobTracker web UI. It uses buckets to manage when to shift task counts from "Last Hour" to "Last Day" and out of "Last Day". After the JT has been running for a while, the connected TTs will have the max number of buckets and will keep shifting them at each update. If a new TT connects (or...
- MAPREDUCE-4967.
Major bug reported by cnauroth and fixed by kkambatl (tasktracker, test)
TestJvmReuse fails on assertion
{{TestJvmReuse}} on branch-1 consistently fails on an assertion.
- MAPREDUCE-4969.
Major bug reported by arpitagarwal and fixed by arpitagarwal (test)
TestKeyValueTextInputFormat test fails with Open JDK 7
RawLocalFileSystem.delete fails on Windows even when the files are not expected to be in use. It does not reproduce with Sun JDK 6.
- MAPREDUCE-4970.
Major bug reported by sandyr and fixed by sandyr
Child tasks (try to) create security audit log files
After HADOOP-8552, MR child tasks will attempt to create security audit log files with their user names. On an insecure cluster, this has no effect, but on a secure cluster, log4j will try to create log files for tasks with names like SecurityAuth-joeuser.log.
- MAPREDUCE-5008.
Major bug reported by sandyr and fixed by sandyr
Merger progress miscounts with respect to EOF_MARKER
After MAPREDUCE-2264, a segment's raw data length is calculated without the EOF_MARKER bytes. However, when the merge is counting how many bytes it processed, it includes the marker. This can cause the merge progress to go above 100%.
Whether these EOF_MARKER bytes should count should be consistent between the two.
This a JIRA instead of an amendment because MAPREDUCE-2264 already went into 2.0.3.
- MAPREDUCE-5028.
Critical bug reported by kkambatl and fixed by kkambatl
Maps fail when io.sort.mb is set to high value
Verified the problem exists on branch-1 with the following configuration:
Pseudo-dist mode: 2 maps/ 1 reduce, mapred.child.java.opts=-Xmx2048m, io.sort.mb=1280, dfs.block.size=2147483648
Run teragen to generate 4 GB data
Maps fail when you run wordcount on this configuration with the following error:
{noformat}
java.io.IOException: Spill failed
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1031)
at org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTa...
- MAPREDUCE-5035.
Major bug reported by tomwhite and fixed by tomwhite (mrv1)
Update MR1 memory configuration docs
The pmem/vmem settings in the docs (http://hadoop.apache.org/docs/r1.1.1/cluster_setup.html#Memory+monitoring) have not been supported for a long time. The docs should be updated to reflect the new settings (mapred.cluster.map.memory.mb etc).
- MAPREDUCE-5049.
Major bug reported by sandyr and fixed by sandyr
CombineFileInputFormat counts all compressed files non-splitable
In branch-1, CombineFileInputFormat doesn't take SplittableCompressionCodec into account and thinks that all compressible input files aren't splittable. This is a regression from when handling for non-splitable compression codecs was originally added in MAPREDUCE-1597, and seems to have somehow gotten in when the code was pulled from 0.22 to branch-1.
- MAPREDUCE-5066.
Major bug reported by ivanmi and fixed by ivanmi
JobTracker should set a timeout when calling into job.end.notification.url
In current code, timeout is not specified when JobTracker (JobEndNotifier) calls into the notification URL. When the given URL points to a server that will not respond for a long time, job notifications are completely stuck (given that we have only a single thread processing all notifications). We've seen this cause noticeable delays in job execution in components that rely on job end notifications (like Oozie workflows).
I propose we introduce a configurable timeout option and set a defaul...
- MAPREDUCE-5081.
Major new feature reported by szetszwo and fixed by szetszwo (distcp)
Backport DistCpV2 and the related JIRAs to branch-1
Here is a list of DistCpV2 JIRAs:
- MAPREDUCE-2765: DistCpV2 main jira
- HADOOP-8703: turn CRC checking off for 0 byte size
- HDFS-3054: distcp -skipcrccheck has no effect.
- HADOOP-8431: Running distcp without args throws IllegalArgumentException
- HADOOP-8775: non-positive value to -bandwidth
- MAPREDUCE-4654: TestDistCp is ignored
- HADOOP-9022: distcp fails to copy file if -m 0 specified
- HADOOP-9025: TestCopyListing failing
- MAPREDUCE-5075: DistCp leaks input file handles
- distcp par...
- MAPREDUCE-5129.
Minor new feature reported by billie.rinaldi and fixed by billie.rinaldi
Add tag info to JH files
It will be useful to add tags to the existing workflow info logged by JH. This will allow jobs to be filtered/grouped for analysis more easily.
- MAPREDUCE-5131.
Major bug reported by acmurthy and fixed by acmurthy
Provide better handling of job status related apis during JT restart
I've seen pig/hive applications bork during JT restart since they get NPEs - this is due to fact that jobs are not really inited, but are submitted.
- MAPREDUCE-5154.
Major bug reported by sandyr and fixed by sandyr (jobtracker)
staging directory deletion fails because delegation tokens have been cancelled
In a secure setup, the jobtracker needs the job's delegation tokens to delete the staging directory. MAPREDUCE-4850 made it so that job cleanup staging directory deletion occurs asynchronously, so that it could order it with system directory deletion. This introduced the issue that a job's delegation tokens could be cancelled before the cleanup thread got around to deleting it, causing the deletion to fail.
- MAPREDUCE-5158.
Major bug reported by yeshavora and fixed by mayank_bansal (jobtracker)
Cleanup required when mapreduce.job.restart.recover is set to false
When mapred.jobtracker.restart.recover is set as true and mapreduce.job.restart.recover is set to false for a MR job, Job clean up never happens for that job if JT restarts while job is running.
.staging and job-info file for that job remains on HDFS forever.
- MAPREDUCE-5166.
Blocker bug reported by hagleitn and fixed by sandyr
ConcurrentModificationException in LocalJobRunner
With the latest version hive unit tests fail in various places with the following stack trace. The problem seems related to: MAPREDUCE-2931
{noformat}
[junit] java.util.ConcurrentModificationException
[junit] at java.util.HashMap$HashIterator.nextEntry(HashMap.java:793)
[junit] at java.util.HashMap$ValueIterator.next(HashMap.java:822)
[junit] at org.apache.hadoop.mapred.Counters.incrAllCounters(Counters.java:505)
[junit] at org.apache.hadoop.mapred.Counters.sum(Counte...
- MAPREDUCE-5169.
Major bug reported by arpitgupta and fixed by acmurthy
Job recovery fails if job tracker is restarted after the job is submitted but before its initialized
This was noticed when within 5 seconds of submitting a word count job, the job tracker was restarted. Upon restart the job failed to recover
- MAPREDUCE-5198.
Major bug reported by arpitgupta and fixed by arpitgupta (tasktracker)
Race condition in cleanup during task tracker renint with LinuxTaskController
This was noticed when job tracker would be restarted while jobs were running and would ask the task tracker to reinitialize.
Tasktracker would fail with an error like
{code}
013-04-27 20:19:09,627 INFO org.apache.hadoop.mapred.TaskTracker: Good mapred local directories are: /grid/0/hdp/mapred/local,/grid/1/hdp/mapred/local,/grid/2/hdp/mapred/local,/grid/3/hdp/mapred/local,/grid/4/hdp/mapred/local,/grid/5/hdp/mapred/local
2013-04-27 20:19:09,628 INFO org.apache.hadoop.ipc.Server: IPC Server...
- MAPREDUCE-5202.
Major bug reported by owen.omalley and fixed by owen.omalley
Revert MAPREDUCE-4397 to avoid using incorrect config files
MAPREDUCE-4397 added the capability to switch the location of the taskcontroller.cfg file, which weakens security.
Changes since Hadoop 1.1.1
Jiras with Release Notes (describe major or incompatible changes)
- HADOOP-8567.
Major new feature reported by djp and fixed by jingzhao (conf)
Port conf servlet to dump running configuration to branch 1.x
Users can use the conf servlet to get the server-side configuration. Users can
1) connect to http_server_url/conf or http_server_url/conf?format=xml and get XML-based configuration description;
2) connect to http_server_url/conf?format=json and get JSON-based configuration description.
- HADOOP-9115.
Blocker bug reported by arpitgupta and fixed by jingzhao
Deadlock in configuration when writing configuration to hdfs
This fixes a bug where Hive could trigger a deadlock condition in the Hadoop configuration management code.
- MAPREDUCE-4478.
Major bug reported by liangly and fixed by liangly
TaskTracker's heartbeat is out of control
Fixed a bug in TaskTracker's heartbeat to keep it under control.
Other Jiras (describe bug fixes and minor changes)
- HADOOP-8418.
Major bug reported by vicaya and fixed by crystal_gaoyu (security)
Fix UGI for IBM JDK running on Windows
The login module and user principal classes are different for 32 and 64-bit Windows in IBM J9 JDK 6 SR10. Hadoop 1.0.3 does not run on either because it uses the 32 bit login module and the 64-bit user principal class.
- HADOOP-8419.
Major bug reported by vicaya and fixed by carp84 (io)
GzipCodec NPE upon reset with IBM JDK
The GzipCodec will NPE upon reset after finish when the native zlib codec is not loaded. When the native zlib is loaded the codec creates a CompressorOutputStream that doesn't have the problem, otherwise, the GZipCodec uses GZIPOutputStream which is extended to provide the resetState method. Since IBM JDK 6 SR9 FP2 including the current JDK 6 SR10, GZIPOutputStream#finish will release the underlying deflater, which causes NPE upon reset. This seems to be an IBM JDK quirk as Sun JDK and OpenJD...
- HADOOP-8561.
Major improvement reported by vicaya and fixed by crystal_gaoyu (security)
Introduce HADOOP_PROXY_USER for secure impersonation in child hadoop client processes
To solve the problem for an authenticated user to type hadoop shell commands in a web console, we can introduce an HADOOP_PROXY_USER environment variable to allow proper impersonation in the child hadoop client processes.
- HADOOP-8880.
Major bug reported by gkesavan and fixed by gkesavan
Missing jersey jars as dependency in the pom causes hive tests to fail
ivy.xml has the dependency included where as the same dependency is not updated in the pom template.
- HADOOP-9051.
Minor test reported by mgong@vmware.com and fixed by vicaya (test)
Òant testÓ will build failed for trying to delete a file
Run "ant test" on branch-1 of hadoop-common.
When the test process reach "test-core-excluding-commit-and-smoke"
It will invoke the "macro-test-runner" to clear and rebuild the test environment.
Then the ant task command <delete dir="@{test.dir}/logs" />
failed for trying to delete an non-existent file.
following is the test result logs:
test-core-excluding-commit-and-smoke:
[delete] Deleting: /home/jdu/bdc/hadoop-topology-branch1-new/hadoop-common/build/test/testsfailed
[delete] Dele...
- HADOOP-9111.
Minor improvement reported by jingzhao and fixed by jingzhao (test)
Fix failed testcases with @ignore annotation In branch-1
Currently in branch-1, several failed testcases have @ignore annotation which does not take effect because these testcases are still using JUnit3. This jira plans to change these testcases to JUnit4 to let @ignore work.
- HDFS-3727.
Major bug reported by atm and fixed by atm (namenode)
When using SPNEGO, NN should not try to log in using KSSL principal
When performing a checkpoint with security enabled, the NN will attempt to relogin from its keytab before making an HTTP request back to the 2NN to fetch the newly-merged image. However, it always attempts to log in using the KSSL principal, even if SPNEGO is configured to be used.
This issue was discovered by Stephen Chu.
- HDFS-4208.
Critical bug reported by brandonli and fixed by brandonli (namenode)
NameNode could be stuck in SafeMode due to never-created blocks
In one test case, NameNode allocated a block and then was killed before the client got the addBlock response. After NameNode restarted, it couldn't get out of SafeMode waiting for the block which was never created. In trunk, NameNode can get out of SafeMode since it only counts complete blocks. However branch-1 doesn't have the clear notion of under-constructioned-block in Namenode.
JIRA HDFS-4212 is to track the never-created-block issue and this JIRA is to fix NameNode in branch-1 so it c...
- HDFS-4252.
Major improvement reported by sureshms and fixed by jingzhao (namenode)
Improve confusing log message that prints exception when editlog read is completed
Namenode prints a log with an exception to indicate successful completion of reading of logs. This causes misunderstanding where people have interpreted it as failure to load editlog. The log message could be better.
- HDFS-4423.
Blocker bug reported by chenfolin and fixed by cnauroth (namenode)
Checkpoint exception causes fatal damage to fsimage.
The impact of class is org.apache.hadoop.hdfs.server.namenode.FSImage.java
{code}
boolean loadFSImage(MetaRecoveryContext recovery) throws IOException {
...
latestNameSD.read();
needToSave |= loadFSImage(getImageFile(latestNameSD, NameNodeFile.IMAGE));
LOG.info("Image file of size " + imageSize + " loaded in "
+ (FSNamesystem.now() - startTime)/1000 + " seconds.");
// Load latest edits
if (latestNameCheckpointTime > latestEditsCheckpointTime)
// the image i...
- MAPREDUCE-2374.
Major bug reported by tlipcon and fixed by adi2
"Text File Busy" errors launching MR tasks
Some very small percentage of tasks fail with a "Text file busy" error.
The following was the original diagnosis:
{quote}
Our use of PrintWriter in TaskController.writeCommand is unsafe, since that class swallows all IO exceptions. We're not currently checking for errors, which I'm seeing result in occasional task failures with the message "Text file busy" - assumedly because the close() call is failing silently for some reason.
{quote}
.. but turned out to be another issue as well (see below)
- MAPREDUCE-4272.
Major bug reported by vicaya and fixed by crystal_gaoyu (task)
SortedRanges.Range#compareTo is not spec compliant
SortedRanges.Range#compareTo does not satisfy the requirement of Comparable#compareTo, where "the implementor must ensure {noformat}sgn(x.compareTo(y)) == -sgn(y.compareTo(x)){noformat} for all x and y."
This is manifested as TestStreamingBadRecords failures in alternative JDKs.
- MAPREDUCE-4396.
Minor bug reported by vicaya and fixed by crystal_gaoyu (client)
Make LocalJobRunner work with private distributed cache
Some LocalJobRunner related unit tests fails if user directory permission and/or umask is too restrictive.
- MAPREDUCE-4397.
Major improvement reported by vicaya and fixed by crystal_gaoyu (task-controller)
Introduce HADOOP_SECURITY_CONF_DIR for task-controller
The linux task controller currently hard codes the directory in which to look for its config file at compile time (via the HADOOP_CONF_DIR macro). Adding a new environment variable to look for task-controller's conf dir (with strict permission checks) would make installation much more flexible.
- MAPREDUCE-4696.
Minor bug reported by gopalv and fixed by gopalv
TestMRServerPorts throws NullReferenceException
TestMRServerPorts throws
{code}
java.lang.NullPointerException
at org.apache.hadoop.mapred.TestMRServerPorts.canStartJobTracker(TestMRServerPorts.java:99)
at org.apache.hadoop.mapred.TestMRServerPorts.testJobTrackerPorts(TestMRServerPorts.java:152)
{code}
Use the JobTracker.startTracker(string, string, boolean initialize) factory method to get a pre-initialized JobTracker for the test.
- MAPREDUCE-4697.
Minor bug reported by gopalv and fixed by gopalv
TestMapredHeartbeat fails assertion on HeartbeatInterval
TestMapredHeartbeat fails test on heart beat interval
{code}
FAILED
expected:<300> but was:<500>
junit.framework.AssertionFailedError: expected:<300> but was:<500>
at org.apache.hadoop.mapred.TestMapredHeartbeat.testJobDirCleanup(TestMapredHeartbeat.java:68)
{code}
Replicate math for getNextHeartbeatInterval() in the test-case to ensure MRConstants changes do not break test-case.
- MAPREDUCE-4699.
Minor bug reported by gopalv and fixed by gopalv
TestFairScheduler & TestCapacityScheduler fails due to JobHistory exception
TestFairScheduler fails due to exception from mapred.JobHistory
{code}
null
java.lang.NullPointerException
at org.apache.hadoop.mapred.JobHistory$JobInfo.logJobPriority(JobHistory.java:1975)
at org.apache.hadoop.mapred.JobInProgress.setPriority(JobInProgress.java:895)
at org.apache.hadoop.mapred.TestFairScheduler.testFifoPool(TestFairScheduler.java:2617)
{code}
TestCapacityScheduler fails due to
{code}
java.lang.NullPointerException
at org.apache.hadoop.mapred.JobHistory$JobInfo.log...
- MAPREDUCE-4798.
Minor bug reported by sam liu and fixed by (jobhistoryserver, test)
TestJobHistoryServer fails some times with 'java.lang.AssertionError: Address already in use'
UT Failure in IHC 1.0.3: org.apache.hadoop.mapred.TestJobHistoryServer. This UT fails sometimes.
The error message is:
'Testcase: testHistoryServerStandalone took 5.376 sec
Caused an ERROR
Address already in use
java.lang.AssertionError: Address already in use
at org.apache.hadoop.mapred.TestJobHistoryServer.testHistoryServerStandalone(TestJobHistoryServer.java:113)'
- MAPREDUCE-4858.
Major bug reported by acmurthy and fixed by acmurthy
TestWebUIAuthorization fails on branch-1
TestWebUIAuthorization fails on branch-1
- MAPREDUCE-4859.
Major bug reported by acmurthy and fixed by acmurthy
TestRecoveryManager fails on branch-1
Looks like the tests are extremely flaky and just hang.
- MAPREDUCE-4888.
Blocker bug reported by revans2 and fixed by vinodkv (mrv1)
NLineInputFormat drops data in 1.1 and beyond
When trying to root cause why MAPREDUCE-4782 did not cause us issues on 1.0.2, I found out that HADOOP-7823 introduced essentially the exact same error into org.apache.hadoop.mapred.lib.NLineInputFormat.
In 1.X org.apache.hadoop.mapred.lib.NLineInputFormat and org.apache.hadoop.mapreduce.lib.input.NLineInputFormat are separate implementations. The latter had an off by one error in it until MAPREDUCE-4782 fixed it. The former had no error in it until HADOOP-7823 introduced it in 1.1 and MAPR...
Changes since Hadoop 1.1.0
Jiras with Release Notes (describe major or incompatible changes)
Other Jiras (describe bug fixes and minor changes)
- HADOOP-8745.
Minor bug reported by mafr and fixed by mafr
Incorrect version numbers in hadoop-core POM
The hadoop-core POM as published to Maven central has different dependency versions than Hadoop actually has on its runtime classpath. This can lead to client code working in unit tests but failing on the cluster and vice versa.
The following version numbers are incorrect: jackson-mapper-asl, kfs, and jets3t. There's also a duplicate dependency to commons-net.
- HADOOP-8823.
Major improvement reported by szetszwo and fixed by szetszwo (build)
ant package target should not depend on cn-docs
In branch-1, the package target depends on cn-docs but the doc is already outdated.
- HADOOP-8878.
Major bug reported by arpitgupta and fixed by arpitgupta
uppercase namenode hostname causes hadoop dfs calls with webhdfs filesystem and fsck to fail when security is on
This was noticed on a secure cluster where the namenode had an upper case hostname and the following command was issued
hadoop dfs -ls webhdfs://NN:PORT/PATH
the above command failed because delegation token retrieval failed.
Upon looking at the kerberos logs it was determined that we tried to get the ticket for kerberos principal with upper case hostnames and that host did not exit in kerberos. We should convert the hostnames to lower case. Take a look at HADOOP-7988 where the same fix wa...
- HADOOP-8882.
Major bug reported by arpitgupta and fixed by arpitgupta
uppercase namenode host name causes fsck to fail when useKsslAuth is on
{code}
public static void fetchServiceTicket(URL remoteHost) throws IOException {
if(!UserGroupInformation.isSecurityEnabled())
return;
String serviceName = "host/" + remoteHost.getHost();
{code}
the hostname should be converted to lower case. Saw this in branch 1, will look at trunk and update the bug accordingly.
- HADOOP-8995.
Minor bug reported by jingzhao and fixed by jingzhao
Remove unnecessary bogus exception log from Configuration
In Configuration#Configuration(boolean) and Configuration#Configuration(Configuration), bogus exceptions are thrown when Log level is DEBUG.
- HADOOP-9017.
Major bug reported by gkesavan and fixed by gkesavan (build)
fix hadoop-client-pom-template.xml and hadoop-client-pom-template.xml for version
hadoop-client-pom-template.xml and hadoop-client-pom-template.xml references to project.version variable, instead they should refer to @version token.
- HDFS-528.
Major new feature reported by tlipcon and fixed by tlipcon (scripts)
Add ability for safemode to wait for a minimum number of live datanodes
When starting up a fresh cluster programatically, users often want to wait until DFS is "writable" before continuing in a script. "dfsadmin -safemode wait" doesn't quite work for this on a completely fresh cluster, since when there are 0 blocks on the system, 100% of them are accounted for before any DNs have reported.
This JIRA is to add a command which waits until a certain number of DNs have reported as alive to the NN.
- HDFS-1108.
Major sub-task reported by dhruba and fixed by tlipcon (ha, name-node)
Log newly allocated blocks
The current HDFS design says that newly allocated blocks for a file are not persisted in the NN transaction log when the block is allocated. Instead, a hflush() or a close() on the file persists the blocks into the transaction log. It would be nice if we can immediately persist newly allocated blocks (as soon as they are allocated) for specific files.
- HDFS-1539.
Major improvement reported by dhruba and fixed by dhruba (data-node, hdfs client, name-node)
prevent data loss when a cluster suffers a power loss
we have seen an instance where a external outage caused many datanodes to reboot at around the same time. This resulted in many corrupted blocks. These were recently written blocks; the current implementation of HDFS Datanodes do not sync the data of a block file when the block is closed.
1. Have a cluster-wide config setting that causes the datanode to sync a block file when a block is finalized.
2. Introduce a new parameter to the FileSystem.create() to trigger the new behaviour, i.e. cau...
- HDFS-2815.
Critical bug reported by umamaheswararao and fixed by umamaheswararao (name-node)
Namenode is not coming out of safemode when we perform ( NN crash + restart ) . Also FSCK report shows blocks missed.
When tested the HA(internal) with continuous switch with some 5mins gap, found some *blocks missed* and namenode went into safemode after next switch.
After the analysis, i found that this files already deleted by clients. But i don't see any delete commands logs namenode log files. But namenode added that blocks to invalidateSets and DNs deleted the blocks.
When restart of the namenode, it went into safemode and expecting some more blocks to come out of safemode.
Here the reaso...
- HDFS-3658.
Major bug reported by eli and fixed by szetszwo
TestDFSClientRetries#testNamenodeRestart failed
Saw the following fail on a jenkins run:
{noformat}
Error Message
expected:<MD5-of-0MD5-of-512CRC32:f397fb3d9133d0a8f55854ea2bb268b0> but was:<MD5-of-0MD5-of-0CRC32:70bc8f4b72a86921468bf8e8441dce51>
Stacktrace
junit.framework.AssertionFailedError: expected:<MD5-of-0MD5-of-512CRC32:f397fb3d9133d0a8f55854ea2bb268b0> but was:<MD5-of-0MD5-of-0CRC32:70bc8f4b72a86921468bf8e8441dce51>
at junit.framework.Assert.fail(Assert.java:47)
at junit.framework.Assert.failNotEquals(Assert.java:283)
at jun...
- HDFS-3791.
Major bug reported by umamaheswararao and fixed by umamaheswararao (name-node)
Backport HDFS-173 to Branch-1 : Recursively deleting a directory with millions of files makes NameNode unresponsive for other commands until the deletion completes
Backport HDFS-173.
see the [comment|https://issues.apache.org/jira/browse/HDFS-2815?focusedCommentId=13422007&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13422007] for more details
- HDFS-3846.
Major bug reported by szetszwo and fixed by brandonli (name-node)
Namenode deadlock in branch-1
Jitendra found out the following problem:
1. Handler : Acquires namesystem lock waits on SafemodeInfo lock at SafeModeInfo.isOn()
2. SafemodeMonitor : Calls SafeModeInfo.canLeave() which is synchronized so SafemodeInfo lock is acquired, but this method also causes following call sequence needEnter() -> getNumLiveDataNodes() -> getNumberOfDatanodes() -> getDatanodeListForReport() -> getDatanodeListForReport() . The getDatanodeListForReport is synchronized with FSNamesystem lock.
- HDFS-4105.
Major bug reported by arpitgupta and fixed by arpitgupta
the SPNEGO user for secondary namenode should use the web keytab
This is similar to HDFS-3466 where we made sure the namenode checks for the web keytab before it uses the namenode keytab.
The same needs to be done for secondary namenode as well.
{code}
String httpKeytab =
conf.get(DFSConfigKeys.DFS_SECONDARY_NAMENODE_KEYTAB_FILE_KEY);
if (httpKeytab != null && !httpKeytab.isEmpty()) {
params.put("kerberos.keytab", httpKeytab);
}
{code}
- HDFS-4134.
Minor bug reported by stevel@apache.org and fixed by (name-node)
hadoop namenode & datanode entry points should return negative exit code on bad arguments
When you go {{hadoop namenode start}} (or some other bad argument to the namenode), a usage message is generated -but the script returns 0.
This stops it being a robust command to invoke from other scripts -and is inconsistent with the JT & TT entry points, that do return -1 on a usage message
- HDFS-4161.
Major bug reported by sureshms and fixed by szetszwo (hdfs client)
HDFS keeps a thread open for every file writer
In 1.0 release DFSClient uses a thread per file writer. In some use cases (dynamic partions in hive) that use a large number of file writers a large number of threads are created. The file writer thread has the following stack:
{noformat}
at java.lang.Thread.sleep(Native Method)
at org.apache.hadoop.hdfs.DFSClient$LeaseChecker.run(DFSClient.java:1462)
at java.lang.Thread.run(Thread.java:662)
{noformat}
This problem has been fixed in later releases. This jira will post a consolidated patch fr...
- HDFS-4174.
Major improvement reported by jingzhao and fixed by jingzhao
Backport HDFS-1031 to branch-1: to list a few of the corrupted files in WebUI
1. Add getCorruptFiles method to FSNamesystem (the getCorruptFiles method is in branch-0.21 but not in branch-1).
2. Backport HDFS-1031: display corrupt files in WebUI.
- MAPREDUCE-4749.
Major bug reported by arpitgupta and fixed by arpitgupta
Killing multiple attempts of a task taker longer as more attempts are killed
The following was noticed on a mr job running on hadoop 1.1.0
1. Start an mr job with 1 mapper
2. Wait for a min
3. Kill the first attempt of the mapper and then subsequently kill the other 3 attempts in order to fail the job
The time taken to kill the task grew exponentially.
1st attempt was killed immediately.
2nd attempt took a little over a min
3rd attempt took approx. 20 mins
4th attempt took around 3 hrs.
The command used to kill the attempt was "hadoop job -fail-task"
Note that ...
- MAPREDUCE-4782.
Blocker bug reported by mark.fuhs and fixed by mark.fuhs (client)
NLineInputFormat skips first line of last InputSplit
NLineInputFormat creates FileSplits that are then used by LineRecordReader to generate Text values. To deal with an idiosyncrasy of LineRecordReader, the begin and length fields of the FileSplit are constructed differently for the first FileSplit vs. the rest.
After looping through all lines of a file, the final FileSplit is created, but the creation does not respect the difference of how the first vs. the rest of the FileSplits are created.
This results in the first line of the final Input...
- MAPREDUCE-4792.
Major bug reported by asanjar and fixed by asanjar (test)
Unit Test TestJobTrackerRestartWithLostTracker fails with ant-1.8.4
Problem:
JUnit tag @Ignore is not recognized since the testcase is JUnit3 and not JUnit4:
Solution:
Migrate the testcase to JUnit4, including:
* Remove extends TestCase"
* Remove import junit.framework.TestCase;
* Add import org.junit.*;
* Use appropriate annotations such as @After, @Before, @Test.
Changes since Hadoop 1.0.3
Jiras with Release Notes (describe major or incompatible changes)
- HADOOP-5464.
Major bug reported by rangadi and fixed by rangadi
DFSClient does not treat write timeout of 0 properly
Zero values for dfs.socket.timeout and dfs.datanode.socket.write.timeout are now respected. Previously zero values for these parameters resulted in a 5 second timeout.
- HADOOP-6995.
Minor improvement reported by tlipcon and fixed by tlipcon (security)
Allow wildcards to be used in ProxyUsers configurations
When configuring proxy users and hosts, the special wildcard value "*" may be specified to match any host or any user.
- HADOOP-8230.
Major improvement reported by eli2 and fixed by eli
Enable sync by default and disable append
Append is not supported in Hadoop 1.x. Please upgrade to 2.x if you need append. If you enabled dfs.support.append for HBase, you're OK, as durable sync (why HBase required dfs.support.append) is now enabled by default. If you really need the previous functionality, to turn on the append functionality set the flag "dfs.support.broken.append" to true.
- HADOOP-8365.
Blocker improvement reported by eli2 and fixed by eli
Add flag to disable durable sync
This patch enables durable sync by default. Installation where HBase was not used, that used to run without setting "dfs.support.append" or setting it to false explicitly in the configuration, must add a new flag "dfs.durable.sync" and set it to false to preserve the previous semantics.
- HDFS-2465.
Major improvement reported by tlipcon and fixed by tlipcon (data-node, performance)
Add HDFS support for fadvise readahead and drop-behind
HDFS now has the ability to use posix_fadvise and sync_data_range syscalls to manage the OS buffer cache. This support is currently considered experimental, and may be enabled by configuring the following keys:
dfs.datanode.drop.cache.behind.writes - set to true to drop data out of the buffer cache after writing
dfs.datanode.drop.cache.behind.reads - set to true to drop data out of the buffer cache when performing sequential reads
dfs.datanode.sync.behind.writes - set to true to trigger dirty page writeback immediately after writing data
dfs.datanode.readahead.bytes - set to a non-zero value to trigger readahead for sequential reads
- HDFS-2617.
Major improvement reported by jghoman and fixed by jghoman (security)
Replaced Kerberized SSL for image transfer and fsck with SPNEGO-based solution
Due to the requirement that KSSL use weak encryption types for Kerberos tickets, HTTP authentication to the NameNode will now use SPNEGO by default. This will require users of previous branch-1 releases with security enabled to modify their configurations and create new Kerberos principals in order to use SPNEGO. The old behavior of using KSSL can optionally be enabled by setting the configuration option "hadoop.security.use-weak-http-crypto" to "true".
- HDFS-2741.
Minor bug reported by markus17 and fixed by
dfs.datanode.max.xcievers missing in 0.20.205.0
Document and raise the maximum allowed transfer threads on a DataNode to 4096. This helps Apache HBase in particular.
- HDFS-3044.
Major improvement reported by eli2 and fixed by cmccabe (name-node)
fsck move should be non-destructive by default
The fsck "move" option is no longer destructive. It copies the accessible blocks of corrupt files to lost and found as before, but no longer deletes the corrupt files after copying the blocks. The original, destructive behavior can be enabled by specifying both the "move" and "delete" options.
- HDFS-3055.
Minor new feature reported by cmccabe and fixed by cmccabe
Implement recovery mode for branch-1
This is a new feature. It is documented in hdfs_user_guide.xml.
- HDFS-3094.
Major improvement reported by arpitgupta and fixed by arpitgupta
add -nonInteractive and -force option to namenode -format command
The 'namenode -format' command now supports the flags '-nonInteractive' and '-force' to improve usefulness without user input.
- HDFS-3518.
Major bug reported by bikassaha and fixed by szetszwo (hdfs client)
Provide API to check HDFS operational state
Add a utility method HdfsUtils.isHealthy(uri) for checking if the given HDFS is healthy.
- HDFS-3522.
Major bug reported by brandonli and fixed by brandonli (name-node)
If NN is in safemode, it should throw SafeModeException when getBlockLocations has zero locations
getBlockLocations(), and hence open() for read, will now throw SafeModeException if the NameNode is still in safe mode and there are no replicas reported yet for one of the blocks in the file.
- HDFS-3703.
Major improvement reported by nkeywal and fixed by jingzhao (data-node, name-node)
Decrease the datanode failure detection time
This jira adds a new DataNode state called "stale" at the NameNode. DataNodes are marked as stale if it does not send heartbeat message to NameNode within the timeout configured using the configuration parameter "dfs.namenode.stale.datanode.interval" in seconds (default value is 30 seconds). NameNode picks a stale datanode as the last target to read from when returning block locations for reads.
This feature is by default turned * off *. To turn on the feature, set the HDFS configuration "dfs.namenode.check.stale.datanode" to true.
- HDFS-3814.
Major improvement reported by sureshms and fixed by jingzhao (name-node)
Make the replication monitor multipliers configurable in 1.x
This change adds two new configuration parameters.
# {{dfs.namenode.invalidate.work.pct.per.iteration}} for controlling deletion rate of blocks.
# {{dfs.namenode.replication.work.multiplier.per.iteration}} for controlling replication rate. This in turn allows controlling the time it takes for decommissioning.
Please see hdfs-default.xml for detailed description.
- MAPREDUCE-1906.
Major improvement reported by scott_carey and fixed by tlipcon (jobtracker, performance, tasktracker)
Lower default minimum heartbeat interval for tasktracker > Jobtracker
The default minimum heartbeat interval has been dropped from 3 seconds to 300ms to increase scheduling throughput on small clusters. Users may tune mapreduce.jobtracker.heartbeats.in.second to adjust this value.
- MAPREDUCE-2517.
Major task reported by vinaythota and fixed by vinaythota (contrib/gridmix)
Porting Gridmix v3 system tests into trunk branch.
Adds system tests to Gridmix. These system tests cover various features like job types (load and sleep), user resolvers (round-robin, submitter-user, echo) and submission modes (stress, replay and serial).
- MAPREDUCE-3008.
Major sub-task reported by amar_kamat and fixed by amar_kamat (contrib/gridmix)
[Gridmix] Improve cumulative CPU usage emulation for short running tasks
Improves cumulative CPU emulation for short running tasks.
- MAPREDUCE-3118.
Major new feature reported by ravidotg and fixed by ravidotg (contrib/gridmix, tools/rumen)
Backport Gridmix and Rumen features from trunk to Hadoop 0.20 security branch
Backports latest features from trunk to 0.20.206 branch.
- MAPREDUCE-3597.
Major improvement reported by ravidotg and fixed by ravidotg (tools/rumen)
Provide a way to access other info of history file from Rumentool
Rumen now provides {{Parsed*}} objects. These objects provide extra information that are not provided by {{Logged*}} objects.
- MAPREDUCE-4087.
Major bug reported by ravidotg and fixed by ravidotg
[Gridmix] GenerateDistCacheData job of Gridmix can become slow in some cases
Fixes the issue of GenerateDistCacheData job slowness.
- MAPREDUCE-4673.
Major bug reported by arpitgupta and fixed by arpitgupta (test)
make TestRawHistoryFile and TestJobHistoryServer more robust
Fixed TestRawHistoryFile and TestJobHistoryServer to not write to /tmp.
- MAPREDUCE-4675.
Major bug reported by arpitgupta and fixed by bikassaha (test)
TestKillSubProcesses fails as the process is still alive after the job is done
Fixed a race condition caused in TestKillSubProcesses caused due to a recent commit.
- MAPREDUCE-4698.
Minor bug reported by gopalv and fixed by gopalv
TestJobHistoryConfig throws Exception in testJobHistoryLogging
Optionally call initialize/initializeFileSystem in JobTracker::startTracker() to allow for proper initialization when offerService is not being called.
Other Jiras (describe bug fixes and minor changes)
- HADOOP-5836.
Major bug reported by nowland and fixed by nowland (fs/s3)
Bug in S3N handling of directory markers using an object with a trailing "/" causes jobs to fail
Some tools which upload to S3 and use a object terminated with a "/" as a directory marker, for instance "s3n://mybucket/mydir/". If asked to iterate that "directory" via listStatus(), then the current code will return an empty file "", which the InputFormatter happily assigns to a split, and which later causes a task to fail, and probably the job to fail.
- HADOOP-6527.
Major bug reported by jghoman and fixed by ivanmi (security)
UserGroupInformation::createUserForTesting clobbers already defined group mappings
In UserGroupInformation::createUserForTesting the follow code creates a new groups instance, obliterating any groups that have been previously defined in the static groups field.
{code} if (!(groups instanceof TestingGroups)) {
groups = new TestingGroups();
}
{code}
This becomes a problem in tests that start a Mini{DFS,MR}Cluster and then create a testing user. The user that started the user (generally the real user running the test) immediately has their groups wiped out and is...
- HADOOP-6546.
Major bug reported by cjjefcoat and fixed by cjjefcoat (io)
BloomMapFile can return false negatives
BloomMapFile can return false negatives when using keys of varying sizes. If the amount of data written by the write() method of your key class differs between instance of your key, your BloomMapFile may return false negatives.
- HADOOP-6947.
Major bug reported by tlipcon and fixed by tlipcon (security)
Kerberos relogin should set refreshKrb5Config to true
In working on securing a daemon that uses two different principals from different threads, I found that I wasn't able to login from a second keytab after I'd logged in from the first. This is because we don't set the refreshKrb5Config in the Configuration for the Krb5LoginModule - hence it won't switch over to the correct keytab file if it's different than the first.
- HADOOP-7154.
Minor improvement reported by tlipcon and fixed by tlipcon (scripts)
Should set MALLOC_ARENA_MAX in hadoop-config.sh
New versions of glibc present in RHEL6 include a new arena allocator design. In several clusters we've seen this new allocator cause huge amounts of virtual memory to be used, since when multiple threads perform allocations, they each get their own memory arena. On a 64-bit system, these arenas are 64M mappings, and the maximum number of arenas is 8 times the number of cores. We've observed a DN process using 14GB of vmem for only 300M of resident set. This causes all kinds of nasty issues fo...
- HADOOP-7297.
Trivial bug reported by nonop92 and fixed by qwertymaniac (documentation)
Error in the documentation regarding Checkpoint/Backup Node
On http://hadoop.apache.org/common/docs/r0.20.203.0/hdfs_user_guide.html#Checkpoint+Node: the command bin/hdfs namenode -checkpoint required to launch the backup/checkpoint node does not exist.
I have removed this from the docs.
- HADOOP-7509.
Trivial improvement reported by raviprak and fixed by raviprak
Improve message when Authentication is required
The message when security is enabled and authentication is configured to be simple is not explicit enough. It simply prints out "Authentication is required" and prints out a stack trace. The message should be "Authorization (hadoop.security.authorization) is enabled but authentication (hadoop.security.authentication) is configured as simple. Please configure another method."
- HADOOP-7621.
Critical bug reported by tucu00 and fixed by atm (security)
alfredo config should be in a file not readable by users
[thxs ATM for point this one out]
Alfredo configuration currently is stored in the core-site.xml file, this file is readable by users (it must be as Configuration defaults must be loaded).
One of Alfredo config values is a secret which is used by all nodes to sign/verify the authentication cookie.
A user could get hold of this secret and forge authentication cookies for other users.
Because of this the Alfredo configuration, should be move to a user non-readable file.
- HADOOP-7629.
Major bug reported by phunt and fixed by tlipcon
regression with MAPREDUCE-2289 - setPermission passed immutable FsPermission (rpc failure)
MAPREDUCE-2289 introduced the following change:
{noformat}
+ fs.setPermission(stagingArea, JOB_DIR_PERMISSION);
{noformat}
JOB_DIR_PERMISSION is an immutable FsPermission which cannot be used in RPC calls, it results in the following exception:
{noformat}
2011-09-08 16:31:45,187 WARN org.apache.hadoop.ipc.Server: Unable to read call parameters for client 127.0.0.1
java.lang.RuntimeException: java.lang.NoSuchMethodException: org.apache.hadoop.fs.permission.FsPermission$2.<init>()
...
- HADOOP-7634.
Minor bug reported by eli and fixed by eli (documentation, security)
Cluster setup docs specify wrong owner for task-controller.cfg
The cluster setup docs indicate task-controller.cfg must be owned by the user running TaskTracker but the code checks for root. We should update the docs to reflect the real requirement.
- HADOOP-7653.
Minor bug reported by natty and fixed by natty (build)
tarball doesn't include .eclipse.templates
The hadoop tarball doesn't include .eclipse.templates. This results in a failure to successfully run ant eclipse-files:
eclipse-files:
BUILD FAILED
/home/natty/Downloads/hadoop-0.20.2/build.xml:1606: /home/natty/Downloads/hadoop-0.20.2/.eclipse.templates not found.
- HADOOP-7665.
Major bug reported by atm and fixed by atm (security)
branch-0.20-security doesn't include SPNEGO settings in core-default.xml
Looks like back-port of HADOOP-7119 to branch-0.20-security missed the changes to {{core-default.xml}}.
- HADOOP-7666.
Major bug reported by atm and fixed by atm (security)
branch-0.20-security doesn't include o.a.h.security.TestAuthenticationFilter
Looks like the back-port of HADOOP-7119 to branch-0.20-security missed {{o.a.h.security.TestAuthenticationFilter}}.
- HADOOP-7745.
Major bug reported by raviprak and fixed by raviprak
I switched variable names in HADOOP-7509
As Aaron pointed out on https://issues.apache.org/jira/browse/HADOOP-7509?focusedCommentId=13126725&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13126725 I stupidly swapped CommonConfigurationKeys.HADOOP_SECURITY_AUTHENTICATION with CommonConfigurationKeys.HADOOP_SECURITY_AUTHORIZATION.
- HADOOP-7753.
Major sub-task reported by tlipcon and fixed by tlipcon (io, native, performance)
Support fadvise and sync_data_range in NativeIO, add ReadaheadPool class
This JIRA adds JNI wrappers for sync_data_range and posix_fadvise. It also implements a ReadaheadPool class for future use from HDFS and MapReduce.
- HADOOP-7806.
Major new feature reported by qwertymaniac and fixed by qwertymaniac (util)
Support binding to sub-interfaces
Right now, with the {{DNS}} class, we can look up IPs of provided interface names ({{eth0}}, {{vm1}}, etc.). However, it would be useful if the I/F -> IP lookup also took a look at subinterfaces ({{eth0:1}}, etc.) and allowed binding to only a specified subinterface / virtual interface.
This should be fairly easy to add, by matching against all available interfaces' subinterfaces via Java.
- HADOOP-7823.
Major new feature reported by tbroberg and fixed by apurtell
port HADOOP-4012 to branch-1 (splitting support for bzip2)
Please see HADOOP-4012 - Providing splitting support for bzip2 compressed files.
- HADOOP-7870.
Major bug reported by jmhsieh and fixed by jmhsieh
fix SequenceFile#createWriter with boolean createParent arg to respect createParent.
After HBASE-6840, one set of calls to createNonRecursive(...) seems fishy - the new boolean createParent variable from the signature isn't used at all.
{code}
+ public static Writer
+ createWriter(FileSystem fs, Configuration conf, Path name,
+ Class keyClass, Class valClass, int bufferSize,
+ short replication, long blockSize, boolean createParent,
+ CompressionType compressionType, CompressionCodec codec,
+ Metadata meta...
- HADOOP-7879.
Trivial bug reported by jmhsieh and fixed by jmhsieh
DistributedFileSystem#createNonRecursive should also incrementWriteOps statistics.
This method:
{code}
public FSDataOutputStream createNonRecursive(Path f, FsPermission permission,
boolean overwrite,
int bufferSize, short replication, long blockSize,
Progressable progress) throws IOException {
return new FSDataOutputStream
(dfs.create(getPathName(f), permission,
overwrite, false, replication, blockSize, progress, bufferSize),
statistics);
}
{code}
Needs a statistics.incrementWriteOps(1);
- HADOOP-7898.
Minor bug reported by sureshms and fixed by sureshms (security)
Fix javadoc warnings in AuthenticationToken.java
Fix the following javadoc warning:
[WARNING] /home/jenkins/jenkins-slave/workspace/PreCommit-HADOOP-Build/trunk/hadoop-common-project/hadoop-auth/src/main/java/org/apache/hadoop/security/authentication/server/AuthenticationToken.java:33: warning - Tag @link: reference not found: HttpServletRequest
[WARNING] /home/jenkins/jenkins-slave/workspace/PreCommit-HADOOP-Build/trunk/hadoop-common-project/hadoop-auth/src/main/java/org/apache/hadoop/security/authentication/server/AuthenticationToken.java...
- HADOOP-7908.
Trivial bug reported by eli and fixed by eli (documentation)
Fix three javadoc warnings on branch-1
Fix 3 javadoc warnings on branch-1:
[javadoc] /home/eli/src/hadoop-branch-1/src/core/org/apache/hadoop/io/Sequence
File.java:428: warning - @param argument "progress" is not a parameter name.
[javadoc] /home/eli/src/hadoop-branch-1/src/core/org/apache/hadoop/util/ChecksumUtil.java:32: warning - @param argument "chunkOff" is not a parameter name.
[javadoc] /home/eli/src/hadoop-branch-1/src/mapred/org/apache/hadoop/mapred/QueueAclsInfo.java:52: warning - @param argument "queue" is not ...
- HADOOP-7942.
Major test reported by gkesavan and fixed by jnp
enabling clover coverage reports fails hadoop unit test compilation
enabling clover reports fails compiling the following junit tests.
link to the console output of jerkins :
https://builds.apache.org/view/G-L/view/Hadoop/job/Hadoop-1-Code-Coverage/13/console
{noformat}
[javac] /tmp/clover50695626838999169.tmp/org/apache/hadoop/security/TestUserGroupInformation.java:224: cannot find symbol
......
[javac] /tmp/clover50695626838999169.tmp/org/apache/hadoop/security/TestUserGroupInformation.java:225: cannot find symbol
......
[javac] /tmp/clover50695626...
- HADOOP-7982.
Major bug reported by tlipcon and fixed by tlipcon (security)
UserGroupInformation fails to login if thread's context classloader can't load HadoopLoginModule
In a few hard-to-reproduce situations, we've seen a problem where the UGI login call causes a failure to login exception with the following cause:
Caused by: javax.security.auth.login.LoginException: unable to find
LoginModule class: org.apache.hadoop.security.UserGroupInformation
$HadoopLoginModule
After a bunch of debugging, I determined that this happens when the login occurs in a thread whose Context ClassLoader has been set to null.
- HADOOP-7988.
Major bug reported by jnp and fixed by jnp
Upper case in hostname part of the principals doesn't work with kerberos.
Kerberos doesn't like upper case in the hostname part of the principals.
This issue has been seen in 23 as well as 1.0.
- HADOOP-8154.
Major bug reported by eli2 and fixed by eli (conf)
DNS#getIPs shouldn't silently return the local host IP for bogus interface names
DNS#getIPs silently returns the local host IP for bogus interface names. In this case let's throw an UnknownHostException. This is technically an incompatbile change. I suspect the current behavior was origininally introduced so the interface name "default" works w/o explicitly checking for it. It may also be used in cases where someone is using a shared config file and an option like "dfs.datanode.dns.interface" or "hbase.master.dns.interface" and eg interface "eth3" that some hosts don't ha...
- HADOOP-8159.
Major bug reported by cmccabe and fixed by cmccabe
NetworkTopology: getLeaf should check for invalid topologies
Currently, in NetworkTopology, getLeaf doesn't do too much validation on the InnerNode object itself. This results in us getting ClassCastException sometimes when the network topology is invalid. We should have a less confusing exception message for this case.
- HADOOP-8209.
Major improvement reported by eli2 and fixed by eli
Add option to relax build-version check for branch-1
In 1.x DNs currently refuse to connect to NNs if their build *revision* (ie svn revision) do not match. TTs refuse to connect to JTs if their build *version* (version, revision, user, and source checksum) do not match.
This prevents rolling upgrades, which is intentional, see the discussion in HADOOP-5203. The primary motivation in that jira was (1) it's difficult to guarantee every build on a large cluster got deployed correctly, builds don't get rolled back to old versions by accident etc,...
- HADOOP-8269.
Trivial bug reported by eli2 and fixed by eli (documentation)
Fix some javadoc warnings on branch-1
There are some javadoc warnings on branch-1, let's fix them.
- HADOOP-8314.
Major bug reported by tucu00 and fixed by tucu00 (security)
HttpServer#hasAdminAccess should return false if authorization is enabled but user is not authenticated
If the user is not authenticated (request.getRemoteUser() returns NULL) or there is not authentication filter configured (thus returning also NULL), hasAdminAccess should return false. Note that a filter could allow anonymous access, thus the first case.
- HADOOP-8329.
Major bug reported by kumarr and fixed by eli (build)
Build fails with Java 7
I am seeing the following message running IBM Java 7 running branch-1.0 code.
compile:
[echo] contrib: gridmix
[javac] Compiling 31 source files to /home/hadoop/branch-1.0_0427/build/contrib/gridmix/classes
[javac] /home/hadoop/branch-1.0_0427/src/contrib/gridmix/src/java/org/apache/hadoop/mapred/gridmix/Gridmix.java:396: error: type argument ? extends T is not within bounds of type-variable E
[javac] private <T> String getEnumValues(Enum<? extends T>[] e) {
[javac] ^
[javac] where T,E are ty...
- HADOOP-8399.
Major bug reported by cos and fixed by cos (build)
Remove JDK5 dependency from Hadoop 1.0+ line
This issues has been fixed in Hadoop starting from 0.21 (see HDFS-1552).
I propose to make the same fix for 1.0 line and get rid of JDK5 dependency all together.
- HADOOP-8417.
Major bug reported by zhihyu@ebaysf.com and fixed by zhihyu@ebaysf.com
HADOOP-6963 didn't update hadoop-core-pom-template.xml
HADOOP-6963 introduced commons-io 2.1 in ivy.xml but forgot to update the hadoop-core-pom-template.xml.
This has caused map reduce jobs in downstream projects to fail with:
{code}
Caused by: java.lang.ClassNotFoundException: org.apache.commons.io.FileUtils
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:3...
- HADOOP-8430.
Major improvement reported by eli2 and fixed by eli
Backport new FileSystem methods introduced by HADOOP-8014 to branch-1
Per HADOOP-8422 let's backport the new FileSystem methods from HADOOP-8014 to branch-1 so users can transition over in Hadoop 1.x releases, which helps upstream projects like HBase work against federation (see HBASE-6067).
- HADOOP-8445.
Major bug reported by raviprak and fixed by raviprak (security)
Token should not print the password in toString
This JIRA is for porting HADOOP-6622 to branch-1 since 6622 is already closed.
- HADOOP-8552.
Major bug reported by kkambatl and fixed by kkambatl (conf, security)
Conflict: Same security.log.file for multiple users.
In log4j.properties, hadoop.security.log.file is set to SecurityAuth.audit. In the presence of multiple users, this can lead to a potential conflict.
Adding username to the log file would avoid this scenario.
- HADOOP-8617.
Major bug reported by brandonli and fixed by brandonli (performance)
backport pure Java CRC32 calculator changes to branch-1
Multiple efforts have been made gradually to improve the CRC performance in Hadoop. This JIRA is to back port these changes to branch-1, which include HADOOP-6166, HADOOP-6148, HADOOP-7333.
The related HDFS and MAPREDUCE patches are uploaded to their original JIRAs HDFS-496 and MAPREDUCE-782.
- HADOOP-8656.
Minor improvement reported by stevel@apache.org and fixed by rvs (bin)
backport forced daemon shutdown of HADOOP-8353 into branch-1
the init.d service shutdown code doesn't work if the daemon is hung -backporting the portion of HADOOP-8353 that edits bin/hadoop-daemon.sh corrects this
- HADOOP-8748.
Minor improvement reported by acmurthy and fixed by acmurthy (io)
Move dfsclient retry to a util class
HDFS-3504 introduced mechanisms to retry RPCs. I want to move that to common to allow MAPREDUCE-4603 to share it too. Should be a trivial patch.
- HDFS-496.
Minor improvement reported by tlipcon and fixed by tlipcon (data-node, hdfs client, performance)
Use PureJavaCrc32 in HDFS
Common now has a pure java CRC32 implementation which is more efficient than java.util.zip.CRC32. This issue is to make use of it.
- HDFS-1378.
Major improvement reported by tlipcon and fixed by cmccabe (name-node)
Edit log replay should track and report file offsets in case of errors
Occasionally there are bugs or operational mistakes that result in corrupt edit logs which I end up having to repair by hand. In these cases it would be very handy to have the error message also print out the file offsets of the last several edit log opcodes so it's easier to find the right place to edit in the OP_INVALID marker. We could also use this facility to provide a rough estimate of how far along edit log replay the NN is during startup (handy when a 2NN has died and replay takes a w...
- HDFS-1910.
Minor bug reported by slukog and fixed by (name-node)
when dfs.name.dir and dfs.name.edits.dir are same fsimage will be saved twice every time
when image and edits dir are configured same, the fsimage flushing from memory to disk will be done twice whenever saveNamespace is done. this may impact the performance of backupnode/snn where it does a saveNamespace during every checkpointing time.
- HDFS-2305.
Major bug reported by atm and fixed by atm (name-node)
Running multiple 2NNs can result in corrupt file system
Here's the scenario:
* You run the NN and 2NN (2NN A) on the same machine.
* You don't have the address of the 2NN configured, so it's defaulting to 127.0.0.1.
* There's another 2NN (2NN B) running on a second machine.
* When a 2NN is done checkpointing, it says "hey NN, I have an updated fsimage for you. You can download it from this URL, which includes my IP address, which is x"
And here's the steps that occur to cause this issue:
# Some edits happen.
# 2NN A (on the NN machine) does a c...
- HDFS-2332.
Major test reported by tlipcon and fixed by tlipcon (test)
Add test for HADOOP-7629: using an immutable FsPermission as an IPC parameter
HADOOP-7629 fixes a bug where an immutable FsPermission would throw an error if used as the argument to fs.setPermission(). This JIRA is to add a test case for the common bugfix.
- HDFS-2541.
Major bug reported by qwertymaniac and fixed by qwertymaniac (data-node)
For a sufficiently large value of blocks, the DN Scanner may request a random number with a negative seed value.
Running off 0.20-security, I noticed that one could get the following exception when scanners are used:
{code}
DataXceiver
java.lang.IllegalArgumentException: n must be positive
at java.util.Random.nextInt(Random.java:250)
at org.apache.hadoop.hdfs.server.datanode.DataBlockScanner.getNewBlockScanTime(DataBlockScanner.java:251)
at org.apache.hadoop.hdfs.server.datanode.DataBlockScanner.addBlock(DataBlockScanner.java:268)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(Da...
- HDFS-2547.
Trivial bug reported by qwertymaniac and fixed by qwertymaniac (name-node)
ReplicationTargetChooser has incorrect block placement comments
{code}
/** The class is responsible for choosing the desired number of targets
* for placing block replicas.
* The replica placement strategy is that if the writer is on a datanode,
* the 1st replica is placed on the local machine,
* otherwise a random datanode. The 2nd replica is placed on a datanode
* that is on a different rack. The 3rd replica is placed on a datanode
* which is on the same rack as the **first replca**.
*/
{code}
That should read "second replica". The test cases c...
- HDFS-2637.
Major bug reported by eli and fixed by eli (hdfs client)
The rpc timeout for block recovery is too low
The RPC timeout for block recovery does not take into account that it issues multiple RPCs itself. This can cause recovery to fail if the network is congested or DNs are busy.
- HDFS-2638.
Minor improvement reported by eli and fixed by eli (name-node)
Improve a block recovery log
It would be useful to know whether an attempt to recover a block is failing because the block was already recovered (has a new GS) or the block is missing.
- HDFS-2653.
Major improvement reported by eli and fixed by eli (data-node)
DFSClient should cache whether addrs are non-local when short-circuiting is enabled
Something Todd mentioned to me off-line.. currently DFSClient doesn't cache the fact that non-local reads are non-local, so if short-circuiting is enabled every time we create a block reader we'll go through the isLocalAddress code path. We should cache the fact that an addr is non-local as well.
- HDFS-2654.
Major improvement reported by eli and fixed by eli (data-node)
Make BlockReaderLocal not extend RemoteBlockReader2
The BlockReaderLocal code paths are easier to understand (especially true on branch-1 where BlockReaderLocal inherits code from BlockerReader and FSInputChecker) if the local and remote block reader implementations are independent, and they're not really sharing much code anyway. If for some reason they start to share significant code we can make the BlockReader interface an abstract class.
- HDFS-2728.
Minor bug reported by qwertymaniac and fixed by qwertymaniac (name-node)
Remove dfsadmin -printTopology from branch-1 docs since it does not exist
It is documented we have -printTopology but we do not really have it in this branch. Possible docs mixup from somewhere in security branch pre-merge?
{code}
? branch-1 grep printTopology -R .
./src/docs/src/documentation/content/xdocs/.svn/text-base/hdfs_user_guide.xml.svn-base: <code>-printTopology</code>
./src/docs/src/documentation/content/xdocs/hdfs_user_guide.xml: <code>-printTopology</code>
{code}
Lets remove the reference.
- HDFS-2751.
Major bug reported by tlipcon and fixed by tlipcon (data-node)
Datanode drops OS cache behind reads even for short reads
HDFS-2465 has some code which attempts to disable the "drop cache behind reads" functionality when the reads are <256KB (eg HBase random access). But this check was missing in the {{close()}} function, so it always drops cache behind reads regardless of the size of the read. This hurts HBase random read performance when this patch is enabled.
- HDFS-2790.
Minor bug reported by arpitgupta and fixed by arpitgupta
FSNamesystem.setTimes throws exception with wrong configuration name in the message
the api throws this message when hdfs is not configured for accessTime
"Access time for hdfs is not configured. Please set dfs.support.accessTime configuration parameter."
The property name should be dfs.access.time.precision
- HDFS-2869.
Minor bug reported by qwertymaniac and fixed by qwertymaniac (webhdfs)
Error in Webhdfs documentation for mkdir
Reported over the lists by user Stuti Awasthi:
{quote}
I have tried the webhdfs functionality of Hadoop-1.0.0 and it is working fine.
Just a small change is required in the documentation :
Make a Directory declaration in documentation:
curl -i -X PUT "http://<HOST>:<PORT>/<PATH>?op=MKDIRS[&permission=<OCTAL>]"
Gives following error :
HTTP/1.1 405 HTTP method PUT is not supported by this URL
Content-Length: 0
Server: Jetty(6.1.26)
Correction Required : This works for me
curl -i -X PUT "ht...
- HDFS-2872.
Major improvement reported by tlipcon and fixed by cmccabe (name-node)
Add sanity checks during edits loading that generation stamps are non-decreasing
In 0.23 and later versions, we have a txid per edit, and the loading process verifies that there are no gaps. Lacking this in 1.0, we can use generation stamps as a proxy - the OP_SET_GENERATION_STAMP opcode should never result in a decreased genstamp. If it does, that would indicate that the edits are corrupt, or older edits are being applied to a newer checkpoint, for example.
- HDFS-2877.
Major bug reported by tlipcon and fixed by tlipcon (name-node)
If locking of a storage dir fails, it will remove the other NN's lock file on exit
In {{Storage.tryLock()}}, we call {{lockF.deleteOnExit()}} regardless of whether we successfully lock the directory. So, if another NN has the directory locked, then we'll fail to lock it the first time we start another NN. But our failed start attempt will still remove the other NN's lockfile, and a second attempt will erroneously start.
- HDFS-3008.
Major bug reported by eli2 and fixed by eli (hdfs client)
Negative caching of local addrs doesn't work
HDFS-2653 added negative caching of local addrs, however it still goes through the fall through path every time if the address is non-local.
- HDFS-3078.
Major bug reported by eli2 and fixed by eli
2NN https port setting is broken
The code in SecondaryNameNode.java to set the https port is broken, if the port is set it sets the bind addr to "addr:addr:port" which is bogus. Even if it did work it uses port 0 instead of port 50490 (default listed in ./src/packages/templates/conf/hdfs-site.xml).
- HDFS-3129.
Minor test reported by cmccabe and fixed by cmccabe
NetworkTopology: add test that getLeaf should check for invalid topologies
- HDFS-3131.
Minor improvement reported by szetszwo and fixed by brandonli
Improve TestStorageRestore
Aaron has the following comments on TestStorageRestore in HDFS-3127.
# removeStorageAccess, restoreAccess, and numStorageDirs can all be made private
# numStorageDirs can be made static
# Rather than do set(Readable/Executable/Writable), use FileUtil.chmod(...).
# Please put the contents of the test in a try/finally, with the calls to shutdown the cluster and the 2NN in the finally block.
# Some lines are over 80 chars.
# No need for the numDatanodes variable - it's only used in one place.
#...
- HDFS-3148.
Major new feature reported by eli2 and fixed by eli (hdfs client, performance)
The client should be able to use multiple local interfaces for data transfer
HDFS-3147 covers using multiple interfaces on the server (Datanode) side. Clients should also be able to utilize multiple *local* interfaces for outbound connections instead of always using the interface for the local hostname. This can be accomplished with a new configuration parameter ({{dfs.client.local.interfaces}}) that accepts a list of interfaces the client should use. Acceptable configuration values are the same as the {{dfs.datanode.available.interfaces}} parameter. The client binds ...
- HDFS-3150.
Major new feature reported by eli2 and fixed by eli (data-node, hdfs client)
Add option for clients to contact DNs via hostname
The DN listens on multiple IP addresses (the default {{dfs.datanode.address}} is the wildcard) however per HADOOP-6867 only the source address (IP) of the registration is given to clients. HADOOP-985 made clients access datanodes by IP primarily to avoid the latency of a DNS lookup, this had the side effect of breaking DN multihoming (the client can not route the IP exposed by the NN if the DN registers with an interface that has a cluster-private IP). To fix this let's add back the option fo...
- HDFS-3176.
Major bug reported by kihwal and fixed by kihwal (hdfs client)
JsonUtil should not parse the MD5MD5CRC32FileChecksum bytes on its own.
Currently JsonUtil used by webhdfs parses MD5MD5CRC32FileChecksum binary bytes on its own and contructs a MD5MD5CRC32FileChecksum. It should instead call MD5MD5CRC32FileChecksum.readFields().
- HDFS-3330.
Critical bug reported by tlipcon and fixed by tlipcon (name-node)
If GetImageServlet throws an Error or RTE, response has HTTP "OK" status
Currently in GetImageServlet, we catch Exception but not other Errors or RTEs. So, if the code ends up throwing one of these exceptions, the "response.sendError()" code doesn't run, but the finally clause does run. This results in the servlet returning HTTP 200 OK and an empty response, which causes the client to think it got a successful image transfer.
- HDFS-3453.
Major bug reported by kihwal and fixed by kihwal (hdfs client)
HDFS does not use ClientProtocol in a backward-compatible way
HDFS-617 was brought into branch-0.20-security/branch-1 to support non-recursive create, along with HADOOP-6840 and HADOOP-6886. However, the changes in HDFS was done in an incompatible way, making the client unusable against older clusters, even when plain old create() is called. This is because DFS now internally calls create() through the newly introduced method. By simply changing how the methods are wired internally, we can remove this limitation. We may eventually switch back to the app...
- HDFS-3461.
Major bug reported by owen.omalley and fixed by owen.omalley
HFTP should use the same port & protocol for getting the delegation token
Currently, hftp uses http to the Namenode's https port, which doesn't work.
- HDFS-3466.
Major bug reported by owen.omalley and fixed by owen.omalley (name-node, security)
The SPNEGO filter for the NameNode should come out of the web keytab file
Currently, the spnego filter uses the DFS_NAMENODE_KEYTAB_FILE_KEY to find the keytab. It should use the DFS_WEB_AUTHENTICATION_KERBEROS_KEYTAB_KEY to do it.
- HDFS-3504.
Major improvement reported by sseth and fixed by szetszwo
Configurable retry in DFSClient
When NN maintenance is performed on a large cluster, jobs end up failing. This is particularly bad for long running jobs. The client retry policy could be made configurable so that jobs don't need to be restarted.
- HDFS-3516.
Major improvement reported by szetszwo and fixed by szetszwo (hdfs client)
Check content-type in WebHdfsFileSystem
WebHdfsFileSystem currently tries to parse the response as json. It may be a good idea to check the content-type before parsing it.
- HDFS-3551.
Major bug reported by szetszwo and fixed by szetszwo (webhdfs)
WebHDFS CREATE does not use client location for redirection
CREATE currently redirects client to a random datanode but not using the client location information.
- HDFS-3596.
Minor improvement reported by cmccabe and fixed by cmccabe
Improve FSEditLog pre-allocation in branch-1
Implement HDFS-3510 in branch-1. This will improve FSEditLog preallocation to decrease the incidence of corrupted logs after disk full conditions. (See HDFS-3510 for a longer description.)
- HDFS-3617.
Major improvement reported by mattf and fixed by qwertymaniac
Port HDFS-96 to branch-1 (support blocks greater than 2GB)
Please see HDFS-96.
- HDFS-3652.
Blocker bug reported by tlipcon and fixed by tlipcon (name-node)
1.x: FSEditLog failure removes the wrong edit stream when storage dirs have same name
In {{FSEditLog.removeEditsForStorageDir}}, we iterate over the edits streams trying to find the stream corresponding to a given dir. To check equality, we currently use the following condition:
{code}
File parentDir = getStorageDirForStream(idx);
if (parentDir.getName().equals(sd.getRoot().getName())) {
{code}
... which is horribly incorrect. If two or more storage dirs happen to have the same terminal path component (eg /data/1/nn and /data/2/nn) then it will pick the wrong strea...
- HDFS-3667.
Major improvement reported by szetszwo and fixed by szetszwo (webhdfs)
Add retry support to WebHdfsFileSystem
DFSClient (i.e. DistributedFileSystem) has a configurable retry policy and it retries on exceptions such as connection failure, safemode. WebHdfsFileSystem should have similar retry support.
- HDFS-3696.
Critical bug reported by kihwal and fixed by szetszwo
Create files with WebHdfsFileSystem goes OOM when file size is big
When doing "fs -put" to a WebHdfsFileSystem (webhdfs://), the FsShell goes OOM if the file size is large. When I tested, 20MB files were fine, but 200MB didn't work.
I also tried reading a large file by issuing "-cat" and piping to a slow sink in order to force buffering. The read path didn't have this problem. The memory consumption stayed the same regardless of progress.
- HDFS-3698.
Major bug reported by atm and fixed by atm (security)
TestHftpFileSystem is failing in branch-1 due to changed default secure port
This test is failing since the default secure port changed to the HTTP port upon the commit of HDFS-2617.
- HDFS-3701.
Critical bug reported by nkeywal and fixed by nkeywal (hdfs client)
HDFS may miss the final block when reading a file opened for writing if one of the datanode is dead
When the file is opened for writing, the DFSClient calls one of the datanode owning the last block to get its size. If this datanode is dead, the socket exception is shallowed and the size of this last block is equals to zero. This seems to be fixed on trunk, but I didn't find a related Jira. On 1.0.3, it's not fixed. It's on the same area as HDFS-1950 or HDFS-3222.
- HDFS-3871.
Minor improvement reported by acmurthy and fixed by acmurthy (hdfs client)
Change NameNodeProxies to use HADOOP-8748
Change NameNodeProxies to use util method introduced via HADOOP-8748.
- HDFS-3966.
Minor bug reported by jingzhao and fixed by jingzhao
For branch-1, TestFileCreation should use JUnit4 to make assumeTrue work
Currently in TestFileCreation for branch-1, assumeTrue() is used by two test cases in order to check if the OS is Linux. Thus JUnit 4 should be used to enable assumeTrue.
- MAPREDUCE-782.
Minor improvement reported by tlipcon and fixed by tlipcon (performance)
Use PureJavaCrc32 in mapreduce spills
HADOOP-6148 implemented a Pure Java implementation of CRC32 which performs better than the built-in one. This issue is to make use of it in the mapred package
- MAPREDUCE-1740.
Major bug reported by tlipcon and fixed by ahmed.radwan (jobtracker)
NPE in getMatchingLevelForNodes when node locations are variable depth
In getMatchingLevelForNodes, we assume that both nodes have the same "depth" (ie number of path components). If the user provides a topology script that assigns one node a path like /foo/bar/baz and another node a path like /foo/blah, this function will throw an NPE.
I'm not sure if there are other places where we assume that all node locations have a constant number of paths. If so we should check the output of the topology script aggressively to be sure this is the case. Otherwise I think ...
- MAPREDUCE-2073.
Trivial test reported by tlipcon and fixed by tlipcon (distributed-cache, test)
TestTrackerDistributedCacheManager should be up-front about requirements on build environment
TestTrackerDistributedCacheManager will fail on a system where the build directory is in any path where an ancestor doesn't have a+x permissions. On one of our hudson boxes, for example, hudson's workspace had 700 permissions and caused this test to fail reliably, but not in an obvious manner. It would be helpful if the test failed with a more obvious error message during setUp() when the build environment is misconfigured.
- MAPREDUCE-2103.
Trivial improvement reported by tlipcon and fixed by tlipcon (task-controller)
task-controller shouldn't require o-r permissions
The task-controller currently checks that "other" users don't have read permissions. This is unnecessary - we just need to make it's not executable. The debian policy manual explains it well:
{quote}
Setuid and setgid executables should be mode 4755 or 2755 respectively, and owned by the appropriate user or group. They should not be made unreadable (modes like 4711 or 2711 or even 4111); doing so achieves no extra security, because anyone can find the binary in the freely available Debian pa...
- MAPREDUCE-2129.
Major bug reported by xiaokang and fixed by subrotosanyal (jobtracker)
Job may hang if mapreduce.job.committer.setup.cleanup.needed=false and mapreduce.map/reduce.failures.maxpercent>0
Job may hang at RUNNING state if mapreduce.job.committer.setup.cleanup.needed=false and mapreduce.map/reduce.failures.maxpercent>0. It happens when some tasks fail but havent reached failures.maxpercent.
- MAPREDUCE-2376.
Major bug reported by tlipcon and fixed by tlipcon (task-controller, test)
test-task-controller fails if run as a userid < 1000
test-task-controller tries to verify that the task-controller won't run on behalf of users with uid < 1000. This makes the test fail when running in some test environments - eg our hudson jobs internally run as a system user with uid 101.
- MAPREDUCE-2377.
Major bug reported by tlipcon and fixed by benoyantony (task-controller)
task-controller fails to parse configuration if it doesn't end in \n
If the task-controller.cfg file doesn't end in a newline, it fails to parse properly.
- MAPREDUCE-2835.
Major improvement reported by tomwhite and fixed by tomwhite
Make per-job counter limits configurable
The per-job counter limits introduced in MAPREDUCE-1943 are fixed, except for the total number allowed per job (mapreduce.job.counters.limit). It would be useful to make them all configurable.
- MAPREDUCE-2836.
Minor improvement reported by jwfbean and fixed by ahmed.radwan (contrib/fair-share)
Provide option to fail jobs when submitted to non-existent pools.
In some environments, it might be desirable to explicitly specify the fair scheduler pools and to explicitly fail jobs that are not submitted to any of the pools.
Current behavior of the fair scheduler is to submit jobs to a default pool if a pool name isn't specified or to create a pool with the new name if the pool name doesn't already exist. There should be a configuration option for the fair scheduler that causes it to noisily fail the job if it's submitted to a pool that isn't pre-spec...
- MAPREDUCE-2850.
Major sub-task reported by eli and fixed by ravidotg (tasktracker)
Add test for TaskTracker disk failure handling (MR-2413)
MR-2413 doesn't have any test coverage that eg tests that the TT can survive disk failure.
- MAPREDUCE-2903.
Major bug reported by devaraj.k and fixed by devaraj.k (jobtracker)
Map Tasks graph is throwing XML Parse error when Job is executed with 0 maps
{code:xml}
XML Parsing Error: no element found
Location: http://10.18.52.170:50030/taskgraph?type=map&jobid=job_201108291536_0001
Line Number 1, Column 1:
^
{code}
- MAPREDUCE-2905.
Major bug reported by jwfbean and fixed by jwfbean (contrib/fair-share)
CapBasedLoadManager incorrectly allows assignment when assignMultiple is true (was: assignmultiple per job)
We encountered a situation where in the same cluster, large jobs benefit from mapred.fairscheduler.assignmultiple, but small jobs with small numbers of mappers do not: the mappers all clump to fully occupy just a few nodes, which causes those nodes to saturate and bottleneck. The desired behavior is to spread the job across more nodes so that a relatively small job doesn't saturate any node in the cluster.
Testing has shown that setting mapred.fairscheduler.assignmultiple to false gives the ...
- MAPREDUCE-2919.
Minor improvement reported by eli and fixed by qwertymaniac (jobtracker)
The JT web UI should show job start times
It would be helpful if the list of jobs in the main JT web UI (running, completed, failed..) had a column with the start time. Clicking into each job detail can get tedious.
- MAPREDUCE-2932.
Trivial bug reported by qwertymaniac and fixed by qwertymaniac (tasktracker)
Missing instrumentation plugin class shouldn't crash the TT startup per design
Per the implementation of the TaskTracker instrumentation plugin implementation (from 2008), a ClassNotFoundException during loading up of an configured TaskTracker instrumentation class shouldn't have hampered TT start up at all.
But, there is one class-fetching call outside try/catch, which makes TT fall down with a RuntimeException if there's a class not found. Would be good to include this line into the try/catch itself.
Strace would appear as:
{code}
2011-08-25 11:45:38,470 ERROR org....
- MAPREDUCE-2957.
Major sub-task reported by eli and fixed by eli (tasktracker)
The TT should not re-init if it has no good local dirs
The TT will currently try to re-init itself on disk failure even if it has no good local dirs. It should shutdown instead.
- MAPREDUCE-3015.
Major sub-task reported by eli and fixed by eli (tasktracker)
Add local dir failure info to metrics and the web UI
Like HDFS-811/HDFS-1850 but for the TT.
- MAPREDUCE-3278.
Major improvement reported by tlipcon and fixed by tlipcon (mrv1, performance, task)
0.20: avoid a busy-loop in ReduceTask scheduling
Looking at profiling results, it became clear that the ReduceTask has the following busy-loop which was causing it to suck up 100% of CPU in the fetch phase in some configurations:
- the number of reduce fetcher threads is configured to more than the number of hosts
- therefore "busyEnough()" never returns true
- the "scheduling" portion of the code can't schedule any new fetches, since all of the pending fetches in the mapLocations buffer correspond to hosts that are already being fetched (t...
- MAPREDUCE-3289.
Major improvement reported by tlipcon and fixed by tlipcon (mrv2, nodemanager, performance)
Make use of fadvise in the NM's shuffle handler
Using the new NativeIO fadvise functions, we can make the NodeManager prefetch map output before it's send over the socket, and drop it out of the fs cache once it's been sent (since it's very rare for an output to have to be re-sent). This improves IO efficiency and reduces cache pollution.
- MAPREDUCE-3365.
Trivial improvement reported by sho.shimauchi and fixed by sho.shimauchi (contrib/fair-share)
Uncomment eventlog settings from the documentation
Two fair scheduler debug options "mapred.fairscheduler.eventlog.enabled" and "mapred.fairscheduler.dump.interval" are commented out in fair scheduler doc file.
It's useful for debugging.
- MAPREDUCE-3394.
Trivial improvement reported by tlipcon and fixed by tlipcon (task)
Add log guard for a debug message in ReduceTask
There's a LOG.debug message in ReduceTask that stringifies a task ID and uses a non-negligible amount of CPU in some cases. We should guard it with {{isDebugEnabled}}
- MAPREDUCE-3395.
Trivial improvement reported by eli and fixed by eli (documentation)
Add mapred.disk.healthChecker.interval to mapred-default.xml
Let's add mapred.disk.healthChecker.interval to mapred-default.xml.
- MAPREDUCE-3405.
Critical bug reported by tlipcon and fixed by tlipcon (capacity-sched, contrib/fair-share)
MAPREDUCE-3015 broke compilation of contrib scheduler tests
MAPREDUCE-3015 added a new argument to the TaskTrackerStatus constructor, which is used by a few of the scheduler tests, but didn't update those tests. So, the contrib test build is now failing on 0.20-security
- MAPREDUCE-3419.
Major bug reported by eli and fixed by eli (tasktracker, test)
Don't mark exited TT threads as dead in MiniMRCluster
MAPREDUCE-2850 flagged all TT threads that exited in the MiniMRCluster as dead, this breaks a number of the other tests that use MiniMRCluster across restart.
- MAPREDUCE-3424.
Minor sub-task reported by eli and fixed by eli (tasktracker)
Some LinuxTaskController cleanup
MR-2415 had some tabs and weird indenting and spacing. Also would be more clear if LTC explicitly overrides createLogDir. Let's clean this up.
- MAPREDUCE-3674.
Critical bug reported by qwertymaniac and fixed by qwertymaniac (jobtracker)
If invoked with no queueName request param, jobqueue_details.jsp injects a null queue name into schedulers.
When you access /jobqueue_details.jsp manually, instead of via a link, it has queueName set to null internally and this goes for a lookup into the scheduling info maps as well.
As a result, if using FairScheduler, a Pool with String name = null gets created and this brings the scheduler down. I have not tested what happens to the CapacityScheduler, but ideally if no queueName is set in that jsp, it should fall back to 'default'. Otherwise, this brings down the JobTracker completely.
FairSch...
- MAPREDUCE-3789.
Critical bug reported by qwertymaniac and fixed by qwertymaniac (capacity-sched, scheduler)
CapacityTaskScheduler may perform unnecessary reservations in heterogenous tracker environments
Briefly, to reproduce:
* Run JT with CapacityTaskScheduler [Say, Cluster max map = 8G, Cluster map = 2G]
* Run two TTs but with varied capacity, say, one with 4 map slot, another with 3 map slots.
* Run a job with two tasks, each demanding mem worth 4 slots at least (Map mem = 7G or so).
* Job will begin running on TT #1, but will also end up reserving the 3 slots on TT #2 cause it does not check for the maximum limit of slots when reserving (as it goes greedy, and hopes to gain more slots i...
- MAPREDUCE-3837.
Major new feature reported by mayank_bansal and fixed by mayank_bansal
Job tracker is not able to recover job in case of crash and after that no user can submit job.
If job tracker is crashed while running , and there were some jobs are running , so if job tracker's property mapreduce.jobtracker.restart.recover is true then it should recover the job.
However the current behavior is as follows
jobtracker try to restore the jobs but it can not . And after that jobtracker closes its handle to hdfs and nobody else can submit job.
Thanks,
Mayank
- MAPREDUCE-3992.
Major bug reported by tlipcon and fixed by tlipcon (mrv1)
Reduce fetcher doesn't verify HTTP status code of response
Currently, the reduce fetch code doesn't check the HTTP status code of the response. This can lead to the following situation:
- the map output servlet gets an IOException after setting the headers but before the first call to flush()
- this causes it to send a response with a non-OK result code, including the exception text as the response body (response.sendError() does this if the response isn't committed)
- it will still include the response headers indicating it's a valid response
In th...
- MAPREDUCE-4001.
Minor improvement reported by qwertymaniac and fixed by qwertymaniac (capacity-sched)
Improve MAPREDUCE-3789's fix logic by looking at job's slot demands instead
In MAPREDUCE-3789, the fix had unfortunately only covered the first time assignment scenario, and the test had not really caught the mistake of using the condition of looking at available TT slots (instead of looking for how many slots a job's task demands).
We should change the condition of reservation in such a manner:
{code}
if ((getPendingTasks(j) != 0 &&
!hasSufficientReservedTaskTrackers(j)) &&
- (taskTracker.getAvailableSlots(type) !=
+ ...
- MAPREDUCE-4088.
Critical bug reported by raviprak and fixed by raviprak (mrv1)
Task stuck in JobLocalizer prevented other tasks on the same node from committing
We saw that as a result of HADOOP-6963, one task was stuck in this
Thread 23668: (state = IN_NATIVE)
- java.io.UnixFileSystem.getBooleanAttributes0(java.io.File) @bci=0 (Compiled frame; information may be imprecise)
- java.io.UnixFileSystem.getBooleanAttributes(java.io.File) @bci=2, line=228 (Compiled frame)
- java.io.File.exists() @bci=20, line=733 (Compiled frame)
- org.apache.hadoop.fs.FileUtil.getDU(java.io.File) @bci=3, line=446 (Compiled frame)
- org.apache.hadoop.fs.FileUtil.getD...
- MAPREDUCE-4095.
Major bug reported by eli2 and fixed by cmccabe
TestJobInProgress#testLocality uses a bogus topology
The following in TestJobInProgress#testLocality:
{code}
Node r2n4 = new NodeBase("/default/rack2/s1/node4");
nt.add(r2n4);
{code}
violates the check introduced by HADOOP-8159:
{noformat}
Testcase: testLocality took 0.005 sec
Caused an ERROR
Invalid network topology. You cannot have a rack and a non-rack node at the same level of the network topology.
org.apache.hadoop.net.NetworkTopology$InvalidTopologyException: Invalid network topology. You cannot have a rack and a non-ra...
- MAPREDUCE-4241.
Major bug reported by abayer and fixed by abayer (build, examples)
Pipes examples do not compile on Ubuntu 12.04
-lssl alone won't work for compiling the pipes examples on 12.04. -lcrypto needs to be added explicitly.
- MAPREDUCE-4328.
Major improvement reported by acmurthy and fixed by acmurthy (mrv1)
Add the option to quiesce the JobTracker
In several failure scenarios it would be very handy to have an option to quiesce the JobTracker.
Recently, we saw a case where the NameNode had to be rebooted at a customer due to a random hardware failure - in such a case it would have been nice to not lose jobs by quiescing the JobTracker.
- MAPREDUCE-4399.
Major bug reported by vicaya and fixed by vicaya (performance, tasktracker)
Fix performance regression in shuffle
There is a significant (up to 3x) performance regression in shuffle (vs 0.20.2) in the Hadoop 1.x series. Most noticeable with high-end switches.
- MAPREDUCE-4400.
Major bug reported by vicaya and fixed by vicaya (performance, task)
Fix performance regression for small jobs/workflows
There is a significant performance regression for small jobs/workflows (vs 0.20.2) in the Hadoop 1.x series. Most noticeable with Hive and Pig jobs. PigMix has an average 40% regression against 0.20.2.
- MAPREDUCE-4511.
Major improvement reported by ahmed.radwan and fixed by ahmed.radwan (mrv1, mrv2, performance)
Add IFile readahead
This ticket is to add IFile readahead as part of HADOOP-7714.
- MAPREDUCE-4558.
Major bug reported by sseth and fixed by sseth
TestJobTrackerSafeMode is failing
MAPREDUCE-1906 exposed an issue with this unit test. It has 3 TTs running, but has a check for the TT count to reach exactly 2 (which would be reached with a higher heartbeat interval).
The test ends up getting stuck, with the following message repeated multiple times.
{code}
[junit] 2012-08-15 11:26:46,299 INFO mapred.TestJobTrackerSafeMode (TestJobTrackerSafeMode.java:checkTrackers(201)) - Waiting for Initialize all Task Trackers
[junit] 2012-08-15 11:26:47,301 INFO mapred.TestJo...
- MAPREDUCE-4603.
Major improvement reported by acmurthy and fixed by acmurthy
Allow JobClient to retry job-submission when JT is in safemode
Similar to HDFS-3504, it would be useful to allow JobClient to retry job-submission when JT is in safemode (via MAPREDUCE-4328).
This way applications like Pig/Hive don't bork midway when the NN/JT are not operational.
Changes since Hadoop 1.0.2
Jiras with Release Notes (describe major or incompatible changes)
- HADOOP-5528.
Major new feature reported by klbostee and fixed by klbostee
Binary partitioner
New BinaryPartitioner that partitions BinaryComparable keys by hashing a configurable part of the bytes array corresponding to the key.
- HADOOP-8352.
Major improvement reported by owen.omalley and fixed by owen.omalley
We should always generate a new configure script for the c++ code
If you are compiling c++, the configure script will now be automatically regenerated as it should be.
This requires autoconf version 2.61 or greater.
- MAPREDUCE-4017.
Trivial improvement reported by knoguchi and fixed by tgraves (jobhistoryserver, jobtracker)
Add jobname to jobsummary log
The Job Summary log may contain commas in values that are escaped by a '\' character. This was true before, but is more likely to be exposed now.
Other Jiras (describe bug fixes and minor changes)
- HADOOP-6924.
Major bug reported by wattsteve and fixed by devaraj
Build fails with non-Sun JREs due to different pathing to the operating system architecture shared libraries
The src/native/configure script used to build the native libraries has an environment variable called JNI_LDFLAGS which is set as follows:
JNI_LDFLAGS="-L$JAVA_HOME/jre/lib/$OS_ARCH/server"
This pathing convention to the shared libraries for the operating system architecture is unique to Oracle/Sun Java and thus on other flavors of Java the path will not exist and will result in a build failure with the following exception:
[exec] gcc -shared ../src/org/apache/hadoop/io/compress/zlib...
- HADOOP-6941.
Major bug reported by wattsteve and fixed by devaraj
Support non-SUN JREs in UserGroupInformation
Attempting to format the namenode or attempting to start Hadoop using Apache Harmony or the IBM Java JREs results in the following exception:
10/09/07 16:35:05 ERROR namenode.NameNode: java.lang.NoClassDefFoundError: com.sun.security.auth.UnixPrincipal
at org.apache.hadoop.security.UserGroupInformation.<clinit>(UserGroupInformation.java:223)
at java.lang.J9VMInternals.initializeImpl(Native Method)
at java.lang.J9VMInternals.initialize(J9VMInternals.java:200)
at org.apache.hadoop.hdfs.ser...
- HADOOP-6963.
Critical bug reported by owen.omalley and fixed by raviprak (fs)
Fix FileUtil.getDU. It should not include the size of the directory or follow symbolic links
The getDU method should not include the size of the directory. The Java interface says that the value is undefined and in Linux/Sun it gets the 4096 for the inode. Clearly this isn't useful.
It also recursively calls itself. In case the directory has a symbolic link forming a cycle, getDU keeps spinning in the cycle. In our case, we saw this in the org.apache.hadoop.mapred.JobLocalizer.downloadPrivateCacheObjects call. This prevented other tasks on the same node from committing, causing the T...
- HADOOP-7381.
Major bug reported by jrottinghuis and fixed by jrottinghuis (build)
FindBugs OutOfMemoryError
When running the findbugs target from Jenkins, I get an OutOfMemory error.
The "effort" in FindBugs is set to Max which ends up using a lot of memory to go through all the classes. The jvmargs passed to FindBugs is hardcoded to 512 MB max.
We can leave the default to 512M, as long as we pass this as an ant parameter which can be overwritten in individual cases through -D, or in the build.properties file (either basedir, or user's home directory).
- HADOOP-8027.
Minor improvement reported by qwertymaniac and fixed by atm (metrics)
Visiting /jmx on the daemon web interfaces may print unnecessary error in logs
Logs that follow a {{/jmx}} servlet visit:
{code}
11/11/22 12:09:52 ERROR jmx.JMXJsonServlet: getting attribute UsageThreshold of java.lang:type=MemoryPool,name=Par Eden Space threw an exception
javax.management.RuntimeMBeanException: java.lang.UnsupportedOperationException: Usage threshold is not supported
at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.rethrow(DefaultMBeanServerInterceptor.java:856)
...
{code}
- HADOOP-8151.
Major bug reported by tlipcon and fixed by mattf (io, native)
Error handling in snappy decompressor throws invalid exceptions
SnappyDecompressor.c has the following code in a few places:
{code}
THROW(env, "Ljava/lang/InternalError", "Could not decompress data. Buffer length is too small.");
{code}
this is incorrect, though, since the THROW macro doesn't need the "L" before the class name. This results in a ClassNotFoundException for Ljava.lang.InternalError being thrown, instead of the intended exception.
- HADOOP-8188.
Major improvement reported by devaraj and fixed by devaraj
Fix the build process to do with jsvc, with IBM's JDK as the underlying jdk
When IBM JDK is used as the underlying JDK for the build process, the build of jsvc fails. I just needed to add an extra "os arch" expression in the condition that sets os-arch.
- HADOOP-8251.
Blocker bug reported by tlipcon and fixed by tlipcon (security)
SecurityUtil.fetchServiceTicket broken after HADOOP-6941
HADOOP-6941 replaced direct references to some classes with reflective access so as to support other JDKs. Unfortunately there was a mistake in the name of the Krb5Util class, which broke fetchServiceTicket. This manifests itself as the inability to run checkpoints or other krb5-SSL HTTP-based transfers:
java.lang.ClassNotFoundException: sun.security.jgss.krb5
- HADOOP-8293.
Major bug reported by owen.omalley and fixed by owen.omalley (build)
The native library's Makefile.am doesn't include JNI path
When compiling on centos 6, I get the following error when compiling the native library:
{code}
[exec] /usr/bin/ld: cannot find -ljvm
{code}
The problem is simply that the Makefile.am libhadoop_la_LDFLAGS doesn't include AM_LDFLAGS.
- HADOOP-8294.
Critical bug reported by kihwal and fixed by kihwal (ipc)
IPC Connection becomes unusable even if server address was temporarilly unresolvable
This is same as HADOOP-7428, but was observed on 1.x data nodes. This can happen more frequently after HADOOP-7472, which allows IPC Connection to re-resolve the name. HADOOP-7428 needs to be back-ported.
- HADOOP-8338.
Major bug reported by owen.omalley and fixed by owen.omalley (security)
Can't renew or cancel HDFS delegation tokens over secure RPC
The fetchdt tool is failing for secure deployments when given --renew or --cancel on tokens fetched using RPC. (The tokens fetched over HTTP can be renewed and canceled fine.)
- HADOOP-8346.
Blocker bug reported by tucu00 and fixed by devaraj (security)
Changes to support Kerberos with non Sun JVM (HADOOP-6941) broke SPNEGO
before HADOOP-6941 hadoop-auth testcases with Kerberos ON pass, *mvn test -PtestKerberos*
after HADOOP-6941 the tests fail with the error below.
Doing some IDE debugging I've found out that the changes in HADOOP-6941 are making the JVM Kerberos libraries to append an extra element to the kerberos principal of the server (on the client side when creating the token) so *HTTP/localhost* ends up being *HTTP/localhost/localhost*. Then, when contacting the KDC to get the granting ticket, the serv...
- HDFS-119.
Major bug reported by shv and fixed by sureshms (name-node)
logSync() may block NameNode forever.
# {{FSEditLog.logSync()}} first waits until {{isSyncRunning}} is false and then performs syncing to file streams by calling {{EditLogOutputStream.flush()}}.
If an exception is thrown after {{isSyncRunning}} is set to {{true}} all threads will always wait on this condition.
An {{IOException}} may be thrown by {{EditLogOutputStream.setReadyToFlush()}} or a {{RuntimeException}} may be thrown by {{EditLogOutputStream.flush()}} or by {{processIOError()}}.
# The loop that calls {{eStream.flush()}} ...
- HDFS-1041.
Major bug reported by szetszwo and fixed by szetszwo (hdfs client)
DFSClient does not retry in getFileChecksum(..)
If connection to the first datanode fails, DFSClient does not retry in getFileChecksum(..).
- HDFS-3061.
Blocker bug reported by alex.holmes and fixed by kihwal (name-node)
Cached directory size in INodeDirectory can get permantently out of sync with computed size, causing quota issues
It appears that there's a condition under which a HDFS directory with a space quota set can get to a point where the cached size for the directory can permanently differ from the computed value. When this happens the following command:
{code}
hadoop fs -count -q /tmp/quota-test
{code}
results in the following output in the NameNode logs:
{code}
WARN org.apache.hadoop.hdfs.server.namenode.NameNode: Inconsistent diskspace for directory quota-test. Cached: 6000 Computed: 6072
{code}
I've ob...
- HDFS-3127.
Major bug reported by brandonli and fixed by brandonli (name-node)
failure in recovering removed storage directories should not stop checkpoint process
When a restore fails, rollEditLog() also fails even if there are healthy directories. Any exceptions from recovering the removed directories should not fail checkpoint process.
- HDFS-3265.
Major bug reported by kumarr and fixed by kumarr (build)
PowerPc Build error.
When attempting to build branch-1, the following error is seen and ant exits.
[exec] configure: error: Unsupported CPU architecture "powerpc64"
The following command was used to build hadoop-common
ant -Dlibhdfs=true -Dcompile.native=true -Dfusedfs=true -Dcompile.c++=true -Dforrest.home=$FORREST_HOME compile-core-native compile-c++ compile-c++-examples task-controller tar record-parser compile-hdfs-classes package -Djava5.home=/opt/ibm/ibm-java2-ppc64-50/
- HDFS-3310.
Major bug reported by cmccabe and fixed by cmccabe
Make sure that we abort when no edit log directories are left
We should make sure to abort when there are no edit log directories left to write to. It seems that there is at least one case that is slipping through the cracks right now in branch-1.
- HDFS-3374.
Major bug reported by owen.omalley and fixed by owen.omalley (name-node)
hdfs' TestDelegationToken fails intermittently with a race condition
The testcase is failing because the MiniDFSCluster is shutdown before the secret manager can change the key, which calls system.exit with no edit streams available.
{code}
[junit] 2012-05-04 15:03:51,521 WARN common.Storage (FSImage.java:updateRemovedDirs(224)) - Removing storage dir /home/horton/src/hadoop/build/test/data/dfs/name1
[junit] 2012-05-04 15:03:51,522 FATAL namenode.FSNamesystem (FSEditLog.java:fatalExit(388)) - No edit streams are accessible
[junit] java.lang.Exce...
- MAPREDUCE-1238.
Major bug reported by rramya and fixed by tgraves (jobtracker)
mapred metrics shows negative count of waiting maps and reduces
Negative waiting_maps and waiting_reduces count is observed in the mapred metrics
- MAPREDUCE-3377.
Major bug reported by jxchen and fixed by jxchen
Compatibility issue with 0.20.203.
I have an OutputFormat which implements Configurable. I set new config entries to a job configuration during checkOutputSpec() so that the tasks will get the config entries through the job configuration. This works fine in 0.20.2, but stopped working starting from 0.20.203. With 0.20.203, my OutputFormat still has the configuration set, but the copy a task gets does not have the new entries that are set as part of checkOutputSpec().
I believe that the problem is with JobClient. The job...
- MAPREDUCE-3857.
Major bug reported by jeagles and fixed by jeagles (examples)
Grep example ignores mapred.job.queue.name
Grep example creates two jobs as part of its implementation. The first job correctly uses the configuration settings. The second job ignores configuration settings.
- MAPREDUCE-4003.
Major bug reported by zaozaowang and fixed by knoguchi (task-controller, tasktracker)
log.index (No such file or directory) AND Task process exit with nonzero status of 126
hello?I have dwelled on this hadoop(cdhu3) problem for 2 days,I have tried every google method.This is the issue: when ran hadoop example "wordcount" ,the tasktracker's log in one slave node presented such errors
1.WARN org.apache.hadoop.mapred.DefaultTaskController: Task wrapper stderr: bash: /var/tmp/mapred/local/ttprivate/taskTracker/hdfs/jobcache/job_201203131751_0003/attempt_201203131751_0003_m_000006_0/taskjvm.sh: Permission denied
2.WARN org.apache.hadoop.mapred.TaskRunner: attempt_...
- MAPREDUCE-4012.
Minor bug reported by knoguchi and fixed by tgraves
Hadoop Job setup error leaves no useful info to users (when LinuxTaskController is used)
When distributed cache pull fail on the TaskTracker, job webUI only shows
{noformat}
Job initialization failed (255)
{noformat}
leaving users confused.
On the TaskTracker log, there is a log with useful info
{noformat}
2012-03-14 21:44:17,083 INFO org.apache.hadoop.mapred.TaskController: org.apache.hadoop.security.AccessControlException: org.apache.hadoop.security.AccessControlException:
Permission denied: user=user1, access=READ, inode="testfile":user3:users:rw-------
...
2012-03-14 21...
- MAPREDUCE-4154.
Major bug reported by thejas and fixed by devaraj
streaming MR job succeeds even if the streaming command fails
Hadoop 1.0.1 behaves as expected - The task fails for streaming MR job if the streaming command fails. But it succeeds in hadoop 1.0.2 .
- MAPREDUCE-4207.
Major bug reported by kihwal and fixed by kihwal (mrv1)
Remove System.out.println() in FileInputFormat
MAPREDUCE-3607 accidentally left the println statement.
Changes since Hadoop 1.0.1
Jiras with Release Notes (describe major or incompatible changes)
- HADOOP-1722.
Major improvement reported by runping and fixed by klbostee
Make streaming to handle non-utf8 byte array
Streaming allows binary (or other non-UTF8) streams.
- MAPREDUCE-3851.
Major bug reported by kihwal and fixed by tgraves (tasktracker)
Allow more aggressive action on detection of the jetty issue
added new configuration variables to control when TT aborts if it sees a certain number of exceptions:
// Percent of shuffle exceptions (out of sample size) seen before it's
// fatal - acceptable values are from 0 to 1.0, 0 disables the check.
// ie. 0.3 = 30% of the last X number of requests matched the exception,
// so abort.
conf.getFloat(
"mapreduce.reduce.shuffle.catch.exception.percent.limit.fatal", 0);
// The number of trailing requests we track, used for the fatal
// limit calculation
conf.getInt("mapreduce.reduce.shuffle.catch.exception.sample.size", 1000);
Other Jiras (describe bug fixes and minor changes)
- HADOOP-5450.
Blocker improvement reported by klbostee and fixed by klbostee
Add support for application-specific typecodes to typed bytes
For serializing objects of types that are not supported by typed bytes serialization, applications might want to use a custom serialization format. Right now, typecode 0 has to be used for the bytes resulting from this custom serialization, which could lead to problems when deserializing the objects because the application cannot know if a byte sequence following typecode 0 is a customly serialized object or just a raw sequence of bytes. Therefore, a range of typecodes that are treated as ali...
- HADOOP-7206.
Major new feature reported by eli and fixed by tucu00
Integrate Snappy compression
Google release Zippy as an open source (APLv2) project called Snappy (http://code.google.com/p/snappy). This tracks integrating it into Hadoop.
{quote}
Snappy is a compression/decompression library. It does not aim for maximum compression, or compatibility with any other compression library; instead, it aims for very high speeds and reasonable compression. For instance, compared to the fastest mode of zlib, Snappy is an order of magnitude faster for most inputs, but the resulting compressed ...
- HADOOP-8050.
Major bug reported by kihwal and fixed by kihwal (metrics)
Deadlock in metrics
The metrics serving thread and the periodic snapshot thread can deadlock.
It happened a few times on one of namenodes we have. When it happens RPC works but the web ui and hftp stop working. I haven't look at the trunk too closely, but it might happen there too.
- HADOOP-8088.
Major bug reported by kihwal and fixed by (security)
User-group mapping cache incorrectly does negative caching on transient failures
We've seen a case where some getGroups() calls fail when the ldap server or the network is having transient failures. Looking at the code, the shell-based and the JNI-based implementations swallow exceptions and return an empty or partial list. The caller, Groups#getGroups() adds this likely empty list into the mapping cache for the user. This will function as negative caching until the cache expires. I don't think we want negative caching here, but even if we do, it should be intelligent eno...
- HADOOP-8090.
Major improvement reported by gkesavan and fixed by gkesavan
rename hadoop 64 bit rpm/deb package name
change hadoop rpm/deb name from hadoop-<version>.amd64.rpm/deb hadoop-<version>.x86_64.rpm/deb
- HADOOP-8132.
Major bug reported by arpitgupta and fixed by arpitgupta
64bit secure datanodes do not start as the jsvc path is wrong
64bit secure datanodes were looking for /usr/libexec/../libexec/jsvc. instead of /usr/libexec/../libexec/jsvc.amd64
- HADOOP-8201.
Blocker bug reported by gkesavan and fixed by gkesavan
create the configure script for native compilation as part of the build
configure script is checked into svn and its not regenerated during build. Ideally configure scritp should not be checked into svn and instead should be generated during build using autoreconf.
- HDFS-2701.
Major improvement reported by eli and fixed by eli (name-node)
Cleanup FS* processIOError methods
Let's rename the various "processIOError" methods to be more descriptive. The current code makes it difficult to identify and reason about bug fixes. While we're at it let's remove "Fatal" from the "Unable to sync the edit log" log since it's not actually a fatal error (this is confusing to users). And 2NN "Checkpoint done" should be info, not a warning (also confusing to users).
Thanks to HDFS-1073 these issues don't exist on trunk or 23.
- HDFS-2702.
Critical bug reported by eli and fixed by eli (name-node)
A single failed name dir can cause the NN to exit
There's a bug in FSEditLog#rollEditLog which results in the NN process exiting if a single name dir has failed. Here's the relevant code:
{code}
close() // So editStreams.size() is 0
foreach edits dir {
..
eStream = new ... // Might get an IOE here
editStreams.add(eStream);
} catch (IOException ioe) {
removeEditsForStorageDir(sd); // exits if editStreams.size() <= 1
}
{code}
If we get an IOException before we've added two edits streams to the list we'll exit, eg if there's an ...
- HDFS-2703.
Major bug reported by eli and fixed by eli (name-node)
removedStorageDirs is not updated everywhere we remove a storage dir
There are a number of places (FSEditLog#open, purgeEditLog, and rollEditLog) where we remove a storage directory but don't add it to the removedStorageDirs list. This means a storage dir may have been removed but we don't see it in the log or Web UI. This doesn't affect trunk/23 since the code there is totally different.
- HDFS-2978.
Major new feature reported by atm and fixed by atm (name-node)
The NameNode should expose name dir statuses via JMX
We currently display this info on the NN web UI, so users who wish to monitor this must either do it manually or parse HTML. We should publish this information via JMX.
- HDFS-3006.
Major bug reported by bcwalrus and fixed by szetszwo (name-node)
Webhdfs "SETOWNER" call returns incorrect content-type
The SETOWNER call returns an empty body. But the header has "Content-Type: application/json", which is a contradiction (empty string is not valid json). This appears to happen for SETTIMES and SETPERMISSION as well.
- HDFS-3075.
Major improvement reported by brandonli and fixed by brandonli (name-node)
Backport HADOOP-4885 to branch-1
When a storage directory is inaccessible, namenode removes it from the valid storage dir list to a removedStorageDirs list. Those storage directories will not be restored when they become healthy again.
The proposed solution is to restore the previous failed directories at the beginning of checkpointing, say, rollEdits, by copying necessary metadata files from healthy directory to unhealthy ones. In this way, whenever a failed storage directory is recovered by the administrator, he/she can ...
- HDFS-3101.
Major bug reported by wangzw and fixed by szetszwo (hdfs client)
cannot read empty file using webhdfs
STEP:
1, create a new EMPTY file
2, read it using webhdfs.
RESULT:
expected: get a empty file
I got: {"RemoteException":{"exception":"IOException","javaClassName":"java.io.IOException","message":"Offset=0 out of the range [0, 0); OPEN, path=/testFile"}}
First of all, [0, 0) is not a valid range, and I think read a empty file should be OK.
- MAPREDUCE-764.
Blocker bug reported by klbostee and fixed by klbostee (contrib/streaming)
TypedBytesInput's readRaw() does not preserve custom type codes
The typed bytes format supports byte sequences of the form {{<custom type code> <length> <bytes>}}. When reading such a sequence via {{TypedBytesInput}}'s {{readRaw()}} method, however, the returned sequence currently is {{0 <length> <bytes>}} (0 is the type code for a bytes array), which leads to bugs such as the one described [here|http://dumbo.assembla.com/spaces/dumbo/tickets/54].
- MAPREDUCE-3583.
Critical bug reported by zhihyu@ebaysf.com and fixed by zhihyu@ebaysf.com
ProcfsBasedProcessTree#constructProcessInfo() may throw NumberFormatException
HBase PreCommit builds frequently gave us NumberFormatException.
From https://builds.apache.org/job/PreCommit-HBASE-Build/553//testReport/org.apache.hadoop.hbase.mapreduce/TestHFileOutputFormat/testMRIncrementalLoad/:
{code}
2011-12-20 01:44:01,180 WARN [main] mapred.JobClient(784): No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String).
java.lang.NumberFormatException: For input string: "18446743988060683582"
at java.lang.NumberFormatException.fo...
- MAPREDUCE-3773.
Major new feature reported by owen.omalley and fixed by owen.omalley (jobtracker)
Add queue metrics with buckets for job run times
It would be nice to have queue metrics that reflect the number of jobs in each queue that have been running for different ranges of time.
Reasonable time ranges are probably 0-1 hr, 1-5 hr, 5-24 hr, 24+ hrs; but they should be configurable.
- MAPREDUCE-3824.
Critical bug reported by aw and fixed by tgraves (distributed-cache)
Distributed caches are not removed properly
Distributed caches are not being properly removed by the TaskTracker when they are expected to be expired.
Changes since Hadoop 1.0.0
Jiras with Release Notes (describe major or incompatible changes)
- HADOOP-8009.
Critical improvement reported by tucu00 and fixed by tucu00 (build)
Create hadoop-client and hadoop-minicluster artifacts for downstream projects
Generate integration artifacts "org.apache.hadoop:hadoop-client" and "org.apache.hadoop:hadoop-minicluster" containing all the jars needed to use Hadoop client APIs, and to run Hadoop MiniClusters, respectively. Push these artifacts to the maven repository when mvn-deploy, along with existing artifacts.
- HADOOP-8037.
Blocker bug reported by mattf and fixed by gkesavan (build)
Binary tarball does not preserve platform info for native builds, and RPMs fail to provide needed symlinks for libhadoop.so
This fix is marked "incompatible" only because it changes the bin-tarball directory structure to be consistent with the source tarball directory structure. The source tarball is unchanged. RPMs and DEBs now use an intermediate bin-tarball with an "${os.arch}" tag (like the packages themselves). The un-tagged bin-tarball is now multi-platform and retains the structure of the source tarball; it is in fact generated by target "tar", not by target "binary". Finally, in the 64-bit RPMs and DEBs, the native libs go in the "lib64" directory instead of "lib".
- MAPREDUCE-3184.
Major improvement reported by tlipcon and fixed by tlipcon (jobtracker)
Improve handling of fetch failures when a tasktracker is not responding on HTTP
The TaskTracker now has a thread which monitors for a known Jetty bug in which the selector thread starts spinning and map output can no longer be served. If the bug is detected, the TaskTracker will shut itself down. This feature can be disabled by setting mapred.tasktracker.jetty.cpu.check.enabled to false.
Other Jiras (describe bug fixes and minor changes)
- HADOOP-7470.
Minor improvement reported by stevel@apache.org and fixed by enis (util)
move up to Jackson 1.8.8
I see that hadoop-core still depends on Jackson 1.0.1 -but that project is now up to 1.8.2 in releases. Upgrading will make it easier for other Jackson-using apps that are more up to date to keep their classpath consistent.
The patch would be updating the ivy file to pull in the later version; no test
- HADOOP-7960.
Major bug reported by gkesavan and fixed by mattf
Port HADOOP-5203 to branch-1, build version comparison is too restrictive
hadoop services should not be using the build timestamp to verify version difference in the cluster installation. Instead it should use the source checksum as in HADOOP-5203.
- HADOOP-7964.
Blocker bug reported by kihwal and fixed by daryn (security, util)
Deadlock in class init.
After HADOOP-7808, client-side commands hang occasionally. There are cyclic dependencies in NetUtils and SecurityUtil class initialization. Upon initial look at the stack trace, two threads deadlock when they hit the either of class init the same time.
- HADOOP-7987.
Major improvement reported by devaraj and fixed by jnp (security)
Support setting the run-as user in unsecure mode
Some applications need to be able to perform actions (such as launch MR jobs) from map or reduce tasks. In earlier unsecure versions of hadoop (20.x), it was possible to do this by setting user.name in the configuration. But in 20.205 and 1.0, when running in unsecure mode, this does not work. (In secure mode, you can do this using the kerberos credentials).
- HADOOP-7988.
Major bug reported by jnp and fixed by jnp
Upper case in hostname part of the principals doesn't work with kerberos.
Kerberos doesn't like upper case in the hostname part of the principals.
This issue has been seen in 23 as well as 1.0.
- HADOOP-8010.
Minor bug reported by rvs and fixed by rvs (scripts)
hadoop-config.sh spews error message when HADOOP_HOME_WARN_SUPPRESS is set to true and HADOOP_HOME is present
Running hadoop daemon commands when HADOOP_HOME_WARN_SUPPRESS is set to true and HADOOP_HOME is present produces:
{noformat}
[: 76: true: unexpected operator
{noformat}
- HADOOP-8052.
Major bug reported by reznor and fixed by reznor (metrics)
Hadoop Metrics2 should emit Float.MAX_VALUE (instead of Double.MAX_VALUE) to avoid making Ganglia's gmetad core
Ganglia's gmetad converts the doubles emitted by Hadoop's Metrics2 system to strings, and the buffer it uses is 256 bytes wide.
When the SampleStat.MinMax class (in org.apache.hadoop.metrics2.util) emits its default min value (currently initialized to Double.MAX_VALUE), it ends up causing a buffer overflow in gmetad, which causes it to core, effectively rendering Ganglia useless (for some, the core is continuous; for others who are more fortunate, it's only a one-time Hadoop-startup-time thi...
- HDFS-2379.
Critical bug reported by tlipcon and fixed by tlipcon (data-node)
0.20: Allow block reports to proceed without holding FSDataset lock
As disks are getting larger and more plentiful, we're seeing DNs with multiple millions of blocks on a single machine. When page cache space is tight, block reports can take multiple minutes to generate. Currently, during the scanning of the data directories to generate a report, the FSVolumeSet lock is held. This causes writes and reads to block, timeout, etc, causing big problems especially for clients like HBase.
This JIRA is to explore some of the ideas originally discussed in HADOOP-458...
- HDFS-2814.
Minor improvement reported by hitesh and fixed by hitesh
NamenodeMXBean does not account for svn revision in the version information
Unlike the jobtracker where both the UI and jmx information report the version as "x.y.z, r<svn revision", in case of the namenode, the UI displays x.y.z and svn revision info but the jmx output only contains the x.y.z version.
- MAPREDUCE-3343.
Major bug reported by ahmed.radwan and fixed by zhaoyunjiong (mrv1)
TaskTracker Out of Memory because of distributed cache
This Out of Memory happens when you run large number of jobs (using the distributed cache) on a TaskTracker.
Seems the basic issue is with the distributedCacheManager (instance of TrackerDistributedCacheManager in TaskTracker.java), this gets created during TaskTracker.initialize(), and it keeps references to TaskDistributedCacheManager for every submitted job via the jobArchives Map, also references to CacheStatus via cachedArchives map. I am not seeing these cleaned up between jobs, so th...
- MAPREDUCE-3607.
Major improvement reported by tomwhite and fixed by tomwhite (client)
Port missing new API mapreduce lib classes to 1.x
There are a number of classes under mapreduce.lib that are not present in the 1.x series. Including these would help users and downstream projects using the new MapReduce API migrate to later versions of Hadoop in the future.
A few examples of where this would help:
* Sqoop uses mapreduce.lib.db.DBWritable and mapreduce.lib.input.CombineFileInputFormat (SQOOP-384).
* Mahout uses mapreduce.lib.output.MultipleOutputs (MAHOUT-822).
* HBase has a backport of mapreduce.lib.partition.InputSampler ...
Changes since Hadoop 0.20.205.0
Jiras with Release Notes (describe major or incompatible changes)
- HADOOP-7728.
Major bug reported by rramya and fixed by rramya (conf)
hadoop-setup-conf.sh should be modified to enable task memory manager
Enable task memory management to be configurable via hadoop config setup script.
- HADOOP-7740.
Minor bug reported by arpitgupta and fixed by arpitgupta (conf)
security audit logger is not on by default, fix the log4j properties to enable the logger
Fixed security audit logger configuration. (Arpit Gupta via Eric Yang)
- HADOOP-7923.
Major task reported by szetszwo and fixed by szetszwo (build, documentation)
Update doc versions from 0.20 to 1.0
Docs version number is now automatically updated by reference to the build number.
- HDFS-617.
Major improvement reported by kzhang and fixed by kzhang (hdfs client, name-node)
Support for non-recursive create() in HDFS
New DFSClient.create(...) allows option of not creating missing parent(s).
- HDFS-2246.
Major improvement reported by sanjay.radia and fixed by jnp
Shortcut a local client reads to a Datanodes files directly
1. New configurations
a. dfs.block.local-path-access.user is the key in datanode configuration to specify the user allowed to do short circuit read.
b. dfs.client.read.shortcircuit is the key to enable short circuit read at the client side configuration.
c. dfs.client.read.shortcircuit.skip.checksum is the key to bypass checksum check at the client side.
2. By default none of the above are enabled and short circuit read will not kick in.
3. If security is on, the feature can be used only for user that has kerberos credentials at the client, therefore map reduce tasks cannot benefit from it in general.
- HDFS-2316.
Major new feature reported by szetszwo and fixed by szetszwo
[umbrella] webhdfs: a complete FileSystem implementation for accessing HDFS over HTTP
Provide webhdfs as a complete FileSystem implementation for accessing HDFS over HTTP.
Previous hftp feature was a read-only FileSystem and does not provide "write" accesses.
Other Jiras (describe bug fixes and minor changes)
- HADOOP-5124.
Major improvement reported by hairong and fixed by hairong
A few optimizations to FsNamesystem#RecentInvalidateSets
This jira proposes a few optimization to FsNamesystem#RecentInvalidateSets:
1. when removing all replicas of a block, it does not traverse all nodes in the map. Instead it traverse only the nodes that the block is located.
2. When dispatching blocks to datanodes in ReplicationMonitor. It randomly chooses a predefined number of datanodes and dispatches blocks to those datanodes. This strategy provides fairness to all datanodes. The current strategy always starts from the first datanode.
- HADOOP-6840.
Minor improvement reported by nspiegelberg and fixed by jnp (fs, io)
Support non-recursive create() in FileSystem & SequenceFile.Writer
The proposed solution for HBASE-2312 requires the sequence file to handle a non-recursive create. This is already supported by HDFS, but needs to have an equivalent FileSystem & SequenceFile.Writer API.
- HADOOP-6886.
Minor improvement reported by nspiegelberg and fixed by (fs)
LocalFileSystem Needs createNonRecursive API
While running sanity check tests for HBASE-2312, I noticed that HDFS-617 did not include createNonRecursive() support for the LocalFileSystem. This is a problem for HBase, which allows the user to run over the LocalFS instead of HDFS for local cluster testing. I think this only affects 0.20-append, but may affect the trunk based upon how exactly FileContext handles non-recursive creates.
- HADOOP-7461.
Major bug reported by rbodkin and fixed by gkesavan (build)
Jackson Dependency Not Declared in Hadoop POM
(COMMENT: This bug still affects 0.20.205.0, four months after the bug was filed. This causes total failure, and the fix is trivial for whoever manages the POM -- just add the missing dependency! --ben)
This issue was identified and the fix & workaround was documented at
https://issues.cloudera.org/browse/DISTRO-44
The issue affects use of Hadoop 0.20.203.0 from the Maven central repo. I built a job using that maven repo and ran it, resulting in this failure:
Exception in thread "main" ...
- HADOOP-7664.
Minor improvement reported by raviprak and fixed by raviprak (conf)
o.a.h.conf.Configuration complains of overriding final parameter even if the value with which its attempting to override is the same.
o.a.h.conf.Configuration complains of overriding final parameter even if the value with which its attempting to override is the same.
- HADOOP-7765.
Major bug reported by eyang and fixed by eyang (build)
Debian package contain both system and tar ball layout
When packaging is invoked as "ant clean tar deb". The system creates both system layout and tarball layout in the same build directory. Debian packaging target would pick up files for both layouts. The end result of using produced debian package built this way, would end up README.txt LICENSE.txt, and jar files in /usr.
- HADOOP-7784.
Major bug reported by arpitgupta and fixed by eyang
secure datanodes fail to come up stating jsvc not found
building 205.1 and trying to startup a secure dn leads to the following
/usr/libexec/../bin/hadoop: line 386: /usr/libexec/../libexec/jsvc.amd64: No such file or directory
/usr/libexec/../bin/hadoop: line 386: exec: /usr/libexec/../libexec/jsvc.amd64: cannot execute: No such file or directory
- HADOOP-7804.
Major improvement reported by arpitgupta and fixed by arpitgupta (conf)
enable hadoop config generator to set dfs.block.local-path-access.user to enable short circuit read
we have a new config that allows to select which user can have access for short circuit read. We should make that configurable through the config generator scripts.
- HADOOP-7815.
Minor bug reported by rramya and fixed by rramya (conf)
Map memory mb is being incorrectly set by hadoop-setup-conf.sh
HADOOP-7728 enabled task memory management to be configurable in the hadoop-setup-conf.sh. However, the default value for mapred.job.map.memory.mb is being set incorrectly.
- HADOOP-7816.
Major bug reported by davet and fixed by davet
Allow HADOOP_HOME deprecated warning suppression based on config specified in hadoop-env.sh
Move suppression check for "Warning: $HADOOP_HOME is deprecated" to after sourcing of hadoop-env.sh so that people can set HADOOP_HOME_WARN_SUPPRESS inside the config.
- HADOOP-7853.
Blocker bug reported by daryn and fixed by daryn (security)
multiple javax security configurations cause conflicts
Both UGI and the SPNEGO KerberosAuthenticator set the global javax security configuration. SPNEGO stomps on UGI's security config which leads to kerberos/SASL authentication errors.
- HADOOP-7854.
Critical bug reported by daryn and fixed by daryn (security)
UGI getCurrentUser is not synchronized
Sporadic {{ConcurrentModificationExceptions}} are originating from {{UGI.getCurrentUser}} when it needs to create a new instance. The problem was specifically observed in a JT under heavy load when a post-job cleanup is accessing the UGI while a new job is being processed.
- HADOOP-7865.
Major bug reported by jnp and fixed by jnp
Test Failures in 1.0.0 hdfs/common
Following tests in hdfs and common are failing
1. TestFileAppend2
2. TestFileConcurrentReader
3. TestDoAsEffectiveUser
- HADOOP-7869.
Critical bug reported by owen.omalley and fixed by owen.omalley (scripts)
HADOOP_HOME warning happens all of the time
With HADOOP-7816, the check for HADOOP_HOME has moved after it is set by hadoop-config so that it always happens unless HADOOP_HOME_WARN_SUPPRESS is set in hadoop-env or the environment.
- HDFS-611.
Major bug reported by dhruba and fixed by zshao (data-node)
Heartbeats times from Datanodes increase when there are plenty of blocks to delete
I am seeing that when we delete a large directory that has plenty of blocks, the heartbeat times from datanodes increase significantly from the normal value of 3 seconds to as large as 50 seconds or so. The heartbeat thread in the Datanode deletes a bunch of blocks sequentially, this causes the heartbeat times to increase.
- HDFS-1257.
Major bug reported by rvadali and fixed by eepayne (name-node)
Race condition on FSNamesystem#recentInvalidateSets introduced by HADOOP-5124
HADOOP-5124 provided some improvements to FSNamesystem#recentInvalidateSets. But it introduced unprotected access to the data structure recentInvalidateSets. Specifically, FSNamesystem.computeInvalidateWork accesses recentInvalidateSets without read-lock protection. If there is concurrent activity (like reducing replication on a file) that adds to recentInvalidateSets, the name-node crashes with a ConcurrentModificationException.
- HDFS-1943.
Blocker bug reported by weiyj and fixed by mattf (scripts)
fail to start datanode while start-dfs.sh is executed by root user
When start-dfs.sh is run by root user, we got the following error message:
# start-dfs.sh
Starting namenodes on [localhost ]
localhost: namenode running as process 2556. Stop it first.
localhost: starting datanode, logging to /usr/hadoop/hadoop-common-0.23.0-SNAPSHOT/bin/../logs/hadoop-root-datanode-cspf01.out
localhost: Unrecognized option: -jvm
localhost: Could not create the Java virtual machine.
The -jvm options should be passed to jsvc when we starting a secure
datanode, but it still pa...
- HDFS-2065.
Major bug reported by bharathm and fixed by umamaheswararao
Fix NPE in DFSClient.getFileChecksum
The following code can throw NPE if callGetBlockLocations returns null.
If server returns null
{code}
List<LocatedBlock> locatedblocks
= callGetBlockLocations(namenode, src, 0, Long.MAX_VALUE).getLocatedBlocks();
{code}
The right fix for this is server should throw right exception.
- HDFS-2346.
Blocker bug reported by umamaheswararao and fixed by lakshman (test)
TestHost2NodesMap & TestReplicasMap will fail depending upon execution order of test methods
- HDFS-2416.
Major sub-task reported by arpitgupta and fixed by jnp
distcp with a webhdfs uri on a secure cluster fails
- HDFS-2424.
Major sub-task reported by arpitgupta and fixed by szetszwo
webhdfs liststatus json does not convert to a valid xml document
- HDFS-2427.
Major sub-task reported by arpitgupta and fixed by szetszwo
webhdfs mkdirs api call creates path with 777 permission, we should default it to 755
- HDFS-2428.
Major sub-task reported by arpitgupta and fixed by szetszwo
webhdfs api parameter validation should be better
PUT Request: http://localhost:50070/webhdfs/some_path?op=MKDIRS&permission=955
Exception returned
HTTP/1.1 500 Internal Server Error
{"RemoteException":{"className":"com.sun.jersey.api.ParamException$QueryParamException","message":"java.lang.NumberFormatException: For input string: \"955\""}}
We should return a 400 with appropriate error message
- HDFS-2432.
Major sub-task reported by arpitgupta and fixed by szetszwo
webhdfs setreplication api should return a 403 when called on a directory
Currently the set replication api on a directory leads to a 200.
Request URI http://NN:50070/webhdfs/tmp/webhdfs_data/dir_replication_tests?op=SETREPLICATION&replication=5
Request Method: PUT
Status Line: HTTP/1.1 200 OK
Response Content: {"boolean":false}
Since we can determine that this call did not succeed (boolean=false) we should rather just return a 403
- HDFS-2439.
Major sub-task reported by arpitgupta and fixed by szetszwo
webhdfs open an invalid path leads to a 500 which states a npe, we should return a 404 with appropriate error message
- HDFS-2441.
Major sub-task reported by arpitgupta and fixed by szetszwo
webhdfs returns two content-type headers
$ curl -i "http://localhost:50070/webhdfs/path?op=GETFILESTATUS"
HTTP/1.1 200 OK
Content-Type: text/html; charset=utf-8
Expires: Thu, 01-Jan-1970 00:00:00 GMT
........
Content-Type: application/json
Transfer-Encoding: chunked
Server: Jetty(6.1.26)
It should only return one content type header = application/json
- HDFS-2450.
Major bug reported by rajsaha and fixed by daryn
Only complete hostname is supported to access data via hdfs://
If my complete hostname is host1.abc.xyz.com, only complete hostname must be used to access data via hdfs://
I am running following in .20.205 Client to get data from .20.205 NN (host1)
$hadoop dfs -copyFromLocal /etc/passwd hdfs://host1/tmp
copyFromLocal: Wrong FS: hdfs://host1/tmp, expected: hdfs://host1.abc.xyz.com
Usage: java FsShell [-copyFromLocal <localsrc> ... <dst>]
$hadoop dfs -copyFromLocal /etc/passwd hdfs://host1.abc/tmp/
copyFromLocal: Wrong FS: hdfs://host1.blue/tmp/1, exp...
- HDFS-2453.
Major sub-task reported by arpitgupta and fixed by szetszwo
tail using a webhdfs uri throws an error
/usr//bin/hadoop --config /etc/hadoop dfs -tail webhdfs://NN:50070/file
tail: HTTP_PARTIAL expected, received 200
- HDFS-2494.
Major sub-task reported by umamaheswararao and fixed by umamaheswararao (data-node)
[webhdfs] When Getting the file using OP=OPEN with DN http address, ESTABLISHED sockets are growing.
As part of the reliable test,
Scenario:
Initially check the socket count. ---there are aroud 42 sockets are there.
open the file with DataNode http address using op=OPEN request parameter about 500 times in loop.
Wait for some time and check the socket count. --- There are thousands of ESTABLISHED sockets are growing. ~2052
Here is the netstat result:
C:\Users\uma>netstat | grep 127.0.0.1 | grep ESTABLISHED |wc -l
2042
C:\Users\uma>netstat | grep 127.0.0.1 | grep ESTABLISHED |wc -l
2042
C:\...
- HDFS-2501.
Major sub-task reported by szetszwo and fixed by szetszwo
add version prefix and root methods to webhdfs
- HDFS-2527.
Major sub-task reported by szetszwo and fixed by szetszwo
Remove the use of Range header from webhdfs
- HDFS-2528.
Major sub-task reported by arpitgupta and fixed by szetszwo
webhdfs rest call to a secure dn fails when a token is sent
curl -L -u : --negotiate -i "http://NN:50070/webhdfs/v1/tmp/webhdfs_data/file_small_data.txt?op=OPEN"
the following exception is thrown by the datanode when the redirect happens.
{"RemoteException":{"exception":"IOException","javaClassName":"java.io.IOException","message":"Call to failed on local exception: java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]"}}
...
- HDFS-2539.
Major sub-task reported by szetszwo and fixed by szetszwo
Support doAs and GETHOMEDIRECTORY in webhdfs
- HDFS-2540.
Major sub-task reported by szetszwo and fixed by szetszwo
Change WebHdfsFileSystem to two-step create/append
- HDFS-2552.
Major task reported by szetszwo and fixed by szetszwo (documentation)
Add WebHdfs Forrest doc
- HDFS-2589.
Major bug reported by daryn and fixed by daryn (security)
unnecessary hftp token fetch and renewal thread
Instantiation of the hftp filesystem is causing a token to be implicitly created and added to a custom token renewal thread. With the new token renewal feature in the JT, this causes the mapreduce {{obtainTokensForNamenodes}} to fetch two tokens (an implicit and uncancelled token, and an explicit token) and leave a spurious renewal thread running. This thread should not be running in the JT.
After speaking with Owen, the quick solution is to lazy fetch the token, and to lazy start the rene...
- HDFS-2590.
Major bug reported by szetszwo and fixed by szetszwo (documentation)
Some links in WebHDFS forrest doc do not work
Some links are pointing to DistributedFileSystem javadoc but the javadoc of DistributedFileSystem is not generated by default.
- HDFS-2604.
Minor improvement reported by szetszwo and fixed by szetszwo (data-node, documentation, name-node)
Add a log message to show if WebHDFS is enabled
WebHDFS can be enabled/disabled by the conf key {{dfs.webhdfs.enabled}}. Let's add a log message to show if it is enabled.
- HDFS-2673.
Trivial bug reported by umamaheswararao and fixed by umamaheswararao (name-node)
While Namenode processing the blocksBeingWrittenReport, it will log incorrect number blocks count
In NameNode#blocksBeingWrittenReport
we have the following stateChangeLog
{code}
stateChangeLog.info("*BLOCK* NameNode.blocksBeingWrittenReport: "
+"from "+nodeReg.getName()+" "+blocks.length +" blocks");
{code}
here blocks is long array. Every consecutive 3 elements represents a block ( length, blockid, genstamp).
So, here in log message, blocks.length should be blocks.length/3.
- MAPREDUCE-3169.
Major improvement reported by tlipcon and fixed by ahmed.radwan (mrv1, mrv2, test)
Create a new MiniMRCluster equivalent which only provides client APIs cross MR1 and MR2
Many dependent projects like HBase, Hive, Pig, etc, depend on MiniMRCluster for writing tests. Many users do as well. MiniMRCluster, however, exposes MR implementation details like the existence of TaskTrackers, JobTrackers, etc, since it was used by MR1 for testing the server implementations as well.
This JIRA is to create a new interface which could be implemented either by MR1 or MR2 that exposes only the client-side portions of the MR framework. Ideally it would be "recompile-compatible"...
- MAPREDUCE-3319.
Blocker bug reported by rvs and fixed by subrotosanyal (examples)
multifilewc from hadoop examples seems to be broken in 0.20.205.0
{noformat}
/usr/lib/hadoop/bin/hadoop jar /usr/lib/hadoop/hadoop-examples-0.20.205.0.22.jar multifilewc examples/text examples-output/multifilewc
11/10/31 16:50:26 INFO mapred.FileInputFormat: Total input paths to process : 2
11/10/31 16:50:26 INFO mapred.JobClient: Running job: job_201110311350_0220
11/10/31 16:50:27 INFO mapred.JobClient: map 0% reduce 0%
11/10/31 16:50:42 INFO mapred.JobClient: Task Id : attempt_201110311350_0220_m_000000_0, Status : FAILED
java.lang.ClassCastException: ...
- MAPREDUCE-3374.
Major bug reported by rvs and fixed by (task-controller)
src/c++/task-controller/configure is not set executable in the tarball and that prevents task-controller from rebuilding
ant task-controller fails because src/c++/task-controller/configure is not set executable
- MAPREDUCE-3475.
Major bug reported by daryn and fixed by daryn (jobtracker)
JT can't renew its own tokens
When external systems submit jobs whose tasks need to submit additional jobs (such as oozie/pig), they include their own MR token used to submit the job. The token's renewer may not allow the JT to renew the token. The JT log will include very long SASL/GSSAPI exceptions when the job is submitted. It is also dubious for the JT to renew its token because it renders the expiry as meaningless since the JT will renew its own token until the max lifetime is exceeded.
After speaking with Owen &...
- MAPREDUCE-3480.
Major bug reported by jnp and fixed by jnp
TestJvmReuse fails in 1.0
TestJvmReuse is failing in apache builds, although it passes in my local machine.
Changes since Hadoop 0.20.204.0
- HADOOP-6722.
Major bug reported by tlipcon and fixed by tlipcon (util)
NetUtils.connect should check that it hasn't connected a socket to itself
I had no idea this was possible, but it turns out that a TCP connection will be established in the rare case that the local side of the socket binds to the ephemeral port that you later try to connect to. This can present itself in very very rare occasion when an RPC client is trying to connect to a daemon running on the same node, but that daemon is down. To see what I'm talking about, run "while true ; do telnet localhost 60020 ; done" on a multicore box and wait several minutes.
This can ...
- HADOOP-6833.
Blocker bug reported by tlipcon and fixed by tlipcon
IPC leaks call parameters when exceptions thrown
HADOOP-6498 moved the calls.remove() call lower into the SUCCESS clause of receiveResponse(), but didn't put a similar calls.remove into the ERROR clause. So, any RPC call that throws an exception ends up orphaning the Call object in the connection's "calls" hashtable. This prevents cleanup of the connection and is a memory leak for the call parameters.
- HADOOP-6889.
Major new feature reported by hairong and fixed by johnvijoe (ipc)
Make RPC to have an option to timeout
Currently Hadoop RPC does not timeout when the RPC server is alive. What it currently does is that a RPC client sends a ping to the server whenever a socket timeout happens. If the server is still alive, it continues to wait instead of throwing a SocketTimeoutException. This is to avoid a client to retry when a server is busy and thus making the server even busier. This works great if the RPC server is NameNode.
But Hadoop RPC is also used for some of client to DataNode communications, for e...
- HADOOP-7119.
Major new feature reported by tucu00 and fixed by tucu00 (security)
add Kerberos HTTP SPNEGO authentication support to Hadoop JT/NN/DN/TT web-consoles
Adding support for Kerberos HTTP SPNEGO authentication to the Hadoop web-consoles
- HADOOP-7314.
Major improvement reported by naisbitt and fixed by naisbitt
Add support for throwing UnknownHostException when a host doesn't resolve
As part of MAPREDUCE-2489, we need support for having the resolve methods (for DNS mapping) throw UnknownHostExceptions. (Currently, they hide the exception). Since the existing 'resolve' method is ultimately used by several other locations/components, I propose we add a new 'resolveValidHosts' method.
- HADOOP-7343.
Minor improvement reported by tgraves and fixed by tgraves (test)
backport HADOOP-7008 and HADOOP-7042 to branch-0.20-security
backport HADOOP-7008 and HADOOP-7042 to branch-0.20-security so that we can enable test-patch.sh to have a configured number of acceptable findbugs and javadoc warnings
- HADOOP-7400.
Major bug reported by gkesavan and fixed by gkesavan (build)
HdfsProxyTests fails when the -Dtest.build.dir and -Dbuild.test is set
HdfsProxyTests fails when the -Dtest.build.dir and -Dbuild.test is set a dir other than build dir
test-junit:
[copy] Copying 1 file to /home/y/var/builds/thread2/workspace/Cloud-Hadoop-0.20.1xx-Secondary/src/contrib/hdfsproxy/src/test/resources/proxy-config
[junit] Running org.apache.hadoop.hdfsproxy.TestHdfsProxy
[junit] Tests run: 1, Failures: 0, Errors: 1, Time elapsed: 0 sec
[junit] Test org.apache.hadoop.hdfsproxy.TestHdfsProxy FAILED
- HADOOP-7432.
Major improvement reported by sherri_chen and fixed by sherri_chen
Back-port HADOOP-7110 to 0.20-security
HADOOP-7110 implemented chmod in the NativeIO library so we can have good performance (ie not fork) and still not be prone to races. This should fix build failures (and probably task failures too).
- HADOOP-7472.
Minor improvement reported by kihwal and fixed by kihwal (ipc)
RPC client should deal with the IP address changes
The current RPC client implementation and the client-side callers assume that the hostname-address mappings of servers never change. The resolved address is stored in an immutable InetSocketAddress object above/outside RPC, and the reconnect logic in the RPC Connection implementation also trusts the resolved address that was passed down.
If the NN suffers a failure that requires migration, it may be started on a different node with a different IP address. In this case, even if the name-addre...
- HADOOP-7510.
Major improvement reported by daryn and fixed by daryn (security)
Tokens should use original hostname provided instead of ip
Tokens currently store the ip:port of the remote server. This precludes tokens from being used after a host's ip is changed. Tokens should store the hostname used to make the RPC connection. This will enable new processes to use their existing tokens.
- HADOOP-7539.
Major bug reported by johnvijoe and fixed by johnvijoe
merge hadoop archive goodness from trunk to .20
hadoop archive in branch-0.20-security is outdated. When run recently, it produced some bugs which were all fixed in trunk. This JIRA aims to bring in all these JIRAs to branch-0.20-security.
- HADOOP-7594.
Major new feature reported by szetszwo and fixed by szetszwo
Support HTTP REST in HttpServer
Provide an API in HttpServer for supporting HTTP REST.
This is a part of HDFS-2284.
- HADOOP-7596.
Major bug reported by eyang and fixed by eyang (build)
Enable jsvc to work with Hadoop RPM package
For secure Hadoop 0.20.2xx cluster, datanode can only run with 32 bit jvm because Hadoop only packages 32 bit jsvc. The build process should download proper jsvc versions base on the build architecture. In addition, the shell script should be enhanced to locate hadoop jar files in the proper location.
- HADOOP-7599.
Major bug reported by eyang and fixed by eyang (scripts)
Improve hadoop setup conf script to setup secure Hadoop cluster
Setting up a secure Hadoop cluster requires a lot of manual setup. The motivation of this jira is to provide setup scripts to automate setup secure Hadoop cluster.
- HADOOP-7602.
Major bug reported by johnvijoe and fixed by johnvijoe
wordcount, sort etc on har files fails with NPE
wordcount, sort etc on har files fails with NPE@createSocketAddr(NetUtils.java:137).
- HADOOP-7603.
Major bug reported by eyang and fixed by eyang
Set default hdfs, mapred uid, and hadoop group gid for RPM packages
Set hdfs, mapred uid, and hadoop uid to fixed numbers. (Eric Yang)
- HADOOP-7610.
Major bug reported by eyang and fixed by eyang (scripts)
/etc/profile.d does not exist on Debian
As part of post installation script, there is a symlink created in /etc/profile.d/hadoop-env.sh to source /etc/hadoop/hadoop-env.sh. Therefore, users do not need to configure HADOOP_* environment. Unfortunately, /etc/profile.d only exists in Ubuntu. [Section 9.9 of the Debian Policy|http://www.debian.org/doc/debian-policy/ch-opersys.html#s9.9] states:
{quote}
A program must not depend on environment variables to get reasonable defaults. (That's because these environment variables would ha...
- HADOOP-7615.
Major bug reported by eyang and fixed by eyang (scripts)
Binary layout does not put share/hadoop/contrib/*.jar into the class path
For contrib projects, contrib jar files are not included in HADOOP_CLASSPATH in the binary layout. Several projects jar files should be copied to $HADOOP_PREFIX/share/hadoop/lib for binary deployment. The interesting jar files to include in $HADOOP_PREFIX/share/hadoop/lib are: capacity-scheduler, thriftfs, fairscheduler.
- HADOOP-7625.
Major bug reported by owen.omalley and fixed by owen.omalley
TestDelegationToken is failing in 205
After the patches on Friday, org.apache.hadoop.hdfs.security.TestDelegationToken is failing.
- HADOOP-7626.
Major bug reported by eyang and fixed by eyang (scripts)
Allow overwrite of HADOOP_CLASSPATH and HADOOP_OPTS
Quote email from Ashutosh Chauhan:
bq. There is a bug in hadoop-env.sh which prevents hcatalog server to start in secure settings. Instead of adding classpath, it overrides them. I was not able to verify where the bug belongs to, in HMS or in hadoop scripts. Looks like hadoop-env.sh is generated from hadoop-env.sh.template in installation process by HMS. Hand crafted patch follows:
bq. - export HADOOP_CLASSPATH=$f
bq. +export HADOOP_CLASSPATH=${HADOOP_CLASSPATH}:$f
bq. -export HADOOP_OPTS=...
- HADOOP-7630.
Major bug reported by arpitgupta and fixed by eyang (conf)
hadoop-metrics2.properties should have a property *.period set to a default value foe metrics
currently the hadoop-metrics2.properties file does not have a value set for *.period
This property is useful for metrics to determine when the property will refresh. We should set it to default of 60
- HADOOP-7631.
Major bug reported by rramya and fixed by eyang (conf)
In mapred-site.xml, stream.tmpdir is mapped to ${mapred.temp.dir} which is undeclared.
Streaming jobs seem to fail with the following exception:
{noformat}
Exception in thread "main" java.io.IOException: No such file or directory
at java.io.UnixFileSystem.createFileExclusively(Native Method)
at java.io.File.checkAndCreate(File.java:1704)
at java.io.File.createTempFile(File.java:1792)
at org.apache.hadoop.streaming.StreamJob.packageJobJar(StreamJob.java:603)
at org.apache.hadoop.streaming.StreamJob.setJobConf(StreamJob.java:798)
a...
- HADOOP-7633.
Major bug reported by arpitgupta and fixed by eyang (conf)
log4j.properties should be added to the hadoop conf on deploy
currently the log4j properties are not present in the hadoop conf dir. We should add them so that log rotation happens appropriately and also define other logs that hadoop can generate for example the audit and the auth logs as well as the mapred summary logs etc.
- HADOOP-7637.
Major bug reported by eyang and fixed by eyang (build)
Fair scheduler configuration file is not bundled in RPM
205 build of tar is fine, but rpm failed with:
{noformat}
[rpm] Processing files: hadoop-0.20.205.0-1
[rpm] warning: File listed twice: /usr/libexec
[rpm] warning: File listed twice: /usr/libexec/hadoop-config.sh
[rpm] warning: File listed twice: /usr/libexec/jsvc.i386
[rpm] Checking for unpackaged file(s): /usr/lib/rpm/check-files /tmp/hadoop_package_build_hortonfo/BUILD
[rpm] error: Installed (but unpackaged) file(s) found:
[rpm] /etc/hadoop/fai...
- HADOOP-7644.
Blocker bug reported by owen.omalley and fixed by owen.omalley (security)
Fix the delegation token tests to use the new style renewers
Currently, TestDelegationTokenRenewal and TestDelegationTokenFetcher use the old style renewal and fail.
- HADOOP-7645.
Blocker bug reported by atm and fixed by jnp (security)
HTTP auth tests requiring Kerberos infrastructure are not disabled on branch-0.20-security
The back-port of HADOOP-7119 to branch-0.20-security included tests which require Kerberos infrastructure in order to run. In trunk and 0.23, these are disabled unless one enables the {{testKerberos}} maven profile. In branch-0.20-security, these tests are always run regardless, and so fail most of the time.
See this Jenkins build for an example: https://builds.apache.org/view/G-L/view/Hadoop/job/Hadoop-0.20-security/26/
- HADOOP-7649.
Blocker bug reported by kihwal and fixed by jnp (security, test)
TestMapredGroupMappingServiceRefresh and TestRefreshUserMappings fail after HADOOP-7625
TestMapredGroupMappingServiceRefresh and TestRefreshUserMappings fail after HADOOP-7625.
The classpath has been changed, so they try to create the rsrc file in a jar and fail.
- HADOOP-7655.
Major improvement reported by arpitgupta and fixed by arpitgupta
provide a small validation script that smoke tests the installed cluster
currently we have scripts that will setup a hadoop cluster, create users etc. We should add a script that will smoke test the installed cluster. The script could run 3 small mr jobs teragen, terasort and teravalidate and cleanup once its done.
- HADOOP-7658.
Major bug reported by gkesavan and fixed by eyang
to fix hadoop config template
hadoop rpm config template by default sets the HADOOP_SECURE_DN_USER, HADOOP_SECURE_DN_LOG_DIR & HADOOP_SECURE_DN_PID_DIR
the above values should only be set for secured deployment ;
# On secure datanodes, user to run the datanode as after dropping privileges
export HADOOP_SECURE_DN_USER=${HADOOP_HDFS_USER}
# Where log files are stored. $HADOOP_HOME/logs by default.
export HADOOP_LOG_DIR=${HADOOP_LOG_DIR}/$USER
# Where log files are stored in the secure data environment.
export HADOOP_SE...
- HADOOP-7661.
Major bug reported by jnp and fixed by jnp
FileSystem.getCanonicalServiceName throws NPE for any file system uri that doesn't have an authority.
FileSystem.getCanonicalServiceName throws NPE for any file system uri that doesn't have an authority.
....
java.lang.NullPointerException
at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:138)
at org.apache.hadoop.security.SecurityUtil.buildDTServiceName(SecurityUtil.java:261)
at org.apache.hadoop.fs.FileSystem.getCanonicalServiceName(FileSystem.java:174)
....
- HADOOP-7674.
Major bug reported by jnp and fixed by jnp
TestKerberosName fails in 20 branch.
TestKerberosName fails in 20 branch. In fact this test has got duplicated in 20, with a little change to the rules.
- HADOOP-7676.
Major bug reported by gkesavan and fixed by gkesavan
add rules to the core-site.xml template
add rules for master and region in core-site.xml template.
- HADOOP-7679.
Major bug reported by rramya and fixed by rramya (conf)
log4j.properties templates does not define mapred.jobsummary.logger
In templates/conf/hadoop-env.sh, HADOOP_JOBTRACKER_OPTS is defined as -Dsecurity.audit.logger=INFO,DRFAS -Dmapred.audit.logger=INFO,MRAUDIT -Dmapred.jobsummary.logger=INFO,JSA ${HADOOP_JOBTRACKER_OPTS}
However, in templates/conf/hadoop-env.sh, instead of mapred.jobsummary.logger, hadoop.mapreduce.jobsummary.logger is defined as follows:
hadoop.mapreduce.jobsummary.logger=${hadoop.root.logger}
This is preventing collection of jobsummary logs.
We have to consistently use mapred.jobsummary.logg...
- HADOOP-7681.
Minor bug reported by arpitgupta and fixed by arpitgupta (conf)
log4j.properties is missing properties for security audit and hdfs audit should be changed to info
(Arpit Gupta via Eric Yang)
- HADOOP-7683.
Minor bug reported by arpitgupta and fixed by arpitgupta
hdfs-site.xml template has properties that are not used in 20
properties dfs.namenode.http-address and dfs.namenode.https-address should be removed
- HADOOP-7684.
Major bug reported by eyang and fixed by eyang (scripts)
jobhistory server and secondarynamenode should have init.d script
Added init.d script for jobhistory server and secondary namenode. (Eric Yang)
- HADOOP-7685.
Major bug reported by devaraj.k and fixed by eyang (scripts)
Issues with hadoop-common-project\hadoop-common\src\main\packages\hadoop-setup-conf.sh file
hadoop-common-project\hadoop-common\src\main\packages\hadoop-setup-conf.sh has following issues
1. check_permission does not work as expected if there are two folders with $NAME as part of their name inside $PARENT
e.g. /home/hadoop/conf, /home/hadoop/someconf,
The result of `ls -ln $PARENT | grep -w $NAME| awk '{print $3}'` is non zero..it is 0 0 and hence the following if check becomes true.
{code:xml}
if [ "$OWNER" != "0" ]; then
RESULT=1
break
fi
{code}
2. Spelling mistake
{code:xml}
H...
- HADOOP-7691.
Major bug reported by gkesavan and fixed by eyang
hadoop deb pkg should take a diff group id
Fixed conflict uid for install packages. (Eric Yang)
- HADOOP-7707.
Major improvement reported by arpitgupta and fixed by arpitgupta (conf)
improve config generator to allow users to specify proxy user, turn append on or off, turn webhdfs on or off
Added toggle for dfs.support.append, webhdfs and hadoop proxy user to setup config script. (Arpit Gupta via Eric Yang)
- HADOOP-7708.
Critical bug reported by arpitgupta and fixed by eyang (conf)
config generator does not update the properties file if on exists already
Fixed hadoop-setup-conf.sh to handle config file consistently. (Eric Yang)
- HADOOP-7710.
Major improvement reported by arpitgupta and fixed by arpitgupta
create a script to setup application in order to create root directories for application such hbase, hcat, hive etc
- HADOOP-7711.
Major bug reported by arpitgupta and fixed by arpitgupta (conf)
hadoop-env.sh generated from templates has duplicate info
Fixed recursive sourcing of HADOOP_OPTS environment variables (Arpit Gupta via Eric Yang)
- HADOOP-7715.
Major bug reported by arpitgupta and fixed by eyang (conf)
see log4j Error when running mr jobs and certain dfs calls
Removed unnecessary security logger configuration. (Eric Yang)
- HADOOP-7720.
Major improvement reported by arpitgupta and fixed by arpitgupta (conf)
improve the hadoop-setup-conf.sh to read in the hbase user and setup the configs
Added parameter for HBase user to setup config script. (Arpit Gupta via Eric Yang)
- HADOOP-7721.
Major bug reported by arpitgupta and fixed by jnp
dfs.web.authentication.kerberos.principal expects the full hostname and does not replace _HOST with the hostname
- HADOOP-7724.
Major bug reported by gkesavan and fixed by arpitgupta
hadoop-setup-conf.sh should put proxy user info into the core-site.xml
Fixed hadoop-setup-conf.sh to put proxy user in core-site.xml. (Arpit Gupta via Eric Yang)
- HDFS-142.
Blocker bug reported by rangadi and fixed by dhruba
In 0.20, move blocks being written into a blocksBeingWritten directory
Before 0.18, when Datanode restarts, it deletes files under data-dir/tmp directory since these files are not valid anymore. But in 0.18 it moves these files to normal directory incorrectly making them valid blocks. One of the following would work :
- remove the tmp files during upgrade, or
- if the files under /tmp are in pre-18 format (i.e. no generation), delete them.
Currently effect of this bug is that, these files end up failing block verification and eventually get deleted. But cause...
- HDFS-200.
Blocker new feature reported by szetszwo and fixed by dhruba
In HDFS, sync() not yet guarantees data available to the new readers
In the append design doc (https://issues.apache.org/jira/secure/attachment/12370562/Appends.doc), it says
* A reader is guaranteed to be able to read data that was 'flushed' before the reader opened the file
However, this feature is not yet implemented. Note that the operation 'flushed' is now called "sync".
- HDFS-561.
Major sub-task reported by kzhang and fixed by kzhang (data-node, hdfs client)
Fix write pipeline READ_TIMEOUT
When writing a file, the pipeline status read timeouts for datanodes are not set up properly.
- HDFS-606.
Major bug reported by shv and fixed by shv (name-node)
ConcurrentModificationException in invalidateCorruptReplicas()
{{BlockManager.invalidateCorruptReplicas()}} iterates over DatanodeDescriptor-s while removing corrupt replicas from the descriptors. This causes {{ConcurrentModificationException}} if there is more than one replicas of the block. I ran into this exception debugging different scenarios in append, but it should be fixed in the trunk too.
- HDFS-630.
Major improvement reported by mry.maillist and fixed by clehene (hdfs client, name-node)
In DFSOutputStream.nextBlockOutputStream(), the client can exclude specific datanodes when locating the next block.
created from hdfs-200.
If during a write, the dfsclient sees that a block replica location for a newly allocated block is not-connectable, it re-requests the NN to get a fresh set of replica locations of the block. It tries this dfs.client.block.write.retries times (default 3), sleeping 6 seconds between each retry ( see DFSClient.nextBlockOutputStream).
This setting works well when you have a reasonable size cluster; if u have few datanodes in the cluster, every retry maybe pick the dead-d...
- HDFS-724.
Blocker bug reported by szetszwo and fixed by hairong (data-node, hdfs client)
Pipeline close hangs if one of the datanode is not responsive.
In the new pipeline design, pipeline close is implemented by sending an additional empty packet. If one of the datanode does not response to this empty packet, the pipeline hangs. It seems that there is no timeout.
- HDFS-826.
Major improvement reported by dhruba and fixed by dhruba (hdfs client)
Allow a mechanism for an application to detect that datanode(s) have died in the write pipeline
HDFS does not replicate the last block of the file that is being currently written to by an application. Every datanode death in the write pipeline decreases the reliability of the last block of the currently-being-written block. This situation can be improved if the application can be notified of a datanode death in the write pipeline. Then, the application can decide what is the right course of action to be taken on this event.
In our use-case, the application can close the file on the fir...
- HDFS-895.
Major improvement reported by dhruba and fixed by tlipcon (hdfs client)
Allow hflush/sync to occur in parallel with new writes to the file
In the current trunk, the HDFS client methods writeChunk() and hflush./sync are syncronized. This means that if a hflush/sync is in progress, an applicationn cannot write data to the HDFS client buffer. This reduces the write throughput of the transaction log in HBase.
The hflush/sync should allow new writes to happen to the HDFS client even when a hflush/sync is in progress. It can record the seqno of the message for which it should receice the ack, indicate to the DataStream thread to sta...
- HDFS-988.
Blocker bug reported by dhruba and fixed by eli (name-node)
saveNamespace race can corrupt the edits log
The adminstrator puts the namenode is safemode and then issues the savenamespace command. This can corrupt the edits log. The problem is that when the NN enters safemode, there could still be pending logSycs occuring from other threads. Now, the saveNamespace command, when executed, would save a edits log with partial writes. I have seen this happen on 0.20.
https://issues.apache.org/jira/browse/HDFS-909?focusedCommentId=12828853&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-...
- HDFS-1054.
Major improvement reported by tlipcon and fixed by tlipcon (hdfs client)
Remove unnecessary sleep after failure in nextBlockOutputStream
If DFSOutputStream fails to create a pipeline, it currently sleeps 6 seconds before retrying. I don't see a great reason to wait at all, much less 6 seconds (especially now that HDFS-630 ensures that a retry won't go back to the bad node). We should at least make it configurable, and perhaps something like backoff makes some sense.
- HDFS-1057.
Blocker sub-task reported by tlipcon and fixed by rash37 (data-node)
Concurrent readers hit ChecksumExceptions if following a writer to very end of file
In BlockReceiver.receivePacket, it calls replicaInfo.setBytesOnDisk before calling flush(). Therefore, if there is a concurrent reader, it's possible to race here - the reader will see the new length while those bytes are still in the buffers of BlockReceiver. Thus the client will potentially see checksum errors or EOFs. Additionally, the last checksum chunk of the file is made accessible to readers even though it is not stable.
- HDFS-1118.
Major bug reported by zshao and fixed by zshao
DFSOutputStream socket leak when cannot connect to DataNode
The offending code is in {{DFSOutputStream.nextBlockOutputStream}}
This function retries several times to call {{createBlockOutputStream}}. Each time when it fails, it leaves a {{Socket}} object in {{DFSOutputStream.s}}.
That object is never closed, but overwritten the next time {{createBlockOutputStream}} is called.
- HDFS-1122.
Major sub-task reported by rash37 and fixed by rash37
client block verification may result in blocks in DataBlockScanner prematurely
found that when the DN uses client verification of a block that is open for writing, it will add it to the DataBlockScanner prematurely.
- HDFS-1141.
Blocker bug reported by tlipcon and fixed by tlipcon (name-node)
completeFile does not check lease ownership
completeFile should check that the caller still owns the lease of the file that it's completing. This is for the 'testCompleteOtherLeaseHoldersFile' case in HDFS-1139.
- HDFS-1164.
Major bug reported by eli and fixed by tlipcon (contrib/hdfsproxy)
TestHdfsProxy is failing
TestHdfsProxy is failing on trunk, seen in HDFS-1132 and HDFS-1143. It doesn't look like hudson posts test results for contrib and it's hard to see what's going on from the raw console output. Can someone with access to hudson upload the individual test output for TestHdfsProxy so we can see what the issue is?
- HDFS-1186.
Blocker bug reported by tlipcon and fixed by tlipcon (data-node)
0.20: DNs should interrupt writers at start of recovery
When block recovery starts (eg due to NN recovering lease) it needs to interrupt any writers currently writing to those blocks. Otherwise, an old writer (who hasn't realized he lost his lease) can continue to write+sync to the blocks, and thus recovery ends up truncating data that has been sync()ed.
- HDFS-1197.
Major bug reported by tlipcon and fixed by (data-node, hdfs client, name-node)
Blocks are considered "complete" prematurely after commitBlockSynchronization or DN restart
I saw this failure once on my internal Hudson job that runs the append tests 48 times a day:
junit.framework.AssertionFailedError: expected:<114688> but was:<98304>
at org.apache.hadoop.hdfs.AppendTestUtil.check(AppendTestUtil.java:112)
at org.apache.hadoop.hdfs.TestFileAppend3.testTC2(TestFileAppend3.java:116)
- HDFS-1202.
Major bug reported by tlipcon and fixed by tlipcon (data-node)
DataBlockScanner throws NPE when updated before initialized
Missing an isInitialized() check in updateScanStatusInternal
- HDFS-1204.
Major bug reported by tlipcon and fixed by rash37
0.20: Lease expiration should recover single files, not entire lease holder
This was brought up in HDFS-200 but didn't make it into the branch on Apache.
- HDFS-1207.
Major bug reported by tlipcon and fixed by tlipcon (name-node)
0.20-append: stallReplicationWork should be volatile
the stallReplicationWork member in FSNamesystem is accessed by multiple threads without synchronization, but isn't marked volatile. I believe this is responsible for about 1% failure rate on TestFileAppend4.testAppendSyncChecksum* on my 8-core test boxes (looking at logs I see replication happening even though we've supposedly disabled it)
- HDFS-1210.
Trivial improvement reported by tlipcon and fixed by tlipcon (hdfs client)
DFSClient should log exception when block recovery fails
Right now we just retry without necessarily showing the exception. It can be useful to see what the error was that prevented the recovery RPC from succeeding.
(I believe this only applies in 0.20 style of block recovery)
- HDFS-1211.
Minor improvement reported by tlipcon and fixed by tlipcon (data-node)
0.20 append: Block receiver should not log "rewind" packets at INFO level
In the 0.20 append implementation, it logs an INFO level message for every packet that "rewinds" the end of the block file. This is really noisy for applications like HBase which sync every edit.
- HDFS-1218.
Critical bug reported by tlipcon and fixed by tlipcon (data-node)
20 append: Blocks recovered on startup should be treated with lower priority during block synchronization
When a datanode experiences power loss, it can come back up with truncated replicas (due to local FS journal replay). Those replicas should not be allowed to truncate the block during block synchronization if there are other replicas from DNs that have _not_ restarted.
- HDFS-1242.
Major test reported by tlipcon and fixed by tlipcon
0.20 append: Add test for appendFile() race solved in HDFS-142
This is a unit test that didn't make it into branch-0.20-append, but worth having in TestFileAppend4.
- HDFS-1252.
Major test reported by tlipcon and fixed by tlipcon (test)
TestDFSConcurrentFileOperations broken in 0.20-appendj
This test currently has several flaws:
- It calls DN.updateBlock with a BlockInfo instance, which then causes java.lang.RuntimeException: java.lang.NoSuchMethodException: org.apache.hadoop.hdfs.server.namenode.BlocksMap$BlockInfo.<init>() in the logs when the DN tries to send blockReceived for the block
- It assumes that getBlockLocations returns an up-to-date length block after a sync, which is false. It happens to work because it calls getBlockLocations directly on the NN, and thus gets a...
- HDFS-1260.
Critical bug reported by tlipcon and fixed by tlipcon
0.20: Block lost when multiple DNs trying to recover it to different genstamps
Saw this issue on a cluster where some ops people were doing network changes without shutting down DNs first. So, recovery ended up getting started at multiple different DNs at the same time, and some race condition occurred that caused a block to get permanently stuck in recovery mode. What seems to have happened is the following:
- FSDataset.tryUpdateBlock called with old genstamp 7091, new genstamp 7094, while the block in the volumeMap (and on filesystem) was genstamp 7093
- we find the b...
- HDFS-1346.
Major bug reported by hairong and fixed by hairong (data-node, hdfs client)
DFSClient receives out of order packet ack
When running 0.20 patched with HDFS-101, we sometimes see an error as follow:
WARN hdfs.DFSClient: DFSOutputStream ResponseProcessor exception for block blk_-2871223654872350746_21421120java.io.IOException: Responseprocessor: Expecting seq
no for block blk_-2871223654872350746_21421120 10280 but received 10281
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSClient.java:2570)
This indicates that DFS client expects an ack for packet N, but receives an ack for packe...
- HDFS-1520.
Major new feature reported by hairong and fixed by hairong (name-node)
HDFS 20 append: Lightweight NameNode operation to trigger lease recovery
Currently HBase uses append to trigger the close of HLog during Hlog split. Append is a very expensive operation, which involves not only NameNode operations but creating a writing pipeline. If one of datanodes on the pipeline has a problem, this recovery may takes minutes. I'd like implement a lightweight NameNode operation to trigger lease recovery and make HBase to use this instead.
- HDFS-1554.
Major improvement reported by hairong and fixed by hairong
Append 0.20: New semantics for recoverLease
Change recoverLease API to return if the file is closed or not. It also change the semantics of recoverLease to start lease recovery immediately.
- HDFS-1555.
Major improvement reported by hairong and fixed by hairong
HDFS 20 append: Disallow pipeline recovery if a file is already being lease recovered
When a file is under lease recovery and the writer is still alive, the write pipeline will be killed and then the writer will start a pipeline recovery. Sometimes the pipeline recovery may race before the lease recovery and as a result fail the lease recovery. This is very bad if we want to support the strong recoverLease semantics in HDFS-1554. So it would be nice if we could disallow a file's pipeline recovery while its lease recovery is in progress.
- HDFS-1779.
Major bug reported by umamaheswararao and fixed by umamaheswararao (data-node, name-node)
After NameNode restart , Clients can not read partial files even after client invokes Sync.
In Append HDFS-200 issue,
If file has 10 blocks and after writing 5 blocks if client invokes sync method then NN will persist the blocks information in edits.
After this if we restart the NN, All the DataNodes will reregister with NN. But DataNodes are not sending the blocks being written information to NN. DNs are sending the blocksBeingWritten information in DN startup. So, here NameNode can not find that the 5 persisted blocks belongs to which datanodes. This information can build based o...
- HDFS-1836.
Major bug reported by hkdennis2k and fixed by bharathm (hdfs client)
Thousand of CLOSE_WAIT socket
$ /usr/sbin/lsof -i TCP:50010 | grep -c CLOSE_WAIT
4471
It is better if everything runs normal.
However, from time to time there are some "DataStreamer Exception: java.net.SocketTimeoutException" and "DFSClient.processDatanodeError(2507) | Error Recovery for" can be found from log file and the number of CLOSE_WAIT socket just keep increasing
The CLOSE_WAIT handles may remain for hours and days; then "Too many open file" some day.
- HDFS-2053.
Minor bug reported by miguno and fixed by miguno (name-node)
Bug in INodeDirectory#computeContentSummary warning
*How to reproduce*
{code}
# create test directories
$ hadoop fs -mkdir /hdfs-1377/A
$ hadoop fs -mkdir /hdfs-1377/B
$ hadoop fs -mkdir /hdfs-1377/C
# ...add some test data (few kB or MB) to all three dirs...
# set space quota for subdir C only
$ hadoop dfsadmin -setSpaceQuota 1g /hdfs-1377/C
# the following two commands _on the parent dir_ trigger the warning
$ hadoop fs -dus /hdfs-1377
$ hadoop fs -count -q /hdfs-1377
{code}
Warning message in the namenode logs:
{code}
2011-06-09 09:42...
- HDFS-2117.
Minor bug reported by eli and fixed by eli (data-node)
DiskChecker#mkdirsWithExistsAndPermissionCheck may return true even when the dir is not created
In branch-0.20-security as part of HADOOP-6566, DiskChecker#mkdirsWithExistsAndPermissionCheck will return true even if it wasn't able to create the directory, which means instead of throwing a DiskErrorException the code will proceed to getFileStatus and throw a FNF exception. Post HADOOP-7040, which modified makeInstance to catch not just DiskErrorExceptions but IOExceptions as well, this is not an issue since now the exception is caught either way. But for future modifications we should st...
- HDFS-2190.
Major bug reported by atm and fixed by atm (name-node)
NN fails to start if it encounters an empty or malformed fstime file
On startup, the NN reads the fstime file of all the configured dfs.name.dirs to determine which one to load. However, if any of the searched directories contain an empty or malformed fstime file, the NN will fail to start. The NN should be able to just proceed with starting and ignore the directory containing the bad fstime file.
- HDFS-2202.
Major new feature reported by eepayne and fixed by eepayne (balancer, data-node)
Changes to balancer bandwidth should not require datanode restart.
New dfsadmin command added: [-setBalancerBandwidth <bandwidth>] where bandwidth is max network bandwidth in bytes per second that the balancer is allowed to use on each datanode during balacing.<br/>
This is an incompatible change in 0.23. The versions of ClientProtocol and DatanodeProtocol are changed.
- HDFS-2259.
Minor bug reported by eli and fixed by eli (data-node)
DN web-UI doesn't work with paths that contain html
The 20-based DN web UI doesn't work with paths that contain html. The paths need to be unescaped when used to access the file and escaped when printed for navigation.
- HDFS-2284.
Major sub-task reported by sanjay.radia and fixed by szetszwo
Write Http access to HDFS
HFTP allows on read access to HDFS via HTTP. Add write HTTP access to HDFS.
- HDFS-2300.
Major bug reported by jnp and fixed by jnp
TestFileAppend4 and TestMultiThreadedSync fail on 20.append and 20-security.
TestFileAppend4 and TestMultiThreadedSync fail on the 20.append and 20-security branch.
- HDFS-2309.
Major bug reported by jnp and fixed by jnp
TestRenameWhileOpen fails in branch-0.20-security
TestRenameWhileOpen is failing in branch-0.20-security.
- HDFS-2317.
Major sub-task reported by szetszwo and fixed by szetszwo
Read access to HDFS using HTTP REST
- HDFS-2318.
Major sub-task reported by szetszwo and fixed by szetszwo
Provide authentication to webhdfs using SPNEGO
Added two new conf properties dfs.web.authentication.kerberos.principal and dfs.web.authentication.kerberos.keytab for the SPNEGO servlet filter.
- HDFS-2320.
Major bug reported by sureshms and fixed by sureshms (data-node, hdfs client, name-node)
Make merged protocol changes from 0.20-append to 0.20-security compatible with previous releases.
0.20-append changes have been merged to 0.20-security. The merge has changes to version numbers in several protocols. This jira makes the protocol changes compatible with older release, allowing clients running older version to talk to server running 205 version and clients running 205 version talk to older servers running 203, 204.
- HDFS-2325.
Blocker bug reported by charlescearl and fixed by kihwal (contrib/fuse-dfs, libhdfs)
Fuse-DFS fails to build on Hadoop 20.203.0
In building fuse-dfs, the compile fails due to an argument mismatch between call to hdfsConnectAsUser on line 40 of src/contrib/fuse-dfs/src/fuse_connect.c and an earlier definition of hdfsConnectAsUser given in src/c++/libhdfs/hdfs.h.
I suggest changing hdfs.h. I made the following change in hdfs.h in my local copy:
106c106,107
< hdfsFS hdfsConnectAsUser(const char* host, tPort port, const char *user);
---
> // hdfsFS hdfsConnectAsUser(const char* host, tPort port, const char *us...
- HDFS-2328.
Critical bug reported by daryn and fixed by owen.omalley
hftp throws NPE if security is not enabled on remote cluster
If hftp cannot locate either a hdfs or hftp token in the ugi, it will call {{getDelegationToken}} to acquire one from the remote nn. This method may return a null {{Token}} if security is disabled(*) on the remote nn. Hftp will internally call its {{setDelegationToken}} which will throw a NPE when the token is {{null}}.
(*) Actually, if any problem happens while acquiring the token it assumes security is disabled! However, it's a pre-existing issue beyond the scope of the token renewal c...
- HDFS-2331.
Major bug reported by abhijit.shingate and fixed by abhijit.shingate (hdfs client)
Hdfs compilation fails
I am trying to perform complete build from trunk folder but the compilation fails.
*Commandline:*
mvn clean install
*Error Message:*
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:2.
3.2:compile (default-compile) on project hadoop-hdfs: Compilation failure
[ERROR] \Hadoop\SVN\trunk\hadoop-hdfs-project\hadoop-hdfs\src\main\java\org
\apache\hadoop\hdfs\web\WebHdfsFileSystem.java:[209,21] type parameters of <T>T
cannot be determined; no unique maximal instance...
- HDFS-2333.
Major bug reported by ikelly and fixed by szetszwo
HDFS-2284 introduced 2 findbugs warnings on trunk
When HDFS-2284 was submitted it made DFSOutputStream public which triggered two SC_START_IN_CTOR findbug warnings.
- HDFS-2338.
Major sub-task reported by jnp and fixed by jnp
Configuration option to enable/disable webhdfs.
Added a conf property dfs.webhdfs.enabled for enabling/disabling webhdfs.
- HDFS-2340.
Major sub-task reported by szetszwo and fixed by szetszwo
Support getFileBlockLocations and getDelegationToken in webhdfs
- HDFS-2342.
Blocker bug reported by kihwal and fixed by szetszwo (build)
TestSleepJob and TestHdfsProxy broken after HDFS-2284
After HDFS-2284, TestSleepJob and TestHdfsProxy are failing.
The both work in rev 1167444 and fail in rev 1167663.
It will be great if they can be fixed for 205.
- HDFS-2348.
Major sub-task reported by szetszwo and fixed by szetszwo
Support getContentSummary and getFileChecksum in webhdfs
- HDFS-2356.
Major sub-task reported by szetszwo and fixed by szetszwo
webhdfs: support case insensitive query parameter names
- HDFS-2358.
Major bug reported by rajsaha and fixed by daryn (name-node)
NPE when the default filesystem's uri has no authority
Give meaningful error message instead of NPE.
- HDFS-2359.
Major bug reported by rajsaha and fixed by jeagles (data-node)
NPE found in Datanode log while Disk failed during different HDFS operation
Scenario:
I have a cluster of 4 DN ,each of them have 12disks.
In hdfs-site.xml I have "dfs.datanode.failed.volumes.tolerated=3"
During the execution of distcp (hdfs->hdfs), I am failing 3 disks in one Datanode, by making Data Directory permission 000, The distcp job is successful but , I am getting some NullPointerException in Datanode log
In one thread
$hadoop distcp /user/$HADOOPQA_USER/data1 /user/$HADOOPQA_USER/data3
In another thread in a datanode
$ chmod 000 /xyz/{0,1,2}/hadoop/v...
- HDFS-2361.
Critical bug reported by rajsaha and fixed by jnp (name-node)
hftp is broken
Distcp with hftp is failing.
{noformat}
$hadoop distcp hftp://<NNhostname>:50070/user/hadoopqa/1316814737/newtemp 1316814737/as
11/09/23 21:52:33 INFO tools.DistCp: srcPaths=[hftp://<NNhostname>:50070/user/hadoopqa/1316814737/newtemp]
11/09/23 21:52:33 INFO tools.DistCp: destPath=1316814737/as
Retrieving token from: https://<NN IP>:50470/getDelegationToken
Retrieving token from: https://<NN IP>:50470/getDelegationToken?renewer=mapred
11/09/23 21:52:34 INFO security.TokenCache: Got dt for h...
- HDFS-2366.
Major bug reported by arpitgupta and fixed by szetszwo
webhdfs throws a npe when ugi is null from getDelegationToken
- HDFS-2368.
Major bug reported by arpitgupta and fixed by szetszwo
defaults created for web keytab and principal, these properties should not have defaults
the following defaults are set in hdfs-defaults.xml
<property>
<name>dfs.web.authentication.kerberos.principal</name>
<value>HTTP/${dfs.web.hostname}@${kerberos.realm}</value>
<description>
The HTTP Kerberos principal used by Hadoop-Auth in the HTTP endpoint.
The HTTP Kerberos principal MUST start with 'HTTP/' per Kerberos
HTTP SPENGO specification.
</description>
</property>
<property>
<name>dfs.web.authentication.kerberos.keytab</name>
<value>${user.home}/dfs.web....
- HDFS-2373.
Major bug reported by arpitgupta and fixed by arpitgupta
Commands using webhdfs and hftp print unnecessary debug information on the console with security enabled
run an hdfs command using either hftp or webhdfs and it prints the following line to the console (system out)
Retrieving token from: https://NN_HOST:50470/getDelegationToken
Probably in the code where we get the delegation token. This should be removed as people using the dfs commands to get a handle to the content such as dfs -cat will now get an extra line that is not part of the actual content. This should either be only in the log or not logged at all.
- HDFS-2375.
Blocker bug reported by sureshms and fixed by sureshms (hdfs client)
TestFileAppend4 fails in 0.20.205 branch
TestFileAppend4 fails due to change from HDFS-2333. The test uses reflection to get to the method DFSOutputStream#getNumCurrentReplicas(). Since HDFS-2333 patch change this method from public to private, reflection get the method fails resulting in test failures.
- HDFS-2385.
Major sub-task reported by szetszwo and fixed by szetszwo
Support delegation token renewal in webhdfs
- HDFS-2392.
Critical bug reported by rajsaha and fixed by daryn (name-node)
Dist with hftp is failing again
$ hadoop distcp hftp://<NN Hostname>:50070/user/hadoopqa/input1/part-00000 /user/hadoopqa/out3
11/09/30 18:57:59 INFO tools.DistCp: srcPaths=[hftp://<NN Hostname>:50070/user/hadoopqa/input1/part-00000]
11/09/30 18:57:59 INFO tools.DistCp: destPath=/user/hadoopqa/out3
11/09/30 18:58:00 INFO security.TokenCache: Got dt for
hftp://<NN Hostname>:50070/user/hadoopqa/input1/part-00000;uri=<NN IP>:50470;t.service=<NN IP>:50470
11/09/30 18:58:00 INFO hdfs.DFSClient: Created HDFS_DELEGATION_TOKEN toke...
- HDFS-2395.
Critical bug reported by arpitgupta and fixed by szetszwo
webhdfs api's should return a root element in the json response
- HDFS-2403.
Major bug reported by szetszwo and fixed by szetszwo
The renewer in NamenodeWebHdfsMethods.generateDelegationToken(..) is not used
Below are some suggestions from Suresh.
# renewer not used in #generateDelegationToken
# put() does not use InputStream in and should not throw URISyntaxException
# post() does not use InputStream in and should not throw URISyntaxException
# get() should not throw URISyntaxException
- HDFS-2404.
Major bug reported by arpitgupta and fixed by sureshms
webhdfs liststatus json response is not correct
- HDFS-2408.
Blocker bug reported by stack and fixed by stack (hdfs client)
DFSClient#getNumCurrentReplicas is package private in 205 but public in branch-0.20-append
The below commit broke hdfs-826 for hbase in 205 rc1. It changes the accessiblity from public to package private on getNumCurrentReplicas and now current shipping hbase's at least cannot get at this method.
{code}
Revision 1174483 - (view) (download) (annotate) - [select for diffs]
Modified Fri Sep 23 01:30:18 2011 UTC (13 days, 4 hours ago) by szetszwo
File length: 136876 byte(s)
Diff to previous 1174479 (colored)
svn merge -c 1171137 from branch-0.20-security for HDFS-2333.
{code}
Her...
- HDFS-2411.
Major bug reported by arpitgupta and fixed by jnp
with webhdfs enabled in secure mode the auth to local mappings are not being respected.
- MAPREDUCE-1734.
Blocker improvement reported by tomwhite and fixed by tlipcon (documentation)
Un-deprecate the old MapReduce API in the 0.20 branch
This issue is to un-deprecate the "old" MapReduce API (in o.a.h.mapred) in the next 0.20 release, as discussed at http://www.mail-archive.com/mapreduce-dev@hadoop.apache.org/msg01833.html
- MAPREDUCE-2187.
Major bug reported by azaroth and fixed by anupamseth
map tasks timeout during sorting
I just committed this. Thanks Anupam!
- MAPREDUCE-2324.
Major bug reported by tlipcon and fixed by revans2
Job should fail if a reduce task can't be scheduled anywhere
If there's a reduce task that needs more disk space than is available on any mapred.local.dir in the cluster, that task will stay pending forever. For example, we produced this in a QA cluster by accidentally running terasort with one reducer - since no mapred.local.dir had 1T free, the job remained in pending state for several days. The reason for the "stuck" task wasn't clear from a user perspective until we looked at the JT logs.
Probably better to just fail the job if a reduce task goes ...
- MAPREDUCE-2489.
Major bug reported by naisbitt and fixed by naisbitt (jobtracker)
Jobsplits with random hostnames can make the queue unusable
We saw an issue where a custom InputSplit was returning invalid hostnames for the splits that were then causing the JobTracker to attempt to excessively resolve host names. This caused a major slowdown for the JobTracker. We should prevent invalid InputSplit hostnames from affecting everyone else.
I propose we implement some verification for the hostnames to try to ensure that we only do DNS lookups on valid hostnames (and fail otherwise). We could also fail the job after a certain number...
- MAPREDUCE-2494.
Major improvement reported by revans2 and fixed by revans2 (distributed-cache)
Make the distributed cache delete entires using LRU priority
Added config option mapreduce.tasktracker.cache.local.keep.pct to the TaskTracker. It is the target percentage of the local distributed cache that should be kept in between garbage collection runs. In practice it will delete unused distributed cache entries in LRU order until the size of the cache is less than mapreduce.tasktracker.cache.local.keep.pct of the maximum cache size. This is a floating point value between 0.0 and 1.0. The default is 0.95.
- MAPREDUCE-2549.
Major bug reported by devaraj.k and fixed by devaraj.k (contrib/eclipse-plugin, contrib/streaming)
Potential resource leaks in HadoopServer.java, RunOnHadoopWizard.java and Environment.java
- MAPREDUCE-2610.
Major bug reported by jrottinghuis and fixed by jrottinghuis (client)
Inconsistent API JobClient.getQueueAclsForCurrentUser
Client needs access to the current user's queue name.
Public method JobClient.getQueueAclsForCurrentUser() returns QueueAclsInfo[].
The QueueAclsInfo class has default access. A public method should not return a package-private class.
The QueueAclsInfo class, its two constructors, getQueueName, and getOperations methods should be public.
- MAPREDUCE-2650.
Major bug reported by sherri_chen and fixed by sherri_chen
back-port MAPREDUCE-2238 to 0.20-security
Dev had seen the attempt directory permission getting set to 000 or 111 in the CI builds and tests run on dev desktops with 0.20-security.
MAPREDUCE-2238 reported and fixed the issue for 0.22.0, back-port to 0.20-security is needed.
- MAPREDUCE-2705.
Major bug reported by tgraves and fixed by tgraves (tasktracker)
tasks localized and launched serially by TaskLauncher - causing other tasks to be delayed
The current TaskLauncher serially launches new tasks one at a time. During the launch it does the localization and then starts the map/reduce task. This can cause any other tasks to be blocked waiting for the current task to be localized and started. In some instances we have seen a task that has a large file to localize (1.2MB) block another task for about 40 minutes. This particular task being blocked was a cleanup task which caused the job to be delayed finishing for the 40 minutes.
- MAPREDUCE-2729.
Major improvement reported by sherri_chen and fixed by sherri_chen
Reducers are always counted having "pending tasks" even if they can't be scheduled yet because not enough of their mappers have completed
In capacity scheduler, number of users in a queue needing slots are calculated based on whether users' jobs have any pending tasks.
This works fine for map tasks. However, for reduce tasks, jobs do not need reduce slots until the minimum number of map tasks have been completed.
Here, we add checking whether reduce is ready to schedule (i.e. if a job has completed enough map tasks) when we increment number of users in a queue needing reduce slots.
- MAPREDUCE-2764.
Major bug reported by daryn and fixed by owen.omalley
Fix renewal of dfs delegation tokens
Generalizes token renewal and canceling to a common interface and provides a plugin interface for adding renewers for new kinds of tokens. Hftp changed to store the tokens as HFTP and renew them over http.
- MAPREDUCE-2777.
Major new feature reported by jeagles and fixed by amar_kamat
Backport MAPREDUCE-220 to Hadoop 20 security branch
Adds cumulative cpu usage and total heap usage to task counters. This is a backport of <a href="/jira/browse/MAPREDUCE-220" title="Collecting cpu and memory usage for MapReduce tasks"><strike>MAPREDUCE-220</strike></a> and <a href="/jira/browse/MAPREDUCE-2469" title="Task counters should also report the total heap usage of the task"><strike>MAPREDUCE-2469</strike></a>.
- MAPREDUCE-2780.
Major sub-task reported by daryn and fixed by daryn
Standardize the value of token service
The token's service field must (currently) be set to "ip:port". All the producers of a token are independently building the service string. This should be done via a common method to reduce the chance of error, and to facilitate the field value being easily changed in the (near) future.
- MAPREDUCE-2852.
Major bug reported by eli and fixed by kihwal (tasktracker)
Jira for YDH bug 2854624
The DefaultTaskController and LinuxTaskController reference Yahoo! internal bug 2854624:
{code}
FileSystem rawFs = FileSystem.getLocal(getConf()).getRaw();
long logSize = 0; //TODO: Ref BUG:2854624
{code}
This jira tracks this TODO. If someone w/ access to Yahoo's bugzilla could update this jira with what the bug is that would be great.
- MAPREDUCE-2915.
Major bug reported by kihwal and fixed by kihwal (task-controller)
LinuxTaskController does not work when JniBasedUnixGroupsNetgroupMapping or JniBasedUnixGroupsMapping is enabled
When a job is submitted, LinuxTaskController launches the native task-controller binary for job initialization. The native program does a series of prep work and call execv() to run JobLocalizer. It was observed that JobLocalizer does fails to run when JniBasedUnixGroupsNetgroupMapping or JniBasedUnixGroupsMapping is enabled, resulting in 100% job failures.
JobLocalizer normally does not need the native library (libhadoop) for its functioning, but enabling a JNI user-to-group mapping functi...
- MAPREDUCE-2928.
Major sub-task reported by eli and fixed by eli (tasktracker)
MR-2413 improvements
Tracks improvements to MR-2413. See [this comment|https://issues.apache.org/jira/browse/MAPREDUCE-2413?focusedCommentId=13095073&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13095073].
- MAPREDUCE-2981.
Major improvement reported by matei and fixed by matei (contrib/fair-share)
Backport trunk fairscheduler to 0.20-security branch
A lot of improvements have been made to the fair scheduler in 0.21, 0.22 and trunk, but have not been ported back to the new 0.20.20X releases that are currently considered the stable branch of Hadoop.
- MAPREDUCE-3076.
Blocker bug reported by acmurthy and fixed by acmurthy (test)
TestSleepJob fails
TestSleepJob fails, it was intended to be used in other tests for MAPREDUCE-2981.
- MAPREDUCE-3081.
Major bug reported by vitthal_gogate and fixed by (contrib/vaidya)
Change the name format for hadoop core and vaidya jar to be hadoop-{core/vaidya}-{version}.jar in vaidya.sh
contrib/vaidya/bin/vaidya.sh script fixed to use appropriate jars and classpath
- MAPREDUCE-3112.
Major bug reported by eyang and fixed by eyang (contrib/streaming)
Calling hadoop cli inside mapreduce job leads to errors
Removed inheritance of certain server environment variables (HADOOP_OPTS and HADOOP_ROOT_LOGGER) in task attempt process.
<br/>
Changes since Hadoop 0.20.203.0
- MAPREDUCE-2846.
Blocker bug reported by aw and fixed by owen.omalley (task, task-controller, tasktracker)
a small % of all tasks fail with DefaultTaskController
Fixed a race condition in writing the log index file that caused tasks to 'fail'.
- MAPREDUCE-2804.
Blocker bug reported by aw and fixed by owen.omalley
"Creation of symlink to attempt log dir failed." message is not useful
Removed duplicate chmods of job log dir that were vulnerable to race conditions between tasks. Also improved the messages when the symlinks failed to be created.
- MAPREDUCE-2651.
Major bug reported by bharathm and fixed by bharathm (task-controller)
Race condition in Linux Task Controller for job log directory creation
There is a rare race condition in linux task controller when concurrent task processes tries to create job log directory at the same time.
- MAPREDUCE-2621.
Minor bug reported by sherri_chen and fixed by sherri_chen
TestCapacityScheduler fails with "Queue "q1" does not exist"
{quote}
Error Message
Queue "q1" does not exist
Stacktrace
java.io.IOException: Queue "q1" does not exist
at org.apache.hadoop.mapred.JobInProgress.<init>(JobInProgress.java:354)
at org.apache.hadoop.mapred.TestCapacityScheduler$FakeJobInProgress.<init>(TestCapacityScheduler.java:172)
at org.apache.hadoop.mapred.TestCapacityScheduler.submitJob(TestCapacityScheduler.java:794)
at org.apache.hadoop.mapred.TestCapacityScheduler.submitJob(TestCapacityScheduler.java:818)
at org.apache.hadoop.mapred.TestCapacityScheduler.submitJobAndInit(TestCapacityScheduler.java:825)
at org.apache.hadoop.mapred.TestCapacityScheduler.testMultiTaskAssignmentInMultipleQueues(TestCapacityScheduler.java:1109)
{quote}
When queue name is invalid, an exception is thrown now.
- MAPREDUCE-2558.
Major new feature reported by naisbitt and fixed by naisbitt (jobtracker)
Add queue-level metrics 0.20-security branch
We would like to record and present the jobtracker metrics on a per-queue basis.
- MAPREDUCE-2555.
Minor bug reported by tgraves and fixed by tgraves (tasktracker)
JvmInvalidate errors in the gridmix TT logs
Observing a lot of jvmValidate exceptions in TT logs for grid mix run
************************
2011-04-28 02:00:37,578 INFO org.apache.hadoop.ipc.Server: IPC Server handler 7 on 46121, call
statusUpdate(attempt_201104270735_5993_m_003305_0, org.apache.hadoop.mapred.MapTaskStatus@1840a9c,
org.apache.hadoop.mapred.JvmContext@1d4ab6b) from 127.0.0.1:50864: error: java.io.IOException: JvmValidate Failed.
Ignoring request from task: attempt_201104270735_5993_m_003305_0, with JvmId:
jvm_201104270735_5993_m_103399012gsbl20430: java.io.IOException: JvmValidate Failed. Ignoring request from task:
attempt_201104270735_5993_m_003305_0, with JvmId: jvm_201104270735_5993_m_103399012gsbl20430: --
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1386)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1384)
*********************
- MAPREDUCE-2529.
Major bug reported by tgraves and fixed by tgraves (tasktracker)
Recognize Jetty bug 1342 and handle it
Added 2 new config parameters:
mapreduce.reduce.shuffle.catch.exception.stack.regex
mapreduce.reduce.shuffle.catch.exception.message.regex
- MAPREDUCE-2524.
Minor improvement reported by tgraves and fixed by tgraves (tasktracker)
Backport trunk heuristics for failing maps when we get fetch failures retrieving map output during shuffle
Added a new configuration option: mapreduce.reduce.shuffle.maxfetchfailures, and removed a no longer used option: mapred.reduce.copy.backoff.
- MAPREDUCE-2514.
Trivial bug reported by jeagles and fixed by jeagles (tasktracker)
ReinitTrackerAction class name misspelled RenitTrackerAction in task tracker log
- MAPREDUCE-2495.
Minor improvement reported by revans2 and fixed by revans2 (distributed-cache)
The distributed cache cleanup thread has no monitoring to check to see if it has died for some reason
The cleanup thread in the distributed cache handles IOExceptions and the like correctly, but just to be a bit more defensive it would be good to monitor the thread, and check that it is still alive regularly, so that the distributed cache does not fill up the entire disk on the node.
- MAPREDUCE-2490.
Trivial improvement reported by jeagles and fixed by jeagles (jobtracker)
Log blacklist debug count
Gain some insight into blacklist increments/decrements by enhancing the debug logging
- MAPREDUCE-2479.
Major improvement reported by revans2 and fixed by revans2 (tasktracker)
Backport MAPREDUCE-1568 to hadoop security branch
Added mapreduce.tasktracker.distributedcache.checkperiod to the task tracker that defined the period to wait while cleaning up the distributed cache. The default is 1 min.
- MAPREDUCE-2456.
Trivial improvement reported by naisbitt and fixed by naisbitt (jobtracker)
Show the reducer taskid and map/reduce tasktrackers for "Failed fetch notification #_ for task attempt..." log messages
This jira is to provide more useful log information for debugging the "Too many fetch-failures" error.
Looking at the JobTracker node, we see messages like this:
"2010-12-14 00:00:06,911 INFO org.apache.hadoop.mapred.JobInProgress: Failed fetch notification #8 for task
attempt_201011300729_189729_m_007458_0".
I would be useful to see which reducer is reporting the error here.
So, I propose we add the following to these log messages:
1. reduce task ID
2. TaskTracker nodenames for both the mapper and the reducer
- MAPREDUCE-2451.
Trivial bug reported by tgraves and fixed by tgraves (jobtracker)
Log the reason string of healthcheck script
The information on why a specific TaskTracker got blacklisted is not stored anywhere. The jobtracker web ui will show the detailed reason string until the TT gets unblacklisted. After that it is lost.
- MAPREDUCE-2447.
Minor bug reported by sseth and fixed by sseth
Set JvmContext sooner for a task - MR2429
TaskTracker.validateJVM() is throwing NPE when setupWorkDir() throws IOException. This is because
taskFinal.setJvmContext() is not executed yet
- MAPREDUCE-2443.
Minor bug reported by sseth and fixed by sseth (test)
Fix FI build - broken after MR-2429
src/test/system/aop/org/apache/hadoop/mapred/TaskAspect.aj:72 [warning] advice defined in org.apache.hadoop.mapred.TaskAspect has not been applied [Xlint:adviceDidNotMatch]
After the fix in MR-2429, the call to ping in TaskAspect needs to be fixed.
- MAPREDUCE-2429.
Major bug reported by acmurthy and fixed by sseth (tasktracker)
Check jvmid during task status report
Currently TT doens't check to ensure jvmid is relevant during communication with the Child via TaskUmbilicalProtocol.
- MAPREDUCE-2418.
Minor bug reported by sseth and fixed by sseth
Errors not shown in the JobHistory servlet (specifically Counter Limit Exceeded)
Job error details are not displayed in the JobHistory servlet. e.g. Errors like 'Counter limit exceeded for a job'.
jobdetails.jsp has 'Failure Info', but this is missing in jobdetailshistory.jsp
- MAPREDUCE-2415.
Major sub-task reported by bharathm and fixed by bharathm (task-controller, tasktracker)
Distribute TaskTracker userlogs onto multiple disks
Currently, userlogs directory in TaskTracker is placed under hadoop.log.dir like <hadoop.log.dir>/userlogs. I am proposing to spread these userlogs onto multiple configured mapred.local.dirs to strengthen TaskTracker reliability w.r.t disk failures.
- MAPREDUCE-2413.
Major sub-task reported by bharathm and fixed by ravidotg (task-controller, tasktracker)
TaskTracker should handle disk failures at both startup and runtime
At present, TaskTracker doesn't handle disk failures properly both at startup and runtime.
(1) Currently TaskTracker doesn't come up if any of the mapred-local-dirs is on a bad disk. TaskTracker should ignore that particular mapred-local-dir and start up and use only the remaining good mapred-local-dirs.
(2) If a disk goes bad while TaskTracker is running, currently TaskTracker doesn't do anything special. This results in either
(a) TaskTracker continues to "try to use that bad disk" and this results in lots of task failures and possibly job failures(because of multiple TTs having bad disks) and eventually these TTs getting graylisted for all jobs. And this needs manual restart of TT with modified configuration of mapred-local-dirs avoiding the bad disk. OR
(b) Health check script identifying the disk as bad and the TT gets blacklisted. And this also needs manual restart of TT with modified configuration of mapred-local-dirs avoiding the bad disk.
This JIRA is to make TaskTracker more fault-tolerant to disk failures solving (1) and (2). i.e. TT should start even if at least one of the mapred-local-dirs is on a good disk and TT should adjust its in-memory list of mapred-local-dirs and avoid using bad mapred-local-dirs.
- MAPREDUCE-2411.
Minor bug reported by dking and fixed by dking
When you submit a job to a queue with no ACLs you get an inscrutible NPE
With this patch we'll check for that, and print a message in the logs. Then at submission time you find out about it.
- MAPREDUCE-2409.
Major bug reported by sseth and fixed by sseth (distributed-cache)
Distributed Cache does not differentiate between file /archive for files with the same path
If a 'global' file is specified as a 'file' by one job - subsequent jobs cannot override this source file to be an 'archive' (until the TT cleans up it's cache or a TT restart).
The other way around as well -> 'archive' to 'file'
In case of an accidental submission using the wrong type - some of the tasks for the second job will end up seeing the source file as an archive, others as a file.
- MAPREDUCE-2366.
Major bug reported by owen.omalley and fixed by dking (tasktracker)
TaskTracker can't retrieve stdout and stderr from web UI
Problem where the task browser UI can't retrieve the stdxxx printouts of streaming jobs that abend in the unix code, in the common case where the containing job doesn't reuse JVM's.
- MAPREDUCE-2364.
Major bug reported by owen.omalley and fixed by devaraj (tasktracker)
Shouldn't hold lock on rjob while localizing resources.
There is a deadlock while localizing resources on the TaskTracker.
- MAPREDUCE-2362.
Major bug reported by owen.omalley and fixed by roelofs (test)
Unit test failures: TestBadRecords and TestTaskTrackerMemoryManager
Fix unit-test failures: TestBadRecords (NPE due to rearranged MapTask code) and TestTaskTrackerMemoryManager (need hostname in output-string pattern).
- MAPREDUCE-2360.
Major bug reported by owen.omalley and fixed by (client)
Pig fails when using non-default FileSystem
The job client strips the file system from the user's job jar, which causes breakage when it isn't the default file system.
- MAPREDUCE-2359.
Major bug reported by owen.omalley and fixed by ramach
Distributed cache doesn't use non-default FileSystems correctly
We are passing fs.deafult.name as viewfs:/// in core site.xml on oozie server.
We have default name node in configuration also viewfs:///
We are using hdfs://path in our path for application.
Its giving following error:
IllegalArgumentException: Wrong FS:
hdfs://nn/user/strat_ci/oozie-oozi/0000002-110217014830452-oozie-oozi-W/hadoop1--map-reduce/map-reduce-launcher.jar,
expected: viewfs:/
- MAPREDUCE-2358.
Major bug reported by owen.omalley and fixed by ramach
MapReduce assumes HDFS as the default filesystem
Mapred assumes hdfs as the default fs even when defined otherwise.
- MAPREDUCE-2357.
Major bug reported by owen.omalley and fixed by vicaya (task)
When extending inputsplit (non-FileSplit), all exceptions are ignored
if you're using a custom RecordReader/InputFormat setup and using an
InputSplit that does NOT extend FileSplit, then any exceptions you throw in your RecordReader.nextKeyValue() function
are silently ignored.
- MAPREDUCE-2356.
Major bug reported by owen.omalley and fixed by vicaya
A task succeeded even though there were errors on all attempts.
From Luke Lu:
Here is a summary of why the failed map task was considered "successful" (Thanks to Mahadev, Arun and Devaraj
for insightful discussions).
1. The map task was hanging BEFORE being initialized (probably in localization, but it doesn't matter in this case).
Its state is UNASSIGNED.
2. The jt decided to kill it due to timeout and scheduled a cleanup task on the same node.
3. The cleanup task has the same attempt id (by design.) but runs in a different JVM. Its initial state is
FAILED_UNCLEAN.
4. The JVM of the original attempt is getting killed, while proceeding to setupWorkDir and throwed an
IllegalStateException while FileSystem.getLocal, which causes taskFinal.taskCleanup being called in Child, and
triggered the NPE due to the task is not yet initialized (committer is null). Before the NPE, however it sent a
statusUpdate to TT, and in tip.reportProgress, changed the task state (currently FAILED_UNCLEAN) to UNASSIGNED.
5. The cleanup attempt succeeded and report done to TT. In tip.reportDone, the isCleanup() check returned false due to
the UNASSIGNED state and set the task state as SUCCEEDED.
- MAPREDUCE-517.
Critical bug reported by acmurthy and fixed by acmurthy
The capacity-scheduler should assign multiple tasks per heartbeat
HADOOP-3136 changed the default o.a.h.mapred.JobQueueTaskScheduler to assign multiple tasks per TaskTracker heartbeat, the capacity-scheduler should do the same.
- MAPREDUCE-118.
Blocker bug reported by amar_kamat and fixed by amareshwari (client)
Job.getJobID() will always return null
JobContext is used for a read-only view of job's info. Hence all the readonly fields in JobContext are set in the constructor. Job extends JobContext. When a Job is created, jobid is not known and hence there is no way to set JobID once Job is created. JobID is obtained only when the JobClient queries the jobTracker for a job-id., which happens later i.e upon job submission.
- HDFS-2218.
Blocker test reported by mattf and fixed by mattf (contrib/hdfsproxy, test)
Disable TestHdfsProxy.testHdfsProxyInterface in automated test suite for 0.20-security-204 release
Test case TestHdfsProxy.testHdfsProxyInterface has been temporarily disabled for this release, due to failure in the Hudson automated test environment.
- HDFS-2057.
Major bug reported by bharathm and fixed by bharathm (data-node)
Wait time to terminate the threads causing unit tests to take longer time
As a part of datanode process hang, this part of code was introduced in 0.20.204 to clean up all the waiting threads.
- try {
- readPool.awaitTermination(10, TimeUnit.SECONDS);
- } catch (InterruptedException e) {
- LOG.info("Exception occured in doStop:" + e.getMessage());
- }
- readPool.shutdownNow();
This was clearly meant for production, but all the unit tests uses minidfscluster and minimrcluster for shutdown which waits on this part of the code. Due to this, we saw increase in unit test run times. So removing this code.
- HDFS-2044.
Major test reported by mattf and fixed by mattf (test)
TestQueueProcessingStatistics failing automatic test due to timing issues
The test makes assumptions about timing issues that hold true in workstation environments but not in Hudson auto-test.
- HDFS-2023.
Major bug reported by bharathm and fixed by bharathm (data-node)
Backport of NPE for File.list and File.listFiles
Since we have multiple Jira's in trunk for common and hdfs, I am creating another jira for this issue.
This patch addresses the following:
1. Provides FileUtil API for list and listFiles which throws IOException for null cases.
2. Replaces most of the code where JDK file API with FileUtil API.
- HDFS-1878.
Minor bug reported by mattf and fixed by mattf (name-node)
TestHDFSServerPorts unit test failure - race condition in FSNamesystem.close() causes NullPointerException without serious consequence
In 20.204, TestHDFSServerPorts was observed to intermittently throw a NullPointerException. This only happens when FSNamesystem.close() is called, which means system termination for the Namenode, so this is not a serious bug for .204. TestHDFSServerPorts is more likely than normal execution to stimulate the race, because it runs two Namenodes in the same JVM, causing more interleaving and more potential to see a race condition.
The race is in FSNamesystem.close(), line 566, we have:
if (replthread != null) replthread.interrupt();
if (replmon != null) replmon = null;
Since the interrupted replthread is not waited on, there is a potential race condition with replmon being nulled before replthread is dead, but replthread references replmon in computeDatanodeWork() where the NullPointerException occurs.
The solution is either to wait on replthread or just don't null replmon. The latter is preferred, since none of the sibling Namenode processing threads are waited on in close().
I'll attach a patch for .205.
- HDFS-1822.
Blocker bug reported by sureshms and fixed by sureshms (name-node)
Editlog opcodes overlap between 20 security and later releases
Same opcode are used for different operations between 0.20.security, 0.22 and 0.23. This results in failure to load editlogs on later release, especially during upgrades.
- HDFS-1773.
Minor improvement reported by tanping and fixed by tanping (name-node)
Remove a datanode from cluster if include list is not empty and this datanode is removed from both include and exclude lists
Our service engineering team who operates the clusters on a daily basis founds it is confusing that after a data node is decommissioned, there is no way to make the cluster forget about this data node and it always remains in the dead node list.
- HDFS-1767.
Major sub-task reported by mattf and fixed by mattf (data-node)
Namenode should ignore non-initial block reports from datanodes when in safemode during startup
Consider a large cluster that takes 40 minutes to start up. The datanodes compete to register and send their Initial Block Reports (IBRs) as fast as they can after startup (subject to a small sub-two-minute random delay, which isn't relevant to this discussion).
As each datanode succeeds in sending its IBR, it schedules the starting time for its regular cycle of reports, every hour (or other configured value of dfs.blockreport.intervalMsec). In order to spread the reports evenly across the block report interval, each datanode picks a random fraction of that interval, for the starting point of its regular report cycle. For example, if a particular datanode ends up randomly selecting 18 minutes after the hour, then that datanode will send a Block Report at 18 minutes after the hour every hour as long as it remains up. Other datanodes will start their cycles at other randomly selected times. This code is in DataNode.blockReport() and DataNode.scheduleBlockReport().
The "second Block Report" (2BR), is the start of these hourly reports. The problem is that some of these 2BRs get scheduled sooner rather than later, and actually occur within the startup period. For example, if the cluster takes 40 minutes (2/3 of an hour) to start up, then out of the datanodes that succeed in sending their IBRs during the first 10 minutes, between 1/2 and 2/3 of them will send their 2BR before the 40-minute startup time has completed!
2BRs sent within the startup time actually compete with the remaining IBRs, and thereby slow down the overall startup process. This can be seen in the following data, which shows the startup process for a 3700-node cluster that took about 17 minutes to finish startup:
{noformat}
time starts sum regs sum IBR sum 2nd_BR sum total_BRs/min
0 1299799498 3042 3042 1969 1969 151 151 0 151
1 1299799558 665 3707 1470 3439 248 399 0 248
2 1299799618 3707 224 3663 270 669 0 270
3 1299799678 3707 14 3677 261 930 3 3 264
4 1299799738 3707 23 3700 288 1218 1 4 289
5 1299799798 3707 7 3707 258 1476 3 7 261
6 1299799858 3707 3707 317 1793 4 11 321
7 1299799918 3707 3707 292 2085 6 17 298
8 1299799978 3707 3707 292 2377 8 25 300
9 1299800038 3707 3707 272 2649 25 272
10 1299800098 3707 3707 280 2929 15 40 295
11 1299800158 3707 3707 223 3152 14 54 237
12 1299800218 3707 3707 143 3295 54 143
13 1299800278 3707 3707 141 3436 20 74 161
14 1299800338 3707 3707 195 3631 78 152 273
15 1299800398 3707 3707 51 3682 209 361 260
16 1299800458 3707 3707 25 3707 369 730 394
17 1299800518 3707 3707 3707 166 896 166
18 1299800578 3707 3707 3707 72 968 72
19 1299800638 3707 3707 3707 67 1035 67
20 1299800698 3707 3707 3707 75 1110 75
21 1299800758 3707 3707 3707 71 1181 71
22 1299800818 3707 3707 3707 67 1248 67
23 1299800878 3707 3707 3707 62 1310 62
24 1299800938 3707 3707 3707 56 1366 56
25 1299800998 3707 3707 3707 60 1426 60
{noformat}
This data was harvested from the startup logs of all the datanodes, and correlated into one-minute buckets. Each row of the table represents the progress during one elapsed minute of clock time. It seems that every cluster startup is different, but this one showed the effect fairly well.
The "starts" column shows that all the nodes started up within the first 2 minutes, and the "regs" column shows that all succeeded in registering by minute 6. The IBR column shows a sustained rate of Initial Block Report processing of 250-300/minute for the first 10 minutes.
The question is why, during minutes 11 through 16, the rate of IBR processing slowed down. Why didn't the startup just finish? In the "2nd_BR" column, we see the rate of 2BRs ramping up as more datanodes complete their IBRs. As the rate increases, they become more effective at competing with the IBRs, and slow down the IBR processing even more. After the IBRs finally finish in minute 16, the rate of 2BRs settles down to a steady ~60-70/minute.
In order to decrease competition for locks and other resources, to speed up IBR processing during startup, we propose to delay 2BRs until later into the cycle.
- HDFS-1758.
Minor bug reported by tanping and fixed by tanping (tools)
Web UI JSP pages thread safety issue
The set of JSP pages that web UI uses are not thread safe. We have observed some problems when requesting Live/Dead/Decommissioning pages from the web UI, incorrect page is displayed. To be more specific, requesting Dead node list page, sometimes, Live node page is returned. Requesting decommissioning page, sometimes, dead page is returned.
The root cause of this problem is that JSP page is not thread safe by default. When multiple requests come in, each request is assigned to a different thread, multiple threads access the same instance of the servlet class resulted from a JSP page. A class variable is shared by multiple threads. The JSP code in 20 branche, for example, dfsnodelist.jsp has
{code}
<!%
int rowNum = 0;
int colNum = 0;
String sorterField = null;
String sorterOrder = null;
String whatNodes = "LIVE";
...
%>
{code}
declared as class variables. ( These set of variables are declared within <%! code %> directives which made them class members. ) Multiple threads share the same set of class member variables, one request would step on anther's toe.
However, due to the JSP code refactor, HADOOP-5857, all of these class member variables are moved to become function local variables. So this bug does not appear in Apache trunk. Hence, we have proposed to take a simple fix for this bug on 20 branch alone, to be more specific, branch-0.20-security.
The simple fix is to add jsp ThreadSafe="false" directive into the related JSP pages, dfshealth.jsp and dfsnodelist.jsp to make them thread safe, i.e. only on request is processed at each time.
We did evaluate the thread safety issue for other JSP pages on trunk, we noticed a potential problem is that when we retrieving some statistics from namenode, for example, we make the call to
{code}
NamenodeJspHelper.getInodeLimitText(fsn);
{code}
in dfshealth.jsp, which eventuality is
{code}
static String getInodeLimitText(FSNamesystem fsn) {
long inodes = fsn.dir.totalInodes();
long blocks = fsn.getBlocksTotal();
long maxobjects = fsn.getMaxObjects();
....
{code}
some of the function calls are already guarded by readwritelock, e.g. dir.totalInodes, but others are not. As a result of this, the web ui results are not 100% thread safe. But after evaluating the prons and cons of adding a giant lock into the JSP pages, we decided not to issue FSNamesystem ReadWrite locks into JSPs.
- HDFS-1750.
Major bug reported by szetszwo and fixed by szetszwo
fs -ls hftp://file not working
{noformat}
hadoop dfs -touchz /tmp/file1 # create file. OK
hadoop dfs -ls /tmp/file1 # OK
hadoop dfs -ls hftp://namenode:50070/tmp/file1 # FAILED: not seeing the file
{noformat}
- HDFS-1692.
Major bug reported by bharathm and fixed by bharathm (data-node)
In secure mode, Datanode process doesn't exit when disks fail.
In secure mode, when disks fail more than volumes tolerated, datanode process doesn't exit properly and it just hangs even though shutdown method is called.
- HDFS-1592.
Major bug reported by bharathm and fixed by bharathm
Datanode startup doesn't honor volumes.tolerated
Datanode startup doesn't honor volumes.tolerated for hadoop 20 version.
- HDFS-1541.
Major sub-task reported by hairong and fixed by hairong (name-node)
Not marking datanodes dead When namenode in safemode
In a big cluster, when namenode starts up, it takes a long time for namenode to process block reports from all datanodes. Because heartbeats processing get delayed, some datanodes are erroneously marked as dead, then later on they have to register again, thus wasting time.
It would speed up starting time if the checking of dead nodes is disabled when namenode in safemode.
- HDFS-1445.
Major sub-task reported by mattf and fixed by mattf (data-node)
Batch the calls in DataStorage to FileUtil.createHardLink(), so we call it once per directory instead of once per file
Batch hardlinking during "upgrade" snapshots, cutting time from aprx 8 minutes per volume to aprx 8 seconds. Validated in both Linux and Windows. Depends on prior integration with patch for HADOOP-7133.
- HDFS-1377.
Blocker bug reported by eli and fixed by eli (name-node)
Quota bug for partial blocks allows quotas to be violated
There's a bug in the quota code that causes them not to be respected when a file is not an exact multiple of the block size. Here's an example:
{code}
$ hadoop fs -mkdir /test
$ hadoop dfsadmin -setSpaceQuota 384M /test
$ ls dir/ | wc -l # dir contains 101 files
101
$ du -ms dir # each is 3mb
304 dir
$ hadoop fs -put dir /test
$ hadoop fs -count -q /test
none inf 402653184 -550502400 2 101 317718528 hdfs://haus01.sf.cloudera.com:10020/test
$ hadoop fs -stat "%o %r" /test/dir/f30
134217728 3 # three 128mb blocks
{code}
INodeDirectoryWithQuota caches the number of bytes consumed by it's children in {{diskspace}}. The quota adjustment code has a bug that causes {{diskspace}} to get updated incorrectly when a file is not an exact multiple of the block size (the value ends up being negative).
This causes the quota checking code to think that the files in the directory consumes less space than they actually do, so the verifyQuota does not throw a QuotaExceededException even when the directory is over quota. However the bug isn't visible to users because {{fs count -q}} reports the numbers generated by INode#getContentSummary which adds up the sizes of the blocks rather than use the cached INodeDirectoryWithQuota#diskspace value.
In FSDirectory#addBlock the disk space consumed is set conservatively to the full block size * the number of replicas:
{code}
updateCount(inodes, inodes.length-1, 0,
fileNode.getPreferredBlockSize()*fileNode.getReplication(), true);
{code}
In FSNameSystem#addStoredBlock we adjust for this conservative estimate by subtracting out the difference between the conservative estimate and what the number of bytes actually stored was:
{code}
//Updated space consumed if required.
INodeFile file = (storedBlock != null) ? storedBlock.getINode() : null;
long diff = (file == null) ? 0 :
(file.getPreferredBlockSize() - storedBlock.getNumBytes());
if (diff > 0 && file.isUnderConstruction() &&
cursize < storedBlock.getNumBytes()) {
...
dir.updateSpaceConsumed(path, 0, -diff*file.getReplication());
{code}
We do the same in FSDirectory#replaceNode when completing the file, but at a file granularity (I believe the intent here is to correct for the cases when there's a failure replicating blocks and recovery). Since oldnode is under construction INodeFile#diskspaceConsumed will use the preferred block size (vs of Block#getNumBytes used by newnode) so we will again subtract out the difference between the full block size and what the number of bytes actually stored was:
{code}
long dsOld = oldnode.diskspaceConsumed();
...
//check if disk space needs to be updated.
long dsNew = 0;
if (updateDiskspace && (dsNew = newnode.diskspaceConsumed()) != dsOld) {
try {
updateSpaceConsumed(path, 0, dsNew-dsOld);
...
{code}
So in the above example we started with diskspace at 384mb (3 * 128mb) and then we subtract 375mb (to reflect only 9mb raw was actually used) twice so for each file the diskspace for the directory is - 366mb (384mb minus 2 * 375mb). Which is why the quota gets negative and yet we can still write more files.
So a directory with lots of single block files (if you have multiple blocks on the final partial block ends up subtracting from the diskspace used) ends up having a quota that's way off.
I think the fix is to in FSDirectory#replaceNode not have the diskspaceConsumed calculations differ when the old and new INode have the same blocks. I'll work on a patch which also adds a quota test for blocks that are not multiples of the block size and warns in INodeDirectory#computeContentSummary if the computed size does not reflect the cached value.
- HDFS-1258.
Blocker bug reported by atm and fixed by atm (name-node)
Clearing namespace quota on "/" corrupts FS image
The HDFS root directory starts out with a default namespace quota of Integer.MAX_VALUE. If you clear this quota (using "hadoop dfsadmin -clrQuota /"), the fsimage gets corrupted immediately. Subsequent 2NN rolls will fail, and the NN will not come back up from a restart.
- HDFS-1189.
Major bug reported by xiaokang and fixed by johnvijoe (name-node)
Quota counts missed between clear quota and set quota
HDFS Quota counts will be missed between a clear quota operation and a set quota.
When setting quota for a dir, the INodeDirectory will be replaced by INodeDirectoryWithQuota and dir.isQuotaSet() becomes true. When INodeDirectoryWithQuota is newly created, quota counting will be performed. However, when clearing quota, the quota conf is set to -1 and dir.isQuotaSet() becomes false while INodeDirectoryWithQuota will NOT be replaced back to INodeDirectory.
FSDirectory.updateCount just update the quota count for inodes that isQuotaSet() is true. So after clear quota for a dir, its quota counts will not be updated and it's reasonable. But when re seting quota for this dir, quota counting will not be performed and some counts will be missed.
- HADOOP-7475.
Blocker bug reported by eyang and fixed by eyang
hadoop-setup-single-node.sh is broken
When running hadoop-setup-single-node.sh, the system can not find the templates configuration directory:
{noformat}
cat: /usr/libexec/../templates/conf/core-site.xml: No such file or directory
cat: /usr/libexec/../templates/conf/hdfs-site.xml: No such file or directory
cat: /usr/libexec/../templates/conf/mapred-site.xml: No such file or directory
cat: /usr/libexec/../templates/conf/hadoop-env.sh: No such file or directory
chown: cannot access `hadoop-env.sh': No such file or directory
chmod: cannot access `hadoop-env.sh': No such file or directory
cp: cannot stat `*.xml': No such file or directory
cp: cannot stat `hadoop-env.sh': No such file or directory
{noformat}
- HADOOP-7398.
Major new feature reported by owen.omalley and fixed by owen.omalley
create a mechanism to suppress the HADOOP_HOME deprecated warning
Create a new mechanism to suppress the warning about HADOOP_HOME deprecation.
I'll create a HADOOP_HOME_WARN_SUPPRESS environment variable that suppresses the warning.
- HADOOP-7373.
Major bug reported by owen.omalley and fixed by owen.omalley
Tarball deployment doesn't work with {start,stop}-{dfs,mapred}
The hadoop-config.sh overrides the variable "bin", which makes the scripts use libexec for hadoop-daemon(s).
- HADOOP-7364.
Major bug reported by tgraves and fixed by tgraves (test)
TestMiniMRDFSCaching fails if test.build.dir is set to something other than build/test
TestMiniMRDFSCaching fails if test.build.dir is set to something other than build/test.
- HADOOP-7356.
Blocker bug reported by eyang and fixed by eyang
RPM packages broke bin/hadoop script for hadoop 0.20.205
hadoop-config.sh has been moved to libexec for binary package, but developers prefers to have hadoop-config.sh in bin. Hadoo shell scripts should be modified to support both scenarios.
- HADOOP-7330.
Major bug reported by vicaya and fixed by vicaya (metrics)
The metrics source mbean implementation should return the attribute value instead of the object
The MetricsSourceAdapter#getAttribute in 0.20.203 is returning the attribute object instead of the value.
- HADOOP-7324.
Blocker bug reported by vicaya and fixed by priyomustafi (metrics)
Ganglia plugins for metrics v2
Although, all metrics in metrics v2 are exposed via the standard JMX mechanisms, most users are using Ganglia to collect metrics.
- HADOOP-7277.
Minor improvement reported by naisbitt and fixed by naisbitt (build)
Add Eclipse launch tasks for the 0.20-security branch
This is to add the eclipse launchers from HADOOP-5911 to the 0.20 security branch.
Eclipse has a notion of "run configuration", which encapsulates what's needed to run or debug an application. I use this quite a bit to start various Hadoop daemons in debug mode, with breakpoints set, to inspect state and what not.
This is simply configuration, so no tests are provided. After running "ant eclipse" and refreshing your project, you should see entries in the Run Configurations and Debug Configurations for launching the various hadoop daemons from within eclipse. There's a template for testing a specific test, and also templates to run all the tests, the job tracker, and a task tracker. It's likely that some parameters need to be further tweaked to have the same behavior as "ant test", but for most tests, this works.
This also requires a small change to build.xml for the eclipse classpath.
- HADOOP-7274.
Minor bug reported by jeagles and fixed by jeagles (util)
CLONE - IOUtils.readFully and IOUtils.skipFully have typo in exception creation's message
Same fix as for HADOOP-7057 for the Hadoop security branch
{noformat}
throw new IOException( "Premeture EOF from inputStream");
{noformat}
- HADOOP-7248.
Minor improvement reported by cos and fixed by tgraves (build)
Have a way to automatically update Eclipse .classpath file when new libs are added to the classpath through Ivy for 0.20-* based sources
Backport HADOOP-6407 into 0.20 based source trees
- HADOOP-7232.
Blocker bug reported by owen.omalley and fixed by owen.omalley (documentation)
Fix javadoc warnings
The javadoc is currently generating 31 warnings.
- HADOOP-7144.
Major new feature reported by vicaya and fixed by revans2
Expose JMX with something like JMXProxyServlet
Much of the Hadoop metrics and status info is available via JMX, especially since 0.20.100, and 0.22+ (HDFS-1318, HADOOP-6728 etc.) For operations staff not familiar JMX setup, especially JMX with SSL and firewall tunnelling, the usage can be daunting. Using a JMXProxyServlet (a la Tomcat) to translate JMX attributes into JSON output would make a lot of non-Java admins happy.
We could probably use Tomcat's JMXProxyServlet code directly, if it's already output some standard format (JSON or XML etc.) The code is simple enough to port over and can probably integrate with the common HttpServer as one of the default servelet (maybe /jmx) for the pluggable security.
- HADOOP-6255.
Major new feature reported by owen.omalley and fixed by eyang
Create an rpm integration project
Added RPM/DEB packages to build system.
Changes Since Hadoop 0.20.2
- HADOOP-7190. Add metrics v1 back for backwards compatibility. (omalley)
- MAPREDUCE-2360. Remove stripping of scheme, authority from submit dir in
support of viewfs. (cdouglas)
- MAPREDUCE-2359 Use correct file system to access distributed cache objects.
(Krishna Ramachandran)
- MAPREDUCE-2361. "Fix Distributed Cache is not adding files to class paths
correctly" - Drop the host/scheme/fragment from URI (cdouglas)
- MAPREDUCE-2362. Fix unit-test failures: TestBadRecords (NPE due to
rearranged MapTask code) and TestTaskTrackerMemoryManager
(need hostname in output-string pattern). (Greg Roelofs, Krishna
Ramachandran)
- HDFS-1729. Add statistics logging for better visibility into
startup time costs. (Matt Foley)
- MAPREDUCE-2363. When a queue is built without any access rights we
explain the problem. (Richard King)
- MAPREDUCE-1563. TaskDiagnosticInfo may be missed sometime. (Krishna
Ramachandran)
- MAPREDUCE-2364. Don't hold the rjob lock while localizing resources. (ddas
via omalley)
- HDFS-1598. Directory listing on hftp:// does not show
.*.crc files. (szetszwo)
- MAPREDUCE-2365. New counters for FileInputFormat (BYTES_READ) and
FileOutputFormat (BYTES_WRITTEN).
New counter MAP_OUTPUT_MATERIALIZED_BYTES for compressed MapOutputSize.
(Siddharth Seth)
- HADOOP-7040. Change DiskErrorException to IOException (boryas)
- HADOOP-7104. Remove unnecessary DNS reverse lookups from RPC layer
(kzhang)
- MAPREDUCE-2366. Fix a problem where the task browser UI can't retrieve the
stdxxx printouts of streaming jobs that abend in the unix code, in
the common case where the containing job doesn't reuse JVM's.
(Richard King)
- HADOOP-6977. Herriot daemon clients should vend statistics (cos)
- HADOOP-6971. Clover build doesn't generate per-test coverage (cos)
- HADOOP-6879. Provide SSH based (Jsch) remote execution API for system
tests. (cos)
- MAPREDUCE-2355. Add a configuration knob
mapreduce.tasktracker.outofband.heartbeat.damper that limits out of band
heartbeats (acmurthy)
- MAPREDUCE-2356. Fix a race-condition that corrupted a task's state on the
JobTracker. (Luke Lu)
- MAPREDUCE-2357. Always propagate IOExceptions that are thrown by
non-FileInputFormat. (Luke Lu)
- HADOOP-7163. RPC handles SocketTimeOutException during SASL negotiation.
(ddas)
- MAPREDUCE-2358. MapReduce assumes the default FileSystem is HDFS.
(Krishna Ramachandran)
- MAPREDUCE-1904. Reducing locking contention in TaskTracker's
MapOutputServlet LocalDirAllocator. (Rajesh Balamohan via acmurthy)
- HDFS-1626. Make BLOCK_INVALIDATE_LIMIT configurable. (szetszwo)
- HDFS-1584. Adds a check for whether relogin is needed to
getDelegationToken in HftpFileSystem. (Kan Zhang via ddas)
- HADOOP-7115. Reduces the number of calls to getpwuid_r and
getpwgid_r, by implementing a cache in NativeIO. (ddas)
- HADOOP-6882. An XSS security exploit in jetty-6.1.14. jetty upgraded to
6.1.26. (ddas)
- MAPREDUCE-2278. Fixes a memory leak in the TaskTracker. (cdouglas)
- HDFS-1353 redux. Modulate original 1353 to not bump RPC version.
(jhoman)
- MAPREDUCE-2082 Race condition in writing the jobtoken password file when
launching pipes jobs (jitendra and ddas)
HADOOP-6978. Fixes task log servlet vulnerabilities via symlinks.
(Todd Lipcon and Devaraj Das)
- MAPREDUCE-2178. Write task initialization to avoid race
conditions leading to privilege escalation and resource leakage by
performing more actiions as the user. (Owen O'Malley, Devaraj Das,
Chris Douglas via cdouglas)
- HDFS-1364. HFTP client should support relogin from keytab
- HADOOP-6907. Make RPC client to use per-proxy configuration.
(Kan Zhang via ddas)
- MAPREDUCE-2055. Fix JobTracker to decouple job retirement from copy of
job-history file to HDFS and enhance RetiredJobInfo to carry aggregated
job-counters to prevent a disk roundtrip on job-completion to fetch
counters for the JobClient. (Krishna Ramachandran via acmurthy)
HDFS-1353. Remove most of getBlockLocation optimization (jghoman)
- MAPREDUCE-2023. TestDFSIO read test may not read specified bytes. (htang)
- HDFS-1340. A null delegation token is appended to the url if security is
disabled when browsing filesystem.(boryas)
- HDFS-1352. Fix jsvc.location. (jghoman)
- HADOOP-6860. 'compile-fault-inject' should never be called directly. (cos)
- MAPREDUCE-2005. TestDelegationTokenRenewal fails (boryas)
- MAPREDUCE-2000. Rumen is not able to extract counters for Job history logs
from Hadoop 0.20. (htang)
- MAPREDUCE-1961. ConcurrentModificationException when shutting down Gridmix.
(htang)
- HADOOP-6899. RawLocalFileSystem set working directory does
not work for relative names. (suresh)
- HDFS-495. New clients should be able to take over files lease if the old
client died. (shv)
- HADOOP-6728. Re-design and overhaul of the Metrics framework. (Luke Lu via
acmurthy)
- MAPREDUCE-1966. Change blacklisting of tasktrackers on task failures to be
a simple graylist to fingerpoint bad tasktrackers. (Greg Roelofs via
acmurthy)
- HADOOP-6864. Add ability to get netgroups (as returned by getent
netgroup command) using native code (JNI) instead of forking. (Erik Steffl)
- HDFS-1318. HDFS Namenode and Datanode WebUI information needs to be
accessible programmatically for scripts. (Tanping Wang via suresh)
- HDFS-1315. Add fsck event to audit log and remove other audit log events
corresponding to FSCK listStatus and open calls. (suresh)
- MAPREDUCE-1941. Provides access to JobHistory file (raw) with job user/acl
permission. (Srikanth Sundarrajan via ddas)
- MAPREDUCE-291. Optionally a separate daemon should serve JobHistory.
(Srikanth Sundarrajan via ddas)
- MAPREDUCE-1936. Make Gridmix3 more customizable (sync changes from trunk).
(htang)
- HADOOP-5981. Fix variable substitution during parsing of child environment
variables. (Krishna Ramachandran via acmurthy)
- MAPREDUCE-339. Greedily schedule failed tasks to cause early job failure.
(cdouglas)
- MAPREDUCE-1872. Hardened CapacityScheduler to have comprehensive, coherent
limits on tasks/jobs for jobs/users/queues. Also, added the ability to
refresh queue definitions without the need to restart the JobTracker.
(acmurthy)
- HDFS-1161. Make DN minimum valid volumes configurable. (shv)
- HDFS-457. Reintroduce volume failure tolerance for DataNodes. (shv)
- HDFS-1307 Add start time, end time and total time taken for FSCK
to FSCK report. (suresh)
- MAPREDUCE-1207. Sanitize user environment of map/reduce tasks and allow
admins to set environment and java options. (Krishna Ramachandran via
acmurthy)
- HDFS-1298 - Add support in HDFS for new statistics added in FileSystem
to track the file system operations (suresh)
- HDFS-1301
. TestHDFSProxy need to use server side conf for ProxyUser
stuff.(boryas)
- HADOOP-6859 - Introduce additional statistics to FileSystem to track
file system operations (suresh)
- HADOOP-6818. Provides a JNI implementation of Unix Group resolution. The
config hadoop.security.group.mapping should be set to
org.apache.hadoop.security.JniBasedUnixGroupsMapping to enable this
implementation. (ddas)
- MAPREDUCE-1938. Introduces a configuration for putting user classes before
the system classes during job submission and in task launches. Two things
need to be done in order to use this feature -
(1) mapreduce.user.classpath.first : this should be set to true in the
jobconf, and, (2) HADOOP_USER_CLASSPATH_FIRST : this is relevant for job
submissions done using bin/hadoop shell script. HADOOP_USER_CLASSPATH_FIRST
should be defined in the environment with some non-empty value
(like "true"), and then bin/hadoop should be executed. (ddas)
- HADOOP-6669. Respect compression configuration when creating DefaultCodec
compressors. (Koji Noguchi via cdouglas)
- HADOOP-6855. Add support for netgroups, as returned by command
getent netgroup. (Erik Steffl)
- HDFS-599. Allow NameNode to have a seprate port for service requests from
client requests. (Dmytro Molkov via hairong)
- HDFS-132. Fix namenode to not report files deleted metrics for deletions
done while replaying edits during startup. (shv)
- MAPREDUCE-1521. Protection against incorrectly configured reduces
(mahadev)
- MAPREDUCE-1936. Make Gridmix3 more customizable. (htang)
- MAPREDUCE-517. Enhance the CapacityScheduler to assign multiple tasks
per-heartbeat. (acmurthy)
- MAPREDUCE-323. Re-factor layout of JobHistory files on HDFS to improve
operability. (Dick King via acmurthy)
- MAPREDUCE-1921. Ensure exceptions during reading of input data in map
tasks are augmented by information about actual input file which caused
the exception. (Krishna Ramachandran via acmurthy)
- MAPREDUCE-1118. Enhance the JobTracker web-ui to ensure tabular columns
are sortable, also added a /scheduler servlet to CapacityScheduler for
enhanced UI for queue information. (Krishna Ramachandran via acmurthy)
- HADOOP-5913. Add support for starting/stopping queues. (cdouglas)
- HADOOP-6835. Add decode support for concatenated gzip files. (Greg Roelofs)
- HDFS-1158. Revert HDFS-457. (shv)
- MAPREDUCE-1699. Ensure JobHistory isn't disabled for any reason. (Krishna
Ramachandran via acmurthy)
- MAPREDUCE-1682. Fix speculative execution to ensure tasks are not
scheduled after job failure. (acmurthy)
- MAPREDUCE-1914. Ensure unique sub-directories for artifacts in the
DistributedCache are cleaned up. (Dick King via acmurthy)
- HADOOP-6713. Multiple RPC Reader Threads (Bharathm)
- HDFS-1250. Namenode should reject block reports and block received
requests from dead datanodes (suresh)
- MAPREDUCE-1863. [Rumen] Null failedMapAttemptCDFs in job traces generated
by Rumen. (htang)
- MAPREDUCE-1309. Rumen refactory. (htang)
- HDFS-1114. Implement LightWeightGSet for BlocksMap in order to reduce
NameNode memory footprint. (szetszwo)
- MAPREDUCE-572. Fixes DistributedCache.checkURIs to throw error if link is
missing for uri in cache archives. (amareshwari)
- MAPREDUCE-787. Fix JobSubmitter to honor user given symlink in the path.
(amareshwari)
- HADOOP-6815. refreshSuperUserGroupsConfiguration should use
server side configuration for the refresh( boryas)
- MAPREDUCE-1868. Add a read and connection timeout to JobClient while
pulling tasklogs. (Krishna Ramachandran via acmurthy)
- HDFS-1119. Introduce a GSet interface to BlocksMap. (szetszwo)
- MAPREDUCE-1778. Ensure failure to setup CompletedJobStatusStore is not
silently ignored by the JobTracker. (Krishna Ramachandran via acmurthy)
- MAPREDUCE-1538. Add a limit on the number of artifacts in the
DistributedCache to ensure we cleanup aggressively. (Dick King via
acmurthy)
- MAPREDUCE-1850. Add information about the host from which a job is
submitted. (Krishna Ramachandran via acmurthy)
- HDFS-1110. Reuses objects for commonly used file names in namenode to
reduce the heap usage. (suresh)
- HADOOP-6810. Extract a subset of tests for smoke (DOA) validation. (cos)
- HADOOP-6642. Remove debug stmt left from original patch. (cdouglas)
- HADOOP-6808. Add comments on how to setup File/Ganglia Context for
kerberos metrics (Erik Steffl)
- HDFS-1061. INodeFile memory optimization. (bharathm)
- HDFS-1109. HFTP supports filenames that contains the character "+".
(Dmytro Molkov via dhruba, backported by szetszwo)
- HDFS-1085. Check file length and bytes read when reading a file through
hftp in order to detect failure. (szetszwo)
- HDFS-1311. Running tests with 'testcase' cause triple execution of the
same test case (cos)
- HDFS-1150.FIX. Verify datanodes' identities to clients in secure clusters.
Update to patch to improve handling of jsvc source in build.xml (jghoman)
- HADOOP-6752. Remote cluster control functionality needs JavaDocs
improvement. (Balaji Rajagopalan via cos)
- MAPREDUCE-1288. Fixes TrackerDistributedCacheManager to take into account
the owner of the localized file in the mapping from cache URIs to
CacheStatus objects. (ddas)
- MAPREDUCE-1682. Fix speculative execution to ensure tasks are not
scheduled after job failure. (acmurthy)
- MAPREDUCE-1914. Ensure unique sub-directories for artifacts in the
DistributedCache are cleaned up. (Dick King via acmurthy)
- MAPREDUCE-1538. Add a limit on the number of artifacts in the
DistributedCache to ensure we cleanup aggressively. (Dick King via
acmurthy)
- MAPREDUCE-1900. Fixes a FS leak that i missed in the earlier patch.
(ddas)
- MAPREDUCE-1900. Makes JobTracker/TaskTracker close filesystems, created
on behalf of users, when they are no longer needed. (ddas)
- HADOOP-6832. Add a static user plugin for web auth for external users.
(omalley)
- HDFS-1007. Fixes a bug in SecurityUtil.buildDTServiceName to do
with handling of null hostname. (omalley)
- HDFS-1007. makes long running servers using hftp work. Also has some
refactoring in the MR code to do with handling of delegation tokens.
(omalley & ddas)
- HDFS-1178. The NameNode servlets should not use RPC to connect to the
NameNode. (omalley)
- MAPREDUCE-1807. Re-factor TestQueueManager. (Richard King via acmurthy)
- HDFS-1150. Fixes the earlier patch to do logging in the right directory
and also adds facility for monitoring processes (via -Dprocname in the
command line). (Jakob Homan via ddas)
- HADOOP-6781. security audit log shouldn't have exception in it. (boryas)
- HADOOP-6776. Fixes the javadoc in UGI.createProxyUser. (ddas)
- HDFS-1150. building jsvc from source tar. source tar is also checked in.
(jitendra)
- HDFS-1150. Bugfix in the hadoop shell script. (ddas)
- HDFS-1153. The navigation to /dfsnodelist.jsp with invalid input
parameters produces NPE and HTTP 500 error (rphulari)
MAPREDUCE-1664. Bugfix to enable queue administrators of a queue to
view job details of jobs submitted to that queue even though they
are not part of acl-view-job.
- HDFS-1150. Bugfix to add more knobs to secure datanode starter.
- HDFS-1157. Modifications introduced by HDFS-1150 are breaking aspect's
bindings (cos)
- HDFS-1130
. Adds a configuration dfs.cluster.administrators for
controlling access to the default servlets in hdfs. (ddas)
- HADOOP-6706.FIX. Relogin behavior for RPC clients could be improved
(boryas)
- HDFS-1150. Verify datanodes' identities to clients in secure clusters.
(jghoman)
- MAPREDUCE-1442. Fixed regex in job-history related to parsing Counter
values. (Luke Lu via acmurthy)
- HADOOP-6760. WebServer shouldn't increase port number in case of negative
port setting caused by Jetty's race. (cos)
- HDFS-1146. Javadoc for getDelegationTokenSecretManager in FSNamesystem.
(jitendra)
- HADOOP-6706. Fix on top of the earlier patch. Closes the connection
on a SASL connection failure, and retries again with a new
connection. (ddas)
- MAPREDUCE-1716. Fix on top of earlier patch for logs truncation a.k.a
MAPREDUCE-1100. Addresses log truncation issues when binary data is
written to log files and adds a header to a truncated log file to
inform users of the done trucation.
- HDFS-1383. Improve the error messages when using hftp://.
- MAPREDUCE-1744. Fixed DistributedCache apis to take a user-supplied
FileSystem to allow for better proxy behaviour for Oozie. (Richard King)
- MAPREDUCE-1733. Authentication between pipes processes and java
counterparts. (jitendra)
- MAPREDUCE-1664. Bugfix on top of the previous patch. (ddas)
- HDFS-1136. FileChecksumServlets.RedirectServlet doesn't carry forward
the delegation token (boryas)
- HADOOP-6756. Change value of FS_DEFAULT_NAME_KEY from fs.defaultFS
to fs.default.name which is a correct name for 0.20 (steffl)
- HADOOP-6756. Document (javadoc comments) and cleanup configuration
keys in CommonConfigurationKeys.java (steffl)
- MAPREDUCE-1759. Exception message for unauthorized user doing killJob,
killTask, setJobPriority needs to be improved. (gravi via vinodkv)
- HADOOP-6715. AccessControlList.toString() returns empty string when
we set acl to "*". (gravi via vinodkv)
- HADOOP-6757. NullPointerException for hadoop clients launched from
streaming tasks. (amarrk via vinodkv)
- HADOOP-6631. FileUtil.fullyDelete() should continue to delete other files
despite failure at any level. (vinodkv)
- MAPREDUCE-1317. NPE in setHostName in Rumen. (rksingh)
- MAPREDUCE-1754. Replace mapred.persmissions.supergroup with an acl :
mapreduce.cluster.administrators and HADOOP-6748.: Remove
hadoop.cluster.administrators. Contributed by Amareshwari Sriramadasu.
- HADOOP-6701. Incorrect exit codes for "dfs -chown", "dfs -chgrp"
(rphulari)
- HADOOP-6640. FileSystem.get() does RPC retires within a static
synchronized block. (hairong)
- HDFS-1006. Removes unnecessary logins from the previous patch. (ddas)
- HADOOP-6745. adding some java doc to Server.RpcMetrics, UGI (boryas)
- MAPREDUCE-1707. TaskRunner can get NPE in getting ugi from TaskTracker.
(vinodkv)
- HDFS-1104. Fsck triggers full GC on NameNode. (hairong)
- HADOOP-6332. Large-scale Automated Test Framework (sharad, Sreekanth
Ramakrishnan, at all via cos)
- HADOOP-6526. Additional fix for test context on top of existing one. (cos)
- HADOOP-6710. Symbolic umask for file creation is not conformant with posix.
(suresh)
- HADOOP-6693. Added metrics to track kerberos login success and failure.
(suresh)
- MAPREDUCE-1711. Gridmix should provide an option to submit jobs to the same
queues as specified in the trace. (rksing via htang)
- MAPREDUCE-1687. Stress submission policy does not always stress the
cluster. (htang)
- MAPREDUCE-1641. Bug-fix to ensure command line options such as
-files/-archives are checked for duplicate artifacts in the
DistributedCache. (Amareshwari Sreeramadasu via acmurthy)
- MAPREDUCE-1641. Fix DistributedCache to ensure same files cannot be put in
both the archives and files sections. (Richard King via acmurthy)
- HADOOP-6670. Fixes a testcase issue introduced by the earlier commit
of the HADOOP-6670 patch. (ddas)
- MAPREDUCE-1718. Fixes a problem to do with correctly constructing
service name for the delegation token lookup in HftpFileSystem
(borya via ddas)
- HADOOP-6674. Fixes the earlier patch to handle pings correctly (ddas).
- MAPREDUCE-1664. Job Acls affect when Queue Acls are set.
(Ravi Gummadi via vinodkv)
- HADOOP-6718. Fixes a problem to do with clients not closing RPC
connections on a SASL failure. (ddas)
- MAPREDUCE-1397. NullPointerException observed during task failures.
(Amareshwari Sriramadasu via vinodkv)
- HADOOP-6670. Use the UserGroupInformation's Subject as the criteria for
equals and hashCode. (omalley)
- HADOOP-6716. System won't start in non-secure mode when kerb5.conf
(edu.mit.kerberos on Mac) is not present. (boryas)
- MAPREDUCE-1607. Task controller may not set permissions for a
task cleanup attempt's log directory. (Amareshwari Sreeramadasu via
vinodkv)
- MAPREDUCE-1533. JobTracker performance enhancements. (Amar Kamat via
vinodkv)
- MAPREDUCE-1701. AccessControlException while renewing a delegation token
in not correctly handled in the JobTracker. (boryas)
- HDFS-481. Incremental patch to fix broken unit test in contrib/hdfsproxy
- HADOOP-6706. Fixes a bug in the earlier version of the same patch (ddas)
- HDFS-1096. allow dfsadmin/mradmin refresh of superuser proxy group
mappings(boryas).
- HDFS-1012. Support for cluster specific path entries in ldap for hdfsproxy
(Srikanth Sundarrajan via Nicholas)
- HDFS-1011. Improve Logging in HDFSProxy to include cluster name associated
with the request (Srikanth Sundarrajan via Nicholas)
- HDFS-1010. Retrieve group information from UnixUserGroupInformation
instead of LdapEntry (Srikanth Sundarrajan via Nicholas)
- HDFS-481. Bug fix - hdfsproxy: Stack overflow + Race conditions
(Srikanth Sundarrajan via Nicholas)
- MAPREDUCE-1657. After task logs directory is deleted, tasklog servlet
displays wrong error message about job ACLs. (Ravi Gummadi via vinodkv)
- MAPREDUCE-1692. Remove TestStreamedMerge from the streaming tests.
(Amareshwari Sriramadasu and Sreekanth Ramakrishnan via vinodkv)
- HDFS-1081. Performance regression in
DistributedFileSystem::getFileBlockLocations in secure systems (jhoman)
MAPREDUCE-1656. JobStory should provide queue info. (htang)
- MAPREDUCE-1317. Reducing memory consumption of rumen objects. (htang)
- MAPREDUCE-1317. Reverting the patch since it caused build failures. (htang)
- MAPREDUCE-1683. Fixed jobtracker web-ui to correctly display heap-usage.
(acmurthy)
HADOOP-6706. Fixes exception handling for saslConnect. The ideal
solution is to the Refreshable interface but as Owen noted in
HADOOP-6656, it doesn't seem to work as expected. (ddas)
- MAPREDUCE-1617. TestBadRecords failed once in our test runs. (Amar
Kamat via vinodkv).
- MAPREDUCE-587. Stream test TestStreamingExitStatus fails with Out of
Memory. (Amar Kamat via vinodkv).
- HDFS-1096. Reverting the patch since it caused build failures. (ddas)
- MAPREDUCE-1317. Reducing memory consumption of rumen objects. (htang)
- MAPREDUCE-1680. Add a metric to track number of heartbeats processed by the
JobTracker. (Richard King via acmurthy)
- MAPREDUCE-1683. Removes JNI calls to get jvm current/max heap usage in
ClusterStatus by default. (acmurthy)
- HADOOP-6687. user object in the subject in UGI should be reused in case
of a relogin. (jitendra)
- HADOOP-5647. TestJobHistory fails if /tmp/_logs is not writable to.
Testcase should not depend on /tmp. (Ravi Gummadi via vinodkv)
- MAPREDUCE-181. Bug fix for Secure job submission. (Ravi Gummadi via
vinodkv)
- MAPREDUCE-1635. ResourceEstimator does not work after MAPREDUCE-842.
(Amareshwari Sriramadasu via vinodkv)
- MAPREDUCE-1526. Cache the job related information while submitting the
job. (rksingh)
- HADOOP-6674. Turn off SASL checksums for RPCs. (jitendra via omalley)
- HADOOP-5958. Replace fork of DF with library call. (cdouglas via omalley)
- HDFS-999. Secondary namenode should login using kerberos if security
is configured. Bugfix to original patch. (jhoman)
- MAPREDUCE-1594. Support for SleepJobs in Gridmix (rksingh)
- HDFS-1007. Fix. ServiceName for delegation token for Hftp has hftp
port and not RPC port.
MAPREDUCE-1376. Support for varied user submissions in Gridmix (rksingh)
- HDFS-1080. SecondaryNameNode image transfer should use the defined
http address rather than local ip address (jhoman)
HADOOP-6661. User document for UserGroupInformation.doAs for secure
impersonation. (jitendra)
- MAPREDUCE-1624. Documents the job credentials and associated details
to do with delegation tokens (ddas)
HDFS-1036. Documentation for fetchdt for forrest (boryas)
HDFS-1039. New patch on top of previous patch. Gets namenode address
from conf. (jitendra)
- HADOOP-6656. Renew Kerberos TGT when 80% of the renew lifetime has been
used up. (omalley)
- HADOOP-6653. Protect against NPE in setupSaslConnection when real user is
null. (omalley)
- HADOOP-6649. An error in the previous committed patch. (jitendra)
- HADOOP-6652. ShellBasedUnixGroupsMapping shouldn't have a cache.
(ddas)
- HADOOP-6649. login object in UGI should be inside the subject
(jitendra)
- HADOOP-6637. Benchmark overhead of RPC session establishment
(shv via jitendra)
- HADOOP-6648. Credentials must ignore null tokens that can be generated
when using HFTP to talk to insecure clusters. (omalley)
- HADOOP-6632. Fix on JobTracker to reuse filesystem handles if possible.
(ddas)
- HADOOP-6647. balancer fails with "is not authorized for protocol
interface NamenodeProtocol" in secure environment (boryas)
- MAPREDUCE-1612. job conf file is not accessible from job history
web page. (Ravi Gummadi via vinodkv)
- MAPREDUCE-1611. Refresh nodes and refresh queues doesnt work with
service authorization enabled. (Amar Kamat via vinodkv)
- HADOOP-6644. util.Shell getGROUPS_FOR_USER_COMMAND method
name - should use common naming convention (boryas)
- MAPREDUCE-1609. Fixes a problem with localization of job log
directories when tasktracker is re-initialized that can result
in failed tasks. (Amareshwari Sriramadasu via yhemanth)
- MAPREDUCE-1610. Update forrest documentation for directory
structure of localized files. (Ravi Gummadi via yhemanth)
- MAPREDUCE-1532. Fixes a javadoc and an exception message in JobInProgress
when the authenticated user is different from the user in conf. (ddas)
- MAPREDUCE-1417. Update forrest documentation for private
and public distributed cache files. (Ravi Gummadi via yhemanth)
- HADOOP-6634. AccessControlList uses full-principal names to verify acls
causing queue-acls to fail (vinodkv)
HADOOP-6642. Fix javac, javadoc, findbugs warnings. (chrisdo via acmurthy)
- HDFS-1044. Cannot submit mapreduce job from secure client to
unsecure sever. (boryas)
HADOOP-6638. try to relogin in a case of failed RPC connection
(expired tgt) only in case the subject is loginUser or
proxyUgi.realUser. (boryas)
- HADOOP-6632. Support for using different Kerberos keys for different
instances of Hadoop services. (jitendra)
- HADOOP-6526. Need mapping from long principal names to local OS
user names. (jitendra)
- MAPREDUCE-1604. Update Forrest documentation for job authorization
ACLs. (Amareshwari Sriramadasu via yhemanth)
- HDFS-1045. In secure clusters, re-login is necessary for https
clients before opening connections (jhoman)
- HADOOP-6603. Addition to original patch to be explicit
about new method not being for general use. (jhoman)
- MAPREDUCE-1543. Add audit log messages for job and queue
access control checks. (Amar Kamat via yhemanth)
- MAPREDUCE-1606. Fixed occassinal timeout in TestJobACL. (Ravi Gummadi via
acmurthy)
- HADOOP-6633. normalize property names for JT/NN kerberos principal
names in configuration. (boryas)
- HADOOP-6613. Changes the RPC server so that version is checked first
on an incoming connection. (Kan Zhang via ddas)
- HADOOP-5592. Fix typo in Streaming doc in reference to GzipCodec.
(Corinne Chandel via tomwhite)
- MAPREDUCE-813. Updates Streaming and M/R tutorial documents.
(Corinne Chandel via ddas)
- MAPREDUCE-927. Cleanup of task-logs should happen in TaskTracker instead
of the Child. (Amareshwari Sriramadasu via vinodkv)
- HDFS-1039. Service should be set in the token in JspHelper.getUGI.
(jitendra)
- MAPREDUCE-1599. MRBench reuses jobConf and credentials there in.
(jitendra)
- MAPREDUCE-1522. FileInputFormat may use the default FileSystem for the
input path. (Tsz Wo (Nicholas), SZE via cdouglas)
- HDFS-1036. In DelegationTokenFetch pass Configuration object so
getDefaultUri will work correctly.
- HDFS-1038. In nn_browsedfscontent.jsp fetch delegation token only if
security is enabled. (jitendra)
- HDFS-1036. in DelegationTokenFetch dfs.getURI returns no port (boryas)
- HADOOP-6598. Verbose logging from the Group class (one more case)
(boryas)
- HADOOP-6627. Bad Connection to FS" message in FSShell should print
message from the exception (boryas)
- HDFS-1033. In secure clusters, NN and SNN should verify that the remote
principal during image and edits transfer (jhoman)
- HDFS-1005. Fixes a bug to do with calling the cross-realm API in Fsck
client. (ddas)
- MAPREDUCE-1422. Fix cleanup of localized job directory to work if files
with non-deletable permissions are created within it.
(Amar Kamat via yhemanth)
- HDFS-1007. Fixes bugs to do with 20S cluster talking to 20 over
hftp (borya)
- MAPREDUCE-1566. Fixes bugs in the earlier patch. (ddas)
- HDFS-992. A bug in backport for HDFS-992. (jitendra)
- HADOOP-6598. Remove verbose logging from the Groups class. (borya)
HADOOP-6620. NPE if renewer is passed as null in getDelegationToken.
(jitendra)
- HDFS-1023. Second Update to original patch to fix username (jhoman)
- MAPREDUCE-1435. Add test cases to already committed patch for this
jira, synchronizing changes with trunk. (yhemanth)
- HADOOP-6612. Protocols RefreshUserToGroupMappingsProtocol and
RefreshAuthorizationPolicyProtocol authorization settings thru
KerberosInfo (boryas)
- MAPREDUCE-1566. Bugfix for tests on top of the earlier patch. (ddas)
- MAPREDUCE-1566. Mechanism to import tokens and secrets from a file in to
the submitted job. (omalley)
- HADOOP-6603. Provide workaround for issue with Kerberos not
resolving corss-realm principal. (kan via jhoman)
- HDFS-1023. Update to original patch to fix username (jhoman)
- HDFS-814. Add an api to get the visible length of a
DFSDataInputStream. (hairong)
- HDFS-1023. Allow http server to start as regular user if https
principal is not defined. (jhoman)
- HDFS-1022. Merge all three test specs files (common, hdfs, mapred)
into one. (steffl)
- HDFS-101. DFS write pipeline: DFSClient sometimes does not detect
second datanode failure. (hairong)
- HDFS-1015. Intermittent failure in TestSecurityTokenEditLog. (jitendra)
- MAPREDUCE-1550. A bugfix on top of what was committed earlier (ddas).
- MAPREDUCE-1155. DISABLING THE TestStreamingExitStatus temporarily. (ddas)
- HDFS-1020. Changes the check for renewer from short name to long name
in the cancel/renew delegation token methods. (jitendra via ddas)
- HDFS-1019. Fixes values of delegation token parameters in
hdfs-default.xml. (jitendra via ddas)
- MAPREDUCE-1430. Fixes a backport issue with the earlier patch. (ddas)
- MAPREDUCE-1559. Fixes a problem in DelegationTokenRenewal class to
do with using the right credentials when talking to the NameNode.(ddas)
- MAPREDUCE-1550. Fixes a problem to do with creating a filesystem using
the user's UGI in the JobHistory browsing. (ddas)
- HADOOP-6609. Fix UTF8 to use a thread local DataOutputBuffer instead of
a static that was causing a deadlock in RPC. (omalley)
- HADOOP-6584. Fix javadoc warnings introduced by original HADOOP-6584
patch (jhoman)
HDFS-1017. browsedfs jsp should call JspHelper.getUGI rather than using
createRemoteUser(). (jhoman)
- MAPREDUCE-899. Modified LinuxTaskController to check that task-controller
has right permissions and ownership before performing any actions.
(Amareshwari Sriramadasu via yhemanth)
- HDFS-204. Revive number of files listed metrics. (hairong)
- HADOOP-6569. FsShell#cat should avoid calling uneccessary getFileStatus
before opening a file to read. (hairong)
- HDFS-1014. Error in reading delegation tokens from edit logs. (jitendra)
- HDFS-458. Add under-10-min tests from 0.22 to 0.20.1xx, only the tests
that already exist in 0.20.1xx (steffl)
- MAPREDUCE-1155. Just pulls out the TestStreamingExitStatus part of the
patch from jira (that went to 0.22). (ddas)
HADOOP-6600. Fix for branch backport only. Comparing of user should use
equals. (boryas).
- HDFS-1006. Fixes NameNode and SecondaryNameNode to use kerberizedSSL for
the http communication. (Jakob Homan via ddas)
- HDFS-1007. Fixes a bug on top of the earlier patch. (ddas)
- HDFS-1005. Fsck security. Makes it work over kerberized SSL (boryas and
jhoman)
- HDFS-1007. Makes HFTP and Distcp use kerberized SSL. (ddas)
- MAPREDUCE-1455. Fixes a testcase in the earlier patch.
(Ravi Gummadi via ddas)
- HDFS-992. Refactors block access token implementation to conform to the
generic Token interface. (Kan Zhang via ddas)
- HADOOP-6584. Adds KrbSSL connector for jetty. (Jakob Homan via ddas)
- HADOOP-6589. Add a framework for better error messages when rpc connections
fail to authenticate. (Kan Zhang via omalley)
- HADOOP-6600,HDFS-1003,MAPREDUCE-1539. mechanism for authorization check
for inter-server protocols(boryas)
- HADOOP-6580,HDFS-993,MAPREDUCE-1516. UGI should contain authentication
method.
- Namenode and JT should issue a delegation token only for kerberos
authenticated clients. (jitendra)
- HDFS-984,HADOOP-6573,MAPREDUCE-1537. Delegation Tokens should be persisted
in Namenode, and corresponding changes in common and mr. (jitendra)
- HDFS-994. Provide methods for obtaining delegation token from Namenode for
hftp and other uses. Incorporates HADOOP-6594: Update hdfs script to
provide fetchdt tool. (jitendra)
- HADOOP-6586. Log authentication and authorization failures and successes
(boryas)
- HDFS-991. Allow use of delegation tokens to authenticate to the
HDFS servlets. (omalley)
- HADOOP-1849. Add undocumented configuration parameter for per handler
call queue size in IPC Server. (shv)
HADOOP-6599. Split existing RpcMetrics with summary in RpcMetrics and
details information in RpcDetailedMetrics. (suresh)
- HDFS-985. HDFS should issue multiple RPCs for listing a large directory.
(hairong)
- HDFS-1000. Updates libhdfs to use the new UGI. (ddas)
- MAPREDUCE-1532. Ensures all filesystem operations at the client is done
as the job submitter. Also, changes the renewal to maintain list of tokens
to renew. (ddas)
- HADOOP-6596. Add a version field to the seialization of the
AbstractDelegationTokenIdentifier. (omalley)
- HADOOP-5561. Add javadoc.maxmemory to build.xml to allow larger memory.
(jkhoman via omalley)
- HADOOP-6579. Add a mechanism for encoding and decoding Tokens in to
url-safe strings. (omalley)
- MAPREDUCE-1354. Make incremental changes in jobtracker for
improving scalability (acmurthy)
- HDFS-999.Secondary namenode should login using kerberos if security
is configured(boryas)
- MAPREDUCE-1466. Added a private configuration variable
mapreduce.input.num.files, to store number of input files
being processed by M/R job. (Arun Murthy via yhemanth)
- MAPREDUCE-1403. Save file-sizes of each of the artifacts in
DistributedCache in the JobConf (Arun Murthy via yhemanth)
- HADOOP-6543. Fixes a compilation problem in the original commit. (ddas)
- MAPREDUCE-1520. Moves a call to setWorkingDirectory in Child to within
a doAs block. (Amareshwari Sriramadasu via ddas)
- HADOOP-6543. Allows secure clients to talk to unsecure clusters.
(Kan Zhang via ddas)
- MAPREDUCE-1505. Delays construction of the job client until it is really
required. (Arun C Murthy via ddas)
- HADOOP-6549. TestDoAsEffectiveUser should use ip address of the host
for superuser ip check. (jitendra)
- HDFS-464. Fix memory leaks in libhdfs. (Christian Kunz via suresh)
- HDFS-946. NameNode should not return full path name when lisitng a
diretory or getting the status of a file. (hairong)
- MAPREDUCE-1398. Fix TaskLauncher to stop waiting for slots on a TIP
that is killed / failed. (Amareshwari Sriramadasu via yhemanth)
- MAPREDUCE-1476. Fix the M/R framework to not call commit for special
tasks like job setup/cleanup and task cleanup.
(Amareshwari Sriramadasu via yhemanth)
- HADOOP-6467. Performance improvement for liststatus on directories in
hadoop archives. (mahadev)
- HADOOP-6558. archive does not work with distcp -update. (nicholas via
mahadev)
- HADOOP-6583. Captures authentication and authorization metrics. (ddas)
- MAPREDUCE-1316. Fixes a memory leak of TaskInProgress instances in
the jobtracker. (Amar Kamat via yhemanth)
- MAPREDUCE-670. Creates ant target for 10 mins patch test build.
(Jothi Padmanabhan via gkesavan)
- MAPREDUCE-1430. JobTracker should be able to renew delegation tokens
for the jobs(boryas)
- HADOOP-6551, HDFS-986, MAPREDUCE-1503. Change API for tokens to throw
exceptions instead of returning booleans. (omalley)
- HADOOP-6545. Changes the Key for the FileSystem to be UGI. (ddas)
- HADOOP-6572. Makes sure that SASL encryption and push to responder queue
for the RPC response happens atomically. (Kan Zhang via ddas)
- HDFS-965. Split the HDFS TestDelegationToken into two tests, of which
one proxy users and the other normal users. (jitendra via omalley)
- HADOOP-6560. HarFileSystem throws NPE for har://hdfs-/foo (nicholas via
mahadev)
- MAPREDUCE-686. Move TestSpeculativeExecution.Fake* into a separate class
so that it can be used by other tests. (Jothi Padmanabhan via sharad)
- MAPREDUCE-181. Fixes an issue in the use of the right config. (ddas)
- MAPREDUCE-1026. Fixes a bug in the backport. (ddas)
- HADOOP-6559. Makes the RPC client automatically re-login when the SASL
connection setup fails. This is applicable to only keytab based logins.
(ddas)
- HADOOP-2141. Backport changes made in the original JIRA to aid
fast unit tests in Map/Reduce. (Amar Kamat via yhemanth)
- HADOOP-6382. Import the mavenizable pom file structure and adjust
the build targets and bin scripts. (gkesvan via ltucker)
- MAPREDUCE-1425. archive throws OutOfMemoryError (mahadev)
- MAPREDUCE-1399. The archive command shows a null error message. (nicholas)
- HADOOP-6552. Puts renewTGT=true and useTicketCache=true for the keytab
kerberos options. (ddas)
- MAPREDUCE-1433. Adds delegation token for MapReduce (ddas)
- HADOOP-4359. Fixes a bug in the earlier backport. (ddas)
- HADOOP-6547, HDFS-949, MAPREDUCE-1470. Move Delegation token into Common
so that we can use it for MapReduce also. It is a combined patch for
common, hdfs and mr. (jitendra)
- HADOOP-6510,HDFS-935,MAPREDUCE-1464. Support for doAs to allow
authenticated superuser to impersonate proxy users. It is a combined
patch with compatible fixes in HDFS and MR. (jitendra)
- MAPREDUCE-1435. Fixes the way symlinks are handled when cleaning up
work directory files. (Ravi Gummadi via yhemanth)
- MAPREDUCE-6419. Fixes a bug in the backported patch. (ddas)
- MAPREDUCE-1457. Fixes JobTracker to get the FileSystem object within
getStagingAreaDir within a privileged block. Fixes Child.java to use the
appropriate UGIs while getting the TaskUmbilicalProtocol proxy and while
executing the task. Contributed by Jakob Homan. (ddas)
- MAPREDUCE-1440. Replace the long user name in MapReduce with the local
name. (ddas)
- HADOOP-6419. Adds SASL based authentication to RPC. Also includes the
MAPREDUCE-1335 and HDFS-933 patches. Contributed by Kan Zhang.
(ddas)
HADOOP-6538. Sets hadoop.security.authentication to simple by default.
(ddas)
- HDFS-938. Replace calls to UGI.getUserName() with
UGI.getShortUserName()(boryas)
- HADOOP-6544. fix ivy settings to include JSON jackson.codehause.org
libs for .20 (boryas)
HDFS-907. Add tests for getBlockLocations and totalLoad metrics. (rphulari)
- HADOOP-6204. Implementing aspects development and fault injeciton
framework for Hadoop (cos)
- MAPREDUCE-1432. Adds hooks in the jobtracker and tasktracker
for loading the tokens in the user's ugi. This is required for
the copying of files from the hdfs. (Devaraj Das vi boryas)
- MAPREDUCE-1383. Automates fetching of delegation tokens in File*Formats
Distributed Cache and Distcp. Also, provides a config
mapreduce.job.hdfs-servers that the jobs can populate with a comma
separated list of namenodes. The job client automatically fetches
delegation tokens from those namenodes.
- HADOOP-6337. Update FilterInitializer class to be more visible
and take a conf for further development. (jhoman)
- HADOOP-6520. UGI should load tokens from the environment. (jitendra)
- HADOOP-6517, HADOOP-6518. Ability to add/get tokens from
UserGroupInformation & Kerberos login in UGI should honor KRB5CCNAME
(jitendra)
- HADOOP-6299. Reimplement the UserGroupInformation to use the OS
specific and Kerberos JAAS login. (jhoman, ddas, oom)
HADOOP-6524. Contrib tests are failing Clover'ed build. (cos)
- MAPREDUCE-842. Fixing a bug in the earlier version of the patch
related to improper localization of the job token file.
(Ravi Gummadi via yhemanth)
- HDFS-919. Create test to validate the BlocksVerified metric (Gary Murry
via cos)
- MAPREDUCE-1186. Modified code in distributed cache to set
permissions only on required set of localized paths.
(Amareshwari Sriramadasu via yhemanth)
- HDFS-899. Delegation Token Implementation. (Jitendra Nath Pandey)
- MAPREDUCE-896. Enhance tasktracker to cleanup files that might have
been created by user tasks with non-writable permissions.
(Ravi Gummadi via yhemanth)
- HADOOP-5879. Read compression level and strategy from Configuration for
gzip compression. (He Yongqiang via cdouglas)
- HADOOP-6161. Add get/setEnum methods to Configuration. (cdouglas)
- HADOOP-6382 Mavenize the build.xml targets and update the bin scripts
in preparation for publishing POM files (giri kesavan via ltucker)
- HDFS-737. Add full path name of the file to the block information and
summary of total number of files, blocks, live and deadnodes to
metasave output. (Jitendra Nath Pandey via suresh)
- HADOOP-6577. Add hidden configuration option "ipc.server.max.response.size"
to change the default 1 MB, the maximum size when large IPC handler
response buffer is reset. (suresh)
- HADOOP-6521. Fix backward compatiblity issue with umask when applications
use deprecated param dfs.umask in configuration or use
FsPermission.setUMask(). (suresh)
- HDFS-737. Add full path name of the file to the block information and
summary of total number of files, blocks, live and deadnodes to
metasave output. (Jitendra Nath Pandey via suresh)
- HADOOP-6521. Fix backward compatiblity issue with umask when applications
use deprecated param dfs.umask in configuration or use
FsPermission.setUMask(). (suresh)
- MAPREDUCE-433. Use more reliable counters in TestReduceFetch.
(Christopher Douglas via ddas)
- MAPREDUCE-744. Introduces the notion of a public distributed cache.
(ddas)
- MAPREDUCE-1140. Fix DistributedCache to not decrement reference counts
for unreferenced files in error conditions.
(Amareshwari Sriramadasu via yhemanth)
- MAPREDUCE-1284. Fix fts_open() call in task-controller that was failing
LinuxTaskController unit tests. (Ravi Gummadi via yhemanth)
- MAPREDUCE-1098. Fixed the distributed-cache to not do i/o while
holding a global lock.
(Amareshwari Sriramadasu via acmurthy)
- MAPREDUCE-1338. Introduces the notion of token cache using which
tokens and secrets can be sent by the Job client to the JobTracker.
(Boris Shkolnik)
- HADOOP-6495. Identifier should be serialized after the password is created
In Token constructor. (Jitendra Nath Pandey)
- HADOOP-6506. Failing tests prevent the rest of test targets from
execution. (cos)
- HADOOP-5457. Fix to continue to run builds even if contrib test fails.
(gkesavan)
- MAPREDUCE-856. Setup secure permissions for distributed cache files.
(Vinod Kumar Vavilapalli via yhemanth)
- MAPREDUCE-871. Fix ownership of Job/Task local files to have correct
group ownership according to the egid of the tasktracker.
(Vinod Kumar Vavilapalli via yhemanth)
MAPREDUCE-476. Extend DistributedCache to work locally (LocalJobRunner).
(Philip Zeyliger via tomwhite)
- MAPREDUCE-711. Removed Distributed Cache from Common, to move it under
Map/Reduce. (Vinod Kumar Vavilapalli via yhemanth)
- MAPREDUCE-478. Allow map and reduce jvm parameters, environment
variables and ulimit to be set separately. (acmurthy)
MAPREDUCE-842. Setup secure permissions for localized job files,
intermediate outputs and log files on tasktrackers.
(Vinod Kumar Vavilapalli via yhemanth)
- MAPREDUCE-408. Fixes an assertion problem in TestKillSubProcesses.
(Ravi Gummadi via ddas)
- HADOOP-4041. IsolationRunner does not work as documented.
(Philip Zeyliger via tomwhite)
- MAPREDUCE-181. Changes the job submission process to be secure.
(Devaraj Das)
- HADOOP-5737. Fixes a problem in the way the JobTracker used to talk to
other daemons like the NameNode to get the job's files. Also adds APIs
in the JobTracker to get the FileSystem objects as per the JobTracker's
configuration. (Amar Kamat via ddas)
HADOOP-5771. Implements unit tests for LinuxTaskController.
(Sreekanth Ramakrishnan and Vinod Kumar Vavilapalli via yhemanth)
- HADOOP-4656, HDFS-685, MAPREDUCE-1083. Use the user-to-groups mapping
service in the NameNode and JobTracker. Combined patch for these 3 jiras
otherwise tests fail. (Jitendra Nath Pandey)
- MAPREDUCE-1250. Refactor job token to use a common token interface.
(Jitendra Nath Pandey)
- MAPREDUCE-1026. Shuffle should be secure. (Jitendra Nath Pandey)
- HADOOP-4268. Permission checking in fsck. (Jitendra Nath Pandey)
- HADOOP-6415. Adding a common token interface for both job token and
delegation token. (Jitendra Nath Pandey)
- HADOOP-6367, HDFS-764. Moving Access Token implementation from Common to
HDFS. These two jiras must be committed together otherwise build will
fail. (Jitendra Nath Pandey)
- HDFS-409. Add more access token tests
(Jitendra Nath Pandey)
- HADOOP-6132. RPC client opens an extra connection for VersionedProtocol.
(Jitendra Nath Pandey)
- HDFS-445. pread() fails when cached block locations are no longer valid.
(Jitendra Nath Pandey)
- HDFS-195. Need to handle access token expiration when re-establishing the
pipeline for dfs write. (Jitendra Nath Pandey)
- HADOOP-6176. Adding a couple private methods to AccessTokenHandler
for testing purposes. (Jitendra Nath Pandey)
- HADOOP-5824. remove OP_READ_METADATA functionality from Datanode.
(Jitendra Nath Pandey)
- HADOOP-4359. Access Token: Support for data access authorization
checking on DataNodes. (Jitendra Nath Pandey)
- MAPREDUCE-1372. Fixed a ConcurrentModificationException in jobtracker.
(Arun C Murthy via yhemanth)
- MAPREDUCE-1316. Fix jobs' retirement from the JobTracker to prevent memory
leaks via stale references. (Amar Kamat via acmurthy)
- MAPREDUCE-1342. Fixed deadlock in global blacklisting of tasktrackers.
(Amareshwari Sriramadasu via acmurthy)
- HADOOP-6460. Reinitializes buffers used for serializing responses in ipc
server on exceeding maximum response size to free up Java heap. (suresh)
- MAPREDUCE-1100. Truncate user logs to prevent TaskTrackers' disks from
filling up. (Vinod Kumar Vavilapalli via acmurthy)
- MAPREDUCE-1143. Fix running task counters to be updated correctly
when speculative attempts are running for a TIP.
(Rahul Kumar Singh via yhemanth)
- HADOOP-6151, 6281, 6285, 6441. Add HTML quoting of the parameters to all
of the servlets to prevent XSS attacks. (omalley)
- MAPREDUCE-896. Fix bug in earlier implementation to prevent
spurious logging in tasktracker logs for absent file paths.
(Ravi Gummadi via yhemanth)
- MAPREDUCE-676. Fix Hadoop Vaidya to ensure it works for map-only jobs.
(Suhas Gogate via acmurthy)
- HADOOP-5582. Fix Hadoop Vaidya to use new Counters in
org.apache.hadoop.mapreduce package. (Suhas Gogate via acmurthy)
- HDFS-595. umask settings in configuration may now use octal or
symbolic instead of decimal. Update HDFS tests as such. (jghoman)
- MAPREDUCE-1068. Added a verbose error message when user specifies an
incorrect -file parameter. (Amareshwari Sriramadasu via acmurthy)
- MAPREDUCE-1171. Allow the read-error notification in shuffle to be
configurable. (Amareshwari Sriramadasu via acmurthy)
- MAPREDUCE-353. Allow shuffle read and connection timeouts to be
configurable. (Amareshwari Sriramadasu via acmurthy)
- HDFS-781. Namenode metrics PendingDeletionBlocks is not decremented.
(suresh)
MAPREDUCE-1185. Redirect running job url to history url if job is already
retired. (Amareshwari Sriramadasu and Sharad Agarwal via sharad)
- MAPREDUCE-754. Fix NPE in expiry thread when a TT is lost. (Amar Kamat
via sharad)
- MAPREDUCE-896. Modify permissions for local files on tasktracker before
deletion so they can be deleted cleanly. (Ravi Gummadi via yhemanth)
HADOOP-5771. Implements unit tests for LinuxTaskController.
(Sreekanth Ramakrishnan and Vinod Kumar Vavilapalli via yhemanth)
- MAPREDUCE-1124. Import Gridmix3 and Rumen. (cdouglas)
- MAPREDUCE-1063. Document gridmix benchmark. (cdouglas)
- HDFS-758. Changes to report status of decommissioining on the namenode web
UI. (jitendra)
- HADOOP-6234. Add new option dfs.umaskmode to set umask in configuration
to use octal or symbolic instead of decimal. (Jakob Homan via suresh)
- MAPREDUCE-1147. Add map output counters to new API. (Amar Kamat via
cdouglas)
- MAPREDUCE-1182. Fix overflow in reduce causing allocations to exceed the
configured threshold. (cdouglas)
- HADOOP-4933. Fixes a ConcurrentModificationException problem that shows up
when the history viewer is accessed concurrently.
(Amar Kamat via ddas)
- MAPREDUCE-1140. Fix DistributedCache to not decrement reference counts for
unreferenced files in error conditions.
(Amareshwari Sriramadasu via yhemanth)
- HADOOP-6203. FsShell rm/rmr error message indicates exceeding Trash quota
and suggests using -skpTrash, when moving to trash fails.
(Boris Shkolnik via suresh)
- HADOOP-5675. Do not launch a job if DistCp has no work to do. (Tsz Wo
(Nicholas), SZE via cdouglas)
- HDFS-457. Better handling of volume failure in Data Node storage,
This fix is a port from hdfs-0.22 to common-0.20 by Boris Shkolnik.
Contributed by Erik Steffl
- HDFS-625. Fix NullPointerException thrown from ListPathServlet.
Contributed by Suresh Srinivas.
- HADOOP-6343. Log unexpected throwable object caught in RPC.
Contributed by Jitendra Nath Pandey
- MAPREDUCE-1186. Fixed DistributedCache to do a recursive chmod on just the
per-cache directory, not all of mapred.local.dir.
(Amareshwari Sriramadasu via acmurthy)
- MAPREDUCE-1231. Add an option to distcp to ignore checksums when used with
the upgrade option.
(Jothi Padmanabhan via yhemanth)
- MAPREDUCE-1219. Fixed JobTracker to not collect per-job metrics, thus
easing load on it. (Amareshwari Sriramadasu via acmurthy)
- HDFS-761. Fix failure to process rename operation from edits log due to
quota verification. (suresh)
- MAPREDUCE-1196. Fix FileOutputCommitter to use the deprecated cleanupJob
api correctly. (acmurthy)
- HADOOP-6344. rm and rmr immediately delete files rather than sending
to trash, despite trash being enabled, if a user is over-quota. (jhoman)
- MAPREDUCE-1160. Reduce verbosity of log lines in some Map/Reduce classes
to avoid filling up jobtracker logs on a busy cluster.
(Ravi Gummadi and Hong Tang via yhemanth)
- HDFS-587. Add ability to run HDFS with MR test on non-default queue,
also updated junit dependendcy from junit-3.8.1 to junit-4.5 (to make
it possible to use Configured and Tool to process command line to
be able to specify a queue). Contributed by Erik Steffl.
- MAPREDUCE-1158. Fix JT running maps and running reduces metrics.
(sharad)
- MAPREDUCE-947. Fix bug in earlier implementation that was
causing unit tests to fail.
(Ravi Gummadi via yhemanth)
- MAPREDUCE-1062. Fix MRReliabilityTest to work with retired jobs
(Contributed by Sreekanth Ramakrishnan)
- MAPREDUCE-1090. Modified log statement in TaskMemoryManagerThread to
include task attempt id. (yhemanth)
- MAPREDUCE-1098. Fixed the distributed-cache to not do i/o while
holding a global lock. (Amareshwari Sriramadasu via acmurthy)
- MAPREDUCE-1048. Add occupied/reserved slot usage summary on
jobtracker UI. (Amareshwari Sriramadasu via sharad)
- MAPREDUCE-1103. Added more metrics to Jobtracker. (sharad)
- MAPREDUCE-947. Added commitJob and abortJob apis to OutputCommitter.
Enhanced FileOutputCommitter to create a _SUCCESS file for successful
jobs. (Amar Kamat & Jothi Padmanabhan via acmurthy)
- MAPREDUCE-1105. Remove max limit configuration in capacity scheduler in
favor of max capacity percentage thus allowing the limit to go over
queue capacity. (Rahul Kumar Singh via yhemanth)
- MAPREDUCE-1086. Setup Hadoop logging environment for tasks to point to
task related parameters. (Ravi Gummadi via yhemanth)
- MAPREDUCE-739. Allow relative paths to be created inside archives.
(mahadev)
- HADOOP-6097. Multiple bugs w/ Hadoop archives (mahadev)
- HADOOP-6231. Allow caching of filesystem instances to be disabled on a
per-instance basis (ben slusky via mahadev)
- MAPREDUCE-826. harchive doesn't use ToolRunner / harchive returns 0 even
if the job fails with exception (koji via mahadev)
- HDFS-686. NullPointerException is thrown while merging edit log and
image. (hairong)
- HDFS-709. Fix TestDFSShell failure due to rename bug introduced by
HDFS-677. (suresh)
- HDFS-677. Rename failure when both source and destination quota exceeds
results in deletion of source. (suresh)
- HADOOP-6284. Add a new parameter, HADOOP_JAVA_PLATFORM_OPTS, to
hadoop-config.sh so that it allows setting java command options for
JAVA_PLATFORM. (Koji Noguchi via szetszwo)
- MAPREDUCE-732. Removed spurious log statements in the node
blacklisting logic. (Sreekanth Ramakrishnan via yhemanth)
- MAPREDUCE-144. Includes dump of the process tree in task diagnostics when
a task is killed due to exceeding memory limits.
(Vinod Kumar Vavilapalli via yhemanth)
- MAPREDUCE-979. Fixed JobConf APIs related to memory parameters to
return values of new configuration variables when deprecated
variables are disabled. (Sreekanth Ramakrishnan via yhemanth)
- MAPREDUCE-277. Makes job history counters available on the job history
viewers. (Jothi Padmanabhan via ddas)
- HADOOP-5625. Add operation duration to clienttrace. (Lei Xu
via cdouglas)
- HADOOP-5222. Add offset to datanode clienttrace. (Lei Xu via cdouglas)
- HADOOP-6218. Adds a feature where TFile can be split by Record
Sequence number. Contributed by Hong Tang and Raghu Angadi.
- MAPREDUCE-1088. Changed permissions on JobHistory files on local disk to
0744. Contributed by Arun C. Murthy.
- HADOOP-6304. Use java.io.File.set{Readable|Writable|Executable} where
possible in RawLocalFileSystem. Contributed by Arun C. Murthy.
MAPREDUCE-270. Fix the tasktracker to optionally send an out-of-band
heartbeat on task-completion for better job-latency. Contributed by
Arun C. Murthy
Configuration changes:
add mapreduce.tasktracker.outofband.heartbeat
- MAPREDUCE-1030. Fix capacity-scheduler to assign a map and a reduce task
per-heartbeat. Contributed by Rahuk K Singh.
- MAPREDUCE-1028. Fixed number of slots occupied by cleanup tasks to one
irrespective of slot size for the job. Contributed by Ravi Gummadi.
- MAPREDUCE-964. Fixed start and finish times of TaskStatus to be
consistent, thereby fixing inconsistencies in metering tasks.
Contributed by Sreekanth Ramakrishnan.
- HADOOP-5976. Add a new command, classpath, to the hadoop
script. Contributed by Owen O'Malley and Gary Murry
- HADOOP-5784. Makes the number of heartbeats that should arrive
a second at the JobTracker configurable. Contributed by
Amareshwari Sriramadasu.
- MAPREDUCE-945. Modifies MRBench and TestMapRed to use
ToolRunner so that options such as queue name can be
passed via command line. Contributed by Sreekanth Ramakrishnan.
- HADOOP:5420 Correct bug in earlier implementation
by Arun C. Murthy
- HADOOP-5363 Add support for proxying connections to multiple
clusters with different versions to hdfsproxy. Contributed
by Zhiyong Zhang
- HADOOP-5780. Improve per block message prited by -metaSave
in HDFS. (Raghu Angadi)
- HADOOP-6227. Fix Configuration to allow final parameters to be set
to null and prevent them from being overridden. Contributed by
Amareshwari Sriramadasu.
- MAPREDUCE-430 Added patch supplied by Amar Kamat to allow roll forward
on branch to includ externally committed patch.
- MAPREDUCE-768. Provide an option to dump jobtracker configuration in
JSON format to standard output. Contributed by V.V.Chaitanya
- MAPREDUCE-834 Correct an issue created by merging this issue with
patch attached to external Jira.
- HADOOP-6184 Provide an API to dump Configuration in a JSON format.
Contributed by V.V.Chaitanya Krishna.
- MAPREDUCE-745 Patch added for this issue to allow branch-0.20 to
merge cleanly.
- MAPREDUCE-478 Allow map and reduce jvm parameters, environment
variables and ulimit to be set separately.
- MAPREDUCE-682 Removes reservations on tasktrackers which are blacklisted.
Contributed by Sreekanth Ramakrishnan.
- HADOOP:5420 Support killing of process groups in LinuxTaskController
binary
- HADOOP-5488 Removes the pidfile management for the Task JVM from the
framework and instead passes the PID back and forth between the
TaskTracker and the Task processes. Contributed by Ravi Gummadi.
- MAPREDUCE-467 Provide ability to collect statistics about total tasks and
succeeded tasks in different time windows.
- MAPREDUCE-817. Add a cache for retired jobs with minimal job
info and provide a way to access history file url
- MAPREDUCE-814. Provide a way to configure completed job history
files to be on HDFS.
- MAPREDUCE-838 Fixes a problem in the way commit of task outputs
happens. The bug was that even if commit failed, the task would be
declared as successful. Contributed by Amareshwari Sriramadasu.
- MAPREDUCE-809 Fix job-summary logs to correctly record final status of
FAILED and KILLED jobs.
- MAPREDUCE-740 Log a job-summary at the end of a job, while
allowing it to be configured to use a custom appender if desired.
- MAPREDUCE-771 Fixes a bug which delays normal jobs in favor of
high-ram jobs.
- HADOOP-5420 Support setsid based kill in LinuxTaskController.
- MAPREDUCE-733 Fixes a bug that when a task tracker is killed ,
it throws exception. Instead it should catch it and process it and
allow the rest of the flow to go through
- MAPREDUCE-734 Fixes a bug which prevented hi ram jobs from being
removed from the scheduler queue.
- MAPREDUCE-693 Fixes a bug that when a job is submitted and the
JT is restarted (before job files have been written) and the job
is killed after recovery, the conf files fail to be moved to the
"done" subdirectory.
- MAPREDUCE-722 Fixes a bug where more slots are getting reserved
for HiRAM job tasks than required.
- MAPREDUCE-683 TestJobTrackerRestart failed because of stale
filemanager cache (which was created once per jvm). This patch makes
sure that the filemanager is inited upon every JobHistory.init()
and hence upon every restart. Note that this wont happen in production
as upon a restart the new jobtracker will start in a new jvm and
hence a new cache will be created.
- MAPREDUCE-709 Fixes a bug where node health check script does
not display the correct message on timeout.
- MAPREDUCE-708 Fixes a bug where node health check script does
not refresh the "reason for blacklisting".
- MAPREDUCE-522 Rewrote TestQueueCapacities to make it simpler
and avoid timeout errors.
- MAPREDUCE-532 Provided ability in the capacity scheduler to
limit the number of slots that can be concurrently used per queue
at any given time.
- MAPREDUCE-211 Provides ability to run a health check script on
the tasktracker nodes and blacklist nodes if they are unhealthy.
Contributed by Sreekanth Ramakrishnan.
- MAPREDUCE-516 Remove .orig file included by mistake.
- MAPREDUCE-416 Moves the history file to a "done" folder whenever
a job completes.
- HADOOP-5980 Previously, task spawned off by LinuxTaskController
didn't get LD_LIBRARY_PATH in their environment. The tasks will now
get same LD_LIBRARY_PATH value as when spawned off by
DefaultTaskController.
- HADOOP-5981 This issue completes the feature mentioned in
HADOOP-2838. HADOOP-2838 provided a way to set env variables in
child process. This issue provides a way to inherit tt's env variables
and append or reset it. So now X=$X:y will inherit X (if there) and
append y to it.
- HADOOP-5419 This issue is to provide an improvement on the
existing M/R framework to let users know which queues they have
access to, and for what operations. One use case for this would
that currently there is no easy way to know if the user has access
to submit jobs to a queue, until it fails with an access control
exception.
- HADOOP-5420 Support setsid based kill in LinuxTaskController.
- HADOOP-5643 Added the functionality to refresh jobtrackers node
list via command line (bin/hadoop mradmin -refreshNodes). The command
should be run as the jobtracker owner (jobtracker process owner)
or from a super group (mapred.permissions.supergroup).
- HADOOP-2838 Now the users can set environment variables using
mapred.child.env. They can do the following X=Y : set X to Y X=$X:Y
: Append Y to X (which should be taken from the tasktracker)
HADOOP-5818. Revert the renaming from FSNamesystem.checkSuperuserPrivilege
to checkAccess by HADOOP-5643. (Amar Kamat via szetszwo)
- HADOOP-5801. Fixes the problem: If the hosts file is changed across restart
then it should be refreshed upon recovery so that the excluded hosts are
lost and the maps are re-executed. (Amar Kamat via ddas)
- HADOOP-5643. HADOOP-5643. Adds a way to decommission TaskTrackers
while the JobTracker is running. (Amar Kamat via ddas)
- HADOOP-5419. Provide a facility to query the Queue ACLs for the
current user. (Rahul Kumar Singh via yhemanth)
- HADOOP-5733. Add map/reduce slot capacity and blacklisted capacity to
JobTracker metrics. (Sreekanth Ramakrishnan via cdouglas)
- HADOOP-5738. Split "waiting_tasks" JobTracker metric into waiting maps and
waiting reduces. (Sreekanth Ramakrishnan via cdouglas)
- HADOOP-4842. Streaming now allows specifiying a command for the combiner.
(Amareshwari Sriramadasu via ddas)
- HADOOP-4490. Provide ability to run tasks as job owners.
(Sreekanth Ramakrishnan via yhemanth)
- HADOOP-5442. Paginate jobhistory display and added some search
capabilities. (Amar Kamat via acmurthy)
- HADOOP-3327. Improves handling of READ_TIMEOUT during map output copying.
(Amareshwari Sriramadasu via ddas)
- HADOOP-5113. Fixed logcondense to remove files for usernames
beginning with characters specified in the -l option.
(Peeyush Bishnoi via yhemanth)
- HADOOP-2898. Provide an option to specify a port range for
Hadoop services provisioned by HOD.
(Peeyush Bishnoi via yhemanth)
- HADOOP-4930. Implement a Linux native executable that can be used to
launch tasks as users. (Sreekanth Ramakrishnan via yhemanth)