Hadoop 0.20.2 Release Notes

These release notes include new developer and user-facing incompatibilities, features, and major improvements. The table below is sorted by Component.

Changes Since Hadoop 0.20.1

Common

Bug

Task

HDFS

Bug

Test

MapReduce

Bug

Improvement

New Feature

Changes Since Hadoop 0.20.0

Common

Sub-task

Bug

Improvement

New Feature

HDFS

Bug

Improvement

Map/Reduce

Bug

Improvement

Changes Since Hadoop 0.19.1

IssueComponentNotes
HADOOP-3344buildChanged build procedure for libhdfs to build correctly for different platforms. Build instructions are in the Jira item.
HADOOP-4253confRemoved from class org.apache.hadoop.fs.RawLocalFileSystem deprecated methods public String getName(), public void lock(Path p, boolean shared) and public void release(Path p).
HADOOP-4454confChanged processing of conf/slaves file to allow # to begin a comment.
HADOOP-4631confSplit hadoop-default.xml into core-default.xml, hdfs-default.xml and mapreduce-default.xml.
HADOOP-4035contrib/capacity-schedChanged capacity scheduler policy to take note of task memory requirements and task tracker memory availability.
HADOOP-4445contrib/capacity-schedChanged JobTracker UI to better present the number of active tasks.
HADOOP-4576contrib/capacity-schedChanged capacity scheduler UI to better present number of running and pending tasks.
HADOOP-4179contrib/chukwaIntroduced Vaidya rule based performance diagnostic tool for Map/Reduce jobs.
HADOOP-4827contrib/chukwaImproved framework for data aggregation in Chuckwa.
HADOOP-4843contrib/chukwaIntroduced Chuckwa collection of job history.
HADOOP-5030contrib/chukwaChanged RPM install location to the value specified by build.properties file.
HADOOP-5531contrib/chukwaDisabled Chukwa unit tests for 0.20 branch only.
HADOOP-4789contrib/fair-shareChanged fair scheduler to divide resources equally between pools, not jobs.
HADOOP-4873contrib/fair-shareChanged fair scheduler UI to display minMaps and minReduces variables.
HADOOP-3750dfsRemoved deprecated method parseArgs from org.apache.hadoop.fs.FileSystem.
HADOOP-4029dfsAdded name node storage information to the dfshealth page, and moved data node information to a separated page.
HADOOP-4103dfsModified dfsadmin -report to report under replicated blocks. blocks with corrupt replicas, and missing blocks".
HADOOP-4567dfsChanged GetFileBlockLocations to return topology information for nodes that host the block replicas.
HADOOP-4572dfsMoved org.apache.hadoop.hdfs.{CreateEditsLog, NNThroughputBenchmark} to org.apache.hadoop.hdfs.server.namenode.
HADOOP-4618dfsMoved HTTP server from FSNameSystem to NameNode. Removed FSNamesystem.getNameNodeInfoPort(). Replaced FSNamesystem.getDFSNameNodeMachine() and FSNamesystem.getDFSNameNodePort() with new method FSNamesystem.getDFSNameNodeAddress(). Removed constructor NameNode(bindAddress, conf).
HADOOP-4826dfsIntroduced new dfsadmin command saveNamespace to command the name service to do an immediate save of the file system image.
HADOOP-4970dfsChanged trash facility to use absolute path of the deleted file.
HADOOP-5468documentationReformatted HTML documentation for Hadoop to use submenus at the left column.
HADOOP-3497fsChanged the semantics of file globbing with a PathFilter (using the globStatus method of FileSystem). Previously, the filtering was too restrictive, so that a glob of /*/* and a filter that only accepts /a/b would not have matched /a/b. With this change /a/b does match.
HADOOP-4234fsChanged KFS glue layer to allow applications to interface with multiple KFS metaservers.
HADOOP-4422fs/s3Modified Hadoop file system to no longer create S3 buckets. Applications can create buckets for their S3 file systems by other means, for example, using the JetS3t API.
HADOOP-3063ioIntroduced BloomMapFile subclass of MapFile that creates a Bloom filter from all keys.
HADOOP-1230mapredReplaced parameters with context obejcts in Mapper, Reducer, Partitioner, InputFormat, and OutputFormat classes.
HADOOP-1650mapredUpgraded all core servers to use Jetty 6
HADOOP-3923mapredMoved class org.apache.hadoop.mapred.StatusHttpServer to org.apache.hadoop.http.HttpServer.
HADOOP-3986mapredRemoved classes org.apache.hadoop.mapred.JobShell and org.apache.hadoop.mapred.TestJobShell. Removed from JobClient methods static void setCommandLineConfig(Configuration conf) and public static Configuration getCommandLineConfig().
HADOOP-4188mapredRemoved Task's dependency on concrete file systems by taking list from FileSystem class. Added statistics table to FileSystem class. Deprecated FileSystem method getStatistics(Class<? extends FileSystem> cls).
HADOOP-4210mapredChanged public class org.apache.hadoop.mapreduce.ID to be an abstract class. Removed from class org.apache.hadoop.mapreduce.ID the methods public static ID read(DataInput in) and public static ID forName(String str).
HADOOP-4305mapredImproved TaskTracker blacklisting strategy to better exclude faulty tracker from executing tasks.
HADOOP-4435mapredChanged JobTracker web status page to display the amount of heap memory in use. This changes the JobSubmissionProtocol.
HADOOP-4565mapredImproved MultiFileInputFormat so that multiple blocks from the same node or same rack can be combined into a single split.
HADOOP-4749mapredAdded a new counter REDUCE_INPUT_BYTES.
HADOOP-4783mapredChanged history directory permissions to 750 and history file permissions to 740.
HADOOP-3422metricsChanged names of ganglia metrics to avoid conflicts and to better identify source function.
HADOOP-4284securityIntroduced HttpServer method to support global filters.
HADOOP-4575securityIntroduced independent HSFTP proxy server for authenticated access to clusters.
HADOOP-4661tools/distcpIntroduced distch tool for parallel ch{mod, own, grp}.