Issue | Component | Notes |
HADOOP-2828 | conf |
Remove these deprecated methods in
org.apache.hadoop.conf.Configuration:
|
HADOOP-2410 | contrib/ec2 | The command hadoop-ec2 run has been replaced by hadoop-ec2 launch-cluster <group> <number of instances>, and hadoop-ec2 start-hadoop has been removed since Hadoop is started on instance start up. See http://wiki.apache.org/hadoop/AmazonEC2 for details. |
HADOOP-2796 | contrib/hod | Added a provision to reliably detect a failing script's exit code. When the HOD script option returns a non-zero exit code, look for a script.exitcode file written to the HOD cluster directory. If this file is present, it means the script failed with the exit code given in the file. |
HADOOP-2775 | contrib/hod | Added A unit testing framework based on pyunit to HOD. Developers contributing patches to HOD should now contribute unit tests along with the patches when possible. |
HADOOP-3137 | contrib/hod | The HOD version is now the same as the Hadoop version. |
HADOOP-2855 | contrib/hod | HOD now handles relative paths correctly for important HOD options such as the cluster directory, tarball option, and script file. |
HADOOP-2899 | contrib/hod | HOD now cleans up the HOD generated mapred system directory at cluster deallocation time. |
HADOOP-2982 | contrib/hod | The number of free nodes in the cluster is computed using a better algorithm that filters out inconsistencies in node status as reported by Torque. |
HADOOP-2947 | contrib/hod | The stdout and stderr streams of daemons are redirected to files that are created under the hadoop log directory. Users can now send a kill 3 signal to the daemons to get stack traces and thread dumps for debugging. |
HADOOP-3168 | contrib/streaming | Decreased the frequency of logging in Hadoop streaming (from every 100 records to every 10,000 records). |
HADOOP-3040 | contrib/streaming | Fixed a critical bug to restore important functionality in Hadoop streaming. If the first character on a line is the separator, then an empty key is assumed and the whole line is the value. |
HADOOP-2820 | contrib/streaming |
Removed these deprecated classes:
|
HADOOP-3280 | contrib/streaming | Added the mapred.child.ulimit configuration variable to limit the maximum virtual memory allocated to processes launched by the Map-Reduce framework. This can be used to control both the Mapper/Reducer tasks and applications using Hadoop pipes, Hadoop streaming etc. |
HADOOP-2657 | dfs | Added the new API DFSOututStream.flush() to flush all outstanding data to DataNodes. |
HADOOP-2219 | dfs |
Added a new fs -count command for
counting the number of bytes, files, and directories under a given path. Added a new RPC getContentSummary(String path) to ClientProtocol. |
HADOOP-2559 | dfs | Changed DFS block placement to allocate the first replica locally, the second off-rack, and the third intra-rack from the second. |
HADOOP-2758 | dfs | Improved DataNode CPU usage by 50% while serving data to clients. |
HADOOP-2634 | dfs | Deprecated ClientProtocol's exists() method. Use getFileInfo(String) instead. |
HADOOP-2423 | dfs | Improved FSDirectory.mkdirs(...) performance by about 50% as measured by the NNThroughputBenchmark. |
HADOOP-3124 | dfs | Made DataNode socket write timeout configurable, however the configuration variable is undocumented. |
HADOOP-2470 | dfs |
Removed open() and isDir() methods from ClientProtocol without first deprecating. Remove deprecated getContentLength() from ClientProtocol. Deprecated isDirectory in DFSClient. Use getFileStatus() instead. |
HADOOP-2854 | dfs | Removed deprecated method org.apache.hadoop.ipc.Server.getUserInfo(). |
HADOOP-2239 | dfs | Added a new FileSystem, HftpsFileSystem, that allows access to HDFS data over HTTPS. |
HADOOP-771 | dfs | Added a new method to FileSystem API, delete(path, boolean), and deprecated the previous delete(path) method. The new method recursively deletes files only if boolean is set to true. |
HADOOP-3239 | dfs | Modified org.apache.hadoop.dfs.FSDirectory.getFileInfo(String) to return null when a file is not found instead of throwing FileNotFoundException. |
HADOOP-3091 | dfs | Enhanced hadoop dfs -put command to accept multiple sources when destination is a directory. |
HADOOP-2192 | dfs | Modified hadoop dfs -mv to be closer in functionality to the Linux mv command by removing unnecessary output and return an error message when moving non existent files/directories. |
|
dfs mapred |
Added rack awareness for map tasks and moves the rack resolution logic to the
NameNode and JobTracker. The administrator can specify a
loadable class given by topology.node.switch.mapping.impl to specify the
class implementing the logic for rack resolution. The class must implement
a method - resolve(List<String> names), where names is the list of
DNS-names/IP-addresses that we want resolved. The return value is a list of
resolved network paths of the form /foo/rack, where rack is the rackID
where the node belongs to and foo is the switch where multiple racks are
connected, and so on. The default implementation of this class is packaged
along with hadoop and points to org.apache.hadoop.net.ScriptBasedMapping
and this class loads a script that can be used for rack resolution. The
script location is configurable. It is specified by
topology.script.file.name and defaults to an empty script. In the case
where the script name is empty, /default-rack is returned for all
dns-names/IP-addresses. The loadable topology.node.switch.mapping.impl provides
administrators fleixibilty to define how their site's node resolution
should happen. |
HADOOP-2063 | fs | Added a new option -ignoreCrc to fs -get and fs -copyToLocal. The option causes CRC checksums to be ignored for this command so that corrupt files may be downloaded. |
HADOOP-3001 | fs | Added a new Map/Reduce framework counters that track the number of bytes read and written to HDFS, local, KFS, and S3 file systems. |
HADOOP-2027 | fs | Added a new FileSystem method getFileBlockLocations to return the number of bytes in each block in a file via a single rpc to the NameNode. Deprecated getFileCacheHints. |
HADOOP-2839 | fs | Removed deprecated method org.apache.hadoop.fs.FileSystem.globPaths(). |
HADOOP-2563 | fs | Removed deprecated method org.apache.hadoop.fs.FileSystem.listPaths(). |
HADOOP-1593 | fs | Modified FSShell commands to accept non-default paths. Now you can commands like hadoop dfs -ls hdfs://remotehost1:port/path and hadoop dfs -ls hdfs://remotehost2:port/path without changing your Hadoop config. |
HADOOP-3048 | io | Added a new API and a default implementation to convert and restore serializations of objects to strings. |
HADOOP-3152 | io | Add a static method MapFile.setIndexInterval(Configuration, int interval) so that Map/Reduce jobs using MapFileOutputFormat can set the index interval. |
HADOOP-3073 | ipc | SocketOutputStream.close() now closes the underlying channel. This increase compatibility with java.net.Socket.getOutputStream. |
HADOOP-3041 | mapred |
Deprecated JobConf.setOutputPath and JobConf.getOutputPath. Deprecated OutputFormatBase. Added FileOutputFormat. Existing output formats extending OutputFormatBase now extend FileOutputFormat.
Added the following methods to FileOutputFormat:
|
HADOOP-3204 | mapred | Fixed ReduceTask.LocalFSMerger to handle errors and exceptions better. Prior to this all exceptions except IOException would be silently ignored. |
HADOOP-1986 | mapred |
Programs that implement the raw
Mapper or Reducer interfaces will need modification to compile with this
release. For example,
class MyMapper implements Mapper { public void map(WritableComparable key, Writable val, OutputCollector out, Reporter reporter) throws IOException { // ... } // ... }will need to be changed to refer to the parameterized type. For example:
class MyMapper implements Mapper<WritableComparable, Writable, WritableComparable, Writable> { public void map(WritableComparable key, Writable val, OutputCollector<WritableComparable, Writable> out, Reporter reporter) throws IOException { // ... } // ... }Similarly implementations of the following raw interfaces will need modification:
|
HADOOP-910 | mapred | Reducers now perform merges of shuffle data (both in-memory and on disk) while fetching map outputs. Earlier, during shuffle they used to merge only the in-memory outputs. |
HADOOP-2822 | mapred | Removed the deprecated classes org.apache.hadoop.mapred.InputFormatBase and org.apache.hadoop.mapred.PhasedFileSystem. |
HADOOP-2817 | mapred | Removed the deprecated method org.apache.hadoop.mapred.ClusterStatus.getMaxTasks() and the deprecated configuration property mapred.tasktracker.tasks.maximum. |
HADOOP-2825 | mapred | Removed the deprecated method org.apache.hadoop.mapred.MapOutputLocation.getFile(FileSystem fileSys, Path localFilename, int reduce, Progressable pingee, int timeout). |
HADOOP-2818 | mapred | Removed the deprecated methods org.apache.hadoop.mapred.Counters.getDisplayName(String counter) and org.apache.hadoop.mapred.Counters.getCounterNames(). Undeprecated the method org.apache.hadoop.mapred.Counters.getCounter(String counterName). |
HADOOP-2826 | mapred |
Changed The signature of the method
public org.apache.hadoop.streaming.UTF8ByteArrayUtils.readLIne(InputStream) to
UTF8ByteArrayUtils.readLIne(LineReader, Text). Since the old
signature is not deprecated, any code using the old method must be changed
to use the new method.
Removed the deprecated methods org.apache.hadoop.mapred.FileSplit.getFile() and org.apache.hadoop.mapred.LineRecordReader.readLine(InputStream in, OutputStream out). Made the constructor org.apache.hadoop.mapred.LineRecordReader.LineReader(InputStream in, Configuration conf) public. |
HADOOP-2819 | mapred |
Removed these deprecated methods from org.apache.hadoop.JobConf:
|
HADOOP-3093 | mapred |
Added the following public methods to org.apache.hadoop.conf.Configuration:
|
HADOOP-2399 | mapred | The key and value objects that are given to the Combiner and Reducer are now reused between calls. This is much more efficient, but the user can not assume the objects are constant. |
HADOOP-3162 | mapred |
Deprecated the public methods org.apache.hadoop.mapred.JobConf.setInputPath(Path) and
org.apache.hadoop.mapred.JobConf.addInputPath(Path).
Added the following public methods to org.apache.hadoop.mapred.FileInputFormat:
|
HADOOP-2178 | mapred |
Provided a new facility to
store job history on DFS. Cluster administrator can now provide either localFS
location or DFS location using configuration property
mapred.job.history.location to store job histroy. History will also
be logged in user specified location if the configuration property
mapred.job.history.user.location is specified.
Removed these classes and method:
Changed the signature of the public method org.apache.hadoop.mapred.DefaultJobHistoryParser.parseJobTasks(File jobHistoryFile, JobHistory.JobInfo job) to DefaultJobHistoryParser.parseJobTasks(String jobHistoryFile, JobHistory.JobInfo job, FileSystem fs). Changed the signature of the public method org.apache.hadoop.mapred.JobHistory.parseHistory(File path, Listener l) to JobHistory.parseHistoryFromFS(String path, Listener l, FileSystem fs). |
HADOOP-2055 | mapred |
Users are now provided the ability to specify what paths to ignore when processing the job input directory
(apart from the filenames that start with "_" and ".").
To do this, two new methods were defined:
|
HADOOP-2116 | mapred | Restructured the local job directory on the tasktracker. Users are provided with a job-specific shared directory (mapred-local/taskTracker/jobcache/$jobid/work) for use as scratch space, through configuration property and system property job.local.dir. The directory ../work is no longer available from the task's current working directory. |
HADOOP-1622 | mapred |
Added new command line options for hadoop jar command:
hadoop jar -files <comma seperated list of files> -libjars <comma seperated list of jars> -archives <comma seperated list of archives> where the options have these meanings:
|
HADOOP-2823 | record |
Removed the deprecated methods in
org.apache.hadoop.record.compiler.generated.SimpleCharStream:
|
HADOOP-2551 | scripts | Introduced new environment variables to allow finer grained control of Java options passed to server and client JVMs. See the new *_OPTS variables in conf/hadoop-env.sh. |
HADOOP-3099 | util |
Added a new -p option to distcp for preserving file and directory status:
-p[rbugp] Preserve status r: replication number b: block size u: user g: group p: permissionThe -p option alone is equivalent to -prbugp |
HADOOP-2821 | util | Removed the deprecated classes org.apache.hadoop.util.ShellUtil and org.apache.hadoop.util.ToolBase. |