Hadoop MapReduce Change Log
Trunk (unreleased changes)
INCOMPATIBLE CHANGES
MAPREDUCE-1287. Only call the partitioner with more than one reducer.
(cdouglas)
NEW FEATURES
MAPREDUCE-698. Per-pool task limits for the fair scheduler.
(Kevin Peterson via matei)
MAPREDUCE-1017. Compression and output splitting for Sqoop.
(Aaron Kimball via tomwhite)
MAPREDUCE-1026. Does mutual authentication of the shuffle
transfers using a shared JobTracker generated key.
(Boris Shkolnik via ddas)
MAPREDUCE-1168. Export data to databases via Sqoop. (Aaron Kimball via
tomwhite)
MAPREDUCE-744. Introduces the notion of a public distributed cache.
(Devaraj Das)
MAPREDUCE-1338. Introduces the notion of token cache using which
tokens and secrets can be sent by the Job client to the JobTracker.
(Boris Shkolnik via ddas)
HDFS-503. This patch implements an optional layer over HDFS that
implements offline erasure-coding. It can be used to reduce the
total storage requirements of HDFS. (dhruba)
IMPROVEMENTS
MAPREDUCE-1198. Alternatively schedule different types of tasks in
fair share scheduler. (Scott Chen via matei)
MAPREDUCE-707. Provide a jobconf property for explicitly assigning a job to
a pool in the Fair Scheduler. (Alan Heirich via matei)
MAPREDUCE-999. Improve Sqoop test speed and refactor tests.
(Aaron Kimball via tomwhite)
MAPREDUCE-906. Update Sqoop documentation. (Aaron Kimball via cdouglas)
MAPREDUCE-947. Added commitJob and abortJob apis to OutputCommitter.
Enhanced FileOutputCommitter to create a _SUCCESS file for successful
jobs. (Amar Kamat & Jothi Padmanabhan via acmurthy)
MAPREDUCE-1103. Added more metrics to Jobtracker. (sharad)
MAPREDUCE-1048. Add occupied/reserved slot usage summary on jobtracker UI.
(Amareshwari Sriramadasu and Hemanth Yamijala via sharad)
MAPREDUCE-1090. Modified log statement in TaskMemoryManagerThread to
include task attempt id. (yhemanth)
MAPREDUCE-1069. Implement Sqoop API refactoring. (Aaron Kimball via
tomwhite)
MAPREDUCE-1036. Document Sqoop API. (Aaron Kimball via cdouglas)
MAPREDUCE-1189. Reduce ivy console output to ovservable level (cos)
MAPREDUCE-1167. ProcfsBasedProcessTree collects rss memory information.
(Scott Chen via dhruba)
MAPREDUCE-1169. Improvements to mysqldump use in Sqoop.
(Aaron Kimball via tomwhite)
MAPREDUCE-1231. Added a new DistCp option, -skipcrccheck, so that the CRC
check during setup can be skipped. (Jothi Padmanabhan via szetszwo)
MAPREDUCE-1190. Add package documentation for BBP example.
(Tsz Wo (Nicholas) Sze via cdouglas)
MAPREDUCE-1119. When tasks fail to report status, show tasks's stack dump
before killing. (Aaron Kimball via tomwhite)
MAPREDUCE-1185. Redirect running job url to history url if job is already
retired. (Amareshwari Sriramadasu and Sharad Agarwal via sharad)
MAPREDUCE-1050. Introduce a mock object testing framework. (tomwhite)
MAPREDUCE-1084. Implementing aspects development and fault injeciton
framework for MapReduce. (Sreekanth Ramakrishnan via cos)
MAPREDUCE-1209. Move common specific part of the test TestReflectionUtils
out of mapred into common. (Todd Lipcon via tomwhite)
MAPREDUCE-967. TaskTracker does not need to fully unjar job jars.
(Todd Lipcon via tomwhite)
MAPREDUCE-1083. Changes in MapReduce so that group information of users
can be refreshed in the JobTracker via command line.
(Boris Shkolnik via ddas)
MAPREDUCE-181. Changes the job submission process to be secure.
(Devaraj Das)
MAPREDUCE-1250. Refactors the JobToken to use Common's Token interface.
(Kan Zhang via ddas)
MAPREDUCE-896. Enhance tasktracker to cleanup files that might have
been created by user tasks with non-writable permissions.
(Ravi Gummadi via yhemanth)
MAPREDUCE-372. Change org.apache.hadoop.mapred.lib.ChainMapper/Reducer
to use new mapreduce api. (Amareshwari Sriramadasu via sharad)
MAPREDUCE-1295. Add a tool in Rumen for folding and manipulating job
traces. (Dick King via cdouglas)
MAPREDUCE-1302. TrackerDistributedCacheManager deletes file
asynchronously, thus reducing task initialization delays.
(Zheng Shao via dhruba)
MAPREDUCE-1218. TaskTrackers send cpu and memory usage of
node to JobTracker. (Scott Chen via dhruba)
MAPREDUCE-847. Fix Releaseaudit warning count to zero
(Giridharan Kesavan)
MAPREDUCE-1337. Use generics in StreamJob to improve readability of that
class. (Kay Kay via cdouglas)
MAPREDUCE-361. Port terasort example to the new mapreduce API. (Amareshwari
Sriramadasu via cdouglas)
OPTIMIZATIONS
MAPREDUCE-270. Fix the tasktracker to optionally send an out-of-band
heartbeat on task-completion for better job-latency. (acmurthy)
Configuration changes:
add mapreduce.tasktracker.outofband.heartbeat
MAPREDUCE-1224. Calling "SELECT t.* from
AS t" to get meta
information is too expensive for big tables. (Spencer Ho via tomwhite)
MAPREDUCE-1186. Modified code in distributed cache to set permissions
only on required set of localized paths.
(Amareshwari Sriramadasu via yhemanth)
BUG FIXES
MAPREDUCE-1258. Fix fair scheduler event log not logging job info.
(matei)
MAPREDUCE-1089. Fix NPE in fair scheduler preemption when tasks are
scheduled but not running. (Todd Lipcon via matei)
MAPREDUCE-1014. Fix the libraries for common and hdfs. (omalley)
MAPREDUCE-1111. JT Jetty UI not working if we run mumak.sh
off packaged distribution directory. (hong tang via mahadev)
MAPREDUCE-1133. Eclipse .classpath template has outdated jar files and is
missing some new ones. (cos)
MAPREDUCE-1098. Fixed the distributed-cache to not do i/o while holding a
global lock. (Amareshwari Sriramadasu via acmurthy)
MAPREDUCE-1158. Fix JT running maps and running reduces metrics.
(sharad)
MAPREDUCE-1160. Reduce verbosity of log lines in some Map/Reduce classes
to avoid filling up jobtracker logs on a busy cluster.
(Ravi Gummadi and Hong Tang via yhemanth)
MAPREDUCE-1153. Fix tasktracker metrics when trackers are decommissioned.
(sharad)
MAPREDUCE-1128. Fix MRUnit to prohibit iterating over values twice. (Aaron
Kimball via cdouglas)
MAPREDUCE-665. Move libhdfs to HDFS subproject. (Eli Collins via dhruba)
MAPREDUCE-1196. Fix FileOutputCommitter to use the deprecated cleanupJob
api correctly. (acmurthy)
MAPREDUCE-1244. Fix eclipse-plugin's build dependencies. (gkesavan)
MAPREDUCE-1140. Fix DistributedCache to not decrement reference counts for
unreferenced files in error conditions.
(Amareshwari Sriramadasu via yhemanth)
MAPREDUCE-1245. Fix TestFairScheduler failures by instantiating lightweight
Jobtracker. (sharad)
MAPREDUCE-1260. Update Eclipse configuration to match changes to Ivy
configuration. (Edwin Chan via cos)
MAPREDUCE-1152. Distinguish between failed and killed tasks in
JobTrackerInstrumentation. (Sharad Agarwal via cdouglas)
MAPREDUCE-1285. In DistCp.deleteNonexisting(..), get class from the
parameter instead of using FileStatus.class. (Peter Romianowski via
szetszwo)
MAPREDUCE-1251. c++ utils doesn't compile. (Eli Collins via tomwhite)
MAPREDUCE-1148. SQL identifiers are a superset of Java identifiers.
(Aaron Kimball via tomwhite)
MAPREDUCE-1294. Build fails to pull latest hadoop-core-* artifacts (cos)
MAPREDUCE-1213. TaskTrackers restart is faster because it deletes
distributed cache directory asynchronously. (Zheng Shao via dhruba)
MAPREDUCE-1146. Sqoop dependencies break Eclipse build on Linux.
(Aaron Kimball via tomwhite)
MAPREDUCE-1174. Sqoop improperly handles table/column names which are
reserved sql words. (Aaron Kimball via tomwhite)
MAPREDUCE-1265. The task attempt error log prints the name of the
tasktracker machine. (Scott Chen via dhruba)
MAPREDUCE-1201. ProcfsBasedProcessTree collects CPU usage information.
(Scott Chen via dhruba)
MAPREDUCE-1326. fi tests don't use fi-site.xml (cos)
MAPREDUCE-1235. Fix a MySQL timestamp incompatibility in Sqoop. (Aaron
Kimball via cdouglas)
MAPREDUCE-1165. Replace non-portable function name with C99 equivalent.
(Allen Wittenauer via cdouglas)
MAPREDUCE-1331. Fixes a typo in a testcase (Devaraj Das)
MAPREDUCE-1293. AutoInputFormat doesn't work with non-default FileSystems.
(Andrew Hitchcock via tomwhite)
MAPREDUCE-1131. Using profilers other than hprof can cause JobClient to
report job failure. (Aaron Kimball via tomwhite)
MAPREDUCE-1155. Streaming tests swallow exceptions.
(Todd Lipcon via tomwhite)
MAPREDUCE-1310. CREATE TABLE statements for Hive do not correctly specify
delimiters. (Aaron Kimball via tomwhite)
MAPREDUCE-1212. Mapreduce contrib project ivy dependencies are not included
in binary target. (Aaron Kimball via tomwhite)
MAPREDUCE-1388. Move the HDFS RAID package from HDFS to MAPREDUCE.
(Eli Collins via dhruba)
MAPREDUCE-1313. Fix NPE in Sqoop when table with null fields uses escape
during import. (Aaron Kimball via cdouglas)
MAPREDUCE-1327. Fix Sqoop handling of Oracle timezone with timestamp data
types in import. (Leonid Furman via cdouglas)
Release 0.21.0 - Unreleased
INCOMPATIBLE CHANGES
MAPREDUCE-516. Fix the starvation problem in the Capacity Scheduler
when running High RAM Jobs. (Arun Murthy via yhemanth)
MAPREDUCE-358. Change org.apache.hadoop.examples. AggregateWordCount
and org.apache.hadoop.examples.AggregateWordHistogram to use new
mapreduce api. (Amareshwari Sriramadasu via sharad)
MAPREDUCE-245. Change Job and jobcontrol classes to use the List interface
rather than ArrayList in APIs. (Tom White via cdouglas)
MAPREDUCE-766. Enhanced list-blacklisted-trackers to display reasons
for blacklisting a node. (Sreekanth Ramakrishnan via yhemanth)
MAPREDUCE-817. Add a cache for retired jobs with minimal job info and
provide a way to access history file url. (sharad)
MAPREDUCE-711. Moved Distributed Cache from Common to Map/Reduce
project. (Vinod Kumar Vavilapalli via yhemanth)
MAPREDUCE-895. Per the contract elucidated in HADOOP-6201, throw
FileNotFoundException from FileSystem::listStatus rather than returning
null. (Jakob Homan via cdouglas)
MAPREDUCE-479. Provide full task id to map output servlet rather than the
reduce id, only. (Jiaqi Tan via cdouglas)
MAPREDUCE-873. Simplify job recovery. Incomplete jobs are resubmitted on
jobtracker restart. Removes a public constructor in JobInProgress. (sharad)
HADOOP-6230. Moved process tree and memory calculator related classes from
Common to Map/Reduce. (Vinod Kumar Vavilapalli via yhemanth)
MAPREDUCE-157. Refactor job history APIs and change the history format to
JSON. (Jothi Padmanabhan via sharad)
MAPREDUCE-849. Rename configuration properties. (Amareshwari Sriramadasu
via sharad)
NEW FEATURES
MAPREDUCE-706. Support for FIFO pools in the fair scheduler.
(Matei Zaharia)
MAPREDUCE-546. Provide sample fair scheduler config file in conf/ and use
it by default if no other config file is specified. (Matei Zaharia)
MAPREDUCE-551. Preemption support in the Fair Scheduler. (Matei Zaharia)
HADOOP-5887. Sqoop should create tables in Hive metastore after importing
to HDFS. (Aaron Kimball via tomwhite)
MAPREDUCE-567. Add a new example MR that always fails. (Philip Zeyliger
via tomwhite)
MAPREDUCE-211. Provides ability to run a health check script on the
tasktracker nodes and blacklist nodes if they are unhealthy.
(Sreekanth Ramakrishnan via yhemanth)
MAPREDUCE-637. Add an example, distbbp, which able to compute the n th bit
of Pi for some large n. (szetszwo)
MAPREDUCE-532. Provide a way to limit the number of used slots
per queue in the capacity scheduler.
(Rahul Kumar Singh via yhemanth)
MAPREDUCE-467. Provide ability to collect statistics about total tasks
and succeeded tasks in different time windows. (sharad)
MAPREDUCE-740. Log a job-summary at the end of a job, while allowing it
to be configured to use a custom appender if desired. (acmurthy)
MAPREDUCE-705. User-configurable quote and delimiter characters for Sqoop
records and record reparsing. (Aaron Kimball via tomwhite)
MAPREDUCE-814. Provide a way to configure completed job history files
to be on HDFS. (sharad)
MAPREDUCE-800. MRUnit should support the new API. (Aaron Kimball via
tomwhite)
MAPREDUCE-798. MRUnit should be able to test a succession of MapReduce
passes. (Aaron Kimball via tomwhite)
MAPREDUCE-768. Provide an option to dump jobtracker configuration in JSON
format to standard output. (V.V.Chaitanya Krishna via yhemanth)
MAPREDUCE-824. Add support for a hierarchy of queues in the capacity
scheduler. (Rahul Kumar Singh via yhemanth)
MAPREDUCE-751. Add Rumen, a tool for extracting statistics from job tracker
logs and generating job traces for simulation and analysis. (Dick King via
cdouglas)
MAPREDUCE-938. Postgresql support for Sqoop. (Aaron Kimball via tomwhite)
MAPREDUCE-830. Add support for splittable compression to TextInputFormats.
(Abdul Qadeer via cdouglas)
MAPREDUCE-861. Add support for hierarchical queues in the Map/Reduce
framework. (Rahul Kumar Singh via yhemanth)
MAPREDUCE-776. Add Gridmix, a benchmark processing Rumen traces to simulate
a measured mix of jobs on a cluster. (cdouglas)
MAPREDUCE-862. Enhance JobTracker UI to display hierarchical queues.
(V.V.Chaitanya Krishna via yhemanth)
MAPREDUCE-777. Brand new apis to track and query jobs as a
replacement for JobClient. (Amareshwari Sriramadasu via acmurthy)
MAPREDUCE-775. Add native and streaming support for Vertica as an input
or output format taking advantage of parallel read and write properties of
the DBMS. (Omer Trajman via ddas)
MAPREDUCE-679. XML-based metrics as JSP servlet for JobTracker.
(Aaron Kimball via tomwhite)
MAPREDUCE-980. Modify JobHistory to use Avro for serialization. (cutting)
MAPREDUCE-728. Add Mumak, a Hadoop map/reduce simulator. (Arun C Murthy,
Tamas Sarlos, Anirban Dasgupta, Guanying Wang, and Hong Tang via cdouglas)
IMPROVEMENTS
MAPREDUCE-816. Rename "local" mysql import to "direct" in Sqoop.
(Aaron Kimball via matei)
HADOOP-5967. Sqoop should only use a single map task. (Aaron Kimball via
tomwhite)
HADOOP-5968. Sqoop should only print a warning about mysql import speed
once. (Aaron Kimball via tomwhite)
MAPREDUCE-463. Makes job setup and cleanup tasks as optional.
(Amareshwari Sriramadasu via sharad)
MAPREDUCE-502. Allow jobtracker to be configured with zero completed jobs
in memory. (Amar Kamat via sharad)
MAPREDUCE-416. Moves the history file to a "done" folder whenever a job
completes. (Amar Kamat via ddas)
MAPREDUCE-646. Increase srcfilelist replication number in dictcp job.
(Ravi Gummadi via szetszwo)
HADOOP-6106. Updated hadoop-core and test jars from hudson trunk
build #12. (Giridharan Kesavan)
MAPREDUCE-642. A option to distcp that allows preserving the full
source path of a file in the specified destination directory.
(Rodrigo Schmidt via dhruba)
MAPREDUCE-686. Move TestSpeculativeExecution.Fake* into a separate class
so that it can be used by other tests. (Jothi Padmanabhan via sharad)
MAPREDUCE-625. Modify TestTaskLimits to improve execution time.
(Jothi Padmanabhan via sharad)
MAPREDUCE-465. Deprecate o.a.h.mapred.lib.MultithreadedMapRunner and add
test for o.a.h.mapreduce.lib.MultithreadedMapper.
(Amareshwari Sriramadasu via sharad)
MAPREDUCE-692. Make Hudson run Sqoop unit tests.
(Aaron Kimball via tomwhite)
MAPREDUCE-701. Improves the runtime of the TestRackAwareTaskPlacement
by making it a unit test. (Jothi Padmanabhan via ddas)
MAPREDUCE-674. Sqoop should allow a "where" clause to avoid having to
export entire tables. (Kevin Weil via tomwhite)
MAPREDUCE-675. Sqoop should allow user-defined class and package names.
(Aaron Kimball via tomwhite)
MAPREDUCE-371. Change KeyFieldBasedComparator and KeyFieldBasedPartitioner
to use new api. (Amareshwari Sriramadasu via sharad)
MAPREDUCE-713. Sqoop has some superfluous imports. (Aaron Kimball via
tomwhite)
MAPREDUCE-623. Resolve javac warnings in mapreduce. (Jothi Padmanabhan
via sharad)
MAPREDUCE-655. Change KeyValueLineRecordReader and KeyValueTextInputFormat
to use new mapreduce api. (Amareshwari Sriramadasu via sharad)
MAPREDUCE-632. Merge TestCustomOutputCommitter with
TestCommandLineJobSubmission. (Jothi Padmanabhan via sharad)
MAPREDUCE-710. Sqoop should read and transmit passwords in a more secure
manner. (Aaron Kimball via tomwhite)
MAPREDUCE-627. Improves execution time of TestTrackerBlacklistAcrossJobs.
(Jothi Padmanabhan via ddas)
MAPREDUCE-630. Improves execution time of TestKillCompletedJob.
(Jothi Padmanabhan via ddas)
MAPREDUCE-626. Improves the execution time of TestLostTracker.
(Jothi Padmanabhan via ddas)
MAPREDUCE-353. Makes the shuffle read and connection timeouts
configurable. (Ravi Gummadi via ddas)
MAPREDUCE-739. Allow relative paths to be created in archives. (Mahadev
Konar via cdouglas)
MAPREDUCE-772. Merge HADOOP-4010 changes to LineRecordReader into mapreduce
package. (Abdul Qadeer via cdouglas)
MAPREDUCE-785. Separate sub-test of TestReduceFetch to be included in
MR-670. (Jothi Padmanabhan via cdouglas)
MAPREDUCE-784. Modify TestUserDefinedCounters to use LocalJobRunner
instead of MiniMR. (Jothi Padmanabhan via sharad)
HADOOP-6160. Fix releaseaudit target to run on specific directories.
(gkesavan)
MAPREDUCE-782. Use PureJavaCrc32 in SpillRecord. (Todd Lipcon via
szetszwo)
MAPREDUCE-369. Change org.apache.hadoop.mapred.lib.MultipleInputs to
use new api. (Amareshwari Sriramadasu via sharad)
MAPREDUCE-373. Change org.apache.hadoop.mapred.lib.FieldSelectionMapReduce
to use new api. (Amareshwari Sriramadasu via sharad)
MAPREDUCE-628. Improves the execution time of TestJobInProgress.
(Jothi Padmanabhan via ddas)
MAPREDUCE-793. Creates a new test that consolidates a few tests to
include in the commit-test list. (Jothi Padmanabhan via ddas)
MAPREDUCE-797. Adds combiner support to MRUnit MapReduceDriver.
(Aaron Kimball via johan)
MAPREDUCE-656. Change org.apache.hadoop.mapred.SequenceFile* classes
to use new mapreduce api. (Amareshwari Sriramadasu via sharad)
MAPREDUCE-670. Creates ant target for 10 mins patch test build.
(Jothi Padmanabhan via gkesavan)
MAPREDUCE-375. Change org.apache.hadoop.mapred.lib.NLineInputFormat
and org.apache.hadoop.mapred.MapFileOutputFormat to use new api.
(Amareshwari Sriramadasu via ddas)
MAPREDUCE-779. Added node health failure counts into
JobTrackerStatistics. (Sreekanth Ramakrishnan via yhemanth)
MAPREDUCE-789. Oracle support for Sqoop. (Aaron Kimball via tomwhite)
MAPREDUCE-842. Setup secure permissions for localized job files,
intermediate outputs and log files on tasktrackers.
(Vinod Kumar Vavilapalli via yhemanth)
MAPREDUCE-478. Allow map and reduce jvm parameters, environment variables
and ulimit to be set separately.
Configuration changes:
add mapred.map.child.java.opts
add mapred.reduce.child.java.opts
add mapred.map.child.env
add mapred.reduce.child.ulimit
add mapred.map.child.env
add mapred.reduce.child.ulimit
deprecated mapred.child.java.opts
deprecated mapred.child.env
deprecated mapred.child.ulimit
(acmurthy)
MAPREDUCE-767. Remove the dependence on the CLI 2.0 snapshot.
(Amar Kamat via omalley)
MAPREDUCE-712. Minor efficiency tweaks to RandomTextWriter. (cdouglas)
MAPREDUCE-870. Remove the job retire thread and the associated
config parameters. (sharad)
MAPREDUCE-749. Make Sqoop unit tests more Hudson-friendly.
(Aaron Kimball via tomwhite)
MAPREDUCE-874. Rename the PiEstimator example to QuasiMonteCarlo.
(szetszwo)
MAPREDUCE-336. Allow logging level of map/reduce tasks to be configurable.
Configuration changes:
add mapred.map.child.log.level
add mapred.reduce.child.log.level
(acmurthy)
MAPREDUCE-355. Update mapred.join package to use the new API. (Amareshwari
Sriramadasu via cdouglas)
HADOOP-6184. Updated hadoop common and test jars to get the new API
in Configuration for dumping in JSON format from Hudson trunk build #68.
(yhemanth)
MAPREDUCE-476. Extend DistributedCache to work locally (LocalJobRunner).
(Philip Zeyliger via tomwhite)
MAPREDUCE-750. Extensible ConnManager factory API. (Aaron Kimball via
tomwhite)
MAPREDUCE-825. JobClient completion poll interval of 5s causes slow tests
in local mode. (Aaron Kimball via tomwhite)
MAPREDUCE-910. Support counters in MRUnit. (Aaron Kimball via cdouglas)
MAPREDUCE-788. Update gridmix2 to use the new API (Amareshwari Sriramadasu
via cdouglas)
MAPREDUCE-875. Make DBRecordReader execute queries lazily. (Aaron Kimball
via enis)
MAPREDUCE-318. Modularizes the shuffle code. (Jothi Padmanabhan and
Arun Murthy via ddas)
MAPREDUCE-936. Allow a load difference for fairshare scheduler.
(Zheng Shao via dhruba)
MAPREDUCE-370. Update MultipleOutputs to use the API, merge funcitonality
of MultipleOutputFormat. (Amareshwari Sriramadasu via cdouglas)
MAPREDUCE-898. Changes DistributedCache to use the new API.
(Amareshwari Sriramadasu via ddas)
MAPREDUCE-876. Sqoop import of large tables can time out.
(Aaron Kimball via tomwhite)
MAPREDUCE-918. Test hsqldb server should be memory-only.
(Aaron Kimball via tomwhite)
MAPREDUCE-144. Includes dump of the process tree in task diagnostics when
a task is killed due to exceeding memory limits.
(Vinod Kumar Vavilapalli via yhemanth)
MAPREDUCE-945. Modifies MRBench and TestMapRed to use ToolRunner so that
options such as queue name can be passed via command line.
(Sreekanth Ramakrishnan via yhemanth)
MAPREDUCE-963. Deprecate o.a.h.mapred.FileAlreadyExistsException and
replace it with o.a.h.fs.FileAlreadyExistsException. (Boris Shkolnik
via szetszwo)
MAPREDUCE-960. Remove an unnecessary intermediate copy and obsolete API
from KeyValueLineRecordReader. (cdouglas)
MAPREDUCE-930. Modify Rumen to resolve paths in the canonical way, rather
than defaulting to the local filesystem. (cdouglas)
MAPREDUCE-944. Extend the LoadManager API of the fair-share scheduler
to support regulating tasks for a job based on resources currently in use
by that job. (dhruba)
MAPREDUCE-973. Move FailJob and SleepJob from examples to test. (cdouglas
via omalley)
MAPREDUCE-966. Modify Rumen to clean up interfaces and simplify integration
with other tools. (Hong Tang via cdouglas)
MAPREDUCE-856. Setup secure permissions for distributed cache files.
(Vinod Kumar Vavilapalli via yhemanth)
MAPREDUCE-885. More efficient SQL queries for DBInputFormat. (Aaron Kimball
via enis)
MAPREDUCE-284. Enables ipc.client.tcpnodelay in Tasktracker's Child.
(Ravi Gummadi via sharad)
MAPREDUCE-907. Sqoop should use more intelligent splits. (Aaron Kimball
via tomwhite)
MAPREDUCE-916. Split the documentation to match the project split.
(Corinne Chandel via omalley)
MAPREDUCE-649. Validate a copy by comparing the source and destination
checksums in distcp. Also adds an intra-task retry mechanism for errors
detected during the copy. (Ravi Gummadi via cdouglas)
MAPREDUCE-654. Add a -dryrun option to distcp printing a summary of the
file data to be copied, without actually performing the copy. (Ravi Gummadi
via cdouglas)
MAPREDUCE-664. Display the number of files deleted by distcp when the
-delete option is specified. (Ravi Gummadi via cdouglas)
MAPREDUCE-781. Let the name of distcp jobs be configurable. (Venkatesh S
via cdouglas)
MAPREDUCE-975. Add an API in job client to get the history file url for
a given job id. (sharad)
MAPREDUCE-905. Add Eclipse launch tasks for MapReduce. (Philip Zeyliger
via tomwhite)
MAPREDUCE-277. Makes job history counters available on the job history
viewers. (Jothi Padmanabhan via ddas)
MAPREDUCE-893. Provides an ability to refresh queue configuration
without restarting the JobTracker.
(Vinod Kumar Vavilapalli and Rahul Kumar Singh via yhemanth)
MAPREDUCE-1011. Add build.properties to svn and git ignore. (omalley)
MAPREDUCE-954. Change Map-Reduce context objects to be interfaces.
(acmurthy)
MAPREDUCE-639. Change Terasort example to reflect the 2009 updates.
(omalley)
MAPREDUCE-1063. Document gridmix benchmark. (cdouglas)
MAPREDUCE-931. Use built-in interpolation classes for making up task
runtimes in Rumen. (Dick King via cdouglas)
MAPREDUCE-1012. Mark Context interfaces as public evolving. (Tom White via
cdouglas)
MAPREDUCE-971. Document use of distcp when copying to s3, managing timeouts
in particular. (Aaron Kimball via cdouglas)
HDFS-663. DFSIO for append. (shv)
HDFS-641. Move all of the components that depend on map/reduce to
map/reduce. (omalley)
HADOOP-5107. Use Maven ant tasks to publish artifacts. (Giridharan Kesavan
via omalley)
MAPREDUCE-1229. Allow customization of job submission policy in Mumak.
(Hong Tang via cdouglas)
MAPREDUCE-1317. Reduce the memory footprint of Rumen objects by interning
host Strings. (Hong Tang via cdouglas)
MAPREDUCE-1097. Add support for Vertica 3.5 to its contrib module. (Omer
Trajman via cdouglas)
BUG FIXES
MAPREDUCE-878. Rename fair scheduler design doc to
fair-scheduler-design-doc.tex and add Apache license header (matei)
MAPREDUCE-703. Sqoop requires dependency on hsqldb in ivy.
(Aaron Kimball via matei)
HADOOP-4687. MapReduce is split from Hadoop Core. It is a subproject under
Hadoop (Owen O'Malley)
HADOOP-6096. Fix Eclipse project and classpath files following project
split. (tomwhite)
MAPREDUCE-419. Reconcile mapred.userlog.limit.kb defaults in configuration
and code. (Philip Zeyliger via cdouglas)
MAPREDUCE-2. Fixes a bug in KeyFieldBasedPartitioner in handling empty
keys. (Amar Kamat via sharad)
MAPREDUCE-130. Delete the jobconf copy from the log directory of the
JobTracker when the job is retired. (Amar Kamat via sharad)
MAPREDUCE-657. Fix hardcoded filesystem problem in CompletedJobStatusStore.
(Amar Kamat via sharad)
MAPREDUCE-179. Update progress in new RecordReaders. (cdouglas)
MAPREDUCE-658. Replace NPE in distcp with a meaningful error message when
the source path does not exist. (Ravi Gummadi via cdouglas)
MAPREDUCE-671. Update ignore list to include untracked, generated
build artifacts and config files. (cdouglas)
MAPREDUCE-433. Use more reliable counters in TestReduceFetch. (cdouglas)
MAPREDUCE-124. Fix a bug in failure handling of abort task of
OutputCommiter. (Amareshwari Sriramadasu via sharad)
MAPREDUCE-673. Sqoop depends on commons-cli, which is not in its ivy.xml.
(Kevin Weil via tomwhite)
MAPREDUCE-694. Fix to add jsp-api jars to capacity-scheduler classpath.
(Giridharan Kesavan)
MAPREDUCE-690. Sqoop's test "hive" script needs to be executable.
(Aaron Kimball via tomwhite)
MAPREDUCE-702. Fix eclipse-plugin jar target (Giridharan Kesavan)
MAPREDUCE-522. Replace TestQueueCapacities with simpler test case to
test integration between capacity scheduler and MR framework.
(Sreekanth Ramakrishnan via yhemanth)
MAPREDUCE-683. Fixes an initialization problem in the JobHistory.
The initialization of JobHistoryFilesManager is now done in the
JobHistory.init call. (Amar Kamat via ddas)
MAPREDUCE-708. Fixes a bug to allow updating the reason for
blacklisting a node on the JobTracker UI.
(Sreekanth Ramakrishnan via yhemanth)
MAPREDUCE-709. Fixes message displayed for a blacklisted node where
the reason for blacklisting is due to the health check script
timing out. (Sreekanth Ramakrishnan via yhemanth)
MAPREDUCE-676. Existing diagnostic rules fail for MAP ONLY jobs.
(Suhas Gogate via tomwhite)
MAPREDUCE-722. Fixes a bug with tasktracker reservations for
high memory jobs in capacity scheduler.
(Vinod Kumar Vavilapalli via yhemanth)
HADOOP-6090. Updates gridmix script to use new mapreduce api output
format. (Amareshwari Sriramadasu via sharad)
MAPREDUCE-732. Removed spurious log statements in the node
blacklisting logic. (Sreekanth Ramakrishnan via yhemanth)
MAPREDUCE-685. Sqoop will fail with OutOfMemory on large tables
using mysql. (Aaron Kimball via tomwhite)
MAPREDUCE-734. Fix a ConcurrentModificationException in unreserving
unused reservations for a job when it completes.
(Arun Murthy and Sreekanth Ramakrishnan via yhemanth)
MAPREDUCE-733. Fix a RuntimeException while unreserving trackers
that are blacklisted for a job.
(Arun Murthy and Sreekanth Ramakrishnan via yhemanth)
MAPREDUCE-677. Fix timeout in TestNodeRefresh. (Amar Kamat via
sharad)
MAPREDUCE-153. Fix timeout in TestJobInProgressListener. (Amar
Kamat via sharad)
MAPREDUCE-742. Fix output messages and java comments in the Pi related
examples. (szetszwo)
MAPREDUCE-565. Fix partitioner to work with new API. (Owen O'Malley via
cdouglas)
MAPREDUCE-680. Fix so MRUnit can handle reuse of Writable objects.
(Aaron Kimball via johan)
MAPREDUCE-18. Puts some checks for cross checking whether a reduce
task gets the correct shuffle data. (Ravi Gummadi via ddas)
MAPREDUCE-771. Fix scheduling of setup and cleanup tasks to use
free slots instead of tasks for scheduling. (yhemanth)
MAPREDUCE-717. Fixes some corner case issues in speculative
execution heuristics. (Devaraj Das)
MAPREDUCE-716. Make DBInputFormat work with Oracle. (Aaron Kimball
via tomwhite)
MAPREDUCE-735. Fixes a problem in the KeyFieldHelper to do with
the end index for some inputs (Amar Kamat via ddas)
MAPREDUCE-682. Removes reservations on tasktrackers which are
blacklisted. (Sreekanth Ramakrishnan via yhemanth)
MAPREDUCE-743. Fixes a problem to do with progress reporting
in the map phase. (Ravi Gummadi via ddas)
MAPREDUCE-765. Eliminate the deprecated warnings introduced by H-5438.
(He Yongqiang via szetszwo)
MAPREDUCE-383. Fix a bug in Pipes combiner due to bytes count not
getting reset after the spill. (Christian Kunz via sharad)
MAPREDUCE-809. Fix job-summary logs to correctly record status of FAILED
and KILLED jobs. (acmurthy)
MAPREDUCE-792. Fix unchecked warnings in DBInputFormat. (Aaron Kimball
via szetszwo)
MAPREDUCE-760. Fix a timing issue in TestNodeRefresh. (Amar Kamat via
sharad)
MAPREDUCE-40. Keep memory management backwards compatible for job
configuration parameters and limits. (Rahul Kumar Singh via yhemanth)
MAPREDUCE-587. Fixes a OOM issue in TestStreamingExitStatus.
(Amar Kamat via ddas)
MAPREDUCE-408. Fixes an assertion problem in TestKillSubProcesses
(Ravi Gummadi via ddas)
MAPREDUCE-659. Fix gridmix2 compilation. (Giridharan Kesavan)
MAPREDUCE-796. Fixes a ClassCastException in an exception log in
MultiThreadedMapRunner. (Amar Kamat via ddas)
MAPREDUCE-808. Fixes a serialization problem in TypedBytes.
(Klaas Bosteels via ddas)
MAPREDUCE-845. Fix a findbugs heap size problem in build.xml and add
a new property findbugs.heap.size. (Lee Tucker via szetszwo)
MAPREDUCE-838. Fixes a problem in the way commit of task outputs
happens. The bug was that even if commit failed, the task would
be declared as successful. (Amareshwari Sriramadasu via ddas)
MAPREDUCE-813. Updates Streaming and M/R tutorial documents.
(Corinne Chandel via ddas)
MAPREDUCE-805. Fixes some deadlocks in the JobTracker due to the fact
the JobTracker lock hierarchy wasn't maintained in some JobInProgress
method calls. (Amar Kamat via ddas)
MAPREDUCE-799. Fixes so all of the MRUnit self-tests run.
(Aaron Kimball via johan)
MAPREDUCE-848. Fixes a problem to do with TestCapacityScheduler
failing (Amar Kamat via ddas)
MAPREDUCE-840. DBInputFormat leaves open transaction.
(Aaron Kimball via tomwhite)
MAPREDUCE-859. Adds Avro and its dependencies required by Hadoop
common. (Ravi Gummadi via sharad)
MAPREDUCE-867. Fix ivy conf to look for avro jar from maven repo.
(Giridharan Kesavan)
MAPREDUCE-877. Added avro as a dependency to contrib ivy settings.
(Tsz Wo (Nicholas) Sze via yhemanth)
MAPREDUCE-852. In build.xml, remove the Main-Class, which is incorrectly
set in tools, and rename the target "tools-jar" to "tools". (szetszwo)
MAPREDUCE-773. Sends progress reports for compressed gzip inputs in maps.
Fixes a native direct buffer leak in LineRecordReader classes.
(Hong Tang and ddas)
MAPREDUCE-832. Reduce number of warning messages printed when
deprecated memory variables are used. (Rahul Kumar Singh via yhemanth)
MAPREDUCE-745. Fixes a testcase problem to do with generation of JobTracker
IDs. (Amar Kamat via ddas)
MAPREDUCE-834. Enables memory management on tasktrackers when old
memory management parameters are used in configuration.
(Sreekanth Ramakrishnan via yhemanth)
MAPREDUCE-818. Fixes Counters#getGroup API. (Amareshwari Sriramadasu
via sharad)
MAPREDUCE-807. Handles the AccessControlException during the deletion of
mapred.system.dir in the JobTracker. The JobTracker will bail out if it
encounters such an exception. (Amar Kamat via ddas)
MAPREDUCE-430. Fix a bug related to task getting stuck in case of
OOM error. (Amar Kamat via ddas)
MAPREDUCE-871. Fix ownership of Job/Task local files to have correct
group ownership according to the egid of the tasktracker.
(Vinod Kumar Vavilapalli via yhemanth)
MAPREDUCE-911. Fix a bug in TestTaskFail related to speculative
execution. (Amareshwari Sriramadasu via sharad)
MAPREDUCE-687. Fix an assertion in TestMiniMRMapRedDebugScript.
(Amareshwari Sriramadasu via sharad)
MAPREDUCE-924. Fixes the TestPipes testcase to use Tool.
(Amareshwari Sriramadasu via sharad)
MAPREDUCE-903. Add Avro jar to eclipse classpath.
(Philip Zeyliger via tomwhite)
MAPREDUCE-943. Removes a testcase in TestNodeRefresh that doesn't make
sense in the new Job recovery model. (Amar Kamat via ddas)
MAPREDUCE-764. TypedBytesInput's readRaw() does not preserve custom type
codes. (Klaas Bosteels via tomwhite)
HADOOP-6243. Fixes a NullPointerException in handling deprecated keys.
(Sreekanth Ramakrishnan via yhemanth)
MAPREDUCE-968. NPE in distcp encountered when placing _logs directory on
S3FileSystem. (Aaron Kimball via tomwhite)
MAPREDUCE-826. harchive doesn't use ToolRunner / harchive returns 0 even
if the job fails with exception (koji Noguchi via mahadev)
MAPREDUCE-839. unit test TestMiniMRChildTask fails on mac os-x (hong tang
via mahadev)
MAPREDUCE-112. Add counters for reduce input, output records to the new API.
(Jothi Padmanabhan via cdouglas)
MAPREDUCE-648. Fix two distcp bugs: (1) it should not launch a job if all
src paths are directories, and (2) it does not skip copying when updating
a single file. (Ravi Gummadi via szetszwo)
MAPREDUCE-946. Fix a regression in LineRecordReader where the
maxBytesToConsume parameter is not set correctly. (cdouglas)
MAPREDUCE-977. Missing jackson jars from Eclipse template. (tomwhite)
MAPREDUCE-988. Fix a packaging issue in the contrib modules. (Hong Tang via
cdouglas)
MAPREDUCE-971. distcp does not always remove distcp.tmp.dir. (Aaron Kimball
via tomwhite)
MAPREDUCE-995. Fix a bug in JobHistory where tasks completing after the job
is closed cause a NPE. (Jothi Padmanabhan via cdouglas)
MAPREDUCE-953. Fix QueueManager to dump queue configuration in JSON format.
(V.V. Chaitanya Krishna via yhemanth)
MAPREDUCE-645. Prevent distcp from running a job when the destination is a
file, but the source is not. (Ravi Gummadi via cdouglas)
MAPREDUCE-1002. Flushed writer in JobQueueClient so queue information is
printed correctly. (V.V. Chaitanya Krishna via yhemanth)
MAPREDUCE-1003. Fix compilation problem in eclipse plugin when
eclipse.home is set. (Ravi Gummadi via yhemanth)
MAPREDUCE-941. Vaidya script fails on Solaris. (Chad Metcalf
via tomwhite)
MAPREDUCE-923. Sqoop classpath breaks for jar files with a
plus sign in their names. (Aaron Kimball via tomwhite)
MAPREDUCE-912. Add and standardize Apache license headers. (Chad Metcalf
via cdouglas)
MAPREDUCE-1022. Fix compilation of vertica testcases. (Vinod Kumar
Vavilapalli via acmurthy)
MAPREDUCE-1000. Handle corrupt history files in JobHistory.initDone().
(Jothi Padmanabhan via sharad)
MAPREDUCE-1028. Fixed number of slots occupied by cleanup tasks to one
irrespective of slot size for the job.
(Ravi Gummadi via yhemanth)
MAPREDUCE-964. Fixed start and finish times of TaskStatus to be
consistent, thereby fixing inconsistencies in metering tasks.
(Sreekanth Ramakrishnan via yhemanth)
MAPREDUCE-1076. Deprecate ClusterStatus and add javadoc in ClusterMetrics.
(Amareshwari Sriramadasu via sharad)
MAPREDUCE-979. Fixed JobConf APIs related to memory parameters to return
values of new configuration variables when deprecated variables are
disabled. (Sreekanth Ramakrishnan via yhemanth)
MAPREDUCE-1030. Modified scheduling algorithm to return a map and reduce
task per heartbeat in the capacity scheduler.
(Rahul Kumar Singh via yhemanth)
MAPREDUCE-1071. Use DataInputStream rather than FSDataInputStream in the
JobHistory EventReader. (Hong Tang via cdouglas)
MAPREDUCE-986. Fix Rumen to work with truncated task lines. (Dick King via
cdouglas)
MAPREDUCE-1029. Fix failing TestCopyFiles by restoring the unzipping of
HDFS webapps from the hdfs jar. (Aaron Kimball and Jothi Padmanabhan via
cdouglas)
MAPREDUCE-769. Make findbugs and javac warnings to zero.
(Amareshwari Sriramadasu via sharad)
MAPREDUCE-1104. Initialize RecoveryManager in JobTracker cstr called by
Mumak. (Hong Tang via cdouglas)
MAPREDUCE-1061. Add unit test validating byte specifications for gridmix
jobs. (cdouglas)
MAPREDUCE-1077. Fix Rumen so that truncated tasks do not mark the job as
successful. (Dick King via cdouglas)
MAPREDUCE-1041. Make TaskInProgress::taskStatuses map package-private.
(Jothi Padmanabhan via cdouglas)
MAPREDUCE-1070. Prevent a deadlock in the fair scheduler servlet.
(Todd Lipcon via cdouglas)
MAPREDUCE-1086. Setup Hadoop logging environment for tasks to point to
task related parameters. (Ravi Gummadi via yhemanth)
MAPREDUCE-1105. Remove max limit configuration in capacity scheduler in
favor of max capacity percentage thus allowing the limit to go over
queue capacity. (Rahul Kumar Singh via yhemanth)
MAPREDUCE-1016. Make the job history log format JSON. (cutting)
MAPREDUCE-1038. Weave Mumak aspects only if related files have changed.
(Aaron Kimball via cdouglas)
MAPREDUCE-1037. Continue running contrib tests if Sqoop tests fail. (Aaron
Kimball via cdouglas)
MAPREDUCE-1163. Remove unused, hard-coded paths from libhdfs. (Allen
Wittenauer via cdouglas)
MAPREDUCE-962. Fix a NullPointerException while killing task process
trees. (Ravi Gummadi via yhemanth)
MAPREDUCE-1177. Correct setup/cleanup inversion in
JobTracker::getTaskReports. (Vinod Kumar Vavilapalli via cdouglas)
MAPREDUCE-1178. Fix ClassCastException in MultipleInputs by adding
a DelegatingRecordReader. (Amareshwari Sriramadasu and Jay Booth
via sharad)
MAPREDUCE-1068. Fix streaming job to show proper message if file is
is not present. (Amareshwari Sriramadasu via sharad)
MAPREDUCE-1147. Add map output counters to new API. (Amar Kamat via
cdouglas)
MAPREDUCE-915. The debug scripts are run as the job user. (ddas)
MAPREDUCE-1007. Fix NPE in CapacityTaskScheduler.getJobs().
(V.V.Chaitanya Krishna via sharad)
MAPREDUCE-28. Refactor TestQueueManager and fix default ACLs.
(V.V.Chaitanya Krishna and Rahul K Singh via sharad)
MAPREDUCE-1182. Fix overflow in reduce causing allocations to exceed the
configured threshold. (cdouglas)
MAPREDUCE-1239. Fix contrib components build dependencies.
(Giridharan Kesavan and omalley)
MAPREDUCE-787. Fix JobSubmitter to honor user given symlink path.
(Amareshwari Sriramadasu via sharad)
MAPREDUCE-1249. Update config default value for socket read timeout to
match code default. (Amareshwari Sriramadasu via cdouglas)
MAPREDUCE-1161. Remove ineffective synchronization in NotificationTestCase.
(Owen O'Malley via cdouglas)
MAPREDUCE-1244. Fix eclipse-plugin's build dependencies. (gkesavan)
MAPREDUCE-1075. Fix JobTracker to not throw an NPE for a non-existent
queue. (V.V.Chaitanya Krishna via yhemanth)
MAPREDUCE-754. Fix NPE in expiry thread when a TT is lost. (Amar Kamat
via sharad)
MAPREDUCE-1074. Document Reducer mark/reset functionality. (Jothi
Padmanabhan via cdouglas)
MAPREDUCE-1267. Fix typo in mapred-default.xml. (Todd Lipcon via cdouglas)
MAPREDUCE-952. Remove inadvertently reintroduced Task.Counter enum. (Jothi
Padmanabhan via cdouglas)
MAPREDUCE-1230. Fix handling of null records in VerticaInputFormat. (Omer
Trajman via cdouglas)
MAPREDUCE-1171. Allow shuffle retries and read-error reporting to be
configurable. (Amareshwari Sriramadasu via acmurthy)
MAPREDUCE-879. Fix broken unit test TestTaskTrackerLocalization on MacOS.
(Sreekanth Ramakrishnan via yhemanth)
MAPREDUCE-1124. Fix imprecise byte counts in Gridmix. (cdouglas)
MAPREDUCE-1222. Add an option to exclude numeric IP addresses in topologies
processed by Mumak. (Hong Tang via cdouglas)
MAPREDUCE-1284. Fix fts_open() call in task-controller that was failing
LinuxTaskController unit tests. (Ravi Gummadi via yhemanth)
MAPREDUCE-1143. Fix running task counters to be updated correctly
when speculative attempts are running for a TIP.
(Rahul Kumar Singh via yhemanth)
MAPREDUCE-1241. Use a default queue configuration in JobTracker when
mapred-queues.xml is unavailable. (Todd Lipcon via cdouglas)
MAPREDUCE-1301. Fix set up of permission checking script used in
localization tests. (Amareshwari Sriramadasu via yhemanth)
MAPREDUCE-1286. Remove quoting from client opts in TaskRunner. (Yuri
Pradkin via cdouglas)
MAPREDUCE-1059. Use distcp.bytes.per.map when adding sync markers in
distcp. (Aaron Kimball via cdouglas)
MAPREDUCE-1009. Update forrest documentation describing hierarchical
queues. (Vinod Kumar Vavilapalli via yhemanth)
MAPREDUCE-1342. Fixed deadlock in global blacklisting of tasktrackers.
(Amareshwari Sriramadasu via acmurthy)
MAPREDUCE-1316. Fixes a memory leak of TaskInProgress instances in
the jobtracker. (Amar Kamat via yhemanth)
MAPREDUCE-1359. TypedBytes TestIO doesn't mkdir its test dir first.
(Anatoli Fomenko via cos)
MAPREDUCE-1314. Correct errant mapreduce.x.mapreduce.x replacements from
bulk change. (Amareshwari Sriramadasu via cdouglas)
MAPREDUCE-1365. Restore accidentally renamed test in
TestTaskTrackerBloacklisting. (Amareshwari Sriramadasu via cdouglas)
MAPREDUCE-1406. Fix spelling of JobContext.MAP_COMBINE_MIN_SPILLS.
(cdouglas)