Hadoop MapReduce 0.21.0 Release Notes
These release notes include new developer and user-facing incompatibilities, features, and major improvements.
Changes Since Hadoop 0.20.2
Sub-task
- [MAPREDUCE-157] - Job History log file format is not friendly for external tools.
- [MAPREDUCE-181] - Secure job submission
- [MAPREDUCE-355] - Change org.apache.hadoop.mapred.join to use new api
- [MAPREDUCE-358] - Change org.apache.hadoop.examples. AggregateWordCount and org.apache.hadoop.examples.AggregateWordHistogram to use new mapreduce api.
- [MAPREDUCE-361] - Change org.apache.hadoop.examples.terasort to use new mapreduce api
- [MAPREDUCE-364] - Change org.apache.hadoop.examples.MultiFileWordCount to use new mapreduce api.
- [MAPREDUCE-369] - Change org.apache.hadoop.mapred.lib.MultipleInputs to use new api.
- [MAPREDUCE-370] - Change org.apache.hadoop.mapred.lib.MultipleOutputs to use new api.
- [MAPREDUCE-371] - Change org.apache.hadoop.mapred.lib.KeyFieldBasedComparator and org.apache.hadoop.mapred.lib.KeyFieldBasedPartitioner to use new api
- [MAPREDUCE-372] - Change org.apache.hadoop.mapred.lib.ChainMapper/Reducer to use new api.
- [MAPREDUCE-373] - Change org.apache.hadoop.mapred.lib. FieldSelectionMapReduce to use new api.
- [MAPREDUCE-375] - Change org.apache.hadoop.mapred.lib.NLineInputFormat and org.apache.hadoop.mapred.MapFileOutputFormat to use new api.
- [MAPREDUCE-655] - Change KeyValueLineRecordReader and KeyValueTextInputFormat to use new api.
- [MAPREDUCE-656] - Change org.apache.hadoop.mapred.SequenceFile* classes to use new api
- [MAPREDUCE-744] - Support in DistributedCache to share cache files with other users after HADOOP-4493
- [MAPREDUCE-814] - Move completed Job history files to HDFS
- [MAPREDUCE-817] - Add a cache for retired jobs with minimal job info and provide a way to access history file url
- [MAPREDUCE-842] - Per-job local data on the TaskTracker node should have right access-control
- [MAPREDUCE-856] - Localized files from DistributedCache should have right access-control
- [MAPREDUCE-861] - Modify queue configuration format and parsing to support a hierarchy of queues.
- [MAPREDUCE-862] - Modify UI to support a hierarchy of queues
- [MAPREDUCE-870] - Clean up the job Retire code
- [MAPREDUCE-871] - Job/Task local files have incorrect group ownership set by LinuxTaskController binary
- [MAPREDUCE-898] - Change DistributedCache to use new api.
- [MAPREDUCE-927] - Cleanup of task-logs should happen in TaskTracker instead of the Child
- [MAPREDUCE-943] - TestNodeRefresh timesout occasionally
- [MAPREDUCE-975] - Add an API in job client to get the history file url for a given job id
- [MAPREDUCE-1026] - Shuffle should be secure
- [MAPREDUCE-1033] - Resolve location of scripts and configuration files after project split
- [MAPREDUCE-1035] - Remove streaming forrest documentation from the common project
- [MAPREDUCE-1039] - cluster_setup.xml exists in both mapreduce and common projects
- [MAPREDUCE-1081] - Move hadoop_archives.xml out of mapreduce project
- [MAPREDUCE-1190] - Add package.html to pi and pi.math packages.
- [MAPREDUCE-1201] - Make ProcfsBasedProcessTree collect CPU usage information
- [MAPREDUCE-1209] - Move common specific part of the test TestReflectionUtils out of mapred into common
- [MAPREDUCE-1218] - Collecting cpu and memory usage for TaskTrackers
- [MAPREDUCE-1307] - Introduce the concept of Job Permissions
- [MAPREDUCE-1326] - fi tests don't use fi-site.xml
- [MAPREDUCE-1430] - JobTracker should be able to renew delegation tokens for the jobs
- [MAPREDUCE-1432] - Add the hooks in JobTracker and TaskTracker to load tokens from the token cache into the user's UGI
- [MAPREDUCE-1433] - Create a Delegation token for MapReduce
- [MAPREDUCE-1454] - The servlets should quote server generated strings sent in the response
- [MAPREDUCE-1455] - Authorization for servlets
- [MAPREDUCE-1457] - For secure job execution, couple of more UserGroupInformation.doAs needs to be added
- [MAPREDUCE-1493] - Authorization for job-history pages
- [MAPREDUCE-1623] - Apply audience and stability annotations to classes in mapred package
- [MAPREDUCE-1625] - Improve grouping of packages in Javadoc
- [MAPREDUCE-1650] - Exclude Private elements from generated MapReduce Javadoc
- [MAPREDUCE-1791] - Remote cluster control functionality needs JavaDocs improvement
Bug
- [MAPREDUCE-28] - TestQueueManager takes too long and times out some times
- [MAPREDUCE-64] - Map-side sort is hampered by io.sort.record.percent
- [MAPREDUCE-144] - TaskMemoryManager should log process-tree's status while killing tasks.
- [MAPREDUCE-153] - TestJobInProgressListener sometimes timesout
- [MAPREDUCE-408] - TestKillSubProcesses fails with assertion failure sometimes
- [MAPREDUCE-416] - Move the completed jobs' history files to a DONE subdirectory inside the configured history directory
- [MAPREDUCE-419] - mapred.userlog.limit.kb has inconsistent defaults
- [MAPREDUCE-516] - Fix the 'cluster drain' problem in the Capacity Scheduler wrt High RAM Jobs
- [MAPREDUCE-522] - Rewrite TestQueueCapacities to make it simpler and avoid timeout errors
- [MAPREDUCE-543] - large pending jobs hog resources
- [MAPREDUCE-626] - Modify TestLostTracker to improve execution time
- [MAPREDUCE-627] - Modify TestTrackerBlacklistAcrossJobs to improve execution time
- [MAPREDUCE-628] - TestJobInProgress brings up MinMR/DFS clusters for every test
- [MAPREDUCE-630] - TestKillCompletedJob can be modified to improve execution times
- [MAPREDUCE-637] - Check in the codes that compute the 10^15+1st bit of π
- [MAPREDUCE-639] - Update the TeraSort to reflect the new benchmark rules for '09
- [MAPREDUCE-642] - distcp could have an option to preserve the full source path
- [MAPREDUCE-645] - When disctp is used to overwrite a file, it should return immediately with an error message
- [MAPREDUCE-646] - distcp should place the file distcp_src_files in distributed cache
- [MAPREDUCE-648] - Two distcp bugs
- [MAPREDUCE-658] - NPE in distcp if source path does not exist
- [MAPREDUCE-659] - gridmix2 not compiling under mapred module trunk/src/benchmarks/gridmix2
- [MAPREDUCE-662] - distcp -update fails if source directory is empty (i.e. no files to copy) and target directory does not exists.
- [MAPREDUCE-671] - Update ignore list
- [MAPREDUCE-676] - Existing diagnostic rules fail for MAP ONLY jobs
- [MAPREDUCE-677] - TestNodeRefresh timesout
- [MAPREDUCE-680] - Reuse of Writable objects is improperly handled by MRUnit
- [MAPREDUCE-682] - Reserved tasktrackers should be removed when a node is globally blacklisted
- [MAPREDUCE-683] - TestJobTrackerRestart fails with Map task completion events ordering mismatch
- [MAPREDUCE-694] - JSP jars should be added to dependcy list for Capacity scheduler
- [MAPREDUCE-702] - eclipse-plugin jar target fails during packaging
- [MAPREDUCE-708] - node health check script does not refresh the "reason for blacklisting"
- [MAPREDUCE-709] - node health check script does not display the correct message on timeout
- [MAPREDUCE-716] - org.apache.hadoop.mapred.lib.db.DBInputformat not working with oracle
- [MAPREDUCE-717] - Fix some corner case issues in speculative execution (post hadoop-2141)
- [MAPREDUCE-722] - More slots are getting reserved for HiRAM job tasks then required
- [MAPREDUCE-730] - allow relative paths to be created inside archives.
- [MAPREDUCE-732] - node health check script should not log "UNHEALTHY" status for every heartbeat in INFO mode
- [MAPREDUCE-733] - When running ant test TestTrackerBlacklistAcrossJobs, losing task tracker heartbeat exception occurs.
- [MAPREDUCE-734] - java.util.ConcurrentModificationException observed in unreserving slots for HiRam Jobs
- [MAPREDUCE-743] - Progress of map phase in map task is not updated properly
- [MAPREDUCE-754] - NPE in expiry thread when a TT is lost
- [MAPREDUCE-760] - TestNodeRefresh might not work as expected
- [MAPREDUCE-764] - TypedBytesInput's readRaw() does not preserve custom type codes
- [MAPREDUCE-769] - findbugs and javac warnings on trunk is non-zero
- [MAPREDUCE-771] - Setup and cleanup tasks remain in UNASSIGNED state for a long time on tasktrackers with long running high RAM tasks
- [MAPREDUCE-773] - LineRecordReader can report non-zero progress while it is processing a compressed stream
- [MAPREDUCE-787] - -files, -archives should honor user given symlink path
- [MAPREDUCE-792] - javac warnings in DBInputFormat
- [MAPREDUCE-799] - Some of MRUnit's self-tests were not being run
- [MAPREDUCE-808] - Buffer objects incorrectly serialized to typed bytes
- [MAPREDUCE-809] - Job summary logs show status of completed jobs as RUNNING
- [MAPREDUCE-825] - JobClient completion poll interval of 5s causes slow tests in local mode
- [MAPREDUCE-839] - unit test TestMiniMRChildTask fails on mac os-x
- [MAPREDUCE-840] - DBInputFormat leaves open transaction
- [MAPREDUCE-845] - build.xml hard codes findbugs heap size, in some configurations 512M is insufficient to successfully build
- [MAPREDUCE-848] - TestCapacityScheduler is failing
- [MAPREDUCE-852] - ExampleDriver is incorrectly set as a Main-Class in tools in build.xml
- [MAPREDUCE-859] - Unable to run examples with current trunk
- [MAPREDUCE-867] - trunk builds fails as ivy is lookin for avro jar from the local resolver
- [MAPREDUCE-868] - Trunk can't be compiled since Avro dependencies cannot be resolved
- [MAPREDUCE-877] - Required avro class are missing in contrib projects
- [MAPREDUCE-879] - TestTaskTrackerLocalization fails on MAC OS
- [MAPREDUCE-884] - TestReduceFetchFromPartialMem fails sometimes
- [MAPREDUCE-889] - binary communication formats added to Streaming by HADOOP-1722 should be documented
- [MAPREDUCE-890] - After HADOOP-4491, the user who started mapred system is not able to run job.
- [MAPREDUCE-891] - Streaming tests fail with NPE in MiniDFSCluster
- [MAPREDUCE-895] - FileSystem::ListStatus will now throw FileNotFoundException, MapRed needs updated
- [MAPREDUCE-896] - Users can set non-writable permissions on temporary files for TT and can abuse disk usage.
- [MAPREDUCE-899] - When using LinuxTaskController, localized files may become accessible to unintended users if permissions are misconfigured.
- [MAPREDUCE-912] - apache license header missing for some java files
- [MAPREDUCE-913] - TaskRunner crashes with NPE resulting in held up slots, UNINITIALIZED tasks and hung TaskTracker
- [MAPREDUCE-915] - For secure environments, the Map/Reduce debug script must be run as the user.
- [MAPREDUCE-917] - Remove getInputCounter and getOutputCounter from Contexts
- [MAPREDUCE-941] - vaidya script calls awk instead of nawk
- [MAPREDUCE-945] - Test programs support only default queue.
- [MAPREDUCE-946] - Fix regression in LineRecordReader to comply with line length parameters
- [MAPREDUCE-951] - MAP_INPUT_BYTES counter is missing
- [MAPREDUCE-952] - Previously removed Task.Counter reintroduced by MAPREDUCE-318
- [MAPREDUCE-962] - NPE in ProcfsBasedProcessTree.destroy()
- [MAPREDUCE-964] - Inaccurate values in jobSummary logs
- [MAPREDUCE-968] - NPE in distcp encountered when placing _logs directory on S3FileSystem
- [MAPREDUCE-971] - distcp does not always remove distcp.tmp.dir
- [MAPREDUCE-973] - Move test utilities from examples to test
- [MAPREDUCE-977] - Missing jackson jars from Eclipse template
- [MAPREDUCE-986] - rumen makes a task with a null type when one of the task lines is truncated
- [MAPREDUCE-988] - ant package does not copy the capacity-scheduler.jar under HADOOP_HOME/build/hadoop-mapred-0.21.0-dev/contrib/capacity-scheduler
- [MAPREDUCE-996] - Queue Scheduling Information is lost from Ui when we run mapred mradmin -refreshQueues after mapreduce 861
- [MAPREDUCE-1000] - JobHistory.initDone() should retain the try ... catch in the body
- [MAPREDUCE-1002] - After MAPREDUCE-862, command line queue-list doesn't print any queues
- [MAPREDUCE-1003] - trunk build fails when -Declipse.home is set
- [MAPREDUCE-1007] - MAPREDUCE-777 breaks the UI for hierarchial Queues.
- [MAPREDUCE-1009] - Forrest documentation needs to be updated to describes features provided for supporting hierarchical queues
- [MAPREDUCE-1014] - After the 0.21 branch, MapReduce trunk doesn't compile
- [MAPREDUCE-1016] - Make the format of the Job History be JSON instead of Avro binary
- [MAPREDUCE-1018] - Document changes to the memory management and scheduling model
- [MAPREDUCE-1022] - Trunk tests fail because of test-failure in Vertica
- [MAPREDUCE-1023] - Newly introduced findBugs warnings should be suppressed
- [MAPREDUCE-1028] - Cleanup tasks are scheduled using high memory configuration, leaving tasks in unassigned state.
- [MAPREDUCE-1029] - TestCopyFiles fails on testHftpAccessControl()
- [MAPREDUCE-1030] - Reduce tasks are getting starved in capacity scheduler
- [MAPREDUCE-1031] - ant tar target doens't seem to compile tests in contrib projects
- [MAPREDUCE-1038] - Mumak's compile-aspects target weaves aspects even though there are no changes to the Mumak's sources
- [MAPREDUCE-1041] - TaskStatuses map in TaskInProgress should be made package private instead of protected
- [MAPREDUCE-1062] - MRReliability test does not work with retired jobs
- [MAPREDUCE-1065] - Modify the mapred tutorial documentation to use new mapreduce api.
- [MAPREDUCE-1071] - o.a.h.mapreduce.jobhistory.EventReader constructor should expect DataInputStream
- [MAPREDUCE-1075] - getQueue(String queue) in JobTracker would return NPE for invalid queue name
- [MAPREDUCE-1076] - ClusterStatus class should be deprecated
- [MAPREDUCE-1077] - When rumen reads a truncated job tracker log, it produces a job whose outcome is SUCCESS. Should be null.
- [MAPREDUCE-1080] - Properties max.map.slots and max.reduce.slots should be hyphenated.
- [MAPREDUCE-1082] - Command line UI for queues' information is broken with hierarchical queues.
- [MAPREDUCE-1086] - hadoop commands in streaming tasks are trying to write to tasktracker's log
- [MAPREDUCE-1089] - Fair Scheduler preemption triggers NPE when tasks are scheduled but not running
- [MAPREDUCE-1090] - Modify log statement in Tasktracker log related to memory monitoring to include attempt id.
- [MAPREDUCE-1091] - TaskTrackers only work with same build as the JobTracker
- [MAPREDUCE-1098] - Incorrect synchronization in DistributedCache causes TaskTrackers to freeze up during localization of Cache for tasks.
- [MAPREDUCE-1104] - RecoveryManager not initialized in SimulatorJobTracker led to NPE in JT Jetty server
- [MAPREDUCE-1105] - CapacityScheduler: It should be possible to set queue hard-limit beyond it's actual capacity
- [MAPREDUCE-1111] - JT Jetty UI not working if we run mumak.sh off packaged distribution directory.
- [MAPREDUCE-1117] - ClusterMetrics return metrics for tasks instead of slots'
- [MAPREDUCE-1119] - When tasks fail to report status, show tasks's stack dump before killing
- [MAPREDUCE-1124] - TestGridmixSubmission fails sometimes
- [MAPREDUCE-1128] - MRUnit Allows Iteration Twice
- [MAPREDUCE-1131] - Using profilers other than hprof can cause JobClient to report job failure
- [MAPREDUCE-1133] - Eclipse .classpath template has outdated jar files and is missing some new ones.
- [MAPREDUCE-1140] - Per cache-file refcount can become negative when tasks release distributed-cache files
- [MAPREDUCE-1143] - runningMapTasks counter is not properly decremented in case of failed Tasks.
- [MAPREDUCE-1152] - JobTrackerInstrumentation.killed{Map/Reduce} is never called
- [MAPREDUCE-1153] - Metrics counting tasktrackers and blacklisted tasktrackers are not updated when trackers are decommissioned.
- [MAPREDUCE-1155] - Streaming tests swallow exceptions
- [MAPREDUCE-1158] - running_maps is not decremented when the tasks of a job is killed/failed
- [MAPREDUCE-1160] - Two log statements at INFO level fill up jobtracker logs
- [MAPREDUCE-1161] - NotificationTestCase should not lock current thread
- [MAPREDUCE-1165] - SerialUtils.hh: __PRETTY_FUNCTION__ is a GNU extension and not portable
- [MAPREDUCE-1171] - Lots of fetch failures
- [MAPREDUCE-1177] - TestTaskTrackerMemoryManager retries a task for more than 100 times.
- [MAPREDUCE-1178] - MultipleInputs fails with ClassCastException
- [MAPREDUCE-1186] - While localizing a DistributedCache file, TT sets permissions recursively on the whole base-dir
- [MAPREDUCE-1196] - MAPREDUCE-947 incompatibly changed FileOutputCommitter
- [MAPREDUCE-1212] - Mapreduce contrib project ivy dependencies are not included in binary target
- [MAPREDUCE-1213] - TaskTrackers restart is very slow because it deletes distributed cache directory synchronously
- [MAPREDUCE-1219] - JobTracker Metrics causes undue load on JobTracker
- [MAPREDUCE-1222] - [Mumak] We should not include nodes with numeric ips in cluster topology.
- [MAPREDUCE-1230] - Vertica streaming adapter doesn't handle nulls in all cases
- [MAPREDUCE-1239] - Mapreduce test build is broken after HADOOP-5107
- [MAPREDUCE-1241] - JobTracker should not crash when mapred-queues.xml does not exist
- [MAPREDUCE-1244] - eclipse-plugin fails with missing dependencies
- [MAPREDUCE-1245] - TestFairScheduler fails with "too many open files" error
- [MAPREDUCE-1249] - mapreduce.reduce.shuffle.read.timeout's default value should be 3 minutes, in mapred-default.xml
- [MAPREDUCE-1258] - Fair scheduler event log not logging job info
- [MAPREDUCE-1260] - Update Eclipse configuration to match changes to Ivy configuration
- [MAPREDUCE-1267] - Fix typo in mapred-default.xml
- [MAPREDUCE-1276] - Shuffle connection logic needs correction
- [MAPREDUCE-1284] - TestLocalizationWithLinuxTaskController fails
- [MAPREDUCE-1285] - DistCp cannot handle -delete if destination is local filesystem
- [MAPREDUCE-1293] - AutoInputFormat doesn't work with non-default FileSystems
- [MAPREDUCE-1294] - Build fails to pull latest hadoop-core-* artifacts
- [MAPREDUCE-1301] - TestDebugScriptWithLinuxTaskController fails
- [MAPREDUCE-1314] - Some logs have wrong configuration names.
- [MAPREDUCE-1316] - JobTracker holds stale references to retired jobs via unreported tasks
- [MAPREDUCE-1322] - TestStreamingAsDifferentUser fails on trunk
- [MAPREDUCE-1342] - Potential JT deadlock in faulty TT tracking
- [MAPREDUCE-1348] - Package org.apache.hadoop.blockforensics does not match directory name
- [MAPREDUCE-1358] - Utils.OutputLogFilter incorrectly filters for _logs
- [MAPREDUCE-1365] - TestTaskTrackerBlacklisting.AtestTrackerBlacklistingForJobFailures is mistyped.
- [MAPREDUCE-1369] - JUnit tests should never depend on anything in conf
- [MAPREDUCE-1372] - ConcurrentModificationException in JobInProgress
- [MAPREDUCE-1378] - Args in job details links on jobhistory.jsp are not URL encoded
- [MAPREDUCE-1386] - 'ant javadoc' fails
- [MAPREDUCE-1397] - NullPointerException observed during task failures
- [MAPREDUCE-1398] - TaskLauncher remains stuck on tasks waiting for free nodes even if task is killed.
- [MAPREDUCE-1399] - The archive command shows a null error message
- [MAPREDUCE-1400] - sed in build.xml fails
- [MAPREDUCE-1406] - JobContext.MAP_COMBINE_MIN_SPILLS is misspelled
- [MAPREDUCE-1408] - Allow customization of job submission policies
- [MAPREDUCE-1409] - FileOutputCommitter.abortTask should not catch IOException
- [MAPREDUCE-1412] - TestTaskTrackerBlacklisting fails sometimes
- [MAPREDUCE-1417] - Forrest documentation should be updated to reflect the changes in MAPREDUCE-744
- [MAPREDUCE-1420] - TestTTResourceReporting failing in trunk
- [MAPREDUCE-1421] - LinuxTaskController tests failing on trunk after the commit of MAPREDUCE-1385
- [MAPREDUCE-1422] - Changing permissions of files/dirs under job-work-dir may be needed sothat cleaning up of job-dir in all mapred-local-directories succeeds always
- [MAPREDUCE-1435] - symlinks in cwd of the task are not handled properly after MAPREDUCE-896
- [MAPREDUCE-1448] - [Mumak] mumak.sh does not honor --config option.
- [MAPREDUCE-1474] - forrest docs for archives is out of date.
- [MAPREDUCE-1476] - committer.needsTaskCommit should not be called for a task cleanup attempt
- [MAPREDUCE-1482] - Better handling of task diagnostic information stored in the TaskInProgress
- [MAPREDUCE-1490] - Raid client throws NullPointerException during initialization
- [MAPREDUCE-1494] - TestJobDirCleanup verifies wrong jobcache directory
- [MAPREDUCE-1497] - Suppress warning on inconsistent TaskTracker.indexCache synchronization
- [MAPREDUCE-1508] - NPE in TestMultipleLevelCaching on error cleanup path
- [MAPREDUCE-1515] - need to pass down java5 and forrest home variables
- [MAPREDUCE-1519] - RaidNode fails to create new parity file if an older version already exists
- [MAPREDUCE-1520] - TestMiniMRLocalFS fails on trunk
- [MAPREDUCE-1523] - Sometimes rumen trace generator fails to extract the job finish time.
- [MAPREDUCE-1536] - DataDrivenDBInputFormat does not split date columns correctly.
- [MAPREDUCE-1537] - TestDelegationTokenRenewal fails
- [MAPREDUCE-1538] - TrackerDistributedCacheManager can fail because the number of subdirectories reaches system limit
- [MAPREDUCE-1547] - Build Hadoop-Mapreduce-trunk and Mapreduce-trunk-Commit fails
- [MAPREDUCE-1573] - TestStreamingAsDifferentUser fails if run as tt_user
- [MAPREDUCE-1578] - HadoopArchives.java should not use HarFileSystem.VERSION
- [MAPREDUCE-1585] - Create Hadoop Archives version 2 with filenames URL-encoded
- [MAPREDUCE-1596] - MapReduce trunk snapshot is not being published to maven
- [MAPREDUCE-1602] - When the src does not exist, archive shows IndexOutOfBoundsException
- [MAPREDUCE-1604] - Job acls should be documented in forrest.
- [MAPREDUCE-1606] - TestJobACLs may timeout as there are no slots for launching JOB_CLEANUP task
- [MAPREDUCE-1607] - Task controller may not set permissions for a task cleanup attempt's log directory
- [MAPREDUCE-1609] - TaskTracker.localizeJob should not set permissions on job log directory recursively
- [MAPREDUCE-1610] - Forrest documentation should be updated to reflect the changes in MAPREDUCE-856
- [MAPREDUCE-1611] - Refresh nodes and refresh queues doesnt work with service authorization enabled
- [MAPREDUCE-1612] - job conf file is not accessible from job history web page
- [MAPREDUCE-1615] - ant test on trunk does not compile.
- [MAPREDUCE-1618] - JobStatus.getJobAcls() and setJobAcls should have javadoc
- [MAPREDUCE-1622] - Include slf4j dependencies in binary tarball
- [MAPREDUCE-1628] - HarFileSystem shows incorrect replication numbers and permissions
- [MAPREDUCE-1629] - Get rid of fakeBlockLocations() on HarFileSystem, since it's not used
- [MAPREDUCE-1635] - ResourceEstimator does not work after MAPREDUCE-842
- [MAPREDUCE-1657] - After task logs directory is deleted, tasklog servlet displays wrong error message about job ACLs
- [MAPREDUCE-1659] - RaidNode should write temp files on /tmp and add random numbers to their names to avoid conflicts
- [MAPREDUCE-1692] - Remove TestStreamedMerge from the streaming tests
- [MAPREDUCE-1694] - streaming documentation appears to be wrong on overriding settings w/-D
- [MAPREDUCE-1695] - capacity scheduler is not included in findbugs/javadoc targets
- [MAPREDUCE-1697] - Document the behavior of -file option in streaming and deprecate it in favour of generic -files option.
- [MAPREDUCE-1705] - Archiving and Purging of parity files should handle globbed policies
- [MAPREDUCE-1725] - Fix MapReduce API incompatibilities between 0.20 and 0.21
- [MAPREDUCE-1727] - TestJobACLs fails after HADOOP-6686
- [MAPREDUCE-1728] - Oracle timezone strings do not match Java
- [MAPREDUCE-1747] - Remove documentation for the 'unstable' job-acls feature
- [MAPREDUCE-1765] - Streaming doc - change StreamXmlRecord to StreamXmlRecordReader
- [MAPREDUCE-1789] - MapReduce trunk fails to compile following HADOOP-6600
- [MAPREDUCE-1810] - 0.21 build is broken
- [MAPREDUCE-1845] - FairScheduler.tasksToPeempt() can return negative number
- [MAPREDUCE-1853] - MultipleOutputs does not cache TaskAttemptContext
- [MAPREDUCE-1870] - Harmonize MapReduce JAR library versions with Common and HDFS
- [MAPREDUCE-1876] - TaskAttemptStartedEvent.java incorrectly logs MAP_ATTEMPT_STARTED as event type for reduce tasks
- [MAPREDUCE-1880] - "java.lang.ArithmeticException: Non-terminating decimal expansion; no exact representable decimal result." while running "hadoop jar hadoop-0.20.1+169.89-examples.jar pi 4 30"
- [MAPREDUCE-1885] - Trunk compilation is broken because of FileSystem api change in HADOOP-6826
- [MAPREDUCE-1915] - IndexCache - getIndexInformation - check reduce index Out Of Bounds
- [MAPREDUCE-1920] - Job.getCounters() returns null when using a cluster
- [MAPREDUCE-1926] - MapReduce distribution is missing build-utils.xml
- [MAPREDUCE-1929] - Allow artifacts to be published to the staging Apache Nexus Maven Repository
- [MAPREDUCE-1942] - 'compile-fault-inject' should never be called directly.
- [MAPREDUCE-1980] - TaskAttemptUnsuccessfulCompletionEvent.java incorrectly logs MAP_ATTEMPT_KILLED as event type for reduce tasks
- [MAPREDUCE-2012] - Some contrib tests fail in branch 0.21 and trunk
- [MAPREDUCE-2014] - Remove task-controller from 0.21 branch
Improvement
- [MAPREDUCE-245] - Job and JobControl classes should return interfaces rather than implementations
- [MAPREDUCE-270] - TaskTracker could send an out-of-band heartbeat when the last running map/reduce completes
- [MAPREDUCE-277] - Job history counters should be avaible on the UI.
- [MAPREDUCE-284] - Improvements to RPC between Child and TaskTracker
- [MAPREDUCE-318] - Refactor reduce shuffle code
- [MAPREDUCE-336] - The logging level of the tasks should be configurable by the job
- [MAPREDUCE-353] - Allow shuffle read and connection timeouts to be configurable
- [MAPREDUCE-463] - The job setup and cleanup tasks should be optional
- [MAPREDUCE-476] - extend DistributedCache to work locally (LocalJobRunner)
- [MAPREDUCE-478] - separate jvm param for mapper and reducer
- [MAPREDUCE-479] - Add reduce ID to shuffle clienttrace
- [MAPREDUCE-487] - DBInputFormat support for Oracle
- [MAPREDUCE-502] - Allow jobtracker to be configured with zero completed jobs in memory
- [MAPREDUCE-625] - Modify TestTaskLimits to improve execution time
- [MAPREDUCE-632] - Merge TestCustomOutputCommitter with TestCommandLineJobSubmission
- [MAPREDUCE-649] - distcp should validate the data copied
- [MAPREDUCE-654] - Add an option -count to distcp for displaying some info about the src files
- [MAPREDUCE-664] - distcp with -delete option does not display number of files deleted from the target that were not present on source
- [MAPREDUCE-689] - Update distcp guide for new distcp features
- [MAPREDUCE-701] - Make TestRackAwareTaskPlacement a unit test
- [MAPREDUCE-711] - Move Distributed Cache from Common to Map/Reduce
- [MAPREDUCE-712] - RandomTextWriter example is CPU bound
- [MAPREDUCE-739] - Allow relative paths to be created inside archives.
- [MAPREDUCE-742] - Improve the java comments for the π examples
- [MAPREDUCE-765] - eliminate the usage of FileSystem.create( ) depracated by Hadoop-5438
- [MAPREDUCE-766] - Enhance -list-blacklisted-trackers to display host name, blacklisted reason and blacklist report.
- [MAPREDUCE-772] - Chaging LineRecordReader algo so that it does not need to skip backwards in the stream
- [MAPREDUCE-779] - Add node health failures into JobTrackerStatistics
- [MAPREDUCE-781] - distcp overrides user-selected job name
- [MAPREDUCE-782] - Use PureJavaCrc32 in mapreduce spills
- [MAPREDUCE-784] - Modify TestUserDefinedCounters to use LocalJobRunner instead of MiniMR
- [MAPREDUCE-788] - Modify gridmix2 to use new api.
- [MAPREDUCE-797] - MRUnit MapReduceDriver should support combiners
- [MAPREDUCE-830] - Providing BZip2 splitting support for Text data
- [MAPREDUCE-847] - Adding Apache License Headers and reduce releaseaudit warnings to zero
- [MAPREDUCE-849] - Renaming of configuration property names in mapreduce
- [MAPREDUCE-873] - Simplify Job Recovery
- [MAPREDUCE-874] - The name "PiEstimator" is misleading
- [MAPREDUCE-875] - Make DBRecordReader execute queries lazily
- [MAPREDUCE-885] - More efficient SQL queries for DBInputFormat
- [MAPREDUCE-893] - Provide an ability to refresh queue configuration without restart.
- [MAPREDUCE-903] - Adding AVRO jar to eclipse classpath
- [MAPREDUCE-905] - Add Eclipse launch tasks for MapReduce
- [MAPREDUCE-910] - MRUnit should support counters
- [MAPREDUCE-930] - rumen should interpret job history log input paths with respect to default FS, not local FS
- [MAPREDUCE-931] - rumen should use its own interpolation classes to create runtimes for simulated tasks
- [MAPREDUCE-936] - Allow a load difference in fairshare scheduler
- [MAPREDUCE-937] - Allow comments in mapred.hosts and mapred.hosts.exclude files
- [MAPREDUCE-944] - Extend FairShare scheduler to fair-share memory usage in the cluster
- [MAPREDUCE-947] - OutputCommitter should have an abortJob method
- [MAPREDUCE-953] - Generate configuration dump for hierarchial queue configuration
- [MAPREDUCE-954] - The new interface's Context objects should be interfaces
- [MAPREDUCE-960] - Unnecessary copy in mapreduce.lib.input.KeyValueLineRecordReader
- [MAPREDUCE-963] - mapred's FileAlreadyExistsException should be deprecated in favor of hadoop-common's one.
- [MAPREDUCE-966] - Rumen interface improvement
- [MAPREDUCE-967] - TaskTracker does not need to fully unjar job jars
- [MAPREDUCE-972] - distcp can timeout during rename operation to s3
- [MAPREDUCE-1011] - Git and Subversion ignore of build.properties
- [MAPREDUCE-1012] - Context interfaces should be Public Evolving
- [MAPREDUCE-1048] - Show total slot usage in cluster summary on jobtracker webui
- [MAPREDUCE-1083] - Use the user-to-groups mapping service in the JobTracker
- [MAPREDUCE-1084] - Implementing aspects development and fault injeciton framework for MapReduce
- [MAPREDUCE-1097] - Changes/fixes to support Vertica 3.5
- [MAPREDUCE-1103] - Additional JobTracker metrics
- [MAPREDUCE-1185] - URL to JT webconsole for running job and job history should be the same
- [MAPREDUCE-1189] - Reduce ivy console output to ovservable level
- [MAPREDUCE-1198] - Alternatively schedule different types of tasks in fair share scheduler
- [MAPREDUCE-1221] - Kill tasks on a node if the free physical memory on that machine falls below a configured threshold
- [MAPREDUCE-1229] - [Mumak] Allow customization of job submission policy
- [MAPREDUCE-1231] - Distcp is very slow
- [MAPREDUCE-1250] - Refactor job token to use a common token interface
- [MAPREDUCE-1265] - Include tasktracker name in the task attempt error log
- [MAPREDUCE-1287] - Avoid calling Partitioner with only 1 reducer
- [MAPREDUCE-1302] - TrackerDistributedCacheManager can delete file asynchronously
- [MAPREDUCE-1305] - Running distcp with -delete incurs avoidable penalties
- [MAPREDUCE-1306] - [MUMAK] Randomize the arrival of heartbeat responses
- [MAPREDUCE-1309] - I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats
- [MAPREDUCE-1317] - Reducing memory consumption of rumen objects
- [MAPREDUCE-1337] - Generify StreamJob for better readability
- [MAPREDUCE-1367] - LocalJobRunner should support parallel mapper execution
- [MAPREDUCE-1403] - Save file-sizes of each of the artifacts in DistributedCache in the JobConf
- [MAPREDUCE-1423] - Improve performance of CombineFileInputFormat when multiple pools are configured
- [MAPREDUCE-1425] - archive throws OutOfMemoryError
- [MAPREDUCE-1428] - Make block size and the size of archive created files configurable.
- [MAPREDUCE-1440] - MapReduce should use the short form of the user names
- [MAPREDUCE-1460] - Oracle support in DataDrivenDBInputFormat
- [MAPREDUCE-1466] - FileInputFormat should save #input-files in JobConf
- [MAPREDUCE-1470] - Move Delegation token into Common so that we can use it for MapReduce also
- [MAPREDUCE-1489] - DataDrivenDBInputFormat should not query the database when generating only one split
- [MAPREDUCE-1491] - Use HAR filesystem to merge parity files
- [MAPREDUCE-1503] - Push HADOOP-6551 into MapReduce
- [MAPREDUCE-1512] - RAID could use HarFileSystem directly instead of FileSystem.get
- [MAPREDUCE-1514] - Add documentation on permissions, limitations, error handling for archives.
- [MAPREDUCE-1518] - On contrib/raid, the RaidNode currently runs the deletion check for parity files on directories too. It would be better if it didn't.
- [MAPREDUCE-1527] - QueueManager should issue warning if mapred-queues.xml is skipped.
- [MAPREDUCE-1535] - Replace usage of FileStatus#isDir()
- [MAPREDUCE-1556] - upgrade to Avro 1.3.0
- [MAPREDUCE-1568] - TrackerDistributedCacheManager should clean up cache in a background thread
- [MAPREDUCE-1569] - Mock Contexts & Configurations
- [MAPREDUCE-1570] - Shuffle stage - Key and Group Comparators
- [MAPREDUCE-1579] - archive: check and possibly replace the space charater in paths
- [MAPREDUCE-1590] - Move HarFileSystem from Hadoop Common to Mapreduce tools.
- [MAPREDUCE-1593] - [Rumen] Improvements to random seed generation
- [MAPREDUCE-1613] - Install/deploy source jars to Maven repo
- [MAPREDUCE-1619] - Eclipse .classpath file should be generated from Ivy files to avoid duplicating dependencies
- [MAPREDUCE-1627] - HadoopArchives should not uses DistCp method
- [MAPREDUCE-1656] - JobStory should provide queue info.
- [MAPREDUCE-1735] - Un-deprecate the old MapReduce API in the 0.21 branch
- [MAPREDUCE-1749] - Pull configuration strings out of JobContext
- [MAPREDUCE-1751] - Change MapReduce to depend on Hadoop 'common' artifacts instead of 'core'
- [MAPREDUCE-1832] - Support for file sizes less than 1MB in DFSIO benchmark.
- [MAPREDUCE-1856] - Extract a subset of tests for smoke (DOA) validation
- [MAPREDUCE-2003] - It should be able to specify different jvm settings for map and reduce child process (via mapred.child.map.java.opts and mapred.child.reduce.java.opts options)
New Feature
- [MAPREDUCE-211] - Provide a node health check script and run it periodically to check the node health status
- [MAPREDUCE-467] - Collect information about number of tasks succeeded / total per time unit for a tasktracker.
- [MAPREDUCE-532] - Allow admins of the Capacity Scheduler to set a hard-limit on the capacity of a queue
- [MAPREDUCE-546] - Provide sample fair scheduler config file in conf/ and use it by default if no other config file is specified
- [MAPREDUCE-548] - Global scheduling in the Fair Scheduler
- [MAPREDUCE-551] - Add preemption to the fair scheduler
- [MAPREDUCE-567] - Add a new example MR that always fails
- [MAPREDUCE-679] - XML-based metrics as JSP servlet for JobTracker
- [MAPREDUCE-698] - Per-pool task limits for the fair scheduler
- [MAPREDUCE-706] - Support for FIFO pools in the fair scheduler
- [MAPREDUCE-707] - Provide a jobconf property for explicitly assigning a job to a pool
- [MAPREDUCE-728] - Mumak: Map-Reduce Simulator
- [MAPREDUCE-740] - Provide summary information per job once a job is finished.
- [MAPREDUCE-751] - Rumen: a tool to extract job characterization data from job tracker logs
- [MAPREDUCE-768] - Configuration information should generate dump in a standard format.
- [MAPREDUCE-775] - Add input/output formatters for Vertica clustered ADBMS.
- [MAPREDUCE-776] - Gridmix: Trace-based benchmark for Map/Reduce
- [MAPREDUCE-777] - A method for finding and tracking jobs from the new API
- [MAPREDUCE-798] - MRUnit should be able to test a succession of MapReduce passes
- [MAPREDUCE-800] - MRUnit should support the new API
- [MAPREDUCE-824] - Support a hierarchy of queues in the capacity scheduler
- [MAPREDUCE-853] - Support a hierarchy of queues in the Map/Reduce framework
- [MAPREDUCE-948] - FileOutputCommitter should create a _DONE file for successful jobs
- [MAPREDUCE-980] - Modify JobHistory to use Avro for serialization instead of raw JSON
- [MAPREDUCE-1074] - Provide documentation for Mark/Reset functionality
- [MAPREDUCE-1167] - Make ProcfsBasedProcessTree collect rss memory information
- [MAPREDUCE-1295] - We need a job trace manipulator to build gridmix runs.
- [MAPREDUCE-1304] - Add counters for task time spent in GC
- [MAPREDUCE-1335] - Add SASL DIGEST-MD5 authentication to TaskUmbilicalProtocol
- [MAPREDUCE-1338] - need security keys storage solution
- [MAPREDUCE-1383] - Allow storage and caching of delegation token.
- [MAPREDUCE-1385] - Make changes to MapReduce for the new UserGroupInformation APIs (HADOOP-6299)
- [MAPREDUCE-1464] - In JobTokenIdentifier change method getUsername to getUser which returns UGI
- [MAPREDUCE-1673] - Start and Stop scripts for the RaidNode
- [MAPREDUCE-1774] - Large-scale Automated Framework
Task
Test
- [MAPREDUCE-670] - Create target for 10 minute patch test build for mapreduce
- [MAPREDUCE-686] - Move TestSpeculativeExecution.Fake* into a separate class so that it can be used by other tests also
- [MAPREDUCE-785] - Refactor TestReduceFetchFromPartialMem into a separate test
- [MAPREDUCE-793] - Create a new test that consolidates a few tests to be included in the commit-test list
- [MAPREDUCE-1050] - Introduce a mock object testing framework
- [MAPREDUCE-1061] - Gridmix unit test should validate input/output bytes
- [MAPREDUCE-1359] - TypedBytes TestIO doesn't mkdir its test dir first