[ Go Back ]
This document describes the FairScheduler, a pluggable scheduler for Hadoop that allows YARN applications to share resources in large clusters fairly.
Fair scheduling is a method of assigning resources to applications such that all apps get, on average, an equal share of resources over time. Hadoop NextGen is capable of scheduling multiple resource types. By default, the Fair Scheduler bases scheduling fairness decisions only on memory. It can be configured to schedule with both memory and CPU, using the notion of Dominant Resource Fairness developed by Ghodsi et al. When there is a single app running, that app uses the entire cluster. When other apps are submitted, resources that free up are assigned to the new apps, so that each app eventually on gets roughly the same amount of resources. Unlike the default Hadoop scheduler, which forms a queue of apps, this lets short apps finish in reasonable time while not starving long-lived apps. It is also a reasonable way to share a cluster between a number of users. Finally, fair sharing can also work with app priorities - the priorities are used as weights to determine the fraction of total resources that each app should get.
The scheduler organizes apps further into "queues", and shares resources fairly between these queues. By default, all users share a single queue, named “default”. If an app specifically lists a queue in a container resource request, the request is submitted to that queue. It is also possible to assign queues based on the user name included with the request through configuration. Within each queue, a scheduling policy is used to share resources between the running apps. The default is memory-based fair sharing, but FIFO and multi-resource with Dominant Resource Fairness can also be configured. Queues can be arranged in a hierarchy to divide resources and configured with weights to share the cluster in specific proportions.
In addition to providing fair sharing, the Fair Scheduler allows assigning guaranteed minimum shares to queues, which is useful for ensuring that certain users, groups or production applications always get sufficient resources. When a queue contains apps, it gets at least its minimum share, but when the queue does not need its full guaranteed share, the excess is split between other running apps. This lets the scheduler guarantee capacity for queues while utilizing resources efficiently when these queues don't contain applications.
The Fair Scheduler lets all apps run by default, but it is also possible to limit the number of running apps per user and per queue through the config file. This can be useful when a user must submit hundreds of apps at once, or in general to improve performance if running too many apps at once would cause too much intermediate data to be created or too much context-switching. Limiting the apps does not cause any subsequently submitted apps to fail, only to wait in the scheduler's queue until some of the user's earlier apps finish.
The fair scheduler supports hierarchical queues. All queues descend from a queue named "root". Available resources are distributed among the children of the root queue in the typical fair scheduling fashion. Then, the children distribute the resources assigned to them to their children in the same fashion. Applications may only be scheduled on leaf queues. Queues can be specified as children of other queues by placing them as sub-elements of their parents in the fair scheduler allocation file.
A queue's name starts with the names of its parents, with periods as separators. So a queue named "queue1" under the root queue, would be referred to as "root.queue1", and a queue named "queue2" under a queue named "parent1" would be referred to as "root.parent1.queue2". When referring to queues, the root part of the name is optional, so queue1 could be referred to as just "queue1", and a queue2 could be referred to as just "parent1.queue2".
Additionally, the fair scheduler allows setting a different custom policy for each queue to allow sharing the queue's resources in any which way the user wants. A custom policy can be built by extending org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.SchedulingPolicy. FifoPolicy, FairSharePolicy (default), and DominantResourceFairnessPolicy are built-in and can be readily used.
Certain add-ons are not yet supported which existed in the original (MR1) Fair Scheduler. Among them, is the use of a custom policies governing priority “boosting” over certain apps.
The Fair Scheduler allows administrators to configure policies that automatically place submitted applications into appropriate queues. Placement can depend on the user and groups of the submitter and the requested queue passed by the application. A policy consists of a set of rules that are applied sequentially to classify an incoming application. Each rule either places the app into a queue, rejects it, or continues on to the next rule. Refer to the allocation file format below for how to configure these policies.
To use the Fair Scheduler first assign the appropriate scheduler class in yarn-site.xml:
<property> <name>yarn.resourcemanager.scheduler.class</name> <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value> </property>
Customizing the Fair Scheduler typically involves altering two files. First, scheduler-wide options can be set by adding configuration properties in the yarn-site.xml file in your existing configuration directory. Second, in most cases users will want to create an allocation file listing which queues exist and their respective weights and capacities. The allocation file is reloaded every 10 seconds, allowing changes to be made on the fly.
The allocation file must be in XML format. The format contains five types of elements:
An example allocation file is given here:
<?xml version="1.0"?> <allocations> <queue name="sample_queue"> <minResources>10000 mb,0vcores</minResources> <maxResources>90000 mb,0vcores</maxResources> <maxRunningApps>50</maxRunningApps> <weight>2.0</weight> <schedulingPolicy>fair</schedulingPolicy> <queue name="sample_sub_queue"> <aclSubmitApps>charlie</aclSubmitApps> <minResources>5000 mb,0vcores</minResources> </queue> </queue> <user name="sample_user"> <maxRunningApps>30</maxRunningApps> </user> <userMaxAppsDefault>5</userMaxAppsDefault> <queuePlacementPolicy> <rule name="specified" /> <rule name="primaryGroup" create="false" /> <rule name="default" /> </queuePlacementPolicy> </allocations>
Note that for backwards compatibility with the original FairScheduler, "queue" elements can instead be named as "pool" elements.
Queue Access Control Lists (ACLs) allow administrators to control who may take actions on particular queues. They are configured with the aclSubmitApps and aclAdministerApps properties, which can be set per queue. Currently the only supported administrative action is killing an application. Anybody who may administer a queue may also submit applications to it. These properties take values in a format like "user1,user2 group1,group2" or " group1,group2". An action on a queue will be permitted if its user or group is in the ACL of that queue or in the ACL of any of that queue's ancestors. So if queue2 is inside queue1, and user1 is in queue1's ACL, and user2 is in queue2's ACL, then both users may submit to queue2.
The root queue's ACLs are "*" by default which, because ACLs are passed down, means that everybody may submit to and kill applications from every queue. To start restricting access, change the root queue's ACLs to something other than "*".
The fair scheduler provides support for administration at runtime through a few mechanisms:
It is possible to modify minimum shares, limits, weights, preemption timeouts and queue scheduling policies at runtime by editing the allocation file. The scheduler will reload this file 10-15 seconds after it sees that it was modified.
Current applications, queues, and fair shares can be examined through the ResourceManager's web interface, at http://ResourceManager URL/cluster/scheduler.
The following fields can be seen for each queue on the web interface:
In addition to the information that the ResourceManager normally displays about each application, the web interface includes the application's fair share.
The Fair Scheduler supports moving a running application to a different queue. This can be useful for moving an important application to a higher priority queue, or for moving an unimportant application to a lower priority queue. Apps can be moved by running "yarn application -movetoqueue appID -queue targetQueueName".
When an application is moved to a queue, its existing allocations become counted with the new queue's allocations instead of the old for purposes of determining fairness. An attempt to move an application to a queue will fail if the addition of the app's resources to that queue would violate the its maxRunningApps or maxResources constraints.