Introduction
In a shared computing cluster, users submit jobs that require different combinations of resources: CPU, memory, disk I/O, network bandwidth. A batch analytics job might be CPU-heavy but use little memory, while an in-memory cache service demands gigabytes of RAM with minimal CPU. Traditional fair scheduling algorithms like max-min fairness operate on a single resource dimension. When multiple resource types must be allocated simultaneously, these algorithms break down. They either pick one resource to be "fair" about and ignore the others, or they reduce heterogeneous demands to a single scalar in an ad hoc way.
Dominant Resource Fairness (DRF) solves this problem. Proposed by Ghodsi et al. in 2011, DRF generalizes max-min fairness to multiple resource types. It has become the default scheduling policy in Apache Mesos and YARN, and its core ideas appear in virtually every modern cluster resource manager.
The Core Idea
Consider a cluster with 9 CPUs and 18 GB of memory shared between two users. User A's tasks each require 1 CPU and 4 GB of memory. User B's tasks each require 3 CPUs and 1 GB of memory.
For each user, DRF identifies the dominant resource, which is the resource for which the user has the greatest demand as a fraction of the cluster's total capacity. Each of User A's tasks consumes 1/9 of the total CPUs and 4/18 (2/9) of the total memory. Memory is User A's dominant resource, since 2/9 > 1/9. Each of User B's tasks consumes 3/9 (1/3) of total CPUs and 1/18 of total memory. CPU is User B's dominant resource, since 1/3 > 1/18.
DRF then applies max-min fairness over the dominant shares. It equalizes the dominant share of each user, allocating resources so that no user's dominant share can be increased without decreasing another user's dominant share. In this example, the algorithm allocates 3 tasks to User A (dominant share = 3 ร 2/9 = 6/9) and 2 tasks to User B (dominant share = 2 ร 1/3 = 6/9). Both users end up with a dominant share of 2/3. The total allocation is 3ร1 + 2ร3 = 9 CPUs and 3ร4 + 2ร1 = 14 GB memory, fully utilizing CPUs while leaving 4 GB of memory unallocated.
Desirable Properties
DRF satisfies several properties that are important for a scheduling policy in shared infrastructure.
Sharing Incentive
Every user receives at least as many resources as they would under an equal, static partition of the cluster. A user with 1/n of the cluster's resources in every dimension can never do worse under DRF. This makes participation in the shared cluster rational.
Strategy-Proofness
A user cannot improve their allocation by lying about their resource demands. Inflating a request for an unneeded resource will only increase the user's dominant share, causing DRF to give them fewer tasks overall. This is a critical property in multi-tenant environments where users control their own job specifications.
Envy-Freeness
No user prefers another user's allocation to their own. This follows naturally from equalizing dominant shares. If User A's dominant share equals User B's, then swapping allocations would not benefit either party given their actual demand vectors.
Pareto Efficiency
It is not possible to increase one user's allocation without decreasing another's. The algorithm fully allocates at least one resource in the cluster.
Implementation in Practice
YARN implements DRF as one of its configurable scheduler policies. In YARN's capacity scheduler and fair scheduler, each queue can be assigned a DRF policy that considers both memory and CPU (called "vcores") when making allocation decisions. The scheduler greedily assigns containers to the user with the smallest dominant share, repeating until the cluster is full or no pending requests can be satisfied.
Apache Mesos uses a two-level scheduling architecture where DRF governs the first level: resource offers. The Mesos master computes dominant shares for each framework and makes resource offers to the framework with the lowest dominant share. The framework then decides which offers to accept. This design decouples the fairness policy from framework-specific scheduling logic.
One practical complication is that DRF in its pure form assumes tasks are small relative to the cluster. When a single task demands a large fraction of the cluster, the discrete allocation problem becomes harder, and the algorithm can only approximate the ideal continuous solution. Weighted DRF extends the base algorithm to handle different user priorities by scaling dominant shares by a weight factor.
Key Points
- DRF generalizes max-min fairness from a single resource dimension to multiple resource types by operating on each user's dominant share.
- The dominant resource for a user is the resource type for which the user's per-task demand, expressed as a fraction of total cluster capacity, is largest.
- DRF is strategy-proof, meaning users cannot game the system by inflating their resource requests.
- The sharing incentive property guarantees every user does at least as well as they would under a static equal partition of the cluster.
- DRF is the foundation of resource allocation in both Apache Mesos and Hadoop YARN.
- Weighted DRF allows operators to assign different priorities to users or queues while preserving the algorithm's core fairness properties.
- Pareto efficiency ensures no resources are wasted when pending demand exists.
References
Ghodsi, A., Zaharia, M., Hindman, B., Konwinski, A., Shenker, S., and Stoica, I. "Dominant Resource Fairness: Fair Allocation of Multiple Resource Types." NSDI 2011.
Hindman, B., Konwinski, A., Zaharia, M., Ghodsi, A., Joseph, A.D., Katz, R., Shenker, S., and Stoica, I. "Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center." NSDI 2011.
Vavilapalli, V.K., Murthy, A.C., Douglas, C., Agarwal, S., Konar, M., Evans, R., Graves, T., Lowe, J., Shah, H., Seth, S., Saha, B., Curino, C., O'Malley, O., Radia, S., Reed, B., and Baldeschwieler, E. "Apache Hadoop YARN: Yet Another Resource Negotiator." SoCC 2013.
Bertsekas, D. and Gallager, R. "Data Networks." Prentice Hall, 1992. (Chapter 6: max-min fairness foundations.)