FIFO Scheduler: The default job schedule is queue-based and uses FIFO
(First In First Out) ordering. The FIFO queue scheduler runs jobs based on the
order in which the jobs were submitted. You can prioritize a job by changing
the value of the
Fair Scheduler: Allows multiple users of cluster a fair share simultaneously. Each job is assigned with a pool and each pool is assigned with an even share of available task slots.
mapred.job.priority
property or by calling the setJobPriority()
methodFair Scheduler: Allows multiple users of cluster a fair share simultaneously. Each job is assigned with a pool and each pool is assigned with an even share of available task slots.
Capacity Scheduler: Support multiple queues.
Queues are guaranteed a fraction of the capacity of the grid (their 'guaranteed
capacity') in the sense that a certain capacity of resources will be at their
disposal. All jobs submitted to a queue will have access to the capacity
guaranteed to the queue.
yarn.scheduler.capacity.maximum-am-resource-percent=0.2
yarn.scheduler.capacity.maximum-applications=10000
yarn.scheduler.capacity.node-locality-delay=40
yarn.scheduler.capacity.root.acl_administer_queue=*
yarn.scheduler.capacity.root.capacity=100
yarn.scheduler.capacity.root.default.acl_administer_jobs=*
yarn.scheduler.capacity.root.default.acl_submit_applications=*
yarn.scheduler.capacity.root.default.capacity=50
yarn.scheduler.capacity.root.default.maximum-capacity=100
yarn.scheduler.capacity.root.default.minimum-user-limit-percent=10
yarn.scheduler.capacity.root.default.state=RUNNING
yarn.scheduler.capacity.root.default.user-limit-factor=1
yarn.scheduler.capacity.root.hiveserver.capacity=50
yarn.scheduler.capacity.root.hiveserver.hive1.capacity=50
yarn.scheduler.capacity.root.hiveserver.hive1.user-limit-factor=4
yarn.scheduler.capacity.root.hiveserver.hive2.capacity=50
yarn.scheduler.capacity.root.hiveserver.hive2.user-limit-factor=4
yarn.scheduler.capacity.root.hiveserver.queues=hive1,hive2
yarn.scheduler.capacity.root.queues=default,hiveserver
yarn.scheduler.capacity.root.unfunded.capacity=50
Important mapred-site.xml properties for Fair Schedule:
<property>
yarn.scheduler.capacity.maximum-am-resource-percent=0.2
yarn.scheduler.capacity.maximum-applications=10000
yarn.scheduler.capacity.node-locality-delay=40
yarn.scheduler.capacity.root.acl_administer_queue=*
yarn.scheduler.capacity.root.capacity=100
yarn.scheduler.capacity.root.default.acl_administer_jobs=*
yarn.scheduler.capacity.root.default.acl_submit_applications=*
yarn.scheduler.capacity.root.default.capacity=50
yarn.scheduler.capacity.root.default.maximum-capacity=100
yarn.scheduler.capacity.root.default.minimum-user-limit-percent=10
yarn.scheduler.capacity.root.default.state=RUNNING
yarn.scheduler.capacity.root.default.user-limit-factor=1
yarn.scheduler.capacity.root.hiveserver.capacity=50
yarn.scheduler.capacity.root.hiveserver.hive1.capacity=50
yarn.scheduler.capacity.root.hiveserver.hive1.user-limit-factor=4
yarn.scheduler.capacity.root.hiveserver.hive2.capacity=50
yarn.scheduler.capacity.root.hiveserver.hive2.user-limit-factor=4
yarn.scheduler.capacity.root.hiveserver.queues=hive1,hive2
yarn.scheduler.capacity.root.queues=default,hiveserver
yarn.scheduler.capacity.root.unfunded.capacity=50
yarn.scheduler.maximum-allocation-mb=237568
yarn.scheduler.minimum-allocation-mb=5999
YARN Java heap size=4096
Important mapred-site.xml properties for Fair Schedule:
<property>
<name>mapred.fairscheduler.assignmultiple</name>
<value>true</value>
<description> MapRConf </description>
</property>
<property>
<name>mapred.fairscheduler.eventlog.enabled</name>
<value>false</value>
<description>Enable scheduler logging
in ${HADOOP_LOG_DIR}/fairscheduler/
MapRConf </description>
</property>
<property>
<name>mapred.fairscheduler.smalljob.schedule.enable</name>
<value>true</value>
<description>Enable small job fast
scheduling inside fair scheduler.TaskTrackers should reserve a slot called
ephemeral slot which is used for smalljob if cluster is busy. MapRConf </description>
</property>
<property>
<name>mapred.fairscheduler.smalljob.max.maps</name>
<value>10</value>
<description>Small job definition. Max
number of maps allowed in small job. MapRConf
</description>
</property>
<property>
<name>mapred.fairscheduler.smalljob.max.reducers</name>
<value>10</value>
<description>Small job definition. Max
number of reducers allowed in small job. MapRConf
</description>
</property>
<property>
<name>mapred.fairscheduler.smalljob.max.inputsize</name>
<value>10737418240</value>
<description>Small job definition. Max
input size in bytes allowed for a small job. Default is 10GB.
MapRConf </description>
</property>
<property>
<name>mapred.fairscheduler.smalljob.max.reducer.inputsize</name>
<value>1073741824</value>
<description>Small job definition.Max estimated input size for a reducer
allowed in small job. Default is 1GB per reducer. MapRConf </description>
</property>
<property>
<name>mapred.cluster.ephemeral.tasks.memory.limit.mb</name>
<value>200</value>
<description>Small job definition. Max
memory in mbytes reserved for an ephermal slot.
Default is 200mb. This value must be same on
JobTracker and TaskTracker nodes. MapRConf
</description>
</property>
<property>
<name>mapreduce.jobtracker.node.labels.file</name>
<value></value>
<description>File on maprfs that has
mapping of nodes and labels.</description>
</property>
<property>
<name>mapred.tasktracker.ephemeral.tasks.maximum</name>
<value>1</value>
<description>Reserved slot for small
job scheduling MapRConf </description>
</property>