Tuesday, September 30, 2014

Hadoop Administration Part 12 : Job Scheduling

FIFO Scheduler: The default job schedule is queue-based and uses FIFO (First In First Out) ordering. The FIFO queue scheduler runs jobs based on the order in which the jobs were submitted. You can prioritize a job by changing the value of the mapred.job.priority property or by calling the setJobPriority() method

Fair Scheduler:   Allows multiple users of cluster a fair share simultaneously. Each job is assigned with a pool and each pool is assigned with an even share of available task slots.

Capacity Scheduler: Support multiple queues. Queues are guaranteed a fraction of the capacity of the grid (their 'guaranteed capacity') in the sense that a certain capacity of resources will be at their disposal. All jobs submitted to a queue will have access to the capacity guaranteed to the queue.

yarn.scheduler.capacity.maximum-am-resource-percent=0.2
yarn.scheduler.capacity.maximum-applications=10000
yarn.scheduler.capacity.node-locality-delay=40
yarn.scheduler.capacity.root.acl_administer_queue=*
yarn.scheduler.capacity.root.capacity=100
yarn.scheduler.capacity.root.default.acl_administer_jobs=*
yarn.scheduler.capacity.root.default.acl_submit_applications=*
yarn.scheduler.capacity.root.default.capacity=50
yarn.scheduler.capacity.root.default.maximum-capacity=100
yarn.scheduler.capacity.root.default.minimum-user-limit-percent=10
yarn.scheduler.capacity.root.default.state=RUNNING
yarn.scheduler.capacity.root.default.user-limit-factor=1
yarn.scheduler.capacity.root.hiveserver.capacity=50
yarn.scheduler.capacity.root.hiveserver.hive1.capacity=50
yarn.scheduler.capacity.root.hiveserver.hive1.user-limit-factor=4
yarn.scheduler.capacity.root.hiveserver.hive2.capacity=50
yarn.scheduler.capacity.root.hiveserver.hive2.user-limit-factor=4
yarn.scheduler.capacity.root.hiveserver.queues=hive1,hive2
yarn.scheduler.capacity.root.queues=default,hiveserver

yarn.scheduler.capacity.root.unfunded.capacity=50

yarn.scheduler.maximum-allocation-mb=237568
yarn.scheduler.minimum-allocation-mb=5999
YARN Java heap size=4096











Important mapred-site.xml properties for Fair Schedule:
<property>
  <name>mapred.fairscheduler.assignmultiple</name>
  <value>true</value>
  <description> MapRConf </description>
</property>

<property>
  <name>mapred.fairscheduler.eventlog.enabled</name>
  <value>false</value>
  <description>Enable scheduler logging in ${HADOOP_LOG_DIR}/fairscheduler/
  MapRConf </description>
</property>

<property>
  <name>mapred.fairscheduler.smalljob.schedule.enable</name>
  <value>true</value>
  <description>Enable small job fast scheduling inside fair scheduler.TaskTrackers should reserve a slot called ephemeral slot which is used for smalljob if cluster is busy. MapRConf </description>
</property>

<property>
  <name>mapred.fairscheduler.smalljob.max.maps</name>
  <value>10</value>
  <description>Small job definition. Max number of maps allowed in small job. MapRConf
  </description>
</property>

<property>
  <name>mapred.fairscheduler.smalljob.max.reducers</name>
  <value>10</value>
  <description>Small job definition. Max number of reducers allowed in small job. MapRConf
  </description>
</property>

<property>
  <name>mapred.fairscheduler.smalljob.max.inputsize</name>
  <value>10737418240</value>
  <description>Small job definition. Max input size in bytes allowed for a small job. Default is 10GB.
  MapRConf </description>
</property>

<property>
  <name>mapred.fairscheduler.smalljob.max.reducer.inputsize</name>
  <value>1073741824</value>
  <description>Small job definition.Max estimated input size for a reducer allowed in small job. Default is 1GB per reducer.  MapRConf </description>
</property>

<property>
  <name>mapred.cluster.ephemeral.tasks.memory.limit.mb</name>
  <value>200</value>
  <description>Small job definition. Max memory in mbytes reserved for an ephermal slot.
  Default is 200mb. This value must be same on JobTracker and TaskTracker nodes. MapRConf
  </description>
</property>

<property>
  <name>mapreduce.jobtracker.node.labels.file</name>
  <value></value>
  <description>File on maprfs that has mapping of nodes and labels.</description>
</property>

<property>
  <name>mapred.tasktracker.ephemeral.tasks.maximum</name>
  <value>1</value>
  <description>Reserved slot for small job scheduling MapRConf </description>
</property> 

No comments:

Post a Comment