hadoop Capacity Scheduler计算能力调度器配置

问题导读
1、Capacity Scheduler支持哪些特性？
2、执行什么命令可以重新加载配置项？
3、如何使用队列？

计算能力调度器介绍
Capacity Scheduler支持以下特性：
(1) 计算能力保证。支持多个队列，某个作业可被提交到某一个队列中。每个队列会配置一定比例的计算资源，且所有提交到队列中的作业共享该队列中的资源。
(2) 灵活性。空闲资源会被分配给那些未达到资源使用上限的队列，当某个未达到资源的队列需要资源时，一旦出现空闲资源资源，便会分配给他们。
(3) 支持优先级。队列支持作业优先级调度（默认是FIFO）
(4) 多重租赁。综合考虑多种约束防止单个作业、用户或者队列独占队列或者集群中的资源。
(5) 基于资源的调度。支持资源密集型作业，允许作业使用的资源量高于默认值，进而可容纳不同资源需求的作业。不过，当前仅支持内存资源的调度。

配置方法为
1. 复制$HADOOP_HOME/contrib/capacity-scheduler/hadoop-capacity-scheduler.jar 到$HADOOP_HOME/lib目录中

2. 修改namenode节点中的conf/mapred-site.xml文件

[html] view plaincopyprint?
<property>  
  <name>mapred.jobtracker.taskScheduler</name>  
  <value>org.apache.hadoop.mapred.CapacityTaskScheduler</value>  
</property>  
<property>  
  <name>mapred.queue.names</name>  
  <value>default,hadoop,hive</value>  
</property>  
3. 修改conf/capacity-scheduler.xml 配置文件

[html] view plaincopyprint?
<?xml version="1.0"?>  
  
<!-- This is the configuration file for the resource manager in Hadoop. -->  
<!-- You can configure various scheduling parameters related to queues. -->  
<!-- The properties for a queue follow a naming convention,such as, -->  
<!-- mapred.capacity-scheduler.queue.<queue-name>.property-name. -->  
  
<configuration>  
  <!-- Capacity scheduler Job Initialization configuration parameters -->  
  <property>  
    <name>mapred.capacity-scheduler.init-poll-interval</name>  
    <value>5000</value>  
    <description>The amount of time in miliseconds which is used to poll the job queues for jobs to initialize.  
    </description>  
  </property>  
  <property>  
    <name>mapred.capacity-scheduler.init-worker-threads</name>  
    <value>5</value>  
    <description>Number of worker threads which would be used by  
    Initialization poller to initialize jobs in a set of queue.  
    If number mentioned in property is equal to number of job queues  
    then a single thread would initialize jobs in a queue. If lesser  
    then a thread would get a set of queues assigned. If the number  
    is greater then number of threads would be equal to number of   
    job queues.  
    </description>  
  </property>  
  
  <property>   
     <name>mapred.capacity-scheduler.maximum-system-jobs</name>   
     <value>30</value>   
     <description>Maximum number of jobs in the system which can be initialized,   
concurrently, by the Capacity Scheduler.   
     </description>   
  </property>   
  
<!--hadoop queue-->  
  <property>  
    <name>mapred.capacity-scheduler.queue.hadoop.capacity</name>  
    <value>30</value>  
    <description>Percentage of the number of slots in the cluster that are to be available for jobs in this queue.  
    </description>      
  </property>  
    
  <property>  
    <name>mapred.capacity-scheduler.queue.hadoop.maximum-capacity</name>  
    <value>-1</value>  
    <description>  
    </description>      
  </property>  
    
  <property>  
    <name>mapred.capacity-scheduler.queue.hadoop.supports-priority</name>  
    <value>true</value>  
    <description></description>  
  </property>  
    
    <property>  
    <name>mapred.capacity-scheduler.queue.hadoop.minimum-user-limit-percent</name>  
    <value>100</value>  
    <description> </description>  
  </property>  
  
  <property>  
    <name>mapred.capacity-scheduler.queue.hadoop.user-limit-factor</name>  
    <value>3</value>  
    <description></description>  
  </property>  
  
  <property>  
    <name>mapred.capacity-scheduler.queue.hadoop.maximum-initialized-active-tasks</name>  
    <value>200000</value>  
    <description></description>  
  </property>  
  
  <property>  
    <name>mapred.capacity-scheduler.queue.hadoop.maximum-initialized-active-tasks-per-user</name>  
    <value>100000</value>  
    <description></description>  
  </property>  
    
  <property>  
    <name>mapred.capacity-scheduler.queue.hadoop.init-accept-jobs-factor</name>  
    <value>10</value>  
    <description></description>  
  </property>  
  
  <property>  
    <name>mapred.capacity-scheduler.default-maximum-initialized-jobs-per-user</name>  
    <value>5</value>  
    <description>The maximum number of jobs to be pre-initialized for a user  
    of the job queue.  
    </description>  
  </property>  
    
<!-- hive -->  
<property>  
    <name>mapred.capacity-scheduler.queue.hive.capacity</name>  
    <value>30</value>  
    <description></description>      
  </property>  
    
  <property>  
    <name>mapred.capacity-scheduler.queue.hive.maximum-capacity</name>  
    <value>-1</value>  
    <description></description>      
  </property>  
    
  <property>  
    <name>mapred.capacity-scheduler.queue.hive.supports-priority</name>  
    <value>true</value>  
    <description>If true, priorities of jobs will be taken into account in scheduling decisions.  
    </description>  
  </property>  
    
    <property>  
    <name>mapred.capacity-scheduler.queue.hive.minimum-user-limit-percent</name>  
    <value>100</value>  
    <description></description>  
  </property>  
  
  <property>  
    <name>mapred.capacity-scheduler.queue.hive.user-limit-factor</name>  
    <value>4</value>  
    <description>The multiple of the queue capacity which can be configured to allow a single user to acquire more slots.  
    </description>  
  </property>  
  
  <property>  
    <name>mapred.capacity-scheduler.queue.hive.maximum-initialized-active-tasks</name>  
    <value>200000</value>  
    <description></description>  
  </property>  
  
  <property>  
    <name>mapred.capacity-scheduler.queue.hive.maximum-initialized-active-tasks-per-user</name>  
    <value>100000</value>  
    <description></description>  
  </property>  
    
  <property>  
    <name>mapred.capacity-scheduler.queue.hive.init-accept-jobs-factor</name>  
    <value>10</value>  
    <description></description>  
  </property>  
  
<!-- default -->   
  <property>  
    <name>mapred.capacity-scheduler.queue.default.capacity</name>  
    <value>40</value>  
    <description></description>      
  </property>  
    
  <property>  
    <name>mapred.capacity-scheduler.queue.default.maximum-capacity</name>  
    <value>-1</value>  
    <description></description>      
  </property>  
    
  <property>  
    <name>mapred.capacity-scheduler.queue.default.supports-priority</name>  
    <value>true</value>  
    <description></description>  
  </property>  
  
  <property>  
    <name>mapred.capacity-scheduler.queue.default.minimum-user-limit-percent</name>  
    <value>100</value>  
    <description></description>  
  </property>  
    
  <property>  
    <name>mapred.capacity-scheduler.queue.default.user-limit-factor</name>  
    <value>4</value>  
    <description></description>  
  </property>  
  
  <property>  
    <name>mapred.capacity-scheduler.queue.default.maximum-initialized-active-tasks</name>  
    <value>200000</value>  
    <description></description>  
  </property>  
  
  <property>  
    <name>mapred.capacity-scheduler.queue.default.maximum-initialized-active-tasks-per-user</name>  
    <value>100000</value>  
    <description></description>  
  </property>  
  
  <property>  
    <name>mapred.capacity-scheduler.queue.default.init-accept-jobs-factor</name>  
    <value>10</value>  
    <description></description>  
  </property>  
  
</configuration>
复制代码

保存文件后，重启jobtracker
以后修改capacity-scheduler.xml文件后只需要执行命令hadoop mradmin -refreshQueues 就可以重新加载配置项。

4. 最后，如何使用该队列呢:
mapreduce:在Job的代码中，设置Job属于的队列,例如hive：

conf.setQueueName("hive");
复制代码

hive:在执行hive任务时，设置hive属于的队列,例如hive:

set mapred.job.queue.name=hive;
复制代码

设置队列的任务名称

set mapred.job.name=hadooptest;
复制代码

设置队列的优先级别

set mapred.job.priority=HIGH;
复制代码

本文转载自：http://blog.csdn.net/jiedushi/article/details/7920455

图文精华

hadoop Capacity Scheduler计算能力调度器配置

最佳新人

活跃会员

突出贡献

论坛元老

推荐 /2