分享

hadoop Capacity Scheduler计算能力调度器配置

问题导读
1、Capacity Scheduler支持哪些特性?
2、执行什么命令可以重新加载配置项?
3、如何使用队列?




计算能力调度器介绍
Capacity Scheduler支持以下特性:
(1) 计算能力保证。支持多个队列,某个作业可被提交到某一个队列中。每个队列会配置一定比例的计算资源,且所有提交到队列中的作业共享该队列中的资源。
(2) 灵活性。空闲资源会被分配给那些未达到资源使用上限的队列,当某个未达到资源的队列需要资源时,一旦出现空闲资源资源,便会分配给他们。
(3) 支持优先级。队列支持作业优先级调度(默认是FIFO)
(4) 多重租赁。综合考虑多种约束防止单个作业、用户或者队列独占队列或者集群中的资源。
(5) 基于资源的调度。 支持资源密集型作业,允许作业使用的资源量高于默认值,进而可容纳不同资源需求的作业。不过,当前仅支持内存资源的调度。

配置方法为
1. 复制$HADOOP_HOME/contrib/capacity-scheduler/hadoop-capacity-scheduler.jar  到$HADOOP_HOME/lib目录中

2. 修改namenode节点中的conf/mapred-site.xml文件
  1. [html] view plaincopyprint?
  2. <property>  
  3.   <name>mapred.jobtracker.taskScheduler</name>  
  4.   <value>org.apache.hadoop.mapred.CapacityTaskScheduler</value>  
  5. </property>  
  6. <property>  
  7.   <name>mapred.queue.names</name>  
  8.   <value>default,hadoop,hive</value>  
  9. </property>  
  10. 3. 修改conf/capacity-scheduler.xml 配置文件
  11. [html] view plaincopyprint?
  12. <?xml version="1.0"?>  
  13.   
  14. <!-- This is the configuration file for the resource manager in Hadoop. -->  
  15. <!-- You can configure various scheduling parameters related to queues. -->  
  16. <!-- The properties for a queue follow a naming convention,such as, -->  
  17. <!-- mapred.capacity-scheduler.queue.<queue-name>.property-name. -->  
  18.   
  19. <configuration>  
  20.   <!-- Capacity scheduler Job Initialization configuration parameters -->  
  21.   <property>  
  22.     <name>mapred.capacity-scheduler.init-poll-interval</name>  
  23.     <value>5000</value>  
  24.     <description>The amount of time in miliseconds which is used to poll the job queues for jobs to initialize.  
  25.     </description>  
  26.   </property>  
  27.   <property>  
  28.     <name>mapred.capacity-scheduler.init-worker-threads</name>  
  29.     <value>5</value>  
  30.     <description>Number of worker threads which would be used by  
  31.     Initialization poller to initialize jobs in a set of queue.  
  32.     If number mentioned in property is equal to number of job queues  
  33.     then a single thread would initialize jobs in a queue. If lesser  
  34.     then a thread would get a set of queues assigned. If the number  
  35.     is greater then number of threads would be equal to number of   
  36.     job queues.  
  37.     </description>  
  38.   </property>  
  39.   
  40.   <property>   
  41.      <name>mapred.capacity-scheduler.maximum-system-jobs</name>   
  42.      <value>30</value>   
  43.      <description>Maximum number of jobs in the system which can be initialized,   
  44. concurrently, by the Capacity Scheduler.   
  45.      </description>   
  46.   </property>   
  47.   
  48. <!--hadoop queue-->  
  49.   <property>  
  50.     <name>mapred.capacity-scheduler.queue.hadoop.capacity</name>  
  51.     <value>30</value>  
  52.     <description>Percentage of the number of slots in the cluster that are to be available for jobs in this queue.  
  53.     </description>      
  54.   </property>  
  55.    
  56.   <property>  
  57.     <name>mapred.capacity-scheduler.queue.hadoop.maximum-capacity</name>  
  58.     <value>-1</value>  
  59.     <description>  
  60.     </description>      
  61.   </property>  
  62.    
  63.   <property>  
  64.     <name>mapred.capacity-scheduler.queue.hadoop.supports-priority</name>  
  65.     <value>true</value>  
  66.     <description></description>  
  67.   </property>  
  68.    
  69.     <property>  
  70.     <name>mapred.capacity-scheduler.queue.hadoop.minimum-user-limit-percent</name>  
  71.     <value>100</value>  
  72.     <description> </description>  
  73.   </property>  
  74.   
  75.   <property>  
  76.     <name>mapred.capacity-scheduler.queue.hadoop.user-limit-factor</name>  
  77.     <value>3</value>  
  78.     <description></description>  
  79.   </property>  
  80.   
  81.   <property>  
  82.     <name>mapred.capacity-scheduler.queue.hadoop.maximum-initialized-active-tasks</name>  
  83.     <value>200000</value>  
  84.     <description></description>  
  85.   </property>  
  86.   
  87.   <property>  
  88.     <name>mapred.capacity-scheduler.queue.hadoop.maximum-initialized-active-tasks-per-user</name>  
  89.     <value>100000</value>  
  90.     <description></description>  
  91.   </property>  
  92.    
  93.   <property>  
  94.     <name>mapred.capacity-scheduler.queue.hadoop.init-accept-jobs-factor</name>  
  95.     <value>10</value>  
  96.     <description></description>  
  97.   </property>  
  98.   
  99.   <property>  
  100.     <name>mapred.capacity-scheduler.default-maximum-initialized-jobs-per-user</name>  
  101.     <value>5</value>  
  102.     <description>The maximum number of jobs to be pre-initialized for a user  
  103.     of the job queue.  
  104.     </description>  
  105.   </property>  
  106.    
  107. <!-- hive -->  
  108. <property>  
  109.     <name>mapred.capacity-scheduler.queue.hive.capacity</name>  
  110.     <value>30</value>  
  111.     <description></description>      
  112.   </property>  
  113.    
  114.   <property>  
  115.     <name>mapred.capacity-scheduler.queue.hive.maximum-capacity</name>  
  116.     <value>-1</value>  
  117.     <description></description>      
  118.   </property>  
  119.    
  120.   <property>  
  121.     <name>mapred.capacity-scheduler.queue.hive.supports-priority</name>  
  122.     <value>true</value>  
  123.     <description>If true, priorities of jobs will be taken into account in scheduling decisions.  
  124.     </description>  
  125.   </property>  
  126.    
  127.     <property>  
  128.     <name>mapred.capacity-scheduler.queue.hive.minimum-user-limit-percent</name>  
  129.     <value>100</value>  
  130.     <description></description>  
  131.   </property>  
  132.   
  133.   <property>  
  134.     <name>mapred.capacity-scheduler.queue.hive.user-limit-factor</name>  
  135.     <value>4</value>  
  136.     <description>The multiple of the queue capacity which can be configured to allow a single user to acquire more slots.  
  137.     </description>  
  138.   </property>  
  139.   
  140.   <property>  
  141.     <name>mapred.capacity-scheduler.queue.hive.maximum-initialized-active-tasks</name>  
  142.     <value>200000</value>  
  143.     <description></description>  
  144.   </property>  
  145.   
  146.   <property>  
  147.     <name>mapred.capacity-scheduler.queue.hive.maximum-initialized-active-tasks-per-user</name>  
  148.     <value>100000</value>  
  149.     <description></description>  
  150.   </property>  
  151.    
  152.   <property>  
  153.     <name>mapred.capacity-scheduler.queue.hive.init-accept-jobs-factor</name>  
  154.     <value>10</value>  
  155.     <description></description>  
  156.   </property>  
  157.   
  158. <!-- default -->   
  159.   <property>  
  160.     <name>mapred.capacity-scheduler.queue.default.capacity</name>  
  161.     <value>40</value>  
  162.     <description></description>      
  163.   </property>  
  164.    
  165.   <property>  
  166.     <name>mapred.capacity-scheduler.queue.default.maximum-capacity</name>  
  167.     <value>-1</value>  
  168.     <description></description>      
  169.   </property>  
  170.    
  171.   <property>  
  172.     <name>mapred.capacity-scheduler.queue.default.supports-priority</name>  
  173.     <value>true</value>  
  174.     <description></description>  
  175.   </property>  
  176.   
  177.   <property>  
  178.     <name>mapred.capacity-scheduler.queue.default.minimum-user-limit-percent</name>  
  179.     <value>100</value>  
  180.     <description></description>  
  181.   </property>  
  182.    
  183.   <property>  
  184.     <name>mapred.capacity-scheduler.queue.default.user-limit-factor</name>  
  185.     <value>4</value>  
  186.     <description></description>  
  187.   </property>  
  188.   
  189.   <property>  
  190.     <name>mapred.capacity-scheduler.queue.default.maximum-initialized-active-tasks</name>  
  191.     <value>200000</value>  
  192.     <description></description>  
  193.   </property>  
  194.   
  195.   <property>  
  196.     <name>mapred.capacity-scheduler.queue.default.maximum-initialized-active-tasks-per-user</name>  
  197.     <value>100000</value>  
  198.     <description></description>  
  199.   </property>  
  200.   
  201.   <property>  
  202.     <name>mapred.capacity-scheduler.queue.default.init-accept-jobs-factor</name>  
  203.     <value>10</value>  
  204.     <description></description>  
  205.   </property>  
  206.   
  207. </configuration>
复制代码


保存文件后,重启jobtracker  
以后修改capacity-scheduler.xml文件后只需要执行命令hadoop mradmin -refreshQueues 就可以重新加载配置项。

4. 最后,如何使用该队列呢:
mapreduce:在Job的代码中,设置Job属于的队列,例如hive:
  1. conf.setQueueName("hive");
复制代码


hive:在执行hive任务时,设置hive属于的队列,例如hive:
  1. set mapred.job.queue.name=hive;
复制代码


设置队列的任务名称
  1. set mapred.job.name=hadooptest;
复制代码


设置队列的优先级别
  1. set mapred.job.priority=HIGH;
复制代码




本文转载自:http://blog.csdn.net/jiedushi/article/details/7920455


没找到任何评论,期待你打破沉寂

您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

关闭

推荐上一条 /2 下一条