分享

Ganglia监控Hadoop及Hbase集群性能(安装配置)

xioaxu790 发表于 2014-6-15 07:56:41 [显示全部楼层] 只看大图 回帖奖励 阅读模式 关闭右栏 14 87849
本帖最后由 xioaxu790 于 2014-6-15 07:59 编辑
问题导读:
1、如何安装Ganglia监控Hadoop及Hbase集群?


2、它的效能如何?





1. 在主节点上安装ganglia-webfrontend和ganglia-monitor

  1. sudo apt-get install ganglia-webfrontend ganglia-monitor
复制代码
在主节点上安装ganglia-webfrontend和ganglia-monitor。在其他监视节点上,只需要安装ganglia-monitor即可
将ganglia的文件链接到apache的默认目录下
  1. sudo ln -s /usr/share/ganglia-webfrontend /var/www/ganglia
复制代码

2. 安装ganglia-monitor
在其他监视节点上,只需要安装ganglia-monitor
  1. sudo apt-get install ganglia-monitor
复制代码

3. Ganglia配置
gmond.conf
在每个节点上都需要配置/etc/ganglia/gmond.conf,配置相同如下所示
  1. sudo vim /etc/ganglia/gmond.conf
复制代码

修改后的/etc/ganglia/gmond.conf
  1. globals {                    
  2.   daemonize = yes  ##以后台的方式运行            
  3.   setuid = yes            
  4.   user = ganglia     #运行Ganglia的用户              
  5.   debug_level = 0               
  6.   max_udp_msg_len = 1472        
  7.   mute = no            
  8.   deaf = no            
  9.   host_dmax = 0 /*secs */
  10.   cleanup_threshold = 300 /*secs */
  11.   gexec = no            
  12.   send_metadata_interval = 10     #发送数据的时间间隔
  13. }
  14. /* If a cluster attribute is specified, then all gmond hosts are wrapped inside
  15. * of a <CLUSTER> tag.  If you do not specify a cluster tag, then all <HOSTS> will
  16. * NOT be wrapped inside of a <CLUSTER> tag. */
  17. cluster {
  18.   name = "hadoop-cluster"         #集群名称
  19.   owner = "ganglia"               #运行Ganglia的用户
  20.   latlong = "unspecified"
  21.   url = "unspecified"
  22. }
  23. /* The host section describes attributes of the host, like the location */
  24. host {
  25.   location = "unspecified"
  26. }
  27. /* Feel free to specify as many udp_send_channels as you like.  Gmond
  28.    used to only support having a single channel */
  29. udp_send_channel {
  30.   #mcast_join = 239.2.11.71     #注释掉组播
  31.   host = master                 #发送给安装gmetad的机器
  32.   port = 8649                   #监听端口
  33.   ttl = 1
  34. }
  35. /* You can specify as many udp_recv_channels as you like as well. */
  36. udp_recv_channel {
  37.   #mcast_join = 239.2.11.71     #注释掉组播
  38.   port = 8649
  39.   #bind = 239.2.11.71
  40. }
  41. /* You can specify as many tcp_accept_channels as you like to share
  42.    an xml description of the state of the cluster */
  43. tcp_accept_channel {
  44.   port = 8649
  45. }
复制代码

gmetad.conf
在主节点上还需要配置/etc/ganglia/gmetad.conf,这里面的名字hadoop-cluster和上面gmond.conf中name应该一致。 
/etc/ganglia/gmetad.conf
  1. sudo vim /etc/ganglia/gmetad.conf
复制代码
修改为以下内容
  1. data_source "hadoop-cluster" 10 master:8649 slave:8649
  2. setuid_username "nobody"
  3. rrd_rootdir "/var/lib/ganglia/rrds"
  4. gridname "hadoop-cluster"
  5. 注:master:8649 slave:8649为要监听的主机和端口,data_source中hadoop-cluster与gmond.conf中name一致
复制代码


4. Hadoop配置
在所有hadoop所在的节点,均需要配置hadoop-metrics2.properties,配置如下:
  1. #   Licensed to the Apache Software Foundation (ASF) under one or more
  2. #   contributor license agreements.  See the NOTICE file distributed with
  3. #   this work for additional information regarding copyright ownership.
  4. #   The ASF licenses this file to You under the Apache License, Version 2.0
  5. #   (the "License"); you may not use this file except in compliance with
  6. #   the License.  You may obtain a copy of the License at
  7. #
  8. #       http://www.apache.org/licenses/LICENSE-2.0
  9. #
  10. #   Unless required by applicable law or agreed to in writing, software
  11. #   distributed under the License is distributed on an "AS IS" BASIS,
  12. #   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  13. #   See the License for the specific language governing permissions and
  14. #   limitations under the License.
  15. #
  16. # syntax: [prefix].[source|sink].[instance].[options]
  17. # See javadoc of package-info.java for org.apache.hadoop.metrics2 for details
  18. #注释掉以前原有配置
  19. #*.sink.file.class=org.apache.hadoop.metrics2.sink.FileSink
  20. # default sampling period, in seconds
  21. #*.period=10
  22. # The namenode-metrics.out will contain metrics from all context
  23. #namenode.sink.file.filename=namenode-metrics.out
  24. # Specifying a special sampling period for namenode:
  25. #namenode.sink.*.period=8
  26. #datanode.sink.file.filename=datanode-metrics.out
  27. # the following example split metrics of different
  28. # context to different sinks (in this case files)
  29. #jobtracker.sink.file_jvm.context=jvm
  30. #jobtracker.sink.file_jvm.filename=jobtracker-jvm-metrics.out
  31. #jobtracker.sink.file_mapred.context=mapred
  32. #jobtracker.sink.file_mapred.filename=jobtracker-mapred-metrics.out
  33. #tasktracker.sink.file.filename=tasktracker-metrics.out
  34. #maptask.sink.file.filename=maptask-metrics.out
  35. #reducetask.sink.file.filename=reducetask-metrics.out
  36. *.sink.ganglia.class=org.apache.hadoop.metrics2.sink.ganglia.GangliaSink31  
  37. *.sink.ganglia.period=10
  38. *.sink.ganglia.slope=jvm.metrics.gcCount=zero,jvm.metrics.memHeapUsedM=both  
  39. *.sink.ganglia.dmax=jvm.metrics.threadsBlocked=70,jvm.metrics.memHeapUsedM=40  
  40. namenode.sink.ganglia.servers=master:8649  
  41. resourcemanager.sink.ganglia.servers=master:8649  
  42. datanode.sink.ganglia.servers=master:8649   
  43. nodemanager.sink.ganglia.servers=master:8649   
  44. maptask.sink.ganglia.servers=master:8649   
  45. reducetask.sink.ganglia.servers=master:8649
复制代码


5. Hbase配置
在所有的hbase节点中均配置hadoop-metrics2-hbase.properties,配置如下:
  1. # syntax: [prefix].[source|sink].[instance].[options]
  2. # See javadoc of package-info.java for org.apache.hadoop.metrics2 for details
  3. #*.sink.file*.class=org.apache.hadoop.metrics2.sink.FileSink
  4. # default sampling period
  5. #*.period=10
  6. # Below are some examples of sinks that could be used
  7. # to monitor different hbase daemons.
  8. # hbase.sink.file-all.class=org.apache.hadoop.metrics2.sink.FileSink
  9. # hbase.sink.file-all.filename=all.metrics
  10. # hbase.sink.file0.class=org.apache.hadoop.metrics2.sink.FileSink
  11. # hbase.sink.file0.context=hmaster
  12. # hbase.sink.file0.filename=master.metrics
  13. # hbase.sink.file1.class=org.apache.hadoop.metrics2.sink.FileSink
  14. # hbase.sink.file1.context=thrift-one
  15. # hbase.sink.file1.filename=thrift-one.metrics
  16. # hbase.sink.file2.class=org.apache.hadoop.metrics2.sink.FileSink
  17. # hbase.sink.file2.context=thrift-two
  18. # hbase.sink.file2.filename=thrift-one.metrics
  19. # hbase.sink.file3.class=org.apache.hadoop.metrics2.sink.FileSink
  20. # hbase.sink.file3.context=rest
  21. # hbase.sink.file3.filename=rest.metrics
  22. *.sink.ganglia.class=org.apache.hadoop.metrics2.sink.ganglia.GangliaSink31  
  23. *.sink.ganglia.period=10  
  24. hbase.sink.ganglia.period=10  
  25. hbase.sink.ganglia.servers=master:8649
复制代码


6. 启动hadoop、hbase集群
  1. start-dfs.sh
  2. start-yarn.sh
  3. start-hbase.sh
复制代码


7. 启动Ganglia
先需要重启hadoop和hbase 。在各个节点上启动gmond服务,主节点还需要启动gmetad服务。
使用apt-get方式安装的Ganglia,可以直接用service方式启动。
  1. sudo service ganglia-monitor start(每台机器都需要启动)
  2. sudo service gmetad start(在安装了ganglia-webfrontend的机器上启动)
复制代码


8. 检验
登录浏览器查看:http://master/ganglia,如果Hosts up为9即表示安装成功。
若安装不成功,有几个很有用的调试命令:
以调试模式启动gmetad:gmetad -d 9
查看gmetad收集到的XML文件:telnet master 8649


9. 截图

444444.png


333333333.png


master节点gmetad.conf配置
  1. # This is an example of a Ganglia Meta Daemon configuration file
  2. #                http://ganglia.sourceforge.net/
  3. #
  4. #
  5. #-------------------------------------------------------------------------------
  6. # Setting the debug_level to 1 will keep daemon in the forground and
  7. # show only error messages. Setting this value higher than 1 will make
  8. # gmetad output debugging information and stay in the foreground.
  9. # default: 0
  10. # debug_level 10
  11. #
  12. #-------------------------------------------------------------------------------
  13. # What to monitor. The most important section of this file.
  14. #
  15. # The data_source tag specifies either a cluster or a grid to
  16. # monitor. If we detect the source is a cluster, we will maintain a complete
  17. # set of RRD databases for it, which can be used to create historical
  18. # graphs of the metrics. If the source is a grid (it comes from another gmetad),
  19. # we will only maintain summary RRDs for it.
  20. #
  21. # Format:
  22. # data_source "my cluster" [polling interval] address1:port addreses2:port ...
  23. #
  24. # The keyword 'data_source' must immediately be followed by a unique
  25. # string which identifies the source, then an optional polling interval in
  26. # seconds. The source will be polled at this interval on average.
  27. # If the polling interval is omitted, 15sec is asssumed.
  28. #
  29. # If you choose to set the polling interval to something other than the default,
  30. # note that the web frontend determines a host as down if its TN value is less
  31. # than 4 * TMAX (20sec by default).  Therefore, if you set the polling interval
  32. # to something around or greater than 80sec, this will cause the frontend to
  33. # incorrectly display hosts as down even though they are not.
  34. #
  35. # A list of machines which service the data source follows, in the
  36. # format ip:port, or name:port. If a port is not specified then 8649
  37. # (the default gmond port) is assumed.
  38. # default: There is no default value
  39. #
  40. # data_source "my cluster" 10 localhost  my.machine.edu:8649  1.2.3.5:8655
  41. # data_source "my grid" 50 1.3.4.7:8655 grid.org:8651 grid-backup.org:8651
  42. # data_source "another source" 1.3.4.7:8655  1.3.4.8
  43. data_source "hadoop-cluster" 10 master:8649 slave:8649
  44. setuid_username "nobody"
  45. rrd_rootdir "/var/lib/ganglia/rrds"
  46. gridname "hadoop-cluster"
  47. #
  48. # Round-Robin Archives
  49. # You can specify custom Round-Robin archives here (defaults are listed below)
  50. #
  51. # Old Default RRA: Keep 1 hour of metrics at 15 second resolution. 1 day at 6 minute
  52. # RRAs "RRA:AVERAGE:0.5:1:244" "RRA:AVERAGE:0.5:24:244" "RRA:AVERAGE:0.5:168:244" "RRA:AVERAGE:0.5:672:244" \
  53. #      "RRA:AVERAGE:0.5:5760:374"
  54. # New Default RRA
  55. # Keep 5856 data points at 15 second resolution assuming 15 second (default) polling. That's 1 day
  56. # Two weeks of data points at 1 minute resolution (average)
  57. #RRAs "RRA:AVERAGE:0.5:1:5856" "RRA:AVERAGE:0.5:4:20160" "RRA:AVERAGE:0.5:40:52704"
  58. #
  59. #-------------------------------------------------------------------------------
  60. # Scalability mode. If on, we summarize over downstream grids, and respect
  61. # authority tags. If off, we take on 2.5.0-era behavior: we do not wrap our output
  62. # in <GRID></GRID> tags, we ignore all <GRID> tags we see, and always assume
  63. # we are the "authority" on data source feeds. This approach does not scale to
  64. # large groups of clusters, but is provided for backwards compatibility.
  65. # default: on
  66. # scalable off
  67. #
  68. #-------------------------------------------------------------------------------
  69. # The name of this Grid. All the data sources above will be wrapped in a GRID
  70. # tag with this name.
  71. # default: unspecified
  72. # gridname "MyGrid"
  73. #
  74. #-------------------------------------------------------------------------------
  75. # The authority URL for this grid. Used by other gmetads to locate graphs
  76. # for our data sources. Generally points to a ganglia/
  77. # website on this machine.
  78. # default: "http://hostname/ganglia/",
  79. #   where hostname is the name of this machine, as defined by gethostname().
  80. # authority "http://mycluster.org/newprefix/"
  81. #
  82. #-------------------------------------------------------------------------------
  83. # List of machines this gmetad will share XML with. Localhost
  84. # is always trusted.
  85. # default: There is no default value
  86. # trusted_hosts 127.0.0.1 169.229.50.165 my.gmetad.org
  87. #
  88. #-------------------------------------------------------------------------------
  89. # If you want any host which connects to the gmetad XML to receive
  90. # data, then set this value to "on"
  91. # default: off
  92. # all_trusted on
  93. #
  94. #-------------------------------------------------------------------------------
  95. # If you don't want gmetad to setuid then set this to off
  96. # default: on
  97. # setuid off
  98. #
  99. #-------------------------------------------------------------------------------
  100. # User gmetad will setuid to (defaults to "nobody")
  101. # default: "nobody"
  102. # setuid_username "nobody"
  103. #
  104. #-------------------------------------------------------------------------------
  105. # Umask to apply to created rrd files and grid directory structure
  106. # default: 0 (files are public)
  107. # umask 022
  108. #
  109. #-------------------------------------------------------------------------------
  110. # The port gmetad will answer requests for XML
  111. # default: 8651
  112. # xml_port 8651
  113. #
  114. #-------------------------------------------------------------------------------
  115. # The port gmetad will answer queries for XML. This facility allows
  116. # simple subtree and summation views of the XML tree.
  117. # default: 8652
  118. # interactive_port 8652
  119. #
  120. #-------------------------------------------------------------------------------
  121. # The number of threads answering XML requests
  122. # default: 4
  123. # server_threads 10
  124. #
  125. #-------------------------------------------------------------------------------
  126. # Where gmetad stores its round-robin databases
  127. # default: "/var/lib/ganglia/rrds"
  128. # rrd_rootdir "/some/other/place"
  129. #
  130. #-------------------------------------------------------------------------------
  131. # In earlier versions of gmetad, hostnames were handled in a case
  132. # sensitive manner
  133. # If your hostname directories have been renamed to lower case,
  134. # set this option to 0 to disable backward compatibility.
  135. # From version 3.2, backwards compatibility will be disabled by default.
  136. # default: 1   (for gmetad < 3.2)
  137. # default: 0   (for gmetad >= 3.2)
  138. case_sensitive_hostnames 0
  139. #-------------------------------------------------------------------------------
  140. # It is now possible to export all the metrics collected by gmetad directly to
  141. # graphite by setting the following attributes.
  142. #
  143. # The hostname or IP address of the Graphite server
  144. # default: unspecified
  145. # carbon_server "my.graphite.box"
  146. #
  147. # The port on which Graphite is listening
  148. # default: 2003
  149. # carbon_port 2003
  150. #
  151. # A prefix to prepend to the metric names exported by gmetad. Graphite uses dot-
  152. # separated paths to organize and refer to metrics.
  153. # default: unspecified
  154. # graphite_prefix "datacenter1.gmetad"
  155. #
  156. # Number of milliseconds gmetad will wait for a response from the graphite server
  157. # default: 500
  158. # carbon_timeout 500
  159. #
  160. master-gmond.conf.md Raw
复制代码

master节点gmond.conf配置
  1. /* This configuration is as close to 2.5.x default behavior as possible
  2.    The values closely match ./gmond/metric.h definitions in 2.5.x */
  3. globals {                    
  4.   daemonize = yes              
  5.   setuid = yes            
  6.   user = ganglia              
  7.   debug_level = 0               
  8.   max_udp_msg_len = 1472        
  9.   mute = no            
  10.   deaf = no            
  11.   host_dmax = 0 /*secs */
  12.   cleanup_threshold = 300 /*secs */
  13.   gexec = no            
  14.   send_metadata_interval = 10
  15. }
  16. /* If a cluster attribute is specified, then all gmond hosts are wrapped inside
  17. * of a <CLUSTER> tag.  If you do not specify a cluster tag, then all <HOSTS> will
  18. * NOT be wrapped inside of a <CLUSTER> tag. */
  19. cluster {
  20.   name = "hadoop-cluster"
  21.   owner = "ganglia"
  22.   latlong = "unspecified"
  23.   url = "unspecified"
  24. }
  25. /* The host section describes attributes of the host, like the location */
  26. host {
  27.   location = "unspecified"
  28. }
  29. /* Feel free to specify as many udp_send_channels as you like.  Gmond
  30.    used to only support having a single channel */
  31. udp_send_channel {
  32.   #mcast_join = 239.2.11.71
  33.   host = master
  34.   port = 8649
  35.   ttl = 1
  36. }
  37. /* You can specify as many udp_recv_channels as you like as well. */
  38. udp_recv_channel {
  39.   #mcast_join = 239.2.11.71
  40.   port = 8649
  41.   #bind = 239.2.11.71
  42. }
  43. /* You can specify as many tcp_accept_channels as you like to share
  44.    an xml description of the state of the cluster */
  45. tcp_accept_channel {
  46.   port = 8649
  47. }
  48. /* Each metrics module that is referenced by gmond must be specified and
  49.    loaded. If the module has been statically linked with gmond, it does not
  50.    require a load path. However all dynamically loadable modules must include
  51.    a load path. */
  52. modules {
  53.   module {
  54.     name = "core_metrics"
  55.   }
  56.   module {
  57.     name = "cpu_module"
  58.     path = "/usr/lib/ganglia/modcpu.so"
  59.   }
  60.   module {
  61.     name = "disk_module"
  62.     path = "/usr/lib/ganglia/moddisk.so"
  63.   }
  64.   module {
  65.     name = "load_module"
  66.     path = "/usr/lib/ganglia/modload.so"
  67.   }
  68.   module {
  69.     name = "mem_module"
  70.     path = "/usr/lib/ganglia/modmem.so"
  71.   }
  72.   module {
  73.     name = "net_module"
  74.     path = "/usr/lib/ganglia/modnet.so"
  75.   }
  76.   module {
  77.     name = "proc_module"
  78.     path = "/usr/lib/ganglia/modproc.so"
  79.   }
  80.   module {
  81.     name = "sys_module"
  82.     path = "/usr/lib/ganglia/modsys.so"
  83.   }
  84. }
  85. include ('/etc/ganglia/conf.d/*.conf')
  86. /* The old internal 2.5.x metric array has been replaced by the following
  87.    collection_group directives.  What follows is the default behavior for
  88.    collecting and sending metrics that is as close to 2.5.x behavior as
  89.    possible. */
  90. /* This collection group will cause a heartbeat (or beacon) to be sent every
  91.    20 seconds.  In the heartbeat is the GMOND_STARTED data which expresses
  92.    the age of the running gmond. */
  93. collection_group {
  94.   collect_once = yes
  95.   time_threshold = 20
  96.   metric {
  97.     name = "heartbeat"
  98.   }
  99. }
  100. /* This collection group will send general info about this host every 1200 secs.
  101.    This information doesn't change between reboots and is only collected once. */
  102. collection_group {
  103.   collect_once = yes
  104.   time_threshold = 1200
  105.   metric {
  106.     name = "cpu_num"
  107.     title = "CPU Count"
  108.   }
  109.   metric {
  110.     name = "cpu_speed"
  111.     title = "CPU Speed"
  112.   }
  113.   metric {
  114.     name = "mem_total"
  115.     title = "Memory Total"
  116.   }
  117.   /* Should this be here? Swap can be added/removed between reboots. */
  118.   metric {
  119.     name = "swap_total"
  120.     title = "Swap Space Total"
  121.   }
  122.   metric {
  123.     name = "boottime"
  124.     title = "Last Boot Time"
  125.   }
  126.   metric {
  127.     name = "machine_type"
  128.     title = "Machine Type"
  129.   }
  130.   metric {
  131.     name = "os_name"
  132.     title = "Operating System"
  133.   }
  134.   metric {
  135.     name = "os_release"
  136.     title = "Operating System Release"
  137.   }
  138.   metric {
  139.     name = "location"
  140.     title = "Location"
  141.   }
  142. }
  143. /* This collection group will send the status of gexecd for this host every 300 secs */
  144. /* Unlike 2.5.x the default behavior is to report gexecd OFF.  */
  145. collection_group {
  146.   collect_once = yes
  147.   time_threshold = 300
  148.   metric {
  149.     name = "gexec"
  150.     title = "Gexec Status"
  151.   }
  152. }
  153. /* This collection group will collect the CPU status info every 20 secs.
  154.    The time threshold is set to 90 seconds.  In honesty, this time_threshold could be
  155.    set significantly higher to reduce unneccessary network chatter. */
  156. collection_group {
  157.   collect_every = 20
  158.   time_threshold = 90
  159.   /* CPU status */
  160.   metric {
  161.     name = "cpu_user"  
  162.     value_threshold = "1.0"
  163.     title = "CPU User"
  164.   }
  165.   metric {
  166.     name = "cpu_system"   
  167.     value_threshold = "1.0"
  168.     title = "CPU System"
  169.   }
  170.   metric {
  171.     name = "cpu_idle"  
  172.     value_threshold = "5.0"
  173.     title = "CPU Idle"
  174.   }
  175.   metric {
  176.     name = "cpu_nice"  
  177.     value_threshold = "1.0"
  178.     title = "CPU Nice"
  179.   }
  180.   metric {
  181.     name = "cpu_aidle"
  182.     value_threshold = "5.0"
  183.     title = "CPU aidle"
  184.   }
  185.   metric {
  186.     name = "cpu_wio"
  187.     value_threshold = "1.0"
  188.     title = "CPU wio"
  189.   }
  190.   /* The next two metrics are optional if you want more detail...
  191.      ... since they are accounted for in cpu_system.  
  192.   metric {
  193.     name = "cpu_intr"
  194.     value_threshold = "1.0"
  195.     title = "CPU intr"
  196.   }
  197.   metric {
  198.     name = "cpu_sintr"
  199.     value_threshold = "1.0"
  200.     title = "CPU sintr"
  201.   }
  202.   */
  203. }
  204. collection_group {
  205.   collect_every = 20
  206.   time_threshold = 90
  207.   /* Load Averages */
  208.   metric {
  209.     name = "load_one"
  210.     value_threshold = "1.0"
  211.     title = "One Minute Load Average"
  212.   }
  213.   metric {
  214.     name = "load_five"
  215.     value_threshold = "1.0"
  216.     title = "Five Minute Load Average"
  217.   }
  218.   metric {
  219.     name = "load_fifteen"
  220.     value_threshold = "1.0"
  221.     title = "Fifteen Minute Load Average"
  222.   }
  223. }
  224. /* This group collects the number of running and total processes */
  225. collection_group {
  226.   collect_every = 80
  227.   time_threshold = 950
  228.   metric {
  229.     name = "proc_run"
  230.     value_threshold = "1.0"
  231.     title = "Total Running Processes"
  232.   }
  233.   metric {
  234.     name = "proc_total"
  235.     value_threshold = "1.0"
  236.     title = "Total Processes"
  237.   }
  238. }
  239. /* This collection group grabs the volatile memory metrics every 40 secs and
  240.    sends them at least every 180 secs.  This time_threshold can be increased
  241.    significantly to reduce unneeded network traffic. */
  242. collection_group {
  243.   collect_every = 40
  244.   time_threshold = 180
  245.   metric {
  246.     name = "mem_free"
  247.     value_threshold = "1024.0"
  248.     title = "Free Memory"
  249.   }
  250.   metric {
  251.     name = "mem_shared"
  252.     value_threshold = "1024.0"
  253.     title = "Shared Memory"
  254.   }
  255.   metric {
  256.     name = "mem_buffers"
  257.     value_threshold = "1024.0"
  258.     title = "Memory Buffers"
  259.   }
  260.   metric {
  261.     name = "mem_cached"
  262.     value_threshold = "1024.0"
  263.     title = "Cached Memory"
  264.   }
  265.   metric {
  266.     name = "swap_free"
  267.     value_threshold = "1024.0"
  268.     title = "Free Swap Space"
  269.   }
  270. }
  271. collection_group {
  272.   collect_every = 40
  273.   time_threshold = 300
  274.   metric {
  275.     name = "bytes_out"
  276.     value_threshold = 4096
  277.     title = "Bytes Sent"
  278.   }
  279.   metric {
  280.     name = "bytes_in"
  281.     value_threshold = 4096
  282.     title = "Bytes Received"
  283.   }
  284.   metric {
  285.     name = "pkts_in"
  286.     value_threshold = 256
  287.     title = "Packets Received"
  288.   }
  289.   metric {
  290.     name = "pkts_out"
  291.     value_threshold = 256
  292.     title = "Packets Sent"
  293.   }
  294. }
  295. /* Different than 2.5.x default since the old config made no sense */
  296. collection_group {
  297.   collect_every = 1800
  298.   time_threshold = 3600
  299.   metric {
  300.     name = "disk_total"
  301.     value_threshold = 1.0
  302.     title = "Total Disk Space"
  303.   }
  304. }
  305. collection_group {
  306.   collect_every = 40
  307.   time_threshold = 180
  308.   metric {
  309.     name = "disk_free"
  310.     value_threshold = 1.0
  311.     title = "Disk Space Available"
  312.   }
  313.   metric {
  314.     name = "part_max_used"
  315.     value_threshold = 1.0
  316.     title = "Maximum Disk Space Used"
  317.   }
  318. }
  319. master-hadoop-metrics2-hbase.properties.md Raw
  320. master节点hadoop-metrics2-hbase.properties配置
  321. # syntax: [prefix].[source|sink].[instance].[options]
  322. # See javadoc of package-info.java for org.apache.hadoop.metrics2 for details
  323. #*.sink.file*.class=org.apache.hadoop.metrics2.sink.FileSink
  324. # default sampling period
  325. #*.period=10
  326. # Below are some examples of sinks that could be used
  327. # to monitor different hbase daemons.
  328. # hbase.sink.file-all.class=org.apache.hadoop.metrics2.sink.FileSink
  329. # hbase.sink.file-all.filename=all.metrics
  330. # hbase.sink.file0.class=org.apache.hadoop.metrics2.sink.FileSink
  331. # hbase.sink.file0.context=hmaster
  332. # hbase.sink.file0.filename=master.metrics
  333. # hbase.sink.file1.class=org.apache.hadoop.metrics2.sink.FileSink
  334. # hbase.sink.file1.context=thrift-one
  335. # hbase.sink.file1.filename=thrift-one.metrics
  336. # hbase.sink.file2.class=org.apache.hadoop.metrics2.sink.FileSink
  337. # hbase.sink.file2.context=thrift-two
  338. # hbase.sink.file2.filename=thrift-one.metrics
  339. # hbase.sink.file3.class=org.apache.hadoop.metrics2.sink.FileSink
  340. # hbase.sink.file3.context=rest
  341. # hbase.sink.file3.filename=rest.metrics
  342. *.sink.ganglia.class=org.apache.hadoop.metrics2.sink.ganglia.GangliaSink31  
  343. *.sink.ganglia.period=10  
  344. hbase.sink.ganglia.period=10  
  345. hbase.sink.ganglia.servers=master:8649
  346. master-hadoop-metrics2.properties.md Raw
  347. master节点hadoop-metrics2.properties配置
  348. #
  349. #   Licensed to the Apache Software Foundation (ASF) under one or more
  350. #   contributor license agreements.  See the NOTICE file distributed with
  351. #   this work for additional information regarding copyright ownership.
  352. #   The ASF licenses this file to You under the Apache License, Version 2.0
  353. #   (the "License"); you may not use this file except in compliance with
  354. #   the License.  You may obtain a copy of the License at
  355. #
  356. #       http://www.apache.org/licenses/LICENSE-2.0
  357. #
  358. #   Unless required by applicable law or agreed to in writing, software
  359. #   distributed under the License is distributed on an "AS IS" BASIS,
  360. #   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  361. #   See the License for the specific language governing permissions and
  362. #   limitations under the License.
  363. #
  364. # syntax: [prefix].[source|sink].[instance].[options]
  365. # See javadoc of package-info.java for org.apache.hadoop.metrics2 for details
  366. #*.sink.file.class=org.apache.hadoop.metrics2.sink.FileSink
  367. # default sampling period, in seconds
  368. #*.period=10
  369. # The namenode-metrics.out will contain metrics from all context
  370. #namenode.sink.file.filename=namenode-metrics.out
  371. # Specifying a special sampling period for namenode:
  372. #namenode.sink.*.period=8
  373. #datanode.sink.file.filename=datanode-metrics.out
  374. # the following example split metrics of different
  375. # context to different sinks (in this case files)
  376. #jobtracker.sink.file_jvm.context=jvm
  377. #jobtracker.sink.file_jvm.filename=jobtracker-jvm-metrics.out
  378. #jobtracker.sink.file_mapred.context=mapred
  379. #jobtracker.sink.file_mapred.filename=jobtracker-mapred-metrics.out
  380. #tasktracker.sink.file.filename=tasktracker-metrics.out
  381. #maptask.sink.file.filename=maptask-metrics.out
  382. #reducetask.sink.file.filename=reducetask-metrics.out
  383. *.sink.ganglia.class=org.apache.hadoop.metrics2.sink.ganglia.GangliaSink31  
  384. *.sink.ganglia.period=10
  385. *.sink.ganglia.slope=jvm.metrics.gcCount=zero,jvm.metrics.memHeapUsedM=both  
  386. *.sink.ganglia.dmax=jvm.metrics.threadsBlocked=70,jvm.metrics.memHeapUsedM=40  
  387. namenode.sink.ganglia.servers=master:8649  
  388. resourcemanager.sink.ganglia.servers=master:8649  
  389. datanode.sink.ganglia.servers=master:8649   
  390. nodemanager.sink.ganglia.servers=master:8649   
  391. maptask.sink.ganglia.servers=master:8649   
  392. reducetask.sink.ganglia.servers=master:8649
  393. slave-gmond.conf.md Raw
  394. slave节点gmond.conf配置
  395. /* This configuration is as close to 2.5.x default behavior as possible
  396.    The values closely match ./gmond/metric.h definitions in 2.5.x */
  397. globals {                    
  398.   daemonize = yes              
  399.   setuid = yes            
  400.   user = ganglia              
  401.   debug_level = 0               
  402.   max_udp_msg_len = 1472        
  403.   mute = no            
  404.   deaf = no            
  405.   host_dmax = 0 /*secs */
  406.   cleanup_threshold = 300 /*secs */
  407.   gexec = no            
  408.   send_metadata_interval = 10     
  409. }
  410. /* If a cluster attribute is specified, then all gmond hosts are wrapped inside
  411. * of a <CLUSTER> tag.  If you do not specify a cluster tag, then all <HOSTS> will
  412. * NOT be wrapped inside of a <CLUSTER> tag. */
  413. cluster {
  414.   name = "hadoop-cluster"
  415.   owner = "ganglia"
  416.   latlong = "unspecified"
  417.   url = "unspecified"
  418. }
  419. /* The host section describes attributes of the host, like the location */
  420. host {
  421.   location = "unspecified"
  422. }
  423. /* Feel free to specify as many udp_send_channels as you like.  Gmond
  424.    used to only support having a single channel */
  425. udp_send_channel {
  426.   #mcast_join = 239.2.11.71
  427.   host = master
  428.   port = 8649
  429.   ttl = 1
  430. }
  431. /* You can specify as many udp_recv_channels as you like as well. */
  432. udp_recv_channel {
  433.   #mcast_join = 239.2.11.71
  434.   port = 8649
  435.   #bind = 239.2.11.71
  436. }
  437. /* You can specify as many tcp_accept_channels as you like to share
  438.    an xml description of the state of the cluster */
  439. tcp_accept_channel {
  440.   port = 8649
  441. }
  442. /* Each metrics module that is referenced by gmond must be specified and
  443.    loaded. If the module has been statically linked with gmond, it does not
  444.    require a load path. However all dynamically loadable modules must include
  445.    a load path. */
  446. modules {
  447.   module {
  448.     name = "core_metrics"
  449.   }
  450.   module {
  451.     name = "cpu_module"
  452.     path = "/usr/lib/ganglia/modcpu.so"
  453.   }
  454.   module {
  455.     name = "disk_module"
  456.     path = "/usr/lib/ganglia/moddisk.so"
  457.   }
  458.   module {
  459.     name = "load_module"
  460.     path = "/usr/lib/ganglia/modload.so"
  461.   }
  462.   module {
  463.     name = "mem_module"
  464.     path = "/usr/lib/ganglia/modmem.so"
  465.   }
  466.   module {
  467.     name = "net_module"
  468.     path = "/usr/lib/ganglia/modnet.so"
  469.   }
  470.   module {
  471.     name = "proc_module"
  472.     path = "/usr/lib/ganglia/modproc.so"
  473.   }
  474.   module {
  475.     name = "sys_module"
  476.     path = "/usr/lib/ganglia/modsys.so"
  477.   }
  478. }
  479. include ('/etc/ganglia/conf.d/*.conf')
  480. /* The old internal 2.5.x metric array has been replaced by the following
  481.    collection_group directives.  What follows is the default behavior for
  482.    collecting and sending metrics that is as close to 2.5.x behavior as
  483.    possible. */
  484. /* This collection group will cause a heartbeat (or beacon) to be sent every
  485.    20 seconds.  In the heartbeat is the GMOND_STARTED data which expresses
  486.    the age of the running gmond. */
  487. collection_group {
  488.   collect_once = yes
  489.   time_threshold = 20
  490.   metric {
  491.     name = "heartbeat"
  492.   }
  493. }
  494. /* This collection group will send general info about this host every 1200 secs.
  495.    This information doesn't change between reboots and is only collected once. */
  496. collection_group {
  497.   collect_once = yes
  498.   time_threshold = 1200
  499.   metric {
  500.     name = "cpu_num"
  501.     title = "CPU Count"
  502.   }
  503.   metric {
  504.     name = "cpu_speed"
  505.     title = "CPU Speed"
  506.   }
  507.   metric {
  508.     name = "mem_total"
  509.     title = "Memory Total"
  510.   }
  511.   /* Should this be here? Swap can be added/removed between reboots. */
  512.   metric {
  513.     name = "swap_total"
  514.     title = "Swap Space Total"
  515.   }
  516.   metric {
  517.     name = "boottime"
  518.     title = "Last Boot Time"
  519.   }
  520.   metric {
  521.     name = "machine_type"
  522.     title = "Machine Type"
  523.   }
  524.   metric {
  525.     name = "os_name"
  526.     title = "Operating System"
  527.   }
  528.   metric {
  529.     name = "os_release"
  530.     title = "Operating System Release"
  531.   }
  532.   metric {
  533.     name = "location"
  534.     title = "Location"
  535.   }
  536. }
  537. /* This collection group will send the status of gexecd for this host every 300 secs */
  538. /* Unlike 2.5.x the default behavior is to report gexecd OFF.  */
  539. collection_group {
  540.   collect_once = yes
  541.   time_threshold = 300
  542.   metric {
  543.     name = "gexec"
  544.     title = "Gexec Status"
  545.   }
  546. }
  547. /* This collection group will collect the CPU status info every 20 secs.
  548.    The time threshold is set to 90 seconds.  In honesty, this time_threshold could be
  549.    set significantly higher to reduce unneccessary network chatter. */
  550. collection_group {
  551.   collect_every = 20
  552.   time_threshold = 90
  553.   /* CPU status */
  554.   metric {
  555.     name = "cpu_user"  
  556.     value_threshold = "1.0"
  557.     title = "CPU User"
  558.   }
  559.   metric {
  560.     name = "cpu_system"   
  561.     value_threshold = "1.0"
  562.     title = "CPU System"
  563.   }
  564.   metric {
  565.     name = "cpu_idle"  
  566.     value_threshold = "5.0"
  567.     title = "CPU Idle"
  568.   }
  569.   metric {
  570.     name = "cpu_nice"  
  571.     value_threshold = "1.0"
  572.     title = "CPU Nice"
  573.   }
  574.   metric {
  575.     name = "cpu_aidle"
  576.     value_threshold = "5.0"
  577.     title = "CPU aidle"
  578.   }
  579.   metric {
  580.     name = "cpu_wio"
  581.     value_threshold = "1.0"
  582.     title = "CPU wio"
  583.   }
  584.   /* The next two metrics are optional if you want more detail...
  585.      ... since they are accounted for in cpu_system.  
  586.   metric {
  587.     name = "cpu_intr"
  588.     value_threshold = "1.0"
  589.     title = "CPU intr"
  590.   }
  591.   metric {
  592.     name = "cpu_sintr"
  593.     value_threshold = "1.0"
  594.     title = "CPU sintr"
  595.   }
  596.   */
  597. }
  598. collection_group {
  599.   collect_every = 20
  600.   time_threshold = 90
  601.   /* Load Averages */
  602.   metric {
  603.     name = "load_one"
  604.     value_threshold = "1.0"
  605.     title = "One Minute Load Average"
  606.   }
  607.   metric {
  608.     name = "load_five"
  609.     value_threshold = "1.0"
  610.     title = "Five Minute Load Average"
  611.   }
  612.   metric {
  613.     name = "load_fifteen"
  614.     value_threshold = "1.0"
  615.     title = "Fifteen Minute Load Average"
  616.   }
  617. }
  618. /* This group collects the number of running and total processes */
  619. collection_group {
  620.   collect_every = 80
  621.   time_threshold = 950
  622.   metric {
  623.     name = "proc_run"
  624.     value_threshold = "1.0"
  625.     title = "Total Running Processes"
  626.   }
  627.   metric {
  628.     name = "proc_total"
  629.     value_threshold = "1.0"
  630.     title = "Total Processes"
  631.   }
  632. }
  633. /* This collection group grabs the volatile memory metrics every 40 secs and
  634.    sends them at least every 180 secs.  This time_threshold can be increased
  635.    significantly to reduce unneeded network traffic. */
  636. collection_group {
  637.   collect_every = 40
  638.   time_threshold = 180
  639.   metric {
  640.     name = "mem_free"
  641.     value_threshold = "1024.0"
  642.     title = "Free Memory"
  643.   }
  644.   metric {
  645.     name = "mem_shared"
  646.     value_threshold = "1024.0"
  647.     title = "Shared Memory"
  648.   }
  649.   metric {
  650.     name = "mem_buffers"
  651.     value_threshold = "1024.0"
  652.     title = "Memory Buffers"
  653.   }
  654.   metric {
  655.     name = "mem_cached"
  656.     value_threshold = "1024.0"
  657.     title = "Cached Memory"
  658.   }
  659.   metric {
  660.     name = "swap_free"
  661.     value_threshold = "1024.0"
  662.     title = "Free Swap Space"
  663.   }
  664. }
  665. collection_group {
  666.   collect_every = 40
  667.   time_threshold = 300
  668.   metric {
  669.     name = "bytes_out"
  670.     value_threshold = 4096
  671.     title = "Bytes Sent"
  672.   }
  673.   metric {
  674.     name = "bytes_in"
  675.     value_threshold = 4096
  676.     title = "Bytes Received"
  677.   }
  678.   metric {
  679.     name = "pkts_in"
  680.     value_threshold = 256
  681.     title = "Packets Received"
  682.   }
  683.   metric {
  684.     name = "pkts_out"
  685.     value_threshold = 256
  686.     title = "Packets Sent"
  687.   }
  688. }
  689. /* Different than 2.5.x default since the old config made no sense */
  690. collection_group {
  691.   collect_every = 1800
  692.   time_threshold = 3600
  693.   metric {
  694.     name = "disk_total"
  695.     value_threshold = 1.0
  696.     title = "Total Disk Space"
  697.   }
  698. }
  699. collection_group {
  700.   collect_every = 40
  701.   time_threshold = 180
  702.   metric {
  703.     name = "disk_free"
  704.     value_threshold = 1.0
  705.     title = "Disk Space Available"
  706.   }
  707.   metric {
  708.     name = "part_max_used"
  709.     value_threshold = 1.0
  710.     title = "Maximum Disk Space Used"
  711.   }
  712. }
  713. slave-hadoop-metrics2-hbase.properties.md Raw
  714. slave节点hadoop-metrics2-hbase.properties配置
  715. # syntax: [prefix].[source|sink].[instance].[options]
  716. # See javadoc of package-info.java for org.apache.hadoop.metrics2 for details
  717. #*.sink.file*.class=org.apache.hadoop.metrics2.sink.FileSink
  718. # default sampling period
  719. #*.period=10
  720. # Below are some examples of sinks that could be used
  721. # to monitor different hbase daemons.
  722. # hbase.sink.file-all.class=org.apache.hadoop.metrics2.sink.FileSink
  723. # hbase.sink.file-all.filename=all.metrics
  724. # hbase.sink.file0.class=org.apache.hadoop.metrics2.sink.FileSink
  725. # hbase.sink.file0.context=hmaster
  726. # hbase.sink.file0.filename=master.metrics
  727. # hbase.sink.file1.class=org.apache.hadoop.metrics2.sink.FileSink
  728. # hbase.sink.file1.context=thrift-one
  729. # hbase.sink.file1.filename=thrift-one.metrics
  730. # hbase.sink.file2.class=org.apache.hadoop.metrics2.sink.FileSink
  731. # hbase.sink.file2.context=thrift-two
  732. # hbase.sink.file2.filename=thrift-one.metrics
  733. # hbase.sink.file3.class=org.apache.hadoop.metrics2.sink.FileSink
  734. # hbase.sink.file3.context=rest
  735. # hbase.sink.file3.filename=rest.metrics
  736. *.sink.ganglia.class=org.apache.hadoop.metrics2.sink.ganglia.GangliaSink31  
  737. *.sink.ganglia.period=10  
  738. hbase.sink.ganglia.period=10  
  739. hbase.sink.ganglia.servers=master:8649
  740. slave-hadoop-metrics2.properties.md Raw
  741. slave节点hadoop-metrics2.properties配置
  742. #
  743. #   Licensed to the Apache Software Foundation (ASF) under one or more
  744. #   contributor license agreements.  See the NOTICE file distributed with
  745. #   this work for additional information regarding copyright ownership.
  746. #   The ASF licenses this file to You under the Apache License, Version 2.0
  747. #   (the "License"); you may not use this file except in compliance with
  748. #   the License.  You may obtain a copy of the License at
  749. #
  750. #       http://www.apache.org/licenses/LICENSE-2.0
  751. #
  752. #   Unless required by applicable law or agreed to in writing, software
  753. #   distributed under the License is distributed on an "AS IS" BASIS,
  754. #   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  755. #   See the License for the specific language governing permissions and
  756. #   limitations under the License.
  757. #
  758. # syntax: [prefix].[source|sink].[instance].[options]
  759. # See javadoc of package-info.java for org.apache.hadoop.metrics2 for details
  760. #*.sink.file.class=org.apache.hadoop.metrics2.sink.FileSink
  761. # default sampling period, in seconds
  762. #*.period=10
  763. # The namenode-metrics.out will contain metrics from all context
  764. #namenode.sink.file.filename=namenode-metrics.out
  765. # Specifying a special sampling period for namenode:
  766. #namenode.sink.*.period=8
  767. #datanode.sink.file.filename=datanode-metrics.out
  768. # the following example split metrics of different
  769. # context to different sinks (in this case files)
  770. #jobtracker.sink.file_jvm.context=jvm
  771. #jobtracker.sink.file_jvm.filename=jobtracker-jvm-metrics.out
  772. #jobtracker.sink.file_mapred.context=mapred
  773. #jobtracker.sink.file_mapred.filename=jobtracker-mapred-metrics.out
  774. #tasktracker.sink.file.filename=tasktracker-metrics.out
  775. #maptask.sink.file.filename=maptask-metrics.out
  776. #reducetask.sink.file.filename=reducetask-metrics.out
  777. *.sink.ganglia.class=org.apache.hadoop.metrics2.sink.ganglia.GangliaSink31  
  778. *.sink.ganglia.period=10
  779. *.sink.ganglia.slope=jvm.metrics.gcCount=zero,jvm.metrics.memHeapUsedM=both  
  780. *.sink.ganglia.dmax=jvm.metrics.threadsBlocked=70,jvm.metrics.memHeapUsedM=40  
  781. namenode.sink.ganglia.servers=master:8649  
  782. resourcemanager.sink.ganglia.servers=master:8649  
  783. datanode.sink.ganglia.servers=master:8649   
  784. nodemanager.sink.ganglia.servers=master:8649   
  785. maptask.sink.ganglia.servers=master:8649   
  786. reducetask.sink.ganglia.servers=master:8649
复制代码






已有(14)人评论

跳转到指定楼层
break-spark 发表于 2014-10-15 11:23:59
楼主,伪分布可以用吗
回复

使用道具 举报

pig2 发表于 2014-10-15 11:25:33
break-spark 发表于 2014-10-15 11:23
楼主,伪分布可以用吗
伪分布,也是分布的一种
回复

使用道具 举报

break-spark 发表于 2014-10-15 14:10:52
嗯,谢谢,知道了
回复

使用道具 举报

ohano_javaee 发表于 2014-10-17 20:44:57
碉堡了测试一下
回复

使用道具 举报

dwshmilyss 发表于 2015-1-13 14:10:21
不错,!学习学习
回复

使用道具 举报

tang 发表于 2015-3-10 15:55:33
太厉害了啊
回复

使用道具 举报

YLV 发表于 2015-3-16 16:49:16
学习学习,准备安装
回复

使用道具 举报

mjjian0 发表于 2015-3-18 17:39:34
我的hadoop集群启动后,ganglia也安装好了,但是 查看监控web没有发现hadoop的监控情况,这应该是什么问题啊
89D2`8%3`S9K]K)]0]25.png
回复

使用道具 举报

Minimumy 发表于 2015-4-10 14:42:28
这简直太棒了!!!!
回复

使用道具 举报

12下一页
您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

关闭

推荐上一条 /2 下一条