本帖最后由 hyj 于 2014-9-4 16:47 编辑
上一篇:
1、配置Cygwin支持无密码SSH登陆 安装SSH 在Select Packages的时候,在search输入ssh,选择openssh:The OpenSSH server and client programs 配置SSH服务(以管理员身份运行cygwin) ssh-host-config Should privilege separation be used? yes Do you want to install sshd as a service? yes 默认确认 Do you want to use a different name? no Create new privileged user account 'cyg_server'? yes 输入密码 cygrunsrv -S sshd 如果需要重新安装sshd服务,可以用cygrunsrv -R sshd 生成SSH Key ssh-keygen -t rsa(密码为空,路径默认) cp .ssh/id_rsa.pub .ssh/authorized_keys 登陆 ssh localhost 2、win上的HADOOP单机伪分布式 准备HADOOP运行环境 下载解压并拷贝到Cygwin的用户主目录 1.x版本有BUG,参考: BUG修复请参考:
在/home/ysc/.bashrc 中追加: export JAVA_HOME=/home/ysc/jdk1.7.0_17 export PATH=/home/ysc/hadoop-0.20.2/bin:$JAVA_HOME/bin:$PATH 在hadoop-0.20.2/conf/hadoop-evn.sh中追加 export JAVA_HOME=/home/ysc/jdk1.7.0_17 export HADOOP_LOG_DIR=/tmp/logs 创建符号链接 mklink /D C:\tmp C:\cygwin\tmp 重新登录就生效 ssh localhost which hadoop 配置HADOOP运行参数 vi conf/core-site.xml
<property> <name>fs.default.name</name> <value>hdfs://localhost:9000</value> </property>
vi conf/hdfs-site.xml
<property> <name>dfs.replication</name> <value>1</value> </property>
vi conf/mapred-site.xml
<property> <name>mapred.job.tracker</name> <value>localhost:9001</value> </property> <property> <name>mapred.tasktracker.map.tasks.maximum</name> <value>4</value> </property> <property> <name>mapred.tasktracker.reduce.tasks.maximum</name> <value>4</value> </property> 格式化名称节点并启动集群 hadoop namenode -format 启动集群并查看WEB管理界面 start-all.sh 停止集群 stop-all.sh停止集群 3、 执行wordcount命令
hadoop jar hadoop-0.20.2-examples.jar wordcount input output
第十二讲 1、 HADOOP多机完全分布式模式
三台机器 host2(NameNode、SecondaryNameNode、JobTracker、DataNode、TaskTracker) host6(DataNode、TaskTracker) host8(DataNode、TaskTracker) vi /etc/hostname(分别给每一台主机指定主机名) vi /etc/hosts(分别给每一台主机指定主机名到IP地址的映射) 新建用户和组 三台机器上面都要新建用户和组 addgroup hadoop adduser --ingroup hadoop hadoop 更改临时目录权限 chmod 777 /tmp 注销root以hadoop用户登录 配置SSH 在host2上面执行 ssh-keygen -t rsa(密码为空,路径默认) 该命令会在用户主目录下创建 .ssh 目录,并在其中创建两个文件:id_rsa 私钥文件,是基于 RSA 算法创建,该私钥文件要妥善保管,不要泄漏。id_rsa.pub 公钥文件,和 id_rsa 文件是一对儿,该文件作为公钥文件,可以公开 cp .ssh/id_rsa.pub .ssh/authorized_keys 把公钥追加到其他主机的authorized_keys 文件中 ssh-copy-id -i .ssh/id_rsa.pub hadoop@host6 ssh-copy-id -i .ssh/id_rsa.pub hadoop@host8 可以在host2上面通过ssh无密码登陆host6和host8 ssh host2 ssh host6 ssh host8 准备HADOOP运行环境 tar -xzvf hadoop-1.1.2.tar.gz 在/home/hadoop/.bashrc 中追加: export PATH=/home/hadoop/hadoop-1.1.2/bin:$PATH 重新登录就生效 ssh localhost which hadoop 配置HADOOP运行参数 vi conf/masters
把localhost替换为:host2
vi conf/slaves
删除localhost,加入两行: host2 host6 host8
vi conf/core-site.xml
<property> <name>fs.default.name</name> <value>hdfs://host2:9000</value> </property>
vi conf/hdfs-site.xml
<property> <name>dfs.name.dir</name> <value>/home/hadoop/dfs/filesystem/name</value> </property> <property> <name>dfs.data.dir</name> <value>/home/hadoop/dfs/filesystem/data</value> </property> <property> <name>dfs.replication</name> <value>3</value> </property>
vi conf/mapred-site.xml
<property> <name>mapred.job.tracker</name> <value>host2:9001</value> </property> <property> <name>mapred.tasktracker.map.tasks.maximum</name> <value>4</value> </property> <property> <name>mapred.tasktracker.reduce.tasks.maximum</name> <value>4</value> </property> <property> <name>mapred.system.dir</name> <value>/home/hadoop/mapreduce/system</value> </property> <property> <name>mapred.local.dir</name> <value>/home/hadoop/mapreduce/local</value> </property> 复制HADOOP文件到其他节点 scp -r /home/hadoop/hadoop-1.1.2 hadoop@host6:/home/hadoop/hadoop-1.1.2
scp -r /home/hadoop/hadoop-1.1.2 hadoop@host8:/home/hadoop/hadoop-1.1.2 格式化名称节点并启动集群 hadoop namenode -format 启动集群并查看WEB管理界面 start-all.sh 停止集群 stop-all.sh停止集群
第十三讲 1、改变负载 三台机器,改变负载 host2(NameNode、DataNode、TaskTracker) host6(SecondaryNameNode、DataNode、TaskTracker) host8(JobTracker、DataNode、TaskTracker)
指定SecondaryNameNode为host6: vi conf/masters指定host6 scp conf/masters host6:/home/hadoop/hadoop-1.1.2/conf/masters scp conf/masters host8:/home/hadoop/hadoop-1.1.2/conf/masters
vi conf/hdfs-site.xml <property>
<name>dfs.http.address</name>
<value>host2:50070</value>
</property> <property> <name>dfs.secondary.http.address</name> <value>host6:50090</value> </property> scp conf/hdfs-site.xml host6:/home/hadoop/hadoop-1.1.2/conf/hdfs-site.xml scp conf/hdfs-site.xml host8:/home/hadoop/hadoop-1.1.2/conf/hdfs-site.xml
指定JobTracker为host8: vi conf/mapred-site.xml <property> <name>mapred.job.tracker</name> <value>host8:9001</value> </property> scp conf/mapred-site.xml host6:/home/hadoop/hadoop-1.1.2/conf/mapred-site.xml scp conf/mapred-site.xml host8:/home/hadoop/hadoop-1.1.2/conf/mapred-site.xml
vi conf/core-site.xml <property> <name>fs.checkpoint.dir</name> <value>/home/hadoop/dfs/filesystem/namesecondary</value> </property> scp conf/core-site.xml host6:/home/hadoop/hadoop-1.1.2/conf/core-site.xml scp conf/core-site.xml host8:/home/hadoop/hadoop-1.1.2/conf/core-site.xml 配置host8: host8上的脚本start-mapred.sh会启动host2和host6上面的TaskTracker,所以需要对host8执行: ssh-keygen -t rsa(密码为空,路径默认) ssh-copy-id -i .ssh/id_rsa.pub hadoop@host2 ssh-copy-id -i .ssh/id_rsa.pub hadoop@host6 ssh-copy-id -i .ssh/id_rsa.pub hadoop@host8 可以在host8上面通过ssh无密码登陆host2和host6 ssh host2 ssh host6 ssh host8 在/home/hadoop/.bashrc 中追加: export PATH=/home/hadoop/hadoop-1.1.2/bin:$PATH
host2: 执行start-dfs.sh host8: 执行start-mapred.sh
2、SecondaryNameNode ssh host6 停止secondarynamenode hadoop-1.1.2/bin/hadoop-daemon.sh stop secondarynamenode 强制合并fsimage和eidts hadoop-1.1.2/bin/hadoop secondarynamenode -checkpoint force 启动secondarynamenode hadoop-1.1.2/bin/hadoop-daemon.sh start secondarynamenode
3、启用回收站 <property> <name>fs.trash.interval</name> <value>10080</value> </property>
第十四讲 1、 动态增加DataNode节点和TaskTracker节点
以host226为例 在host226上执行: 指定主机名 vi /etc/hostname 指定主机名到IP地址的映射 vi /etc/hosts 增加用户和组 addgrouphadoop adduser--ingroup hadoop hadoop 更改临时目录权限 chmod777 /tmp
在host2上执行: vi conf/slaves 增加host226 ssh-copy-id -i .ssh/id_rsa.pub hadoop@host226 scp-r /home/hadoop/hadoop-1.1.2 hadoop@host226:/home/hadoop/hadoop-1.1.2 在host8上执行: vi conf/slaves 增加host226 ssh-copy-id -i .ssh/id_rsa.pub hadoop@host226
在host226上面执行: hadoop-daemon.sh start datanode hadoop-daemon.sh start tasktracker
在/etc/hosts的配置文件中,localhost到ip地址的映射要放到ipv4的最后面
第十五讲 1、限制hadoop节点连接 NameNode: vi conf/hdfs-site.xml
<property> <name>dfs.hosts</name> <value>/home/hadoop/hadoop-1.1.2/conf/include</value> </property> <property> <name>dfs.hosts.exclude</name> <value>/home/hadoop/hadoop-1.1.2/conf/exclude</value> </property> 加入集群节点 vi /home/hadoop/hadoop-1.1.2/conf/include
JobTracker: vi conf/mapred-site.xml
<property> <name>mapred.hosts</name> <value>/home/hadoop/hadoop-1.1.2/conf/include</value> </property> <property> <name>mapred.hosts.exclude</name> <value>/home/hadoop/hadoop-1.1.2/conf/exclude</value> </property> 加入集群节点 vi /home/hadoop/hadoop-1.1.2/conf/include
重启集群 2、动态删除DataNode节点和TaskTracker节点 vi /home/hadoop/hadoop-1.1.2/conf/exclude 增加待删除的节点host226 在NameNode上面执行: hadoop dfsadmin -refreshNodes vihadoop-1.1.2/conf/slaves (去掉host226) vi hadoop-1.1.2/conf/include(去掉host226) hadoop dfsadmin -refreshNodes(使include的更改生效) rmhadoop-1.1.2/conf/exclude exclude主要是使一个datanode节点安全退役
删除tasktracker方式一: vi /home/hadoop/hadoop-1.1.2/conf/exclude 增加待删除的节点host226 在JobTracker上面执行: hadoop mradmin -refreshNodes vihadoop-1.1.2/conf/slaves (去掉host226) vi hadoop-1.1.2/conf/include(去掉host226) hadoop mradmin -refreshNodes(使include的更改生效) rmhadoop-1.1.2/conf/exclude
删除tasktracker方式二: vi /home/hadoop/hadoop-1.1.2/conf/include 删除待删除的节点host226 在JobTracker上面执行: hadoop mradmin -refreshNodes vihadoop-1.1.2/conf/slaves (去掉host226)
第十六讲 1、运行基准测试 hadoop jarhadoop-test-1.1.2.jar
hadoop jarhadoop-test-1.1.2.jar DFSCIOTest -write -nrFiles 12 -fileSize 1000 -resFiletest hadoop jarhadoop-test-1.1.2.jar DFSCIOTest -read -nrFiles 12 -fileSize 1000 -resFile test hadoop jarhadoop-test-1.1.2.jar DFSCIOTest -clear
第十七讲 Ganglia主要是用来监控大规模分布式系统的性能,如:cpu 、内存、硬盘、负载、网络流量等。Ganglia支持通过浏览器访问,强大的图表展示方式很容易直观地了解每个节点以及整个集群的工作状态,对集群运行参数调整、提高系统整体资源利用率起到重要作用。 1、 配置服务端
host6作为服务端: 创建用户和组: addgroup ganglia adduser --ingroup ganglia ganglia 安装: apt-get install gmetad apt-get install rrdtool apt-get install ganglia-webfrontend apt-get install ganglia-monitor 配置gmond: vi/etc/ganglia/gmond.conf 先找到setuid= yes,改成setuid=no; 在找到cluster块中的name,改成name=”hadoop-cluster”; 配置gmetad: vi/etc/ganglia/gmetad.conf 在这个配置文件中增加datasource,即增加以下内容: data_source “hadoop-cluster” 10 host2 host6 host8 gridname "Hadoop" 指定web文件夹: ln -s /usr/share/ganglia-webfrontend /var/www/ganglia 指定主机名: vi/etc/apache2/apache2.conf 添加: ServerName host6 重启服务: /etc/init.d/gmetadrestart /etc/init.d/ganglia-monitor restart /etc/init.d/apache2restart 2、 配置客户端
在host2和host8上安装数据收集服务: 创建用户和组: addgroup ganglia adduser --ingroup ganglia ganglia 安装: apt-get install ganglia-monitor 配置gmond: vi/etc/ganglia/gmond.conf 先找到setuid= yes,改成setuid=no; 在找到cluster块中的name,改成name=”hadoop-cluster”; 重启服务: /etc/init.d/ganglia-monitor restart 3、 访问页面
如果页面中的Choosea Source有unspecified,重启gmetad即可: /etc/init.d/gmetad restart 4、 集成hadoop
vi conf/hadoop-metrics2.properties 设置内容为: #大于0.20以后的版本用ganglia31 *.sink.ganglia.class=org.apache.hadoop.metrics2.sink.ganglia.GangliaSink31
*.sink.ganglia.period=10
# default for supportsparse is false
*.sink.ganglia.supportsparse=true
*.sink.ganglia.slope=jvm.metrics.gcCount=zero,jvm.metrics.memHeapUsedM=both
*.sink.ganglia.dmax=jvm.metrics.threadsBlocked=70,jvm.metrics.memHeapUsedM=40
#广播IP地址,这是缺省的,统一设该值(只能用组播地址239.2.11.71) namenode.sink.ganglia.servers=239.2.11.71:8649
datanode.sink.ganglia.servers=239.2.11.71:8649
jobtracker.sink.ganglia.servers=239.2.11.71:8649
tasktracker.sink.ganglia.servers=239.2.11.71:8649
maptask.sink.ganglia.servers=239.2.11.71:8649
reducetask.sink.ganglia.servers=239.2.11.71:8649
dfs.class=org.apache.hadoop.metrics.ganglia.GangliaContext31
dfs.period=10
dfs.servers=239.2.11.71:8649
mapred.class=org.apache.hadoop.metrics.ganglia.GangliaContext31
mapred.period=10
mapred.servers=239.2.11.71:8649
jvm.class=org.apache.hadoop.metrics.ganglia.GangliaContext31
jvm.period=10
jvm.servers=239.2.11.71:8649
把配置文件复制到集群其他节点,重启集群。
第十八讲 1、准备压缩数据 从dmoz下载url库 gunzip content.rdf.u8.gz 准备nutch1.6 cprelease-1.6/conf/nutch-site.xml.template release-1.6/conf/nutch-site.xml vi release-1.6/conf/nutch-site.xml 增加: <property> <name>http.agent.name</name> <value>nutch</value> </property> cdrelease-1.6 ant cd .. 使用DmozParser把dmoz的URL库解析为文本 release-1.6/runtime/local/bin/nutchorg.apache.nutch.tools.DmozParser content.rdf.u8 > urls & 将url文本内容放到HDFS上面 hadoop fs -put urls urls 2、以不同压缩方法注入URL 进入nutch主目录 cd release-1.6 以未压缩的方式注入URL runtime/deploy/bin/nutch inject data_no_compress/crawldb urls
以默认压缩的方式注入URL viconf/nutch-site.xml <property> <name>mapred.output.compression.type</name> <value>BLOCK</value> </property> <property> <name>mapred.output.compress</name> <value>true</value> </property> <property> <name>mapred.compress.map.output</name> <value>true</value> </property> <property> <name>mapred.map.output.compression.codec</name> <value>org.apache.hadoop.io.compress.DefaultCodec</value> </property> <property> <name>mapred.output.compression.codec</name> <value>org.apache.hadoop.io.compress.DefaultCodec</value> </property> ant runtime/deploy/bin/nutch inject data_default_compress/crawldb urls
以Gzip压缩的方式注入URL viconf/nutch-site.xml <property> <name>mapred.output.compression.type</name> <value>BLOCK</value> </property> <property> <name>mapred.output.compress</name> <value>true</value> </property> <property> <name>mapred.compress.map.output</name> <value>true</value> </property> <property> <name>mapred.map.output.compression.codec</name> <value>org.apache.hadoop.io.compress.GzipCodec</value> </property> <property> <name>mapred.output.compression.codec</name> <value>org.apache.hadoop.io.compress.GzipCodec</value> </property> ant runtime/deploy/bin/nutch inject data_gzip_compress/crawldb urls
以BZip2的压缩方式注入URL viconf/nutch-site.xml <property> <name>mapred.output.compression.type</name> <value>BLOCK</value> </property> <property> <name>mapred.output.compress</name> <value>true</value> </property> <property> <name>mapred.compress.map.output</name> <value>true</value> </property> <property> <name>mapred.map.output.compression.codec</name> <value>org.apache.hadoop.io.compress.BZip2Codec</value> </property> <property> <name>mapred.output.compression.codec</name> <value>org.apache.hadoop.io.compress.BZip2Codec</value> </property> ant runtime/deploy/bin/nutch inject data_bzip2_compress/crawldb urls
以Snappy的方式注入URL viconf/nutch-site.xml <property> <name>mapred.output.compression.type</name> <value>BLOCK</value> </property> <property> <name>mapred.output.compress</name> <value>true</value> </property> <property> <name>mapred.compress.map.output</name> <value>true</value> </property> <property> <name>mapred.map.output.compression.codec</name> <value>org.apache.hadoop.io.compress.SnappyCodec</value> </property> <property> <name>mapred.output.compression.codec</name> <value>org.apache.hadoop.io.compress.SnappyCodec</value> </property> ant runtime/deploy/bin/nutch inject data_snappy_compress/crawldb urls 压缩类型的影响 块大小的影响 3、Hadoop配置Snappy压缩 下载解压: tar -xzvf snappy-1.1.0.tar.gz cdsnappy-1.0.5 编译: ./configure make make install 复制库文件: scp /usr/local/lib/libsnappy* host2:/home/hadoop/hadoop-1.1.2/lib/native/Linux-amd64-64/ scp /usr/local/lib/libsnappy* host6:/home/hadoop/hadoop-1.1.2/lib/native/Linux-amd64-64/ scp /usr/local/lib/libsnappy* host8:/home/hadoop/hadoop-1.1.2/lib/native/Linux-amd64-64/ 在每一台集群机器上面修改环境变量: vi /home/hadoop/.bashrc 追加: export LD_LIBRARY_PATH=/home/hadoop/hadoop-1.1.2/lib/native/Linux-amd64-64
第十九讲(21分钟) 1、Ganglia以组播方式监控同一个网段中的多个集群 vi /etc/ganglia/gmetad.conf
data_source "cluster1" 10 host2
data_source "cluster2" 10 host6
data_source "cluster3" 10 host8
/etc/init.d/gmetad restart
下面要分别指定节点使用的端口: cluster1: vi /etc/ganglia/gmond.conf 指定集群名称: cluster {
name = "cluster1"
owner ="unspecified"
latlong ="unspecified"
url ="unspecified"
} 指定端口: udp_send_channel {
mcast_join = 239.2.11.71
port = 8661
ttl = 1
}
udp_recv_channel {
mcast_join =239.2.11.71
port = 8661
bind =239.2.11.71
} /etc/init.d/ganglia-monitor restart
cluster2: vi /etc/ganglia/gmond.conf 指定集群名称: cluster {
name = "cluster2"
owner ="unspecified"
latlong ="unspecified"
url ="unspecified"
} 指定端口: udp_send_channel {
mcast_join =239.2.11.71
port = 8662
ttl = 1
}
udp_recv_channel {
mcast_join =239.2.11.71
port = 8662
bind =239.2.11.71
} /etc/init.d/ganglia-monitor restart
cluster3: vi /etc/ganglia/gmond.conf 指定集群名称: cluster {
name = "cluster3"
owner ="unspecified"
latlong ="unspecified"
url ="unspecified"
} 指定端口: udp_send_channel {
mcast_join =239.2.11.71
port = 8663
ttl = 1
}
udp_recv_channel {
mcast_join =239.2.11.71
port = 8663
bind =239.2.11.71
} /etc/init.d/ganglia-monitor restart
2、Ganglia以单播方式监控同一个网段中的多个集群 vi /etc/ganglia/gmetad.conf
data_source "cluster1" 10 host2
data_source "cluster2" 10 host6
data_source "cluster3" 10 host8
/etc/init.d/gmetad restart
cluster1: vi /etc/ganglia/gmond.conf 指定集群名称: cluster {
name = "cluster1"
owner = "unspecified"
latlong ="unspecified"
url ="unspecified"
} 指定接收数据的节点: udp_send_channel { # mcast_join = 239.2.11.71 host = host2 port = 8649 ttl = 1 } udp_recv_channel { # mcast_join = 239.2.11.71 port = 8649 # bind = 239.2.11.71 } /etc/init.d/ganglia-monitor restart
cluster2: vi /etc/ganglia/gmond.conf 指定集群名称: cluster {
name = "cluster2"
owner ="unspecified"
latlong ="unspecified"
url ="unspecified"
} 指定接收数据的节点: udp_send_channel { # mcast_join = 239.2.11.71 host = host6 port = 8649 ttl = 1 } udp_recv_channel { # mcast_join = 239.2.11.71 port = 8649 # bind = 239.2.11.71 } /etc/init.d/ganglia-monitor restart
cluster3: vi /etc/ganglia/gmond.conf 指定集群名称: cluster {
name = "cluster3"
owner ="unspecified"
latlong ="unspecified"
url ="unspecified"
} 指定接收数据的节点: udp_send_channel { # mcast_join = 239.2.11.71 host = host8 port = 8649 ttl = 1 } udp_recv_channel { # mcast_join = 239.2.11.71 port = 8649 # bind = 239.2.11.71 } /etc/init.d/ganglia-monitor restart 3、Ganglia监控不同网段中的多个集群 不同网段中的主机如果属于同一个集群,则无法使用ganglia的多播配置方法,必须使用单播。 下面把host226加入cluster1: 在host226上安装数据收集服务: 创建用户和组: addgroup ganglia adduser --ingroup ganglia ganglia 安装: apt-get install ganglia-monitor 配置gmond: vi/etc/ganglia/gmond.conf 先找到setuid= yes,改成setuid=no; 在找到cluster块中的name,改成name=”cluster1”; 指定端口(要注意刚才第一步演示组播的时候已经把UDP端口改为8661): udp_send_channel {
mcast_join = 239.2.11.71
port = 8661
ttl = 1
}
udp_recv_channel {
mcast_join =239.2.11.71
port = 8661
bind =239.2.11.71
} 重启服务: /etc/init.d/ganglia-monitor restart
第二十讲(22分钟) 1、Ganglia以单播方式监控跨多个网段的单一集群 vi /etc/ganglia/gmetad.conf
data_source "hadoop-cluster" 10 host6
/etc/init.d/gmetad restart
在集群的所有节点中指定以下配置: vi /etc/ganglia/gmond.conf 指定集群名称: cluster {
name = "hadoop-cluster"
owner = "unspecified"
latlong ="unspecified"
url ="unspecified"
} 指定接收数据的节点: udp_send_channel { # mcast_join = 239.2.11.71 host = host6 port = 8649 ttl = 1 } udp_recv_channel { # mcast_join = 239.2.11.71 port = 8649 # bind = 239.2.11.71 } /etc/init.d/ganglia-monitor restart 2、配置Hadoop集群使用单播地址 vi conf/hadoop-metrics2.properties 设置内容为: #大于0.20以后的版本用ganglia31 *.sink.ganglia.class=org.apache.hadoop.metrics2.sink.ganglia.GangliaSink31
*.sink.ganglia.period=10
# default for supportsparse is false
*.sink.ganglia.supportsparse=true
*.sink.ganglia.slope=jvm.metrics.gcCount=zero,jvm.metrics.memHeapUsedM=both
*.sink.ganglia.dmax=jvm.metrics.threadsBlocked=70,jvm.metrics.memHeapUsedM=40 namenode.sink.ganglia.servers=host6
datanode.sink.ganglia.servers= host6
jobtracker.sink.ganglia.servers= host6
tasktracker.sink.ganglia.servers= host6
maptask.sink.ganglia.servers= host6
reducetask.sink.ganglia.servers= host6
dfs.class=org.apache.hadoop.metrics.ganglia.GangliaContext31
dfs.period=10
dfs.servers= host6
mapred.class=org.apache.hadoop.metrics.ganglia.GangliaContext31
mapred.period=10
mapred.servers= host6
jvm.class=org.apache.hadoop.metrics.ganglia.GangliaContext31
jvm.period=10
jvm.servers= host6
把配置文件复制到集群其他节点,重启集群。 3、扩展集群,节点分别位于3个不同网段 将host226重新加入集群,并新增host138 在host6和host8的include文件中加入host226和host138 在host6和host8的slaves文件中加入host226和host138 在新增的节点host138上面执行: 指定主机名 vi /etc/hostname 指定主机名到IP地址的映射 vi /etc/hosts 增加用户和组 addgrouphadoop adduser--ingroup hadoop hadoop 更改临时目录权限 chmod777 /tmp 在host2和host8 上面配置对host138的SSH登陆: ssh-copy-id -i .ssh/id_rsa.pub hadoop@host138 在host2上将hadoop文件复制到host138: scp-r /home/hadoop/hadoop-1.1.2 hadoop@host138:/home/hadoop/hadoop-1.1.2
如果集群已经在运行,则在host226和host138上面执行以下命令以动态增加节点: hadoop-daemon.sh start datanode hadoop-daemon.sh start tasktracker 4、配置host138 在host138上安装数据收集服务: 创建用户和组: addgroup ganglia adduser --ingroup ganglia ganglia 安装: apt-get install ganglia-monitor 配置gmond: vi /etc/ganglia/gmond.conf 指定集群名称: cluster {
name = "hadoop-cluster"
owner = "unspecified"
latlong ="unspecified"
url ="unspecified"
} 指定接收数据的节点: udp_send_channel { # mcast_join = 239.2.11.71 host = host6 port = 8649 ttl = 1 } udp_recv_channel { # mcast_join = 239.2.11.71 port = 8649 # bind = 239.2.11.71 } /etc/init.d/ganglia-monitor restart
(如失效,可以查看您此帖http://www.aboutyun.com/thread-5449-1-1.html)
http://yangshangchuan.iteye.com/blog/1837935
|