大数据 hadoop2.6.0+spark1.6.0 HA 分布式集群搭建(5个节点)【原创】
本帖最后由 breaking 于 2016-3-4 21:24 编辑问题导读:
1.怎么去安装hadoop?
2.怎么去安装zookeeper?
3.怎么去安装spark?
4.怎么去测试安装正确性?
static/image/hrline/2.gif
一、集群规划主机名 IP地址 安装软件 运行进程Master 10.0.0.170 JDK Scala HadoopSpark HistoryServerQuorumPeerMain SecondaryNameNode NameNodeMaster ResourceManag
Worker1 10.0.0.171 JDK Scala HadoopSpark QuorumPeerMainNodeManager DataNode Worker
Worker1 10.0.0.172 JDK Scala HadoopSpark QuorumPeerMainNodeManager DataNode Worker
Worker1 10.0.0.173 JDK Scala Hadoop Spark NodeManagerDataNode Worker
Worker1 10.0.0.174 JDK Scala HadoopSpark NodeManagerDataNode Worker
二 安装zookeeper集群
上传zookeeper
我们有rz 上传 要先安装lrzsz root@master:~# apt-get install lrzsz -y
找到zookeeper文件 点添加 再确定
上传完毕
解压root@master:~# mkdir -p /usr/local/zookeeperroot@master:~# cd /tools/root@master:/tools# tar -zxf zookeeper-3.4.6.tar.gz -C /usr/local/zookeeper/root@master:/tools# apt-get install treeroot@master:/tools# tree /usr/local/zookeeper/ -L 2
修改配置文件root@master:/tools# cd /usr/local/zookeeper/zookeeper-3.4.6/conf/root@master:/usr/local/zookeeper/zookeeper-3.4.6/conf# lltotal 20drwxr-xr-x2 citic citic 4096 Feb 202014 ./drwxr-xr-x 10 citic citic 4096 Feb 202014 ../-rw-rw-r--1 citic citic535 Feb 202014 configuration.xsl-rw-rw-r--1 citic citic 2161 Feb 202014 log4j.properties-rw-rw-r--1 citic citic922 Feb 202014 zoo_sample.cfgroot@master:/usr/local/zookeeper/zookeeper-3.4.6/conf# mv zoo_sample.cfg zoo.cfg在12行添加以下内容dataDir=/usr/local/zookeeper/zookeeper-3.4.6/datadataLogDir=/usr/local/zookeeper/zookeeper-3.4.6/logsserver.0=Master:2888:3888server.1=Worker1:2888:3888server.2=Worker2:2888:3888
创建一个data和logs文件夹
root@master:/usr/local/zookeeper/zookeeper-3.4.6/conf# mkdir -p /usr/local/zookeeper/zookeeper-3.4.6/dataroot@master:/usr/local/zookeeper/zookeeper-3.4.6/conf# mkdir -p /usr/local/zookeeper/zookeeper-3.4.6/logsroot@master:/usr/local/zookeeper/zookeeper-3.4.6/conf# echo 0> /usr/local/zookeeper/zookeeper-3.4.6/data/myidroot@master:/usr/local/zookeeper/zookeeper-3.4.6/conf# cat /usr/local/zookeeper/zookeeper-3.4.6/data/myid 1root@master:/usr/local/zookeeper/zookeeper-3.4.6/conf#
配置好zookeeper免秘钥登陆
把zookeeper 目录scp到work1和work2 上去root@master:/usr/local/zookeeper/zookeeper-3.4.6/conf# cd ~root@master:~# scp -rq /usr/local/zookeeper/ root@Worker1:/usr/local/root@master:~# scp -rq /usr/local/zookeeper/ root@Worker2:/usr/local/
在Worker1上执行下面命令root@master:/usr/local/zookeeper/zookeeper-3.4.6/conf# echo1 > /usr/local/zookeeper/zookeeper-3.4.6/data/myid在Worker2上执行下面命令root@master:/usr/local/zookeeper/zookeeper-3.4.6/conf# echo2 > /usr/local/zookeeper/zookeeper-3.4.6/data/myid
启动zookeeper集群###注意:严格按照下面的步骤
2.1 在 MsaterWorker1 Worker2
上启动zookeeperMasterroot@master:~# cd /usr/local/zookeeper/zookeeper-3.4.6/binroot@master:/usr/local/zookeeper/zookeeper-3.4.6/bin# ./zkServer.sh start
Worker1root@work1:~# cd /usr/local/zookeeper/zookeeper-3.4.6/binroot@work1:/usr/local/zookeeper/zookeeper-3.4.6/bin# ./zkServer.sh startWorker2root@work2:~# cd /usr/local/zookeeper/zookeeper-3.4.6/binroot@work2:/usr/local/zookeeper/zookeeper-3.4.6/bin# ./zkServer.sh start #查看状态:一个leader,两个follower
三、安装配置hadoop集群
3.1hadoop集群规划:namenode Masterresourcemanage MasterQuorumPeerMain Master Worker1 Worker2datenode Worker1 Worker2 Worker3 Worker4NodeManager Worker1 Worker2 Worker3 Worker4
3.2 解压root@Master:~# mkdir -p /usr/local/hadoop/root@Master:/tools# tar -zxf hadoop-2.6.0.tar.gz -C /usr/local/hadoop/root@Master:/tools# cd /usr/local/hadoop/hadoop-2.6.0/etc/hadoop/
3.3修改配置文件
3.3.1修改hadoo-env.shroot@Master:/usr/local/hadoop/hadoop-2.6.0/etc/hadoop# vim hadoop-env.sh
3.3.2修改core-site.xml
root@Master:/usr/local/hadoop/hadoop-2.6.0/etc/hadoop# vim core-site.xml 添加以下内容:<configuration> <property> <name>fs.defaultFS</name> <value>hdfs://Master:9000</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/usr/local/hadoop/hadoop-2.6.0/tmp</value> </property> <property> <name>hadoop.native.lib</name>
3.3.3修改hdfs-site.xml<configuration> <property> <name>dfs.replication</name> <value>2</value> </property> <property> <name>dfs.namenode.secondary.http-address</name> <value>Master:50090</value> <description>The secondary namenode http server address and port.</description> </property> <property> <name>dfs.namenode.name.dir</name> <value>/usr/local/hadoop/hadoop-2.6.0/dfs/name</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>/usr/local/hadoop/hadoop-2.6.0/dfs/data</value> </property> <property> <name>dfs.namenode.checkpoint.dir</name> <value>file:///usr/local/hadoop/hadoop-2.6.0/dfs/namesecondary</value> <description>Determines where on the local filesystem the DFSsecondary name node should store the temporary images to merge. If this is acomma-delimited list of directories then the image is replicated in all of thedirectories for redundancy.</description> </property></configuration>
3.3.4修改yarn-site.xmlroot@Master:/usr/local/hadoop/hadoop-2.6.0/etc/hadoop# vim yarn-site.xml <configuration>
<!-- Site specific YARN configuration properties --> <property> <name>yarn.resourcemanager.hostname</name> <value>Master</value> </property> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property></configuration>
3.3.5修改mapred-site.xml root@Master:/usr/local/hadoop/hadoop-2.6.0/etc/hadoop# mv mapred-site.xml.template mapred-site.xmlroot@Master:/usr/local/hadoop/hadoop-2.6.0/etc/hadoop# vim mapred-site.xml
<configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property></configuration>
3.3.6修改slaves (slaves是指定子节点的位置)root@Master:/usr/local/hadoop/hadoop-2.6.0/etc/hadoop# vim slaves Worker1Worker2Worker3Worker4
3.3.7 一键分发到各个服务器root@Master:~# vim fenfa.sh #!/bin/shfor i in 1 2 3 4doscp -rq /usr/local/hadoop/ root@Worker$i:/usr/local/#scp -rq /usr/local/spark/ root@Worker$i:/usr/local/done
root@Master:~# sh fenfa.sh
4 启动hadoop
4.1先启动zookeeper集群格式化HDFS
root@Master:~# cd /usr/local/hadoop/hadoop-2.6.0/bin/root@Master:/usr/local/hadoop/hadoop-2.6.0/bin# ./hdfs namenode -format
看到successful就表示格式成功
4.2启动HDFS(在Master上执行即可)root@Master:/usr/local/hadoop/hadoop-2.6.0/bin# .cd ../sbin/root@Master:/usr/local/hadoop/hadoop-2.6.0/sbin# ../start-dfs.shroot@Master:/usr/local/hadoop/hadoop-2.6.0/sbin#jps
4.3 启动yarnroot@Master:/usr/local/hadoop/hadoop-2.6.0/sbin./start-yarn.sh
5 安装spark 集群
5.1 解压root@Master:/tools# tar -zxf spark-1.6.0-bin-hadoop2.6.tgz -C /usr/local/spark/root@Master:/tools# cd /usr/local/spark/spark-1.6.0-bin-hadoop2.6/conf/
5.2 修改配置文件
5.2.1修改spark-env.shroot@Master:/usr/local/spark/spark-1.6.0-bin-hadoop2.6#mv spark-env.sh.template spark-env.shroot@Master:/usr/local/spark/spark-1.6.0-bin-hadoop2.6# vim spark-env.sh 在最后添加以下内容export JAVA_HOME=/usr/local/jdk/jdk1.8.0_60export SCALA_HOME=/usr/local/scala/scala-2.10.4export HADOOP_HOME=/usr/local/hadoop/hadoop-2.6.0export HADOOP_CONF_DIR=/usr/local/hadoop/hadoop-2.6.0/etc/hadoopexport SPARK_MASTER_IP=Master#export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER -Dspark.deploy.zookeeper.url=Master:2181,Worker1:2181,Worker2:2181 -Dspark.deploy.zookeeper.dir=/spark"export SPARK_WORKER_MEMORY=1gexport SPARK_EXECUTOR_MEMORY=1gexport SPARK_DRIVER_MEMORY=1Gexport SPARK_WORKER_CORES=2
5.2.2 修改 slavesroot@Master:/usr/local/spark/spark-1.6.0-bin-hadoop2.6/conf#mv slaves.template slavesroot@Master:/usr/local/spark/spark-1.6.0-bin-hadoop2.6/conf# vim slavesWorker1Worker2Worker3Worker4
5.2.3 修改 spark-defaults.confroot@Master:/usr/local/spark/spark-1.6.0-bin-hadoop2.6/conf#mv spark-defaults.conf.template spark-defaults.confroot@Master:/usr/local/spark/spark-1.6.0-bin-hadoop2.6/conf# vim spark-defaults.confspark.executor.extraJavaOptions -XX:+PrintGCDetails -Dkey=value -Dnumbers="one two three"spark.eventLog.enabled truespark.eventLog.dir hdfs://Master:9000/historyServerforSparkspark.yarn.historyServer.address Master:18080spark.history.fs.logDirectory hdfs://Master:9000/historyServerforSpark
5.6 一键分发到各个服务器上去
root@Master:~# vim fenfa.sh #!/bin/shfor i in 1 2 3 4do#scp -rq /usr/local/hadoop/ root@Worker$i:/usr/local/scp -rq /usr/local/spark/ root@Worker$i:/usr/local/done
root@Master:~# sh fenfa.sh
5.7.需要到hdfs 系统上创建/historyServerforSpark目录
root@Master:/# hadoop fs -mkdir /historyServerforSparkroot@Master:/# hadoop fs -ls /
5.8.启动spark集群root@Master:~# cd /usr/local/spark/spark-1.6.0-bin-hadoop2.6/sbin/root@Master:/usr/local/spark/spark-1.6.0-bin-hadoop2.6/sbin# ./start-all.sh
5.9.启动history-serveroot@Master:/usr/local/spark/spark-1.6.0-bin-hadoop2.6/sbin# ./start-history-server.sh
5.10 验证Master
Worker1Worker2Worker3Worker4
http://10.0.0.170:50070/
http://10.0.0.170:8088
http://10.0.0.170:8080/
http://10.0.0.170:18080/
集群搭建完毕
学习了,感谢分享 学习了,感谢分享 非常感谢楼主的分享! 有点不明白,咨询下楼主
有4个DataNode,而配制文件里面只配置了2个分发,是测试用还是有其他目的,不应该是4个DataNode吗
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
spark-1.6后,是否还要安装scala
谢谢 这文档好眼熟哈哈 szcountryboy 发表于 2016-3-6 23:14
有点不明白,咨询下楼主
有4个DataNode,而配制文件里面只配置了2个分发,是测试用还是有其他目的,不应该是4 ...
先装Scala 是指定2个副本默认是3个
spark1.6后不需要安装scala
另外dfs.replication应该为奇数
另外如果选择 spark on yarn,那是不需要安装那么多spark worker的,资源由yarn管理,spark负责提交任务和使用资源就好了,我说的对吗?? 这是上面部署模式? standalone?? 学习了。谢谢
页:
[1]
2