【2015年6月19日】
刚接触Hadoop两个月,因为公司要用,所以现学的,看的文章比较杂,尤其是1.x和2.x的混着看,越看越理不清头绪,这几天才算入了点门
我这也是整合了大家的东西整理出来的,如果有不足或者错误的地方,希望大家指正
因为准备生产环境用的,所以都做了HA
HDFS的NameNode HA
YARN的Resource Manager HA
Hbase的Hmaster HA
Hive的Hiveserver2 HA
PS:听说还有人对JobHistory做HA,这个我没做,谁有经验的也跟我讲下
=================================================================
角色分配
部署hadoop完全分布式
一、准备环境
移除已有jdk(最小化安装没有自带jdk,可略过此步骤)
yum remove java-1.7.0-openjdk -y
关闭防火墙和selinux
service iptables stop
chkconfig iptables off
setenforce 0
vi /etc/selinux/config
SELINUX=disabled
所有机器同步时间
ntpdate 192.168.7.2
配置主机名和hosts
vi /etc/networks
HOSTNAME=hadoop001~005
编辑hosts文件
vi /etc/hosts
192.168.5.2 hadoop001.xiazy.net hadoop001
192.168.5.3 hadoop002.xiazy.net hadoop002
192.168.5.4 hadoop003.xiazy.net hadoop003
192.168.5.5 hadoop004.xiazy.net hadoop004
192.168.5.6 hadoop005.xiazy.net hadoop005
配置yum源
vi /etc/yum.repos.d/rhel.repo
创建hadoop用户和组
groupadd hadoop
useradd -g hadoop hadoop
passwd hadoop
赋权限,以备后续步骤安装软件[安装包都在/usr/local/src]
chown hadoop.hadoop /usr/local/src -R
切换hadoop用户
su - hadoop
配置密钥验证免密码登录[所有namenode都要做一遍]
ssh-keygen -t rsa -P ''
cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys
chmod 700 ~/.ssh/
chmod 600 ~/.ssh/authorized_keys
ssh-copy-id -i $HOME/.ssh/id_rsa.pub hadoop@hadoop001.xiazy.net
ssh-copy-id -i $HOME/.ssh/id_rsa.pub hadoop@hadoop002.xiazy.net
ssh-copy-id -i $HOME/.ssh/id_rsa.pub hadoop@hadoop003.xiazy.net
ssh-copy-id -i $HOME/.ssh/id_rsa.pub hadoop@hadoop004.xiazy.net
ssh-copy-id -i $HOME/.ssh/id_rsa.pub hadoop@hadoop005.xiazy.net
验证
ssh hadoop002.xiazy.net ~ hadoop005.xiazy.net
创建备用目录
mkdir -pv /home/hadoop/storage/zookeeper/{data,logs}
for ip in `seq 2 5`;do scp -r /home/hadoop/storage hadoop00$ip.xiazy.net:/home/hadoop/;done
配置hadoop环境变量
vi ~/.bashrc
export JAVA_HOME=/usr/java/jdk1.7.0_67-cloudera
export JRE_HOME=$JAVA_HOME/jre
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar:$JRE_HOME/lib/rt.jar
export PATH=$PATH:$JAVA_HOME/bin
##############java############################
export HADOOP_HOME=/home/hadoop/hadoop
export HIVE_HOME=/home/hadoop/hive
export HBASE_HOME=/home/hadoop/hbase
##############hadoop-hbase-hive###############
export HADOOP_MAPRED_HOME=${HADOOP_HOME}
export HADOOP_COMMON_HOME=${HADOOP_HOME}
export HADOOP_HDFS_HOME=${HADOOP_HOME}
export YARN_HOME=${HADOOP_HOME}
export HADOOP_YARN_HOME=${HADOOP_HOME}
export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop
export HDFS_CONF_DIR=${HADOOP_HOME}/etc/hadoop
export YARN_CONF_DIR=${HADOOP_HOME}/etc/hadoop
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$HBASE_HOME/bin:$HIVE_HOME/bin
source ~/.bashrc
for ip in `seq 2 5`;do scp ~/.bashrc hadoop00$ip.xiazy.net:~/.bashrc;done
安装jdk
rpm -ivh /usr/local/src/oracle-j2sdk1.7-1.7.0+update67-1.x86_64.rpm
配置环境变量[准备环境时已做,此步骤可略]
验证jdk安装成功
java -version
二、部署hadoop-2.6.0 hdfs的namenoe HA、yarn的resource manager HA
解压、改名
tar xf /usr/local/src/hadoop-2.6.0.tar.gz -C /home/hadoop
mv /home/hadoop/hadoop-2.6.0 /home/hadoop/hadoop
配置hadoop环境变量[准备环境时已做,略]
验证hadoop安装成功
hadoop version
修改hadoop配置文件
[1]
vi /home/hadoop/hadoop/etc/hadoop/core-site.xml
###############################################
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<!-- 开启垃圾箱功能,1440分钟 -->
<property>
<name>fs.trash.interval</name>
<value>1440</value>
</property>
<!-- 指定hdfs的nameservice为ns1,是NameNode的URI。hdfs://主机名:端口/ -->
<property>
<name>fs.defaultFS</name>
<value>hdfs://gagcluster:8020</value>
</property>
<property>
<name>io.file.buffer.size</name>
<value>131072</value>
</property>
<!-- 指定hadoop临时目录 -->
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/storage/hadoop/tmp</value>
</property>
<!--指定可以在任何IP访问-->
<property>
<name>hadoop.proxyuser.hduser.hosts</name>
<value>*</value>
</property>
<!--指定所有用户可以访问-->
<property>
<name>hadoop.proxyuser.hduser.groups</name>
<value>*</value>
</property>
<!-- 指定zookeeper地址 -->
<property>
<name>ha.zookeeper.quorum</name>
<value>hadoop001.xiazy.net:2181,hadoop002.xiazy.net:2181,hadoop003.xiazy.net:2181</value>
</property>
</configuration>
#################################################
[2]
vi /home/hadoop/hadoop/etc/hadoop/hdfs-site.xml
################################################
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<!--节点黑名单列表文件,用于下线hadoop节点 -->
<property>
<name>dfs.hosts.exclude</name>
<value>/home/hadoop/hadoop/etc/hadoop/exclude</value>
</property>
<!--指定hdfs的block大小64M -->
<property>
<name>dfs.block.size</name>
<value>67108864</value>
</property>
<!--指定hdfs的nameservice为ns1,需要和core-site.xml中的保持一致 -->
<property>
<name>dfs.nameservices</name>
<value>gagcluster</value>
</property>
<!-- ns1下面有两个NameNode,分别是nn1,nn2 -->
<property>
<name>dfs.ha.namenodes.gagcluster</name>
<value>nn1,nn2</value>
</property>
<!-- nn1的RPC通信地址 -->
<property>
<name>dfs.namenode.rpc-address.gagcluster.nn1</name>
<value>hadoop001.xiazy.net:8020</value>
</property>
<!-- nn1的http通信地址 -->
<property>
<name>dfs.namenode.http-address.gagcluster.nn1</name>
<value>hadoop001.xiazy.net:50070</value>
</property>
<!-- nn2的RPC通信地址 -->
<property>
<name>dfs.namenode.rpc-address.gagcluster.nn2</name>
<value>hadoop002.xiazy.net:8020</value>
</property>
<!-- nn2的http通信地址 -->
<property>
<name>dfs.namenode.http-address.gagcluster.nn2</name>
<value>hadoop002.xiazy.net:50070</value>
</property>
<!-- 指定NameNode的元数据在JournalNode上的存放位置 -->
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://hadoop001.xiazy.net:8485;hadoop002.xiazy.net:8485;hadoop003.xiazy.net:8485/gagcluster</value>
</property>
<!-- 配置失败自动切换实现方式 -->
<property>
<name>dfs.client.failover.proxy.provider.gagcluster</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<!-- 配置隔离机制 -->
<property>
<name>dfs.ha.fencing.methods</name>
<value>sshfence</value>
</property>
<!-- 使用隔离机制时需要ssh免密码登陆 -->
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/home/hadoop/.ssh/id_rsa</value>
</property>
<!-- 指定NameNode的元数据在JournalNode上的存放位置 -->
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/home/hadoop/storage/hadoop/journal</value>
</property>
<!--指定支持高可用自动切换机制-->
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
<!--指定namenode名称空间的存储地址-->
<property>
<name>dfs.namenode.name.dir</name>
<value>/home/hadoop/storage/hadoop/name</value>
</property>
<!--指定datanode数据存储地址-->
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/home/hadoop/storage/hadoop/data</value>
</property>
<!--指定数据冗余份数-->
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<!--指定可以通过web访问hdfs目录-->
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
<!--保证数据恢复 -->
<property>
<name>dfs.journalnode.http-address</name>
<value>0.0.0.0:8480</value>
</property>
<property>
<name>dfs.journalnode.rpc-address</name>
<value>0.0.0.0:8485</value>
</property>
<property>
<name>ha.zookeeper.quorum</name>
<value>hadoop001.xiazy.net:2181,hadoop002.xiazy.net:2181,hadoop003.xiazy.net:2181</value>
</property>
</configuration>
#################################################
[3]
vi /home/hadoop/hadoop/etc/hadoop/mapred-site.xml
#################################################
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<!-- 配置MapReduce运行于yarn中 -->
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<!-- 配置 MapReduce JobHistory Server 地址 ,默认端口10020 -->
<property>
<name>mapreduce.jobhistory.address</name>
<value>0.0.0.0:10020</value>
</property>
<!-- 配置 MapReduce JobHistory Server web ui 地址, 默认端口19888 -->
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>0.0.0.0:19888</value>
</property>
</configuration>
###################################################
[4]
yarn-site.xml
###################################################
<?xml version="1.0"?>
<configuration>
<!--日志聚合功能-->
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
<!--在HDFS上聚合的日志最长保留多少秒。3天-->
<property>
<name>yarn.log-aggregation.retain-seconds</name>
<value>259200</value>
</property>
<!--rm失联后重新链接的时间-->
<property>
<name>yarn.resourcemanager.connect.retry-interval.ms</name>
<value>2000</value>
</property>
<!--开启resource manager HA,默认为false-->
<property>
<name>yarn.resourcemanager.ha.enabled</name>
<value>true</value>
</property>
<!--配置resource manager -->
<property>
<name>yarn.resourcemanager.ha.rm-ids</name>
<value>rm1,rm2</value>
</property>
<property>
<name>ha.zookeeper.quorum</name>
<value>hadoop001.xiazy.net:2181,hadoop002.xiazy.net:2181,hadoop003.xiazy.net:2181</value>
</property>
<!--开启故障自动切换-->
<property>
<name>yarn.resourcemanager.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm1</name>
<value>hadoop001.xiazy.net</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm2</name>
<value>hadoop002.xiazy.net</value>
</property>
<!--在namenode1上配置rm1,在namenode2上配置rm2,注意:一般都喜欢把配置好的文件远程复制到其它机器上,但这个在YARN的另一个机器上一定要修改-->
<property>
<name>yarn.resourcemanager.ha.id</name>
<value>rm1</value>
</property>
<!--开启自动恢复功能-->
<property>
<name>yarn.resourcemanager.recovery.enabled</name>
<value>true</value>
</property>
<!--配置与zookeeper的连接地址-->
<property>
<name>yarn.resourcemanager.zk-state-store.address</name>
<value>hadoop001.xiazy.net:2181,hadoop002.xiazy.net:2181,hadoop003.xiazy.net:2181</value>
</property>
<property>
<name>yarn.resourcemanager.store.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
</property>
<property>
<name>yarn.resourcemanager.zk-address</name>
<value>hadoop001.xiazy.net:2181,hadoop002.xiazy.net:2181,hadoop003.xiazy.net:2181</value>
</property>
<property>
<name>yarn.resourcemanager.cluster-id</name>
<value>gagcluster-yarn</value>
</property>
<!--schelduler失联等待连接时间-->
<property>
<name>yarn.app.mapreduce.am.scheduler.connection.wait.interval-ms</name>
<value>5000</value>
</property>
<!--配置rm1-->
<property>
<name>yarn.resourcemanager.address.rm1</name>
<value>hadoop001.xiazy.net:8132</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address.rm1</name>
<value>hadoop001.xiazy.net:8130</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address.rm1</name>
<value>hadoop001.xiazy.net:8188</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address.rm1</name>
<value>hadoop001.xiazy.net:8131</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address.rm1</name>
<value>hadoop001.xiazy.net:8033</value>
</property>
<property>
<name>yarn.resourcemanager.ha.admin.address.rm1</name>
<value>hadoop001.xiazy.net:23142</value>
</property>
<!--配置rm2-->
<property>
<name>yarn.resourcemanager.address.rm2</name>
<value>hadoop002.xiazy.net:8132</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address.rm2</name>
<value>hadoop002.xiazy.net:8130</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address.rm2</name>
<value>hadoop002.xiazy.net:8188</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address.rm2</name>
<value>hadoop002.xiazy.net:8131</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address.rm2</name>
<value>hadoop002.xiazy.net:8033</value>
</property>
<property>
<name>yarn.resourcemanager.ha.admin.address.rm2</name>
<value>hadoop002.xiazy.net:23142</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.nodemanager.local-dirs</name>
<value>/home/hadoop/storage/yarn/local</value>
</property>
<property>
<name>yarn.nodemanager.log-dirs</name>
<value>/home/hadoop/storage/yarn/logs</value>
</property>
<property>
<name>mapreduce.shuffle.port</name>
<value>23080</value>
</property>
<!--故障处理类-->
<property>
<name>yarn.client.failover-proxy-provider</name>
<value>org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider</value>
</property>
<property>
<name>yarn.resourcemanager.ha.automatic-failover.zk-base-path</name>
<value>/yarn-leader-election</value>
</property>
</configuration>
#######################################################
配置DataNode节点
vi /home/hadoop/hadoop/etc/hadoop/slaves
#######################################################
hadoop001.xiazy.net
hadoop002.xiazy.net
hadoop003.xiazy.net
hadoop004.xiazy.net
hadoop005.xiazy.net
创建exclude文件,用于以后下线hadoop节点
touch /home/hadoop/hadoop/etc/hadoop/exclude
同步hadoop工程到hadoop002~005机器上面
for ip in `seq 2 5`;do scp -r /home/hadoop/hadoop hadoop00$ip.xiazy.net:/home/hadoop/;done
修改nn2配置文件yarn-site.xml
#####################################################
<property>
<name>yarn.resourcemanager.ha.id</name>
<value>rm2</value>
</property>
#####################################################
三、部署zookeeper3.4.6三节点完全分布式集群
使用三台服务器安装zookeeper,安装在hadoop用户上
hadoop001.xiazy.net 192.168.5.2
hadoop002.xiazy.net 192.168.5.3
hadoop003.xiazy.net 192.168.5.4
解压、改名
tar xf zookeeper-3.4.6.tar.gz -C /home/hadoop/
mv /home/hadoop/zookeeper-3.4.6/ /home/hadoop/zookeeper
修改配置文件
vi /usr/local/zookeeper/conf/zoo.cfg
tickTime=2000
initLimit=5
syncLimit=2
dataDir=/home/hadoop/storage/zookeeper/data
dataLogDir=/home/hadoop/storage/zookeeper/logs
clientPort=2181
server.1=hadoop001.xiazy.net:2888:3888
server.2=hadoop002.xiazy.net:2888:3888
server.3=hadoop003.xiazy.net:2888:3888
同步到hadoop002、hadoop003节点
for ip in `seq 2 3`;do scp -r /home/hadoop/zookeeper hadoop00$ip.xiazy.net:/home/hadoop;done
创建zookeeper的数据文件和日志存放目录[准备环境已做,此步骤略]
hadoop001~003分别编辑myid值
echo 1 > /home/hadoop/storage/zookeeper/data/myid
echo 2 > /home/hadoop/storage/zookeeper/data/myid
echo 3 > /home/hadoop/storage/zookeeper/data/myid
四、部署hbase-1.0.0的Hmaster HA
解压部署
tar xf /usr/local/src/hbase-1.0.0-bin.tar.gz -C /home/hadoop
cd /home/hadoop
mv hbase-1.0.0 hbase
修改配置文件
配置regionserver节点
vi /home/hadoop/hbase/conf/regionservers
hadoop001.xiazy.net
hadoop002.xiazy.net
hadoop003.xiazy.net
hadoop004.xiazy.net
hadoop005.xiazy.net
vi /home/hadoop/hbase/conf/hbase-site.xml
####################################################
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<!--因为是多台master,所以hbase.roodir的值跟hadoop配置文件hdfs-site.xml中dfs.nameservices的值是一样的-->
<property>
<name>hbase.rootdir</name>
<value>hdfs://gagcluster:8020/hbase</value>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
<name>hbase.tmp.dir</name>
<value>/home/hadoop/storage/hbase</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>hadoop001.xiazy.net,hadoop002.xiazy.net,hadoop003.xiazy.net</value>
</property>
<property>
<name>hbase.zookeeper.property.clientPort</name>
<value>2181</value>
</property>
<!--跟zookeeperper配置的dataDir一致-->
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/home/hadoop/storage/zookeeper/data</value>
</property>
</configuration>
#########################################################
禁用hbase自带的zookeeper
vi /home/hadoop/hbase/conf/hbase-env.sh
export HBASE_MANAGES_ZK=false
同步hbase工程到hadoop002~005机器上
for ip in `seq 2 5`;do scp -r /home/hadoop/hbase hadoop00$ip.xiazy.net:/home/hadoop;done
五、部署hive-1.1.0的hiveserver2 HA
解压部署
tar xf apache-hive-1.1.0-bin.tar.gz -C /home/hadoop/
mv /home/hadoop/apache-hive-1.1.0-bin /home/hadoop/hive
修改配置文件
cd /home/hadoop/hive/conf/
cp hive-default.xml.template hive-default.xml
vi hive-site.xml
#######################################################
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<!--在hdfs上hive数据存放目录,启动hadoop后需要在hdfs上手动创建-->
<property>
<name>hive.metastore.warehouse.dir</name>
<value>/user/hive/warehouse</value>
</property>
<!--通过jdbc协议连接mysql的hive库-->
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://192.168.5.2:3306/hive?createDatabaseIfNotExist=true</value>
</property>
<!--jdbc的mysql驱动-->
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
</property>
<!--mysql用户名-->
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>hive</value>
</property>
<!--mysql用户密码-->
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>hive</value>
</property>
<!--hive的web页面-->
<property>
<name>hive.hwi.war.file</name>
<value>lib/hive-hwi-1.1.0.war</value>
</property>
<!--指定hive元数据访问路径,可以有多个,逗号分隔-->
<property>
<name>hive.metastore.uris</name>
<value>thrift://192.168.5.2:9083</value>
</property>
<!--hiveserver2的HA-->
<property>
<name>hive.zookeeper.quorum</name>
<value>hadoop001.xiazy.net,hadoop002.xiazy.net,hadoop003.xiazy.net</value>
</property>
</configuration>
###############################################################
cp hive-log4j.properties.template hive-log4j.properties
vi hive-log4j.properties
###############################################################
log4j.appender.EventCounter=org.apache.hadoop.log.metrics.EventCounter
###############################################################
添加mysql驱动
cp /usr/local/src/mysql-connector-java-5.1.35.jar /home/hadoop/hive/lib/
添加hive web页面的war包
下载hive源码包,进入hwi/web
jar cvf hive-hwi-1.1.0.war ./*
cp hive-hwi-1.1.0.war /home/hadoop/hive/lib/
从hbase/lib下复制必要jar包到hive/lib下
cp /home/hadoop/hbase/lib/hbase-client-1.0.0.jar /home/hadoop/hbase/lib/hbase-common-1.0.0.jar /home/hadoop/hive/lib
同步hive和hadoop的jline版本
cp /home/hadoop/hive/lib/jline-2.12.jar /home/hadoop/hadoop/share/hadoop/yarn/lib
查看版本
cd /home/hadoop/hadoop/share/hadoop/yarn/lib
find ./ -name "*jline*jar"
删除低版本的jline 0.9
rm jline-0.9.94.jar
复制jdk的tools.jar到hive/lib下
cp $JAVA_HOME/lib/tools.jar /home/hadoop/hive/lib
把hive工程传到其他节点
for ip in `seq 2 5`;do scp -r /home/hadoop/hive hadoop00$ip.xiazy.net:/home/hadoop/;done
在mysql中创建hive库,hive用户,密码hive
部署mysql5.6_64
下载解压
cd /usr/local/src/
wget http://dev.mysql.com/get/Downloa ... 6_64.rpm-bundle.tar
tar xf MySQL-5.6.23-1.el6.x86_64.rpm-bundle.tar
安装
yum install MySQL-shared-compat-5.6.23-1.el6.x86_64.rpm -y
#RHEL兼容包
yum install MySQL-server-5.6.23-1.el6.x86_64.rpm -y
#MySQL服务端程序
yum install MySQL-client-5.6.23-1.el6.x86_64.rpm -y
#MySQL客户端程序
yum install MySQL-devel-5.6.23-1.el6.x86_64.rpm -y
#MySQL的库和头文件
yum install MySQL-shared-5.6.23-1.el6.x86_64.rpm -y
#MySQL的共享库
配置MySQL登录密码
cat /root/.mysql_secret
#获取MySQL安装时生成的随机密码aHoUaEJFav0X7hlG
service mysql start
#启动MySQL服务
mysql -uroot -paHoUaEJFav0X7hlG
#进入MySQL,使用之前获取的随机密码
SET PASSWORD FOR 'root'@'localhost' = PASSWORD('xiazy.net');
#在MySQL命令行中设置root账户的密码为xiazy.net
创建hive用户,密码hive
CREATE USER hive IDENTIFIED BY 'hive';
GRANT ALL PRIVILEGES ON *.* TO 'hive'@'%' WITH GRANT OPTION;
flush privileges;
用创建的hive用户登录mysql并创建hive库
mysql -uhive -phive
create database hive;
#####################################################
Hadoop集群首次启动过程
#####################################################
1.如果zookeeper集群还没有启动的话, 首先把各个zookeeper起来。
/home/hadoop/zookeeper/bin/zkServer.sh start (记住所有的zookeeper机器都要启动)
/home/hadoop/zookeeper/bin/zkServer.sh status (1个leader,n-1个follower)
输入jps,会显示启动进程:QuorumPeerMain
2.、然后在主namenode节点执行如下命令,创建命名空间
/home/hadoop/hadoop/bin/hdfs zkfc -formatZK (复制命令,-会变成— 建议手敲)
验证成功
在zookeeper节点执行
/home/hadoop/zookeeper/bin/zkCli.sh
ls /
ls /hadoop-ha
quit
3、在每个节点用如下命令启日志程序
/home/hadoop/hadoop/sbin/hadoop-daemon.sh start journalnode
(每个journalnode节点都需要启动)
4、在主namenode节点用./bin/hadoop namenode -format格式化namenode和journalnode目录
/home/hadoop/hadoop/bin/hadoop namenode -format
5、在主namenode节点启动namenode进程
/home/hadoop/hadoop/sbin/hadoop-daemon.sh start namenode
6、在备namenode节点执行第一行命令,把备namenode节点的目录格式化并把元数据从主namenode节点copy过来,并且这个命令不会把journalnode目录再格式化了!然后用第二个命令启动备namenode进程!
方法一、[hadoop002]
/home/hadoop/hadoop/bin/hdfs namenode -bootstrapStandby
/home/hadoop/hadoop/sbin/hadoop-daemon.sh start namenode
方法二、[hadoop002]
scp -r hadoop001.xiazy.net:/home/hadoop/storage/hadoop/name /home/hadoop/storage/hadoop
/home/hadoop/hadoop/sbin/hadoop-daemon.sh start namenode
7、在两个namenode节点都执行以下命令
/home/hadoop/hadoop/sbin/hadoop-daemon.sh start zkfc
8、启动datanode
方法一、
在所有DataNode节点单独启动
/home/hadoop/hadoop/sbin/hadoop-daemon.sh start datanode
方法二、
启动datanode节点多的时候,可以直接在主NameNode(nn1)上执行如下命令一次性启动所有datanode
/home/hadoop/hadoop/sbin/hadoop-daemons.sh start datanode
9. 启动Yarn和备ResourceManager
主NameNode节点上[hadoop001]
/home/hadoop/hadoop/sbin/start-yarn.sh
启动备节点ResourceManager[hadoop002]
/home/hadoop/hadoop/sbin/yarn-daemon.sh start resourcemanager
10.启动hbase
方法一、
主hmaster节点上
/home/hadoop/hbase/bin/start-hbase.sh
备hmaster节点上
/home/hadoop/hbase/bin/hbase-daemon.sh start master
验证安装成功
hbase shell
list
方法二、
添加配置文件
vi /home/hadoop/hbase/conf/backup-masters
hadoop002.xiazy.net
主master节点启动
/home/hadoop/hbase/bin/start-hbase.sh
11.启动hive
先在hdfs上创建好hive存储数据的目录
$HADOOP_HOME/bin/hadoop fs -mkdir /tmp
$HADOOP_HOME/bin/hadoop fs -mkdir -p /user/hive/warehouse
$HADOOP_HOME/bin/hadoop fs -chmod g+w /tmp
$HADOOP_HOME/bin/hadoop fs -chmod g+w /user/hive/warehouse
hive 远程服务 (端口号10000) 启动方式[metastore和hiveserver2在同一台上启动即可]
nohup /home/hadoop/hive/bin/hive --service metastore &> metastore.log &
nohup /home/hadoop/hive/bin/hive --service hiveserver2 &> hiveserver2.log &
【或nohup /home/hadoop/hive/bin/hiveserver2 &> hiveserver2.log &】
客户端调用
!connect jdbc:hive2://xxxxx:2181,xxxx:2181/;serviceDiscoveryMode=zookeeper user pass
hive 命令行模式
/home/hadoop/hive/bin/hive
或者输入
hive --service cli
hive web界面的 (端口号9999) 启动方式
/home/hadoop/hive/bin/hive --service hwi&
用于通过浏览器来访问hive
http://hadoop001.xiazy.net:9999/hwi/
【软件包】
http://pan.baidu.com/s/1hqF4utI
##########################################################
==========================================================
【备注】
启动hadoop jobhistory历史服务器
/home/hadoop/hadoop/sbin/mr-jobhistory-daemon.sh start historyserver
查看namenode状态
hdfs haadmin -getServiceState <serviceId>
开启垃圾箱功能后,如果希望文件直接被删除,可以使用如下命令:
hadoop fs -rm -skipTrash /xxx
命令hadoop fs –safemode get 查看安全模式状态
命令hadoop fs –safemode enter 进入安全模式状态
命令hadoop fs –safemode leave 离开安全模式状态
启动所有HRegionServer
hbase-daemons.sh start regionserver
启动单个HRegionServer
hbase-daemon.sh start regionserver
############################################################
============================================================
参考资料:
【Hdfs的NameNode HA、Yarn的Resouce Manager HA】
HBase+ZooKeeper+Hadoop2.6.0的ResourceManager HA集群高可用配置
http://www.aboutyun.com/thread-11909-1-1.html
【Hbase HA】
http://www.cnblogs.com/junrong624/p/3580477.html
http://www.07net01.com/linux/Hba ... 269_1377277861.html
【Hive HA】
Hive HA使用说明及Hive使用HAProxy配置HA(高可用)
http://www.aboutyun.com/thread-10938-1-1.html
Hive安装及使用攻略
http://blog.fens.me/hadoop-hive-intro/
Hive metastore三种配置方式
http://blog.csdn.net/reesun/article/details/8556078
Hive学习之HiveServer2服务端配置与启动
http://www.aboutyun.com/thread-12278-1-1.html
Hive学习之HiveServer2 JDBC客户端
http://blog.csdn.net/skywalker_only/article/details/38366347
Hive内置服务介绍
http://www.aboutyun.com/thread-7438-1-1.html
使用Hive命令行及内置服务
http://www.aboutyun.com/thread-12280-1-1.html
Hive配置文件中配置项的含义详解(收藏版)
http://www.aboutyun.com/thread-7548-1-1.html
hbase0.96与hive0.12整合高可靠文档及问题总结
http://www.aboutyun.com/thread-7881-1-1.html
【hadoop系列】
http://www.cnblogs.com/junrong624/category/537234.html
【其他】
HBase 默认配置说明
http://www.aboutyun.com/thread-7914-1-1.html
HBASE启动脚本/Shell解析
http://zjushch.iteye.com/blog/1736065
全面了解hive
http://www.aboutyun.com/thread-7478-1-1.html
Hadoop添加删除节点
http://my.oschina.net/MrMichael/blog/291802
粉丝日志
http://blog.fens.me/
|
|