不可替代 发表于 2018-4-3 08:18:08

一、Cloudera manager 与CDH 的搭建(工作)

      
      

      
                        cm+CDH搭建
作者:(鸟叔)见本人博客
环境准备

LinuxCentOS7
jdk1.8
Python2.7

安装组件准备

CDH-5.7.2-1.cdh5.7.2.p0.18-el7.parcel
CDH-5.7.2-1.cdh5.7.2.p0.18-el7.parcel.sha1
cloudera-manager-centos7-cm5.7.2_x86_64.tar.gz
manifest.json
MySQL-5.6.26-1.linux_glibc2.5.x86_64.rpm-bundle.tar
mysql-connector-java-5.1.45.tar.gz

系统找回密码
修改了root密码,步骤如下:
步骤一:在启动虚拟机出现如下界面的时候就按“e”键

步骤二:在步骤一按下”e”键之后,按 ↓键一直到底部找到“LANG=zh_CN.UTF-8”这句,在这句后面加上“init=/bin/sh”,然后按Ctrl+x。

步骤三:挂载文件系统为可写模式:mount –o remount,rw /

步骤四:执行passwd命令,修改root密码,密码要输入两次要求两次密码要一致。


步骤五:如果之前系统启用了selinux,必须执行以下命令,否则将无法正常启动系统:touch /.autorelabel。然后执行命令exec /sbin/init来正常启动,或者用命令exec /sbin/reboot重启就OK了。

设置为终端
systemctl set-default multi-user.target//设置成命令模式
systemctl set-default graphical.target//设置成图形模式


=================================================================
部署Cloudera Manager
0 环境准备一、克隆虚拟机(CentOS7)1.修改IP
vim /etc/sysconfig/network-scripts/ifcfg-ens33
2.修改域名
vim /etc/sysconfig/network
vim /etc/hostname
3.域名解析
vim /etc/hosts
4.重启
reboot

5.禁用ipv6
ipv6.disable=1 crashkernel=auto
如下:
GRUB_CMDLINE_LINUX=”ipv6.disable=1 crashkernel=auto rd.lvm.lv=cl/root rd.lvm.lv=cl/swap rhgb quiet”
GRUB_DISABLE_RECOVERY=”true”

6.防火墙处理
# systemctl stop firewalld

# systemctl disable firewalld

# systemctl status firewalld


SELINUX

# sed -i “s/SELINUX=enforcing/SELINUX=disabled/” /etc/selinux/config

# reboot now
二、重启后使用如下命令检查一下:
# getenforce

# sestatus -v


三、JDKStep1、使用yum命令安装java
# yum -y install java
Step2、卸载OpenJDK
# rpm -qa | grep ‘java’
# rpm -e –nodeps java-1.8.0-openjdk-headless-1.8.0.131-3.b12.el7_3.x86_64
# rpm -e –nodeps java-1.8.0-openjdk-1.8.0.131-3.b12.el7_3.x86_64
然后重启机器,使用java -verion验证
Step3、安装下载的JDK的RPM包
# rpm -ivh /opt/softwares/jdk-8u131-linux-x64.rpm
四、首先克隆出2台机器,把需要变化的东西修改一下,然后准备配(设置无密登录)在Server主节点配置:
$ ssh-keygen-t rsa
$ ssh-copy-id hadoop102
$ ssh-copy-id hadoop103
$ ssh-copy-id hadoop104
五、NTP时间同步Step1、首先在主节点上同步一下 时间
# ntpdate cn.pool.ntp.org
Step2、编辑配置文件
# vi /etc/ntp.conf
Server和NameNode主节点
然后,主节点:
# systemctl startntpd.service
# systemctl enable ntpd.service
其他节点:
# systemctl stopntpd.service
# systemctl disable ntpd.service
在其他节点配置定时任务,用于定时同步时间:
# crontab -e
*/10 * * * * /usr/sbin/ntpdate hadoop102
重启定时任务:
# systemctl restartcrond.service
设置BIOS时钟:系统关机时把内存中的系统时间写入并修改主板时间,重新启动系统时,系统时间会与硬件时间同步,从而保证时间的一致性。
Step1、修改ntpd文件
# vi /etc/sysconfig/ntpd
注意注释下边那一行添加的内容SYNC_HWCLOCK=yes
Step2、修改ntpdate文件
# vi /etc/sysconfig/ntpdate
最后一行改为了yes
注意3台机器都要有这些操作。
六、设置用户最大可打开文件数,进程数,内存占用# ulimit -a,查看当前系统的上述配置的上限,unlimited为无上限,
*               soft    nofile         32728
*               hard    nofile         1024999
*               soft    nproc            65535
*               hard    nproc            unlimited
*               soft    memlock          unlimited
*               hard    memlock          unlimited

修改参数:
# vi /etc/security/limits.conf
配置完成后重启机器,*代表所有用户,@表示对某个用户组生效,直接写用户名表示对某个用户生效
提示:3台机器都需要配置重启一下
提示:如果这个文件配置出现错误,重启后,该机器节点将不能再使用。需要进入单用户模式修复,具体请百度之。
七、安装mysqlcentos7自带的是mariadb,需要先卸载掉
# rpm -qa | grep mariadb
mariadb-libs-5.5.41-2.el7_0.x86_64
# rpm -e –nodeps mariadb-libs-5.5.41-2.el7_0.x86_64
将下载好的mysql rpm包拷贝到服务器上然后解压
# tar -xvfMySQL-5.6.24-1.linux_glibc2.5.x86_64.rpm-bundle.tar

然后安装释出的全部rpm:rpm -ivh MySQL-*.rpm
修改配置文件路径:cp /usr/share/mysql/my-default.cnf /etc/my.cnf
在配置文件中增加以下配置并保存
#vim /etc/my.cnf

default-storage-engine = innodb
innodb_file_per_table
collation-server = utf8_general_ci
init-connect = ‘SET NAMES utf8’
character-set-server = utf8
以上就是安装好了。然后就初始化mysql
然后初始化数据库执行
#/usr/bin/mysql_install_db
注意这个时候我遇到以下问题:
“FATAL ERROR: please install the following Perl modules before executing /usr/local/mysql/scripts/mysql_install_db:
Data::Dumper ”
经过查询需要安装perl-Module
# yum install -y perl-Module-Install.noarch
等待安装完了然后就可以执行上面的初始化语句了

– 启动mysql
# service mysql restart
ERROR! MySQL server PID file could not be found!
Starting MySQL… SUCCESS!
– 查看mysql root初始化密码
# cat /root/.mysql_secret
# The random password set for the root user at Fri Sep 16 11:13:25 2016 (local time): 9mp7uYFmgt6drdq3

– 登录进行去更改密码
# mysql -u root -p
mysql> SET PASSWORD=PASSWORD(‘123456′);

– 允许mysql远程访问
mysql> update user set host=’%’ where user=’root’ and host=’localhost’;
Query OK, 1 row affected (0.05 sec)
Rows matched: 1Changed: 1Warnings: 0
mysql> flush privileges;
Query OK, 0 rows affected (0.00 sec)

– 配置开机启动
# chkconfig mysql on

拷贝mysql-connector-java到各个节点指定目录下(所有的节点)
# cp mysql-connector-java-5.1.36-bin.jar /usr/share/java/mysql-connector-java.jar

创建数据库
create database hive DEFAULT CHARSET utf8 COLLATE utf8_general_ci;
Query OK, 1 row affected (0.00 sec)

create database amon DEFAULT CHARSET utf8 COLLATE utf8_general_ci;
Query OK, 1 row affected (0.00 sec)

create database hue DEFAULT CHARSET utf8 COLLATE utf8_general_ci;
Query OK, 1 row affected (0.00 sec)

create database monitor DEFAULT CHARSET utf8 COLLATE utf8_general_ci;
Query OK, 1 row affected (0.00 sec)

create database oozie DEFAULT CHARSET utf8 COLLATE utf8_general_ci;
Query OK, 1 row affected (0.00 sec)

grant all on *.* to root@”%” Identified by “123456”;

八、安装CDH依赖# yum -y install chkconfig python bind-utils psmisc libxslt zlib sqlite cyrus-sasl-plain cyrus-sasl-gssapi fuseportmap fuse-libs redhat-lsb
二、安装Cloudera-Manager
[*]解压cm tar包到指定目录所有服务器都要(或者在主节点解压好,然后通过scp到各个节点同一目录下)
#mkdir /opt/cloudera-manager# tar -axvf cloudera-manager-centos7-cm5.7.2_x86_64.tar.gz -C /opt/cloudera-manager


[*]创建cloudera-scm用户(所有节点)
# useradd –system –home=/opt/cloudera-manager/cm-5.7.2/run/cloudera-scm-server –no-create-home –shell=/bin/false –comment “Cloudera SCM User” cloudera-scm


[*]在主节点创建cloudera-manager-server的本地元数据保存目录
# mkdir /var/cloudera-scm-server# chown cloudera-scm:cloudera-scm /var/cloudera-scm-server# chown cloudera-scm:cloudera-scm /opt/cloudera-manager


[*]配置从节点cloudera-manger-agent指向主节点服务器
vim /opt/cloudera-manager/cm-5.7.2/etc/cloudera-scm-agent/config.ini将server_host改为CMS所在的主机名即hadoop1


[*]主节点中创建parcel-repo仓库目录
# mkdir -p /opt/cloudera/parcel-repo# chown cloudera-scm:cloudera-scm /opt/cloudera/parcel-repo# cp CDH-5.7.2-1.cdh5.7.2.p0.18-el7.parcel CDH-5.7.2-1.cdh5.7.2.p0.18-el7.parcel.sha manifest.json /opt/cloudera/parcel-repo注意:其中CDH-5.7.2-1.cdh5.7.2.p0.18-el5.parcel.sha1 后缀要把1去掉


[*]所有节点创建parcels目录
# mkdir -p /opt/cloudera/parcels# chown cloudera-scm:cloudera-scm /opt/cloudera/parcels解释:Clouder-Manager将CDHs从主节点的/opt/cloudera/parcel-repo目录中抽取出来,分发解压激活到各个节点的/opt/cloudera/parcels目录中


[*]初始脚本配置数据库scm_prepare_database.sh(在主节点上)
# /opt/cloudera-manager/cm-5.7.2/share/cmf/schema/scm_prepare_database.sh mysql -hhadoop1 -uroot -p123456 –scm-host hadoop1 scmdbn scmdbu scmdbp说明:这个脚本就是用来创建和配置CMS需要的数据库的脚本。各参数是指:mysql:数据库用的是mysql,如果安装过程中用的oracle,那么该参数就应该改为oracle。-hhadoop1:数据库建立在hadoop1主机上面。也就是主节点上面。-uroot:root身份运行mysql。-123456:mysql的root密码是***。–scm-host hadoop1:CMS的主机,一般是和mysql安装的主机是在同一个主机上。最后三个参数是:数据库名,数据库用户名,数据库密码。 ERROR com.cloudera.enterprise.dbutil.DbProvisioner– Exception when creating/dropping database with user ‘root’ and jdbc url ‘jdbc:mysql://localhost/?useUnicode=true&characterEncoding=UTF-8’java.sql.SQLException: Access denied for user ‘root’@’localhost’ (using password: YES)

这里我也遇到以下另一个问题
ERROR com.cloudera.enterprise.dbutil.DbProvisioner– Exception when creating/dropping database with user ‘root’ and jdbc url ‘jdbc:mysql://localhost/?useUnicode=true&characterEncoding=UTF-8’java.sql.SQLException: Your password has expired. To log in you must change it using a client that supports expired passwords 这里可以重新设置mysql的数据,然后刷新,或者直接将过期设置不检测mysql> update user set password_expired=’N’ where user=’root’; mysql> flush privileges;


[*]启动主节点cloudera-scm-server
# cp /opt/cloudera-manager/cm-5.7.2/etc/init.d/cloudera-scm-server /etc/init.d/cloudera-scm-server#chkconfig cloudera-scm-server on

此时service cloudera-scm-serverstart的话会报错:“File not found: /usr/sbin/cmf-server”,因为cloudera-scm-server里面的变量路径配置不正确!

# vim /etc/init.d/cloudera-scm-serverCMF_DEFAULTS=${CMF_DEFAULTS:-/etc/default}改为=/opt/cloudera-manager/cm-5.7.2/etc/default此时service cloudera-scm-server start就不会报错了同时为了保证在每次服务器重启的时候都能启动cloudera-scm-server,应该在开机启动脚本/etc/rc.local中加入命令:service cloudera-scm-server restart

启动cloudera-scm-agent所有节点

# mkdir /opt/cloudera-manager/cm-5.7.2/run/cloudera-scm-agent# cp /opt/cloudera-manager/cm-5.7.2/etc/init.d/cloudera-scm-agent /etc/init.d/cloudera-scm-agent# chkconfig cloudera-scm-agent on同样此时service cloudera-scm-agent start的话会报错:File not found: /usr/sbin/cmf-agent,因为cloudera-scm-agent里面的变量路径配置不正确!参照cms的配置同时为了保证在每次服务器重启的时候都能启动cloudera-scm-agent,应该在开机启动脚本/etc/rc.local中加入命令:service cloudera-scm-agent restart   到此如果一直卡在则需要优化客户端的python配置


/opt/cloudera-manager/cm-5.7.2/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.7.2-py2.7.egg/cmf/client_configs.pyYou have new mail in /var/spool/mail/root

把该文件从第443行到459行替换成如下内容:

if line.startswith(“/”):if len(line.rstrip().split(” “))<=4:
path, _, _, priority_str = line.rstrip().split(” “)

# Ignore the alternative if it’s not managed by CM.
if CM_MAGIC_PREFIX not in os.path.basename(path):
continue

try:
priority = int(priority_str)
except ValueError:
THROTTLED_LOG.info(“Failed to parse %s: %s”, name, line)

key = ClientConfigKey(name, path)
value = ClientConfigValue(priority, self._read_generation(path))
ret = value

else:
pass
return ret



重启agent即可 到此安装结束 Web端展示图   后续工作就是集成组件注意:1.去掉警告 echo 10 > /proc/sys/vm/swappiness把10 修改为web UI提示的值 (每台节点都需要做)      2.去掉警告 echo never > /sys/kernel/mm/transparent_hugepage/defrag 把 madvise never 修改为nerver,并把同一命添加到初始化脚本(/etc/rc.local)中,在重启时可以初始化该参数 (每台都需要)


echo “echo never >> /sys/kernel/mm/transparent_hugepage/defrag” >> /etc/rc.local

十、配置hdfs的HAHdfs -> 操作 -> 启用High Availability
十一、flume的测试1.编写代码flume的配置

# Name the components on this agenta2.sources = r2a2.sinks = k2a2.channels = c2 # Describe/configure the sourcea2.sources.r2.type = execa2.sources.r2.command = tail -F /opt/test/calllog.csva2.sources.r2.shell = /bin/bash -c # Describe the sinka2.sinks.k2.type = hdfs
a2.sinks.k2.hdfs.path = hdfs://hadoop102:8020/flume
#上传文件的前缀a2.sinks.k2.hdfs.filePrefix = logs-#是否按照时间滚动文件夹a2.sinks.k2.hdfs.round = true
#多少时间单位创建一个新的文件夹
a2.sinks.k2.hdfs.roundValue = 1
#是否使用本地时间戳
a2.sinks.k2.hdfs.useLocalTimeStamp = true
#积攒多少个Event才flush到HDFS一次
a2.sinks.k2.hdfs.batchSize = 1000
#设置文件类型,可支持压缩
a2.sinks.k2.hdfs.fileType = DataStream
#多久生成一个新的文件
a2.sinks.k2.hdfs.rollInterval = 600
#设置每个文件的滚动大小
a2.sinks.k2.hdfs.rollSize = 134217700
#文件的滚动与Event数量无关
a2.sinks.k2.hdfs.rollCount = 0
#最小冗余数
a2.sinks.k2.hdfs.minBlockReplicas = 1
# Use a channel which buffers events in memorya2.channels.c2.type = memorya2.channels.c2.capacity = 1000
a2.channels.c2.transactionCapacity = 100
# Bind the source and sink to the channel
a2.sources.r2.channels = c2
a2.sinks.k2.channel = c2

2.编写sh脚本

#!/bin/bashjava -cp /opt/test/ct_producer-1.0-SNAPSHOT.jar com.china.Producer/opt/test/calllog.csv
3.测试包(代码编写省)

ct_producer-1.0-SNAPSHOT.jar   (注:由于网站不支持jar格式,下载后去掉”.txt”为”.jar” 格式)
4.执行sh脚本,查看hdfs的目录 提示:注意为了防止权限问题,使用flume用户附:hdfs创建flume目录的命令,及修改权限命令

sudo -uhdfs hadoop fs -mkdir /flume


sudo -uhdfs hadoop fs -chown -R flume:flume /flume
附送:查看flume配置的命令

find / -name “flume.conf” -printfind / -name “bin/flume-ng” -print

十二、 flume + kafka测试
启动hdfs + flume + zookeeper + kafka

(1)修改flume配置

# Name the components on this agenta2.sources = r2a2.sinks = k2a2.channels = c2
# Describe/configure the source
a2.sources.r2.type = execa2.sources.r2.command = tail -F /opt/test/calllog.csv
a2.sources.r2.shell = /bin/bash -c # Describe the sinka2.sinks.k2.type =org.apache.flume.sink.kafka.KafkaSink
a2.sinks.k2.brokerList = hadoop103:9092,hadoop104:9092
a2.sinks.k2.topic = calllog
a2.sinks.k2.batchSize = 20
a2.sinks.k2.requiredAcks = 1
# channel
a2.channels.c2.type = memory
a2.channels.c2.capacity = 1000
a2.channels.c2.transactionCapacity = 100
# binda2.sources.r2.channels = c2
a2.sinks.k2.channel = c2

(2)kafka 启动消费者

./kafka-console-consumer.sh –zookeeper hadoop102:2181,hadoop103:2181,hadoop104:2181 –topic calllog –from-beginning
(3)运行sh脚本

./p.sh
(4)查看(2)启动的消费者 附送:查找kafka-console-consumer.sh脚本的命令

find / -name “kafka-console-consumer.sh” -print
/opt/cloudera/parcels/KAFKA-3.0.0-1.3.0.0.p0.40/lib/kafka/bin/kafka-console-consumer.sh
提示: kafka重要路径bin目录 位置

/opt/cloudera/parcels/KAFKA-3.0.0-1.3.0.0.p0.40/lib/kafka/bin
Conf目录位置

/etc/kafka/conf
Broker目录位置

/opt/cloudera-manager/cm-5.7.2/run/cloudera-scm-agent/process
Spark + kafka的github demo

https://github.com/xlturing/spark-journey/tree/master/SparkStreamingKafka/src/main/scala/com/sparkstreaming/main
十三、flume + kafka + spark 测试
(1)cloudera manager的目录

/opt/cloudera/parcels/CDH-5.7.2-1.cdh5.7.2.p0.18/

(2)可以配置spark HA 可用

页: [1]
查看完整版本: 一、Cloudera manager 与CDH 的搭建(工作)