热度 1
进入官网页面:http://hbase.apache.org/book.html#java
找到与 hadoop 版本对应的 hbase 并下载,我们这里选择的是HBase1.2.2
安装包下载地址:
http://mirrors.hust.edu.cn/apache/hbase/1.2.2/
下载安装包,并将安装包上传到节点1下的/home/hadoop路径下:
hbase-1.2.2-bin.tar.gz
[hadoop@hadoop1 ~]$ tar zxvf hbase-1.2.2-bin.tar.gz
进入 hbase 的 lib 目录,查看 hadoop jar 包的版本
[hadoop@hadoop1 lib]$ find -name 'hadoop*jar'
./hadoop-mapreduce-client-core-2.5.1.jar
./hadoop-yarn-server-common-2.5.1.jar
./hadoop-mapreduce-client-app-2.5.1.jar
./hadoop-yarn-common-2.5.1.jar
./hadoop-yarn-client-2.5.1.jar
./hadoop-auth-2.5.1.jar
./hadoop-mapreduce-client-jobclient-2.5.1.jar
./hadoop-mapreduce-client-common-2.5.1.jar
./hadoop-hdfs-2.5.1.jar
./hadoop-yarn-api-2.5.1.jar
./hadoop-common-2.5.1.jar
./hadoop-annotations-2.5.1.jar
./hadoop-mapreduce-client-shuffle-2.5.1.jar
./hadoop-client-2.5.1.jar
[hadoop@hadoop1 bin]$ hadoop version
Hadoop 2.6.4
Subversion Unknown -r Unknown
Compiled by root on 2016-07-13T09:54Z
Compiled with protoc 2.5.0
From source with checksum 8dee2286ecdbbbc930a6c87b65cbc010
This command was run using /home/hadoop/hadoop-2.6.4/share/hadoop/common/hadoop-common-2.6.4.jar
发现与 hadoop 集群的版本号不一致,需要用 hadoop 目录下的 jar 替换 hbase/lib 目录下的 jar 文件, 所有节点都需替换。
编写脚本来完成替换,如下所示:
[hadoop@hadoop1 lib]$ vim f.sh
find -name "hadoop*jar" | sed 's/2.5.1/2.6.4/g' | sed 's/\.\///g' > f.log
rm ./hadoop*jar
cat ./f.log | while read Line
do
find /home/hadoop/hadoop-2.6.4/share/hadoop -name "$Line" | xargs -i cp {} ./
done
rm ./f.log
[hadoop@hadoop1 lib]$ chmod u+x f.sh
[hadoop@hadoop1 lib]$ ./f.sh
[hadoop@hadoop1 lib]$ find -name 'hadoop*jar'
OK,jar 包替换成功;
或者直接执行下面的操作:
删除旧的JAR包:
[hadoop@hadoop1 lib]$ rm ./hadoop*jar
将hadoop下的jar包复制到hbase下
[hadoop@hadoop1 lib]$find /home/hadoop/hadoop-2.6.4/share/hadoop -name 'hadoop*jar' | xargs -i cp {} ./
#######################
hbase/lib 目录下还有个 slf4j-log4j12-XXX.jar,在机器有装hadoop时,由于classpath中会有hadoop中的这个jar包,会有冲突,直接删除掉
[hadoop@hadoop1 lib]$ rm `find -name 'slf4j-log4j12-*jar'
以上部分存在问题,应该用hadoop下的slf4j-log4j12-XXX.jar替换hbase的
######################################
#cd /home/hadoop/hadoop-2.6.4/share/hadoop/common/lib
[hadoop@hadoop1 lib]$ cp slf4j-* /home/hadoop/hbase-1.2.2/lib/
[hadoop@hadoop1 lib]$ scp slf4j-* hadoop@hadoop2:/home/hadoop/hbase-1.2.2/lib/
slf4j-api-1.7.5.jar 100% 25KB 25.5KB/s 00:00
slf4j-log4j12-1.7.5.jar 100% 8869 8.7KB/s 00:00
[hadoop@hadoop1 lib]$ scp slf4j-* hadoop@hadoop3:/home/hadoop/hbase-1.2.2/lib/
slf4j-api-1.7.5.jar 100% 25KB 25.5KB/s 00:00
slf4j-log4j12-1.7.5.jar 100% 8869 8.7KB/s 00:00
[hadoop@hadoop1 lib]$
2、修改配置:[hadoop@hadoop1 conf]$ vi hbase-env.sh
export JAVA_HOME=/opt/jdk1.7.0_79
export HBASE_CLASSPATH=/home/hadoop/hadoop-2.6.4/etc/hadoop
#########hadoop配置文件路径
# export HBASE_MANAGES_ZK=true 这项没有配置,没有使用hbase自带的zookeeper
本集群采用的单独安装的zk,在配置中需要设置为false
export HBASE_MANAGES_ZK=false
[hadoop@hadoop1 conf]$ vi hbase-site.xml
<property>
<name>hbase.master</name>
<value>192.168.72.131:6000</value>
</property>
<property>
<name>hbase.master.maxclockskew</name>
<value>180000</value>
</property>
<property>
<name>hbase.rootdir</name>
<value>hdfs://192.168.72.131:9000/hbase</value>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
<name>hbase.tmp.dir</name>
<value>/home/hadoop/hbase-1.2.2/tmp</value>
</property>
<property>
<name>hbase.master.info.port</name>
<value>60010</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>hadoop1,hadoop2,hadoop3</value>
</property>
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/home/hadoop/zookeeper-3.4.8/tmp/data</value>
</property>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
其中,hbase.master是指定运行HMaster的服务器及端口号;hbase.master.maxclockskew是用来防止HBase节点之间时间不一致造成regionserver启动失败,默认值是30000;hbase.rootdir指定HBase的存储目录;hbase.cluster.distributed设置集群处于分布式模式;hbase.zookeeper.quorum设置Zookeeper节点的主机名,它的值个数必须是奇数;hbase.zookeeper.property.dataDir设置Zookeeper的目录,默认为/tmp,dfs.replication设置数据备份数,集群节点小于3时需要修改,所以修改为2。
[hadoop@hadoop1 conf]$ vi regionservers
hadoop2
hadoop3
配置环境变量:/etc/profile
export HBASE_HOME=/home/hadoop/hbase-1.2.2
export PATH=$PATH:$HBASE_HOME/bin
分发hbase到其他节点:
[hadoop@hadoop1 ~]$ scp -r hbase-1.2.2 hadoop@hadoop2:/home/hadoop/
[hadoop@hadoop1 ~]$ scp -r hbase-1.2.2 hadoop@hadoop3:/home/hadoop/
3、启动HBASE[hadoop@hadoop1 bin]$ ./start-hbase.sh
单独启动master节点的服务:
[hadoop@hadoop3 bin]$ ./hbase-daemon.sh start regionserver
启动时报错:
Caused by: java.lang.NoClassDefFoundError: com/amazonaws/auth/AWSCredentialsProvider
缺少该aws jar包:aws-java-sdk-1.7.4.1
从网下下载,复制到hbase 的lib目录下即可
启动时报错:
Caused by: java.lang.ClassNotFoundException: org.htrace.Trace
是因为缺少htrace包,将hadoop下的包复制过去
[hadoop@hadoop1 lib]$ cp htrace-core-3.0.4.jar /home/hadoop/hbase-1.2.2/lib/
[hadoop@hadoop1 lib]$ scp htrace-core-3.0.4.jar hadoop@hadoop2:/home/hadoop/hbase-1.2.2/lib/
htrace-core-3.0.4.jar 100% 30KB 30.5KB/s 00:00
[hadoop@hadoop1 lib]$ scp htrace-core-3.0.4.jar hadoop@hadoop3:/home/hadoop/hbase-1.2.2/lib/
启动时节点3报错,服务无法启动:
/hbase/WALs/hadoop3,16020,1477533677583-splitting is non empty': Directory is not empty
处理办法:
将对应的路径删除
[hadoop@hadoop3 current]$ hadoop fs -rm -r /hbase/WALs
测试:
前台查看数据库状态
http://192.168.72.131:60010/master-status
查看hbase数据库:
[hadoop@hadoop1 bin]$ hbase shell
查看帮助:
hbase(main):001:0> help
状态查看:
hbase(main):002:0> status
列出表:
hbase(main):012:0> list
创建表:
hbase(main):013:0> create 'cf','name','sex','edu'
查看表结构:
hbase(main):016:0> desc "cf"
在建表时,报CF已经存在,这是因为之前安装过,建过一个同样的表,需要删除,HDFS上和Hbase相关的东西都已经删除了。有可能是zookeeper的原因导致,进入HMaster节点,执行,zkCli.sh -server 192.168.72.131:2181
执行:[zk: 192.168.72.131:2181(CONNECTED) 1] ls /hbase/table
[hbase:meta, hbase:namespace, cf]
zookeeper中显示依然存在,删除cf,
[zk: 192.168.72.131:2181(CONNECTED) 3] rmr /hbase/table/cf
[zk: 192.168.72.131:2181(CONNECTED) 4] ls /hbase/table
[hbase:meta, hbase:namespace]
可以看到已经不存在了。
重新进入hbase使用create即可成功
插入数据:
语法:put <table>,<rowkey>,<family:column>,<value>,<timestamp>
hbase(main):024:0> put 'cf','one','name','chenfeng'
查询:
语法:get <table>,<rowkey>,[<family:column>,....]
查询某一列的数据:
scan 'word',COLUMNS => 'f1'
hbase hbck 检查数据
hbase org.apache.hadoop.hbase.mapreduce.ImportTsv - Dimporttsv.columns=info:userid,HBASE_ROW_KEY,info:netid test2 /application/logAnalyse/test/test3.dat
hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.columns=num,aa,HBASE_ROW_KEY,num word /data/input/word.txt
hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.columns='HBASE_ROW_KEY,f1,f2' word /home/hadoop/word.txt
以第二列数据作为行键:
hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.columns=f1,HBASE_ROW_KEY word /home/hadoop/word.txt
查某一列的数据:
scan 'word',COLUMNS => 'f3'
查某一时间戳数据:
scan 'word',TIMESTAMP=>1477548320766
hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.columns=f1,HBASE_ROW_KEY,f3 word /home/hadoop/word.txt
报以下信息:
2016-09-26 11:15:56,285 INFO [main] client.RMProxy: Connecting to ResourceManager at localhost/127.0.0.1:18032
2016-09-26 11:15:56,739 INFO [main] Configuration.deprecation: io.bytes.per.checksum is deprecated. Instead, use dfs.bytes-per-checksum
2016-09-26 11:15:58,098 INFO [main] ipc.Client: Retrying connect to server: localhost/127.0.0.1:18032. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2016-09-26 11:15:59,100 INFO [main] ipc.Client: Retrying connect to server: localhost/127.0.0.1:18032. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2016-09-26 11:16:00,102 INFO [main] ipc.Client: Retrying connect to server: localhost/127.0.0.1:18032. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2016-09-26 11:16:01,103 INFO [main] ipc.Client: Retrying connect to server: localhost/127.0.0.1:18032. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
最后检查发现,hadoop中的yarn配置信息resourcemanager 地址端口和hbase中调用的resourcemanager地址端口信息不一致,HBase中调用的是yarn-site.xml中的默认的信息,不是手工配置的信息,最后将hadoop配置下的yarn-site.xml文件复制到hbase的配置文件路径下,重启hbase问题解决。
将文件导入到HBASE中:
在hbase 中创建表:
hbase(main):002:0> create 'hbase_test','info'
0 row(s) in 1.3120 seconds
文件ftp到hadoop机器上,并导入到hdfs中
hadoop fs -mkdir -p /data/input
[hadoop@hadoop1 bin]$ cd /home/hadoop/
[hadoop@hadoop1 ~]$ hadoop fs -put hs_alt_chr2.fa /data/input/
[hadoop@hadoop1 conf]$ hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.columns=HBASE_ROW_KEY,info -Dimporttsv.separator=, hbase_test /data/input/hs_alt_chr2.fa
搜索
复制