分享

hbase问题总结及解决办法

tntzbzc 发表于 2015-6-30 00:46:11 [显示全部楼层] 回帖奖励 阅读模式 关闭右栏 2 52256
本帖最后由 tntzbzc 于 2015-6-30 00:47 编辑

1.zookeeper启动报错

错误日志

启动zookeeper报错信息如下:
[mw_shl_code=bash,true]java.net.NoRouteToHostException: No route to host
        at java.net.PlainSocketImpl.socketConnect(Native Method)
        at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
        at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
        at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
        at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
        at java.net.Socket.connect(Socket.java:579)
        at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:368)
        at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:402)
        at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:840)
        at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:762)
2015-05-19 10:26:26,983 [myid:1] - INFO  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@849] - Notification time out: 800[/mw_shl_code]


解决方法

此问题产生的主要原因是因为zookeeper集群未关闭防火墙。
执行下面命令后仍然报上面的错误:
systemctl start iptables.service
经过仔细查找后发现,CentOS 7.0默认使用的是firewall作为防火墙,需要执行如下命令关闭防火墙:
systemctl stop firewalld.service #停止firewall
systemctl disable firewalld.service #禁止firewall开机启动
关闭各个节点防火墙后,重启zookeeper进程,就可以解决上述问题了。


2.RegionServer进程挂掉

错误日志
[mw_shl_code=bash,true]Caused by: java.io.IOException: Couldn't set up IO streams
    at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:786)
    at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:368)
    at org.apache.hadoop.ipc.Client.getConnection(Client.java:1521)
    at org.apache.hadoop.ipc.Client.call(Client.java:1438)
    ... 60 more
Caused by: java.lang.OutOfMemoryError: unable to create new native thread
    at java.lang.Thread.start0(Native Method)
    at java.lang.Thread.start(Thread.java:713)
    at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:779)[/mw_shl_code]

解决方法

/etc/security/limits.conf

/etc/security/limits.d/90-nproc.conf
[mw_shl_code=bash,true]*       hard    nproc   65536
*       soft    nproc   65536
*       hard    nofile  65536
*       soft    nofile  65536[/mw_shl_code]

3.增加thrift server的线程数

日志信息
[mw_shl_code=bash,true]2015-06-05 12:46:37,756 INFO  [thrift-worker-6] client.AsyncProcess: #79, waiting for 72000  actions to finish
2015-06-05 12:46:37,756 INFO  [thrift-worker-9] client.AsyncProcess: #79, waiting for 48908  actions to finish
2015-06-05 12:46:37,855 INFO  [thrift-worker-8] client.AsyncProcess: #79, waiting for 72000  actions to finish
2015-06-05 12:46:38,198 INFO  [thrift-worker-2] client.AsyncProcess: #1, waiting for 78000  actions to finish
2015-06-05 12:46:38,762 INFO  [thrift-worker-13] client.AsyncProcess: #79, waiting for 72000  actions to finish
2015-06-05 12:46:39,547 INFO  [thrift-worker-0] client.AsyncProcess: #17, waiting for 78000  actions to finish
2015-06-05 12:47:55,612 INFO  [thrift-worker-9] client.AsyncProcess: #79, waiting for 108000  actions to finish
2015-06-05 12:47:55,912 INFO  [thrift-worker-6] client.AsyncProcess: #79, waiting for 114000  actions to finish[/mw_shl_code]



解决方法

增加thriftServer线程数

hbase-daemon.sh start thrift --threadpool -m 200 -w 500

在hbase_home目录下的logs目录中可以看到启动日志信息如下:

[mw_shl_code=bash,true]INFO  [main] thrift.ThriftServerRunner: starting TBoundedThreadPoolServer on /0.0.0.0:9090; min worker threads=200, max worker threads=500, max queued requests=1000[/mw_shl_code]

[mw_shl_code=bash,true]hbase-daemon.sh start thrift –threadpool
-m[/mw_shl_code]


4. zookeeper使用内存大的问题

日志信息

jps 查看 QuorumPeerMain 进程IP
jmap -heap PID 查看进程使用内存情况,具体情况如下:
[mw_shl_code=bash,true]Attaching to process ID 6801, please wait...
Debugger attached successfully.
Server compiler detected.
JVM version is 24.71-b01

using thread-local object allocation.
Parallel GC with 18 thread(s)

Heap Configuration:
   MinHeapFreeRatio = 0
   MaxHeapFreeRatio = 100
   MaxHeapSize      = 32126271488 (30638.0MB)
   NewSize          = 1310720 (1.25MB)
   MaxNewSize       = 17592186044415 MB
   OldSize          = 5439488 (5.1875MB)
   NewRatio         = 2
   SurvivorRatio    = 8
   PermSize         = 21757952 (20.75MB)
   MaxPermSize      = 85983232 (82.0MB)
   G1HeapRegionSize = 0 (0.0MB)

Heap Usage:
PS Young Generation
Eden Space:
   capacity = 537919488 (513.0MB)
   used     = 290495976 (277.0385513305664MB)
   free     = 247423512 (235.9614486694336MB)
   54.00361624377513% used
From Space:
   capacity = 89128960 (85.0MB)
   used     = 0 (0.0MB)
   free     = 89128960 (85.0MB)
   0.0% used
To Space:
   capacity = 89128960 (85.0MB)
   used     = 0 (0.0MB)
   free     = 89128960 (85.0MB)
   0.0% used
PS Old Generation
   capacity = 1431306240 (1365.0MB)
   used     = 0 (0.0MB)
   free     = 1431306240 (1365.0MB)
   0.0% used
PS Perm Generation
   capacity = 22020096 (21.0MB)
   used     = 9655208 (9.207923889160156MB)
   free     = 12364888 (11.792076110839844MB)
   43.847256615048366% used

3259 interned Strings occupying 265592 bytes.[/mw_shl_code]



解决方法

方法一:

Heap Configuration配置中我们可以看到,配置的heap内存很大,现在我们修改zkServer.sh脚本减小MaxHeapSize,具体步骤如下:
* 打开zookeeper安装目录下bin文件夹中的zkServer.sh
* 在 zkServer.sh文件的49行处加入 JVMPARAM="-Xms1000M -Xmx1000M -Xmn512M"
* 然后修改zkServer.sh 109~110行出的内容。

修改前如下:
[mw_shl_code=bash,true]nohup "$JAVA" "-Dzookeeper.log.dir=${ZOO_LOG_DIR}" "-Dzookeeper.root.logger=${ZOO_LOG4J_PROP}" \
-cp "$CLASSPATH" $JVMFLAGS $ZOOMAIN "$ZOOCFG" > "$_ZOO_DAEMON_OUT" 2>&1 < /dev/null &[/mw_shl_code]
修改后如下(将在49上添加的JVMPARAM参数项添加在JVMFLAGS后面):

[mw_shl_code=bash,true] nohup "$JAVA" "-Dzookeeper.log.dir=${ZOO_LOG_DIR}" "-Dzookeeper.root.logger=${ZOO_LOG4J_PROP}" \
-cp "$CLASSPATH" $JVMFLAGS $JVMPARAM $ZOOMAIN "$ZOOCFG" > "$_ZOO_DAEMON_OUT" 2>&1 < /dev/null &[/mw_shl_code]


  • 重启zookeeper进程
  • jmap -heap PID 查看修改后进程使用内存情况:

[mw_shl_code=bash,true]Attaching to process ID 6207, please wait...
Debugger attached successfully.
Server compiler detected.
JVM version is 24.71-b01

using thread-local object allocation.
Parallel GC with 18 thread(s)

Heap Configuration:
   MinHeapFreeRatio = 0
   MaxHeapFreeRatio = 100
   MaxHeapSize      = 1048576000 (1000.0MB)
   NewSize          = 536870912 (512.0MB)
   MaxNewSize       = 536870912 (512.0MB)
   OldSize          = 5439488 (5.1875MB)
   NewRatio         = 2
   SurvivorRatio    = 8
   PermSize         = 21757952 (20.75MB)
   MaxPermSize      = 85983232 (82.0MB)
   G1HeapRegionSize = 0 (0.0MB)

Heap Usage:
PS Young Generation
Eden Space:
   capacity = 402653184 (384.0MB)
   used     = 104690912 (99.84103393554688MB)
   free     = 297962272 (284.1589660644531MB)
   26.000269254048664% used
From Space:
   capacity = 67108864 (64.0MB)
   used     = 0 (0.0MB)
   free     = 67108864 (64.0MB)
   0.0% used
To Space:
   capacity = 67108864 (64.0MB)
   used     = 0 (0.0MB)
   free     = 67108864 (64.0MB)
   0.0% used
PS Old Generation
   capacity = 511705088 (488.0MB)
   used     = 0 (0.0MB)
   free     = 511705088 (488.0MB)
   0.0% used
PS Perm Generation
   capacity = 22020096 (21.0MB)
   used     = 8878832 (8.467514038085938MB)
   free     = 13141264 (12.532485961914062MB)
   40.321495419456845% used

2697 interned Strings occupying 216760 bytes.[/mw_shl_code]


方法二:

打开 zookeeper/bin/zkEnv.sh 文件,在zkEvn.sh中49~52行处有如下内容:
[mw_shl_code=bash,true]if [ -f "$ZOOCFGDIR/java.env" ]
then
    . "$ZOOCFGDIR/java.env"
fi[/mw_shl_code]

该文件已经明确说明有独立JVM内存的设置文件,路径是zookeeper/conf/java.env
安装的时候这个路径下没有有java.env文件,需要自己新建一个:
* vim java.env
* java.env文件内容如下:

[mw_shl_code=bash,true]#!/bin/sh
# heap size MUST be modified according to cluster environment
export JVMFLAGS="-Xms512m -Xmx1024m $JVMFLAGS"[/mw_shl_code]

  • 重启zookeeper进程
  • jmap -heap PID 查看修改后进程使用内存情况:

[mw_shl_code=bash,true][hadoop@hadoop202 conf]$ jmap -heap  10151
Attaching to process ID 10151, please wait...
Debugger attached successfully.
Server compiler detected.
JVM version is 24.71-b01

using thread-local object allocation.
Parallel GC with 18 thread(s)

Heap Configuration:
   MinHeapFreeRatio = 0
   MaxHeapFreeRatio = 100
   MaxHeapSize      = 1073741824 (1024.0MB)
   NewSize          = 1310720 (1.25MB)
   MaxNewSize       = 17592186044415 MB
   OldSize          = 5439488 (5.1875MB)
   NewRatio         = 2
   SurvivorRatio    = 8
   PermSize         = 21757952 (20.75MB)
   MaxPermSize      = 85983232 (82.0MB)
   G1HeapRegionSize = 0 (0.0MB)

Heap Usage:
PS Young Generation
Eden Space:
   capacity = 135266304 (129.0MB)
   used     = 43287608 (41.28227996826172MB)
   free     = 91978696 (87.71772003173828MB)
   32.00176741725715% used
From Space:
   capacity = 22020096 (21.0MB)
   used     = 0 (0.0MB)
   free     = 22020096 (21.0MB)
   0.0% used
To Space:
   capacity = 22020096 (21.0MB)
   used     = 0 (0.0MB)
   free     = 22020096 (21.0MB)
   0.0% used
PS Old Generation
   capacity = 358088704 (341.5MB)
   used     = 0 (0.0MB)
   free     = 358088704 (341.5MB)
   0.0% used
PS Perm Generation
   capacity = 22020096 (21.0MB)
   used     = 8887040 (8.475341796875MB)
   free     = 13133056 (12.524658203125MB)
   40.358770461309526% used

2699 interned Strings occupying 217008 bytes.[/mw_shl_code]

5.habase启动错误及解决方案汇总


错误类型1:
错误内容如下:[mw_shl_code=bash,true]2013-11-12 23:28:31,575 WARN org.apache.zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnectjava.net.ConnectException: Connection refusedat sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:599)at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)2013-11-12 23:28:31,675 WARN org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper exception: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for hbasemaster2013-11-12 23:28:31,676 INFO org.apache.hadoop.hbase.util.RetryCounter: Sleeping 8000ms before retry #3...2013-11-12 23:28:31,676 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server slave002172.25.60.202:2181. Will not attempt to authenticate using SASL (无法定位登录配置) [/mw_shl_code]
解决办法:
每个机器都系统时间首先相同,版本一致。再有就是hbase-default.xml 这个文件把所有节点都配进去。

错误类型2:
在XP eclipse中 做J2ee程序连接到 远程 Hbase 。出现异常:

[mw_shl_code=bash,true][cpt-service]16:34:47,851 INFO ZooKeeper:100 - Client environment:java.io.tmpdir=/home/hadoop/apache-tomcat-6/apache-tomcat-6.0.37/temp[cpt-service]16:34:47,852 INFO ZooKeeper:100 - Client environment:java.compiler=<NA>[cpt-service]16:34:47,852 INFO ZooKeeper:100 - Client environment:os.name=Linux[cpt-service]16:34:47,852 INFO ZooKeeper:100 - Client environment:os.arch=i386[cpt-service]16:34:47,853 INFO ZooKeeper:100 - Client environment:os.version=2.6.18-prep[cpt-service]16:34:47,853 INFO ZooKeeper:100 - Client environment:user.name=hadoop[cpt-service]16:34:47,853 INFO ZooKeeper:100 - Client environment:user.home=/home/hadoop[cpt-service]16:34:47,854 INFO ZooKeeper:100 - Client environment:user.dir=/home/hadoop/apache-tomcat-6/apache-tomcat-6.0.37/logs[cpt-service]16:34:47,856 INFO ZooKeeper:438 - Initiating client connection, connectString=node192:2181,node198:2181,node152:2181 sessionTimeout=180000 watcher=hconnection[cpt-service]16:34:47,862 DEBUG ClientCnxn:99 - zookeeper.disableAutoWatchReset is false[cpt-service]16:34:47,894 INFO RecoverableZooKeeper:104 - The identifier of this process is 6728@node198[cpt-service]16:34:47,900 INFO ClientCnxn:966 - Opening socket connection to server node198/10.2.0.198:2181. Will not attempt to authenticate using SASL (无法定位登录配置)[cpt-service]16:34:47,906 INFO ClientCnxn:849 - Socket connection established to node198/10.2.0.198:2181, initiating session[cpt-service]16:34:47,907 DEBUG ClientCnxn:889 - Session establishment request sent on node198/10.2.0.198:2181[cpt-service]16:34:47,907 DEBUG ZooKeeperSaslClient:519 - Could not retrieve login configuration: java.lang.SecurityException: 无法定位登录配置[cpt-service]16:34:47,909 DEBUG ZooKeeperSaslClient:519 - Could not retrieve login configuration: java.lang.SecurityException: 无法定位登录配置[cpt-service]16:34:47,913 INFO ClientCnxn:1207 - Session establishment complete on server node198/10.2.0.198:2181, sessionid = 0x40a7f8343b002d, negotiated timeout = 180000[cpt-service]16:34:47,915 DEBUG ZooKeeperSaslClient:519 - Could not retrieve login configuration: java.lang.SecurityException: 无法定位登录配置[cpt-service]16:34:47,916 DEBUG ZooKeeperWatcher:294 - hconnection Received ZooKeeper Event, type=None, state=SyncConnected, path=null[cpt-service]16:34:47,916 DEBUG ZooKeeperSaslClient:519 - Could not retrieve login configuration: java.lang.SecurityException: 无法定位登录配置[cpt-service]16:34:47,916 DEBUG ZooKeeperSaslClient:519 - Could not retrieve login configuration: java.lang.SecurityException: 无法定位登录配置[cpt-service]16:34:47,917 DEBUG ZooKeeperSaslClient:519 - Could not retrieve login configuration: java.lang.SecurityException: 无法定位登录配置[cpt-service]16:34:47,920 DEBUG ZooKeeperWatcher:371 - hconnection-0x40a7f8343b002d connected[/mw_shl_code]

解决方法:
1、 配置hosts
由于Hbase是通过hostname解析IP地址的(DNS),Zookeeper只会返回Hbase的域名,需要客户端通过DNS或本地hosts文件进行解析。
若为Linux,在/etc/hosts文件中添加Hbase Master节点的域名及IP地址映射
若为Windows,修改C:\Windows\system32\etc\hosts文件,添加Hbase Master节点的域名及IP地址映射。
2、做完这部继续运行
还有错误,看控制台信息,报:protobuf-java-2.4.0a.jar 该包未找到。去hbase server上将该包下载下来,加入 classpath ,运行,Debug中还有上述异常,但在最后可以查询出hbase中的值。
3、将Log4J配置文件改为 INFO级别。继续运行。异常信息没有。
4、将 protobuf-java.jar 删除,继续运行该项目,也不抱异常了。查询一切正常。

错误类型3:
把代码拷贝到Hadoop的lib下面,然后在命令行中运行,遇到的一个问题如下:

  • 12/09/29 12:29:36 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=localhost:2181 sessionTimeout=180000 watcher=hconnection
  • 12/09/29 12:29:36 INFO zookeeper.ClientCnxn: Opening socket connection to server /127.0.0.1:2181
  • 12/09/29 12:29:36 INFO client.ZooKeeperSaslClient: Client will not SASL-authenticate because the default JAAS configuration section 'Client' could not be found. If you are not using SASL, you may ignore this. On the other hand, if you expected SASL to work, please fix your JAAS configuration.
  • 12/09/29 12:29:36 INFO zookeeper.RecoverableZooKeeper: The identifier of this process is 6479@fansyPC
  • 12/09/29 12:29:36 WARN zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
  • java.net.ConnectException: 拒绝连接
  • at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
  • at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:692)
  • at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:286)
  • at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1035)
  • 12/09/29 12:29:36 WARN zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper exception: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/master

配置如下:

[mw_shl_code=xml,true]<configuration>  
    <property>  
        <name>hbase.rootdir</name>  
        <value>hdfs://fansyPC:9000/hbase</value>  
    </property>  
    <property>  
        <name>hbase.cluster.distributed</name>  
        <value>true</value>  
    </property>  
      
    <property>  
        <name>hbase.zookeeper.quorum</name>  
        <value>slave1</value>  
    </property>  
      
    <property>  
        <name>hbase.zookeeper.property.dataDir</name>  
        <value>/home/fansy/zookeeper</value>  
    </property>  
      
</configuration>[/mw_shl_code]

解决方案:
要Ubuntu的系统有点不同的,所以改了下:
/etc/security/limits.conf  :
添加这两句:
1.hadoop - nofile 32768  
2.hadoop soft/hard nproc 32000  
/etc/pam.d/common-session:
session required pam_limits.so
hbase.zookeeper.quorum
配置不能和slave节点机在一个上面?所以我就又改了这个值 ,全部改为 fansyPC了,同时根据官方文档上面说要用hbase/lib下面的hadoop-core-1.0.2.jar 去代替haoop/下面的hadoop-core-1.0.2.jar这个文件,最后把hbase/lib下面的JAR包都放在了hadoop/lib下面(重复的跳过),然后就Ok了。

错误类型4:
Master上的HMaster服务老是自动死掉, 看错误日志只是提示连接失败Session 0x0 for server null
解决方法:
1. 关闭IP6    , 修改/etc/hosts   注视以"::1 "开头的
2. 校准HBase集群Zookeeper集群机器的时间,误差30秒以内
重启机器.ok


已有(2)人评论

跳转到指定楼层
hahaxixi 发表于 2015-6-30 09:51:09
感谢楼主分享~~~
回复

使用道具 举报

joyken 发表于 2016-10-12 11:22:32
谢谢楼主分享,都是一个个实践经验呀
回复

使用道具 举报

您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

关闭

推荐上一条 /2 下一条