分享

Hbase-0.98 集群启动时HMaster迅速死掉?怎么解决?

hapjin 发表于 2015-5-4 10:05:48 [显示全部楼层] 只看大图 回帖奖励 阅读模式 关闭右栏 19 53788
我安装的是三节点集群,其中Hadoop和zookeeper能够正常启动。如下

base

base


hbase-site.xml的配置如下:

a

a


hbase-evn.sh 使用的是独立安装的zookeeper
Screenshot from 2015-05-04 10:00:12.png




hbase master 报错信息如下:

zookeeper.ClientCnxn: Opening socket connection to server controller/192.168.1.186:2181. Will not at     tempt to authenticate using SASL (unknown error)
1150 2015-05-04 09:36:47,112 INFO  [main-SendThread(controller:2181)] zookeeper.ClientCnxn: Socket connection established to controller/192.168.1.186:2181, initiating ses     sion
1151 2015-05-04 09:36:47,112 INFO  [main-SendThread(controller:2181)] zookeeper.ClientCnxn: Unable to read additional data from server sessionid 0x0, likely server has cl     osed socket, closing socket connection and attempting reconnect
1152 2015-05-04 09:36:47,213 WARN  [main] zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper, quorum=network:2181,controller:2181,compute:2181, exception=org.ap     ache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase
1153 2015-05-04 09:36:47,213 ERROR [main] zookeeper.RecoverableZooKeeper: ZooKeeper create failed after 4 attempts
1154 2015-05-04 09:36:47,214 ERROR [main] master.HMasterCommandLine: Master exiting
1155 java.lang.RuntimeException: Failed construction of Master: class org.apache.hadoop.hbase.master.HMaster
1156     at org.apache.hadoop.hbase.master.HMaster.constructMaster(HMaster.java:3017)
1157     at org.apache.hadoop.hbase.master.HMasterCommandLine.startMaster(HMasterCommandLine.java:186)
1158     at org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:135)
1159     at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
1160     at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:126)
1161     at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:3031)
1162 Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase
1163     at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
1164     at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
1165     at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783)
1166     at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.createNonSequential(RecoverableZooKeeper.java:512)
1167     at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.create(RecoverableZooKeeper.java:491)
1168     at org.apache.hadoop.hbase.zookeeper.ZKUtil.createWithParents(ZKUtil.java:1241)
1169     at org.apache.hadoop.hbase.zookeeper.ZKUtil.createWithParents(ZKUtil.java:1219)
1170     at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.createBaseZNodes(ZooKeeperWatcher.java:174)
1171     at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.<init>(ZooKeeperWatcher.java:167)
1172     at org.apache.hadoop.hbase.master.HMaster.<init>(HMaster.java:547)
1173     at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
1174     at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
1175     at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
1176     at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
1177     at org.apache.hadoop.hbase.master.HMaster.constructMaster(HMaster.java:3012)


google了好久都没找到答案解决,不知有人遇到同样的情况么?

已有(19)人评论

跳转到指定楼层
hapjin 发表于 2015-5-4 10:09:23
运行./bin/start-hbase.sh 后,并没有在HDFS上创建相应的hbase目录。输入jps能够看到Hmaster进程,可是,稍微等下再输入jps时,就看不到Hmaster了。
回复

使用道具 举报

hapjin 发表于 2015-5-4 10:15:12
有 HQuorumPeer进程,为什么查询zookeeper的状态   ./zkServer.sh status  输出:
JMX enabled by default
Using config: /usr/local/zookeeper-3.4.6/bin/../conf/zoo.cfg
Error contacting service. It is probably not running.
回复

使用道具 举报

hapjin 发表于 2015-5-4 10:19:53
发现zookeeper安装目录下/usr/local/zookeeper/zookeeper.out 的日记信息如下:
  1 2015-05-04 10:16:52,668 [myid:] - INFO  [main:QuorumPeerConfig@103] - Reading configuration from: /usr/local/zookeeper-3.4.6/bin/../conf/zoo.cfg
  2 2015-05-04 10:16:52,672 [myid:] - INFO  [main:QuorumPeerConfig@340] - Defaulting to majority quorums
  3 2015-05-04 10:16:52,674 [myid:0] - INFO  [main:DatadirCleanupManager@78] - autopurge.snapRetainCount set to 3
  4 2015-05-04 10:16:52,674 [myid:0] - INFO  [main:DatadirCleanupManager@79] - autopurge.purgeInterval set to 0
  5 2015-05-04 10:16:52,674 [myid:0] - INFO  [main:DatadirCleanupManager@101] - Purge task is not scheduled.
  6 2015-05-04 10:16:52,682 [myid:0] - INFO  [main:QuorumPeerMain@127] - Starting quorum peer
  7 2015-05-04 10:16:52,689 [myid:0] - INFO  [main:NIOServerCnxnFactory@94] - binding to port 0.0.0.0/0.0.0.0:2181
  8 2015-05-04 10:16:52,702 [myid:0] - INFO  [main:QuorumPeer@959] - tickTime set to 2000
  9 2015-05-04 10:16:52,702 [myid:0] - INFO  [main:QuorumPeer@979] - minSessionTimeout set to -1
10 2015-05-04 10:16:52,702 [myid:0] - INFO  [main:QuorumPeer@990] - maxSessionTimeout set to -1
11 2015-05-04 10:16:52,702 [myid:0] - INFO  [main:QuorumPeer@1005] - initLimit set to 10
12 2015-05-04 10:16:52,709 [myid:0] - ERROR [main:QuorumPeer@171] - Setting LearnerType to PARTICIPANT but 0 not in QuorumPeers.
13 2015-05-04 10:16:52,713 [myid:0] - ERROR [main:QuorumPeerMain@89] - Unexpected exception, exiting abnormally
14 java.lang.RuntimeException: My id 0 not in the peer list
15     at org.apache.zookeeper.server.quorum.QuorumPeer.startLeaderElection(QuorumPeer.java:523)
16     at org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:442)
17     at org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:153)
18     at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:111)
19     at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
~                                                                                                  
回复

使用道具 举报

hapjin 发表于 2015-5-4 10:26:07
hapjin 发表于 2015-5-4 10:19
发现zookeeper安装目录下/usr/local/zookeeper/zookeeper.out 的日记信息如下:
  1 2015-05-04 10:16:52, ...

原来,master中的zookeeper的myid设置成0了,把它改成1之后,重新启动zookeeper:./zkServer.sh start 又报错:
2015-05-04 10:21:41,907 [myid:1] - WARN  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:Learner@233] - Unexpected exception, tries=1, connecting to network/192.168.1.123:2888
java.net.ConnectException: Connection refused
    at java.net.PlainSocketImpl.socketConnect(Native Method)
    at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
    at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
    at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
    at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
    at java.net.Socket.connect(Socket.java:579)
    at org.apache.zookeeper.server.quorum.Learner.connectToLeader(Learner.java:225)
    at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:71)
    at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:786)
2015-05-04 10:21:42,282 [myid:1] - INFO  [WorkerReceiver[myid=1]:FastLeaderElection@597] - Notification: 1 (message format version), 2 (n.leader), 0x0 (n.zxid), 0x1d5 (n.round), LOOKING (n.state), 2 (n.sid), 0x0 (n.peerEpoch) FOLLOWING (my state)
2015-05-04 10:21:42,908 [myid:1] - WARN  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:Learner@233] - Unexpected exception, tries=2, connecting to network/192.168.1.123:2888
java.net.ConnectException: Connection refused
    at java.net.PlainSocketImpl.socketConnect(Native Method)
    at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
    at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
    at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
    at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
    at java.net.Socket.connect(Socket.java:579)
    at org.apache.zookeeper.server.quorum.Learner.connectToLeader(Learner.java:225)
    at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:71)
    at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:786)



回复

使用道具 举报

hapjin 发表于 2015-5-4 11:00:13
重新启动zookeeper:./zkServer.sh start 命令行没有报错,可是测试连接zookeeper 服务器  ./zkCli.sh -server controller:2181  提示:zkServer.sh status Error contacting service. It is probably not running.

回复

使用道具 举报

Alkaloid0515 发表于 2015-5-4 11:50:36
回复

使用道具 举报

尘世随缘 发表于 2015-5-4 12:28:48
有可能是时间不一致导致的。
回复

使用道具 举报

hapjin 发表于 2015-5-4 17:00:24
Alkaloid0515 发表于 2015-5-4 11:50
楼主解决问题能力还是比较强的。

这个解决方案我看到了。我去尝试下。感觉就是这几台电脑之间不能通过zookeeper来进行通信。也许是@尘世随缘说的,时间不同步。我试一下去。谢啦。

其实,我这个集群原先是搭建好的,没有问题。后来,HDFS挂了,现在重新搭出现了这个问题。


回复

使用道具 举报

levycui 发表于 2015-5-5 09:21:49
楼主 我查了下配置资料,
hbase-env.sh
群集安装 export HBASE_MANAGES_ZK=false
独立安装 export HBASE_MANAGES_ZK=true
你要是独立安装需要用true
回复

使用道具 举报

12下一页
您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

关闭

推荐上一条 /2 下一条