分享

hbase regionserver服务挂机了

早上起来,突然发现,有一个节点的hregionserver服务挂了,查看zookeeper,运行正常,查看了hbase的log日志,发现:一直循环报这个错误:
2014-01-02 15:22:47,402 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: Could not sync. Requesting close of hlog
java.io.IOException: Reflection
        at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.sync(SequenceFileLogWriter.java:310)
        at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1347)
        at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1291)
        at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1452)
        at org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.run(HLog.java:1243)
        at java.lang.Thread.run(Thread.java:662)
Caused by: java.lang.reflect.InvocationTargetException
        at sun.reflect.GeneratedMethodAccessor59.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.sync(SequenceFileLogWriter.java:308)
        ... 5 more
Caused by: java.io.IOException: DFSOutputStream is closed
        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.sync(DFSClient.java:3879)
        at org.apache.hadoop.fs.FSDataOutputStream.sync(FSDataOutputStream.java:97)
        at org.apache.hadoop.io.SequenceFile$Writer.syncFs(SequenceFile.java:999)
        ... 9 more
2014-01-02 15:22:47,403 ERROR org.apache.hadoop.hbase.regionserver.wal.HLog: Error while syncing, requesting close of hlog
java.io.IOException: Reflection
        at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.sync(SequenceFileLogWriter.java:310)
        at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1347)
        at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1291)
        at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1452)
        at org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.run(HLog.java:1243)
        at java.lang.Thread.run(Thread.java:662)
Caused by: java.lang.reflect.InvocationTargetException
        at sun.reflect.GeneratedMethodAccessor59.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.sync(SequenceFileLogWriter.java:308)
        ... 5 more
Caused by: java.io.IOException: DFSOutputStream is closed
        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.sync(DFSClient.java:3879)
        at org.apache.hadoop.fs.FSDataOutputStream.sync(FSDataOutputStream.java:97)
        at org.apache.hadoop.io.SequenceFile$Writer.syncFs(SequenceFile.java:999)
        ... 9 more


原来这个节点在昨天就挂了,没有及时发现,请教一下是什么原因导致的?

已有(6)人评论

跳转到指定楼层
fanbells 发表于 2014-1-3 11:39:13
问题原因已经找到了,参考的帖子http://www.adintellig.com/blog/143,但是不明白为什么同样的配置,其他机器就没有问题,难道是因为我的hbase表结构设计问题,造成了单个节点数据过热。
回复

使用道具 举报

pig2 发表于 2014-1-3 11:42:13
fanbells 发表于 2014-1-3 11:39
问题原因已经找到了,参考的帖子http://www.adintellig.com/blog/143,但是不明白为什么同样的配置,其他机 ...

解决了就好
回复

使用道具 举报

fanbells 发表于 2014-1-3 11:44:24

问题虽然解决了,还没有找到原因,说不定明天又挂了
回复

使用道具 举报

pig2 发表于 2014-1-3 11:48:34
本帖最后由 pig2 于 2014-1-3 11:51 编辑
fanbells 发表于 2014-1-3 11:44
问题虽然解决了,还没有找到原因,说不定明天又挂了

是因为region server服务器压力过大,导致HLog无法同步。压力过大的原因是该节点是hadoop的datanode,其他服务,类似rush服务在占用过多机器资源,导致regionserver机器无法与master通信。
可以减轻region server服务器压力,看看能不能减少一些软件运行或则进程之类,或则你采取一些减轻压力的措施。就跟人一样如果,太累了,就会休克。机器也是。所以你不要让region server压力太大
回复

使用道具 举报

pig2 发表于 2014-11-1 23:31:39
记录下以为会员遇到的问题

错误信息和帖子中是一样的:
2014-08-20 12:18:08,773 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: Could not sync. Requesting close of hlog
java.io.IOException: Reflection
        at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.sync(SequenceFileLogWriter.java:304)
        at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1331)
        at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1283)
        at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1436)
        at org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.run(HLog.java:1235)
        at java.lang.Thread.run(Thread.java:662)
Caused by: java.lang.reflect.InvocationTargetException
        at sun.reflect.GeneratedMethodAccessor45.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.sync(SequenceFileLogWriter.java:302)
        ... 5 more
Caused by: org.apache.hadoop.ipc.RemoteException(java.io.IOException): BP-1716440838-172.16.77.105-1343289876159:blk_2105353805262506662_63376312 does not exist or is not under Constructionblk_2105353805262506662_63376312{blockUCState=UNDER_RECOVERY, primaryNodeIndex=0, replicas=[ReplicaUnderConstruction[172.16.77.103:50010|RBW], ReplicaUnderConstruction[172.16.77.102:50010|RBW], ReplicaUnderConstruction[172.16.77.101:50010|RBW]]}
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkUCBlock(FSNamesystem.java:4454)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.updateBlockForPipeline(FSNamesystem.java:4518)
        at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.updateBlockForPipeline(NameNodeRpcServer.java:542)
        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.updateBlockForPipeline(ClientNamenodeProtocolServerSideTranslatorPB.java:745)
        at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:42664)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:427)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:916)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1692)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1688)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1686)

        at org.apache.hadoop.ipc.Client.call(Client.java:1225)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
        at $Proxy14.updateBlockForPipeline(Unknown Source)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
        at $Proxy14.updateBlockForPipeline(Unknown Source)
        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.updateBlockForPipeline(ClientNamenodeProtocolTranslatorPB.java:751)
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:962)
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:755)
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:424)
2014-08-20 12:18:08,773 ERROR org.apache.hadoop.hbase.regionserver.wal.HLog: Error while syncing, requesting close of hlog
java.io.IOException: Reflection
        at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.sync(SequenceFileLogWriter.java:304)
        at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1331)
        at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1283)
        at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1436)
        at org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.run(HLog.java:1235)
        at java.lang.Thread.run(Thread.java:662)
Caused by: java.lang.reflect.InvocationTargetException
        at sun.reflect.GeneratedMethodAccessor45.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.sync(SequenceFileLogWriter.java:302)
        ... 5 more
Caused by: org.apache.hadoop.ipc.RemoteException(java.io.IOException): BP-1716440838-172.16.77.105-1343289876159:blk_2105353805262506662_63376312 does not exist or is not under Constructionblk_2105353805262506662_63376312{blockUCState=UNDER_RECOVERY, primaryNodeIndex=0, replicas=[ReplicaUnderConstruction[172.16.77.103:50010|RBW], ReplicaUnderConstruction[172.16.77.102:50010|RBW], ReplicaUnderConstruction[172.16.77.101:50010|RBW]]}
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkUCBlock(FSNamesystem.java:4454)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.updateBlockForPipeline(FSNamesystem.java:4518)
        at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.updateBlockForPipeline(NameNodeRpcServer.java:542)
        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.updateBlockForPipeline(ClientNamenodeProtocolServerSideTranslatorPB.java:745)
        at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:42664)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:427)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:916)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1692)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1688)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1686)

        at org.apache.hadoop.ipc.Client.call(Client.java:1225)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
        at $Proxy14.updateBlockForPipeline(Unknown Source)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
        at $Proxy14.updateBlockForPipeline(Unknown Source)
        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.updateBlockForPipeline(ClientNamenodeProtocolTranslatorPB.java:751)
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:962)
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:755)
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:424)
2014-08-20 12:18:08,762 DEBUG org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler: Processing close of hbase_app_log_model_2013-06-05,\x00app-~+0000000000000000000000000000000000000000000000000000000000000,1400615818580.d6caa9fd204df0fb32c5b2e445ba35d4.
但我找到下面警告日志,貌似是 fullgc太长导致,请教一下,谢谢!
2014-08-22 06:42:43,034 WARN org.apache.hadoop.hbase.util.Sleeper: We slept 139366ms instead of 100000ms, this is likely due to a long garbage collecting pause and it's usually bad, see http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired

原因:
由于gc的原因


回复

使用道具 举报

jxlhljh 发表于 2014-12-18 09:40:25
是因为region server服务器压力过大,导致HLog无法同步。压力过大的原因是该节点是hadoop的datanode,


我问一下,这个的意思是HBase最好不要和Hadoop装在一起?
那怎么弄呢,我看网上的教程都是和Hadoop环境搞在一起
回复

使用道具 举报

您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

关闭

推荐上一条 /2 下一条