分享

[求助] 急急!cdh从4.8.0升级到5.3.0之后hbase集群无法创建读取用户表...

背景:
把cm和cdh从4.8.0升级到5.3.0,整个操作基本上是停止所有服务,其他升级过程就是按照官方文档来做的,升级过程中基本上顺畅,只出现过hdfs 进入安全模式无法离开,然后自己强制离开hdfs,升级完成后对hdfs 做过一致性检查hadoop fsck / ,一切正常。升级过程中hbase没有出现任何问题。升级后测试了一下hbase,可以list table,可以scan ‘hbase:meta',然后认为应该是没有问题的。因为在前一周,我们就用过这种方式升级过另外一个集群,没有出现任何问题,所以我也认为这次应该也没有什么大问题,所以没有仔细再做测试,问题:
第二天来发现hbase master的状态不对了,我们总共有三台hbase master,一台作为active,另外两台作为standby,状态如下图

095337j6paam1at11m10qq.png
最后通过stop原来的active hbase master,等待另外一台standby master起来后,这个报警就没有了。后来开发组告诉我说这个集群hbase无法创建表,然后我就去做测试,创建表等待很久之后,提示以下错误:
ERROR:60000 millis timeout while waiting for channel to be ready for read. ch :java.nio.channels.SocketChannel[connected local=/thriftserver:55220remote=hbasemaster.xxx.xxx/hbasemasterIp:60000]
然后去看hbasemaster的日志,一开始不会有什么任何的输出,过几分钟的样子,就会提示大量连接超时,日志如下:
2015-05-06 02:24:08,358 WARN org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation: Checking master connection
com.google.protobuf.ServiceException: java.net.SocketTimeoutException: Call to hbasemaster.xxx.xxx/hbasemasterIp:60000failed because java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/hbasemasterIp:33045 remote=hbasemaster.xxx.xxx/hbasemasterIp:60000]
        at org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1678)
        at org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1719)
        at org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$BlockingStub.isMasterRunning(MasterProtos.java:44411)
        at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$MasterServiceState.isMasterRunning(HConnectionManager.java:1512)
        at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.isKeepAliveMasterConnectedAndRunning(HConnectionManager.java:2157)
        at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getKeepAliveMasterService(HConnectionManager.java:1863)
        at org.apache.hadoop.hbase.client.HBaseAdmin$MasterCallable.prepare(HBaseAdmin.java:3376)
        at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:113)
        at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:90)
        at org.apache.hadoop.hbase.client.HBaseAdmin.executeCallable(HBaseAdmin.java:3403)
        at org.apache.hadoop.hbase.client.HBaseAdmin.listTableDescriptorsByNamespace(HBaseAdmin.java:2310)
        at org.apache.hadoop.hbase.tmpl.master.MasterStatusTmplImpl.__jamon_innerUnit__catalogTables(MasterStatusTmplImpl.java:465)
        at org.apache.hadoop.hbase.tmpl.master.MasterStatusTmplImpl.renderNoFlush(MasterStatusTmplImpl.java:274)
        at org.apache.hadoop.hbase.tmpl.master.MasterStatusTmpl.renderNoFlush(MasterStatusTmpl.java:386)
        at org.apache.hadoop.hbase.tmpl.master.MasterStatusTmpl.render(MasterStatusTmpl.java:376)
        at org.apache.hadoop.hbase.master.MasterStatusServlet.doGet(MasterStatusServlet.java:95)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
        at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
        at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221)
        at org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:109)
        at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
        at org.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:1122)
        at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
        at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
        at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
        at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
        at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
        at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
        at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:767)
        at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
        at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
        at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
        at org.mortbay.jetty.Server.handle(Server.java:326)
        at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
        at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
        at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
        at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
        at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
        at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
        at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
Caused by: java.net.SocketTimeoutException: Call to hbasemaster.xxx.xxx/hbasemasterIp:60000failed because java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/hbasemasterIp:33045 remote=hbasemaster.xxx.xxx/hbasemasterIp:60000]
        at org.apache.hadoop.hbase.ipc.RpcClient.wrapException(RpcClient.java:1486)
        at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1461)
        at org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1661)
        ... 40 more
Caused by: java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/hbasemasterIp:33045 remote=hbasemaster.xxx.xxx/hbasemasterIp:60000]
        at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
        at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
        at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
        at java.io.FilterInputStream.read(FilterInputStream.java:116)
        at java.io.FilterInputStream.read(FilterInputStream.java:116)
        at org.apache.hadoop.hbase.ipc.RpcClient$Connection$PingInputStream.read(RpcClient.java:558)
        at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
        at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
        at java.io.DataInputStream.readInt(DataInputStream.java:370)
        at org.apache.hadoop.hbase.ipc.RpcClient$Connection.readResponse(RpcClient.java:1076)
        at org.apache.hadoop.hbase.ipc.RpcClient$Connection.run(RpcClient.java:727)
2015-05-06 02:25:08,424 WARN org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation: Checking master connection
com.google.protobuf.ServiceException: java.net.SocketTimeoutException: Call to hbasemaster.xxx.xxx/hbasemasterIp:60000 failed because java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/hbasemasterIp:33068 remote=hbasemaster.xxx.xxx/hbasemasterIp:60000]
        at org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1678)
        at org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1719)
        at org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$BlockingStub.isMasterRunning(MasterProtos.java:44411)
        at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$MasterServiceState.isMasterRunning(HConnectionManager.java:1512)
        at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.isKeepAliveMasterConnectedAndRunning(HConnectionManager.java:2157)
        at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getKeepAliveMasterService(HConnectionManager.java:1863)
        at org.apache.hadoop.hbase.client.HBaseAdmin$MasterCallable.prepare(HBaseAdmin.java:3376)
        at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:113)
        at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:90)
        at org.apache.hadoop.hbase.client.HBaseAdmin.executeCallable(HBaseAdmin.java:3403)
        at org.apache.hadoop.hbase.client.HBaseAdmin.listTableDescriptorsByNamespace(HBaseAdmin.java:2310)
        at org.apache.hadoop.hbase.tmpl.master.MasterStatusTmplImpl.__jamon_innerUnit__catalogTables(MasterStatusTmplImpl.java:465)
        at org.apache.hadoop.hbase.tmpl.master.MasterStatusTmplImpl.renderNoFlush(MasterStatusTmplImpl.java:274)
        at org.apache.hadoop.hbase.tmpl.master.MasterStatusTmpl.renderNoFlush(MasterStatusTmpl.java:386)
        at org.apache.hadoop.hbase.tmpl.master.MasterStatusTmpl.render(MasterStatusTmpl.java:376)
        at org.apache.hadoop.hbase.master.MasterStatusServlet.doGet(MasterStatusServlet.java:95)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
        at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
        at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221)
        at org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:109)
        at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
        at org.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:1122)
        at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
        at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
        at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
        at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
        at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
        at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
        at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:767)
        at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
        at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
        at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
        at org.mortbay.jetty.Server.handle(Server.java:326)
        at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
        at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
        at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
        at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
        at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
        at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
        at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
Caused by: java.net.SocketTimeoutException: Call to hbasemaster.xxx.xxx/hbasemasterIp:60000 failed because java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/hbasemasterIp:33068 remote=hbasemaster.xxx.xxx/hbasemasterIp:60000]
        at org.apache.hadoop.hbase.ipc.RpcClient.wrapException(RpcClient.java:1486)
        at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1461)
        at org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1661)
        ... 40 more

目前的状况:
通过hbase shell,可以list 表,scan 'hbase:meta',disable table,但是不能scan ’hbase:namespace',scan 用户表,等很久后就会报超时,hbase hbck没有问题,如果其中有连接超时的情况,就会出现hbase空洞,定位到相应的regionserver,下线相应的regionserver,然后可以通过-fixHdfsHoles修复hole。目前我们发现最大的问题是,在zookeeper上看到我们hbase:namespace是空,没有任何值,然后去hdfs上去看,好像真的没有值:
hdfs上正常的应该是


2.png


正常hbase-namespace



3.png
其实hbase 0.98.6 hbase namespace中默认存的default和hbase,但是没想通为什么这里它没有值,如何修复这个hbase namespace呢?
现在已经发现不是namespace的问题了,因为我在另外一个好的集群测试,把namespace表删掉,在meta表相应的记录也删掉,hdfs的namespace文件夹移除,zookeeper中rmr /hbase。或者只是删掉hbase:namespace表中的记录,然后rmr /hbase.这两种情况下再去创建表都不会出现超时,只是会报一些错误出来,然后重启整个集群,namespace表都会恢复,数据又会回来

但是现在问题又来了,到底是什么问题导致我们这个集群上线一会,就会出现region server连接master超时,或者master连接自己超时。查看了,linux系统没有掉包的情况,网络正常,server的负载也不高。



已有(3)人评论

跳转到指定楼层
langke93 发表于 2015-5-15 00:19:01
hmaster确保没有成为僵死进程。网络没有问题,连接不上,这种情况一般是master挂掉了。创建表是要走master的,里面存储元数据,所以先让master起来,在继续排查

回复

使用道具 举报

shihuai355 发表于 2015-5-15 13:17:09

问题已经解决,是jdk的问题,cdh5.3的版本jdk要是1.7版本的,我目前是1.6版本的,升级jdk到jdk1.7.0_45就好了!
回复

使用道具 举报

shihuai355 发表于 2015-5-15 13:19:09
langke93 发表于 2015-5-15 00:19
hmaster确保没有成为僵死进程。网络没有问题,连接不上,这种情况一般是master挂掉了。创建表是要走master ...

谢谢回复,目前问题已经解决了,是jdk的问题,cdh5.3的版本jdk要是1.7版本的,我目前是1.6版本的,升级jdk到jdk1.7.0_45就好了!
回复

使用道具 举报

您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

关闭

推荐上一条 /2 下一条