分享

HRegionServer经常性的挂掉

ighack 2019-5-21 09:13:10 发表于 疑问解答 [显示全部楼层] 回帖奖励 阅读模式 关闭右栏 11 9553
本帖最后由 ighack 于 2019-5-21 09:17 编辑

我在regionserver的log日志中发现
[mw_shl_code=bash,true]2019-05-20 21:10:46,802 INFO  [main] zookeeper.ZooKeeper: Initiating client connection, connectString=moniser:2181,basappser2:2181 sessionTimeout=30000 watcher=org.apache.zookee
per.ZooKeeperMain$MyWatcher@3a034642019-05-20 21:10:46,826 INFO  [main-SendThread(moniser:2181)] zookeeper.ClientCnxn: Opening socket connection to server moniser/192.168.0.238:2181. Will not attempt to authenticate using SASL (unknown error)2019-05-20 21:10:46,831 INFO  [main-SendThread(moniser:2181)] zookeeper.ClientCnxn: Socket connection established to moniser/192.168.0.238:2181, initiating session
2019-05-20 21:10:46,839 INFO  [main-SendThread(moniser:2181)] zookeeper.ClientCnxn: Session establishment complete on server moniser/192.168.0.238:2181, sessionid = 0x16ac4a4aee
c000c, negotiated timeout = 80000[/mw_shl_code]
在out日志中发现
[mw_shl_code=bash,true]hbase-daemon.sh: line 226:  7824 Killed     nice -n $HBASE_NICENESS "$HBASE_HOME"/bin/hbase --config "${HBASE_CONF_DIR}" $command "$@" start >> ${HBASE_LOGOUT} 2>&1[/mw_shl_code]
我在zookeeper日志中
[mw_shl_code=bash,true]2019-05-20 21:10:47,175 [myid:1] - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@357] - caught end of stream exception
EndOfStreamException: Unable to read additional data from client sessionid 0x16ac4a4aeec000c, likely client has closed socket
        at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)
        at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
        at java.lang.Thread.run(Thread.java:748)
2019-05-20 21:10:47,176 [myid:1] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007] - Closed socket connection for client /192.168.0.238:26831 which had sess
ionid 0x16ac4a4aeec000c
2019-05-21 02:14:40,886 [myid:1] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection from /127.0.0.1:15554
2019-05-21 02:14:42,886 [myid:1] - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@357] - caught end of stream exception
EndOfStreamException: Unable to read additional data from client sessionid 0x0, likely client has closed socket
        at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)
        at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
        at java.lang.Thread.run(Thread.java:748)
2019-05-21 02:14:42,886 [myid:1] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007] - Closed socket connection for client /127.0.0.1:15554 (no session establ
ished for client)[/mw_shl_code]
zookeeper的tickTime=40000

已有(11)人评论

跳转到指定楼层
s060403072 发表于 2019-5-21 10:36:53
可以设置的时间在长一些
zookeeper.session.timeout.ms=400000

还有为啥都是本地机器,如果都是这样,可能配置有问题,也就是你的网络可能有问题了,比如hosts,hostname,ip地址的配置等出现问题
回复

使用道具 举报

ighack 发表于 2019-5-21 10:47:09
本帖最后由 ighack 于 2019-5-21 10:54 编辑

我只有两台机器做的HBase
A B
在这两台机器上装的zookeeper
而且经常挂掉的是A。B很少挂掉

配制上也没发现机器名写错了啊
hosts也是对的啊
超时80秒,这个时间很长了啊
而且hbase也不是说运行一下就挂
有时可以运行2天,有时运行1天

回复

使用道具 举报

yaojiank 发表于 2019-5-21 11:30:07
ighack 发表于 2019-5-21 10:47
我只有两台机器做的HBase
A B
在这两台机器上装的zookeeper

zookeeper是用来选举的,要么你用伪分布,配置三台都是本地的,要么就用三台虚拟机。
两台安装了出问题的可能性非常大,而且找不到原因。
刚开始学习,按照正常的路子来走。
回复

使用道具 举报

ighack 发表于 2019-5-21 11:59:02
我也想要三台。可是公司没有资源给我
该hbase只用于pinpoint的监控。不是一个重要业务组件
回复

使用道具 举报

yaojiank 发表于 2019-5-21 12:03:44
ighack 发表于 2019-5-21 11:59
我也想要三台。可是公司没有资源给我
该hbase只用于pinpoint的监控。不是一个重要业务组件


推荐伪分布
ip地址使用一个即可:
server.1=192.168.1.201:2888:3888
server.2=192.168.1.201:2889:3889
server.3=192.168.1.201:2890:3890


推荐参考
ZooKeeper介绍、伪分布式集群安装及使用
http://www.aboutyun.com/forum.php?mod=viewthread&tid=9097


回复

使用道具 举报

evababy 发表于 2019-5-24 14:05:19
zk 的 tickTime 是时间单元,不应该设置那么大,10秒以内足以,应该调大HBASE的zk timeout和rpc timeout
回复

使用道具 举报

ighack 发表于 2019-5-24 14:20:37
我看了一下主要是zk的timeout
hbase的我设的和tickTime 是一样的80000
回复

使用道具 举报

ighack 发表于 2019-6-5 09:19:27
本帖最后由 ighack 于 2019-6-5 09:31 编辑

[mw_shl_code=bash,true]2019-06-04 18:34:20,376 INFO  [moniser,16020,1559617721856_ChoreService_1] regionserver.HRegionServer: moniser,16020,1559617721856-MemstoreFlusherChore requesting flush of Trace
V2,&\x00\x00\x00\x00\x00\x00\x00,1559114757691.8ba4163358f97beb059bbe066b15c6c5. because S has an old edit so flush to free WALs after random delay 194067ms2019-06-04 18:34:20,377 INFO  [moniser,16020,1559617721856_ChoreService_1] regionserver.HRegionServer: moniser,16020,1559617721856-MemstoreFlusherChore requesting flush of Trace
V2,\xCC\x00\x00\x00\x00\x00\x00\x00,1559114757691.2e55b6f50195981b03373a114c155b4e. because S has an old edit so flush to free WALs after random delay 179269ms2019-06-04 18:34:20,377 INFO  [moniser,16020,1559617721856_ChoreService_1] regionserver.HRegionServer: moniser,16020,1559617721856-MemstoreFlusherChore requesting flush of Appli
cationMapStatisticsSelf_Ver2,\x05\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00,1559114765629.233d41a8e943fd1e4ddb29b694b13dd0. because C has an old edit so flush to free WALs after random delay 242343ms
2019-06-04 18:34:20,828 INFO  [regionserver/moniser/192.168.0.238:16020-shortCompactions-1559617738938] regionserver.HStore: Completed compaction of 3 (all) file(s) in S of TraceV2,\xFF\x00\x00\x00\x00\x00\x00\x00,1559114757691.aca60625a1273b36a87e5affac2fd09a. into b1c00fc5a642412db370dbef1d5acbaf(size=82.6 M), total size for store is 82.6 M. This selection was in queue for 0sec, and took 1sec to execute.
2019-06-04 18:34:20,828 INFO  [regionserver/moniser/192.168.0.238:16020-shortCompactions-1559617738938] regionserver.CompactSplitThread: Completed compaction: Request = regionName=TraceV2,\xFF\x00\x00\x00\x00\x00\x00\x00,1559114757691.aca60625a1273b36a87e5affac2fd09a., storeName=S, fileCount=3, fileSize=82.7 M (81.0 M, 877.4 K, 819.6 K), priority=7, time=7788170363172924; duration=1sec
2019-06-04 18:34:29,089 INFO  [main] zookeeper.ZooKeeper: Client environment:zookeeper.version=3.4.6-1569965, built on 02/20/2014 09:09 GMT
2019-06-04 18:34:29,090 INFO  [main] zookeeper.ZooKeeper: Client environment:host.name=moniser
2019-06-04 18:34:29,090 INFO  [main] zookeeper.ZooKeeper: Client environment:java.version=1.8.0_131
2019-06-04 18:34:29,090 INFO  [main] zookeeper.ZooKeeper: Client environment:java.vendor=Oracle Corporation
2019-06-04 18:34:29,090 INFO  [main] zookeeper.ZooKeeper: Client environment:java.home=/usr/java/jdk1.8.0_131/jre
2019-06-04 18:34:29,090 INFO  [main] zookeeper.ZooKeeper: Client environment:java.class.path=/pinpoint/app/pinpoint/hbase-1.3.1/bin/../conf:/usr/java/jdk1.8.0_131/lib/tools.jar:

2019-06-04 18:34:29,090 INFO  [main] zookeeper.ZooKeeper: Client environment:java.io.tmpdir=/tmp
2019-06-04 18:34:29,090 INFO  [main] zookeeper.ZooKeeper: Client environment:java.compiler=<NA>
2019-06-04 18:34:29,090 INFO  [main] zookeeper.ZooKeeper: Client environment:os.name=Linux
2019-06-04 18:34:29,090 INFO  [main] zookeeper.ZooKeeper: Client environment:os.arch=amd64
2019-06-04 18:34:29,090 INFO  [main] zookeeper.ZooKeeper: Client environment:os.version=3.10.0-862.14.4.el7.x86_64
2019-06-04 18:34:29,090 INFO  [main] zookeeper.ZooKeeper: Client environment:user.name=pinpoint
2019-06-04 18:34:29,090 INFO  [main] zookeeper.ZooKeeper: Client environment:user.home=/pinpoint/app
2019-06-04 18:34:29,090 INFO  [main] zookeeper.ZooKeeper: Client environment:user.dir=/pinpoint/app/pinpoint/hbase-1.3.1
2019-06-04 18:34:29,092 INFO  [main] zookeeper.ZooKeeper: Initiating client connection, connectString=moniser:2181,basappser2:2181 sessionTimeout=30000 watcher=org.apache.zookeeper.ZooKeeperMain$MyWatcher@3a03464
2019-06-04 18:34:29,113 INFO  [main-SendThread(moniser:2181)] zookeeper.ClientCnxn: Opening socket connection to server moniser/192.168.0.238:2181. Will not attempt to authenticate using SASL (unknown error)
2019-06-04 18:34:29,118 INFO  [main-SendThread(moniser:2181)] zookeeper.ClientCnxn: Socket connection established to moniser/192.168.0.238:2181,initiating session
2019-06-04 18:34:29,126 INFO  [main-SendThread(moniser:2181)] zookeeper.ClientCnxn: Session establishment complete on server moniser/192.168.0.238:2181, sessionid =0x16b01d9f02d003c, negotiated timeout = 80000[/mw_shl_code]

最近又发现这样的日志。

gc很正常啊
[mw_shl_code=bash,true]2019-06-04T18:33:02.284+0800: 26661.606: [GC (Allocation Failure) 2019-06-04T18:33:02.284+0800: 26661.607: [ParNew: 433013K->18499K(463872K), 0.0063271 secs] 1145281K->730766K(1554228K), 0.0064589 secs] [Times: user=0.04 sys=0.00, real=0.01 secs]

2019-06-04T18:33:09.949+0800: 26669.272: [GC (Allocation Failure) 2019-06-04T18:33:09.949+0800: 26669.272: [ParNew: 430799K->40491K(463872K), 0.0064531 secs] 1143066K->755126K(1554228K), 0.0066203 secs] [Times: user=0.05 sys=0.00, real=0.01 secs]

2019-06-04T18:33:13.576+0800: 26672.899: [GC (Allocation Failure) 2019-06-04T18:33:13.576+0800: 26672.899: [ParNew: 452843K->3221K(463872K), 0.0142780 secs] 1167478K->737788K(1554228K), 0.0144429 secs] [Times: user=0.06 sys=0.00, real=0.02 secs]

2019-06-04T18:33:24.612+0800: 26683.935: [GC (Allocation Failure) 2019-06-04T18:33:24.612+0800: 26683.935: [ParNew: 415573K->11294K(463872K), 0.0057481 secs] 1150140K->745860K(1554228K), 0.0058815 secs] [Times: user=0.03 sys=0.00, real=0.00 secs]

2019-06-04T18:33:30.032+0800: 26689.355: [GC (Allocation Failure) 2019-06-04T18:33:30.033+0800: 26689.355: [ParNew: 423646K->23998K(463872K), 0.0062182 secs] 1158212K->758564K(1554228K), 0.0063865 secs] [Times: user=0.04 sys=0.00, real=0.00 secs]

2019-06-04T18:33:33.507+0800: 26692.830: [GC (Allocation Failure) 2019-06-04T18:33:33.507+0800: 26692.830: [ParNew: 436311K->35250K(463872K), 0.0047877 secs] 1170877K->769817K(1554228K), 0.0049250 secs] [Times: user=0.04 sys=0.00, real=0.01 secs]

2019-06-04T18:33:37.921+0800: 26697.244: [GC (Allocation Failure) 2019-06-04T18:33:37.921+0800: 26697.244: [ParNew: 447602K->9699K(463872K), 0.0079094 secs] 1182169K->752898K(1554228K), 0.0080765 secs] [Times: user=0.04 sys=0.00, real=0.00 secs]

2019-06-04T18:33:49.647+0800: 26708.970: [GC (Allocation Failure) 2019-06-04T18:33:49.647+0800: 26708.970: [ParNew: 422051K->36889K(463872K), 0.0067112 secs] 1165250K->780087K(1554228K), 0.0068562 secs] [Times: user=0.05 sys=0.00, real=0.01 secs]

2019-06-04T18:33:55.591+0800: 26714.914: [GC (Allocation Failure) 2019-06-04T18:33:55.591+0800: 26714.914: [ParNew: 449241K->6015K(463872K), 0.0060338 secs] 1192439K->762115K(1554228K), 0.0061978 secs] [Times: user=0.05 sys=0.00, real=0.00 secs]

2019-06-04T18:34:05.792+0800: 26725.115: [GC (Allocation Failure) 2019-06-04T18:34:05.792+0800: 26725.115: [ParNew: 418367K->11221K(463872K), 0.0070709 secs] 1174467K->767322K(1554228K), 0.0072509 secs] [Times: user=0.04 sys=0.00, real=0.00 secs]

2019-06-04T18:34:16.262+0800: 26735.585: [GC (Allocation Failure) 2019-06-04T18:34:16.263+0800: 26735.585: [ParNew: 423573K->14332K(463872K), 0.0049019 secs] 1179674K->770432K(1554228K), 0.0050493 secs] [Times: user=0.05 sys=0.00, real=0.00 secs]

2019-06-04T18:34:17.823+0800: 26737.146: [GC (Allocation Failure) 2019-06-04T18:34:17.823+0800: 26737.146: [ParNew: 426672K->18534K(463872K), 0.0070292 secs] 1182773K->774634K(1554228K), 0.0071750 secs] [Times: user=0.04 sys=0.00, real=0.01 secs]

2019-06-04T18:34:20.759+0800: 26740.082: [GC (Allocation Failure) 2019-06-04T18:34:20.759+0800: 26740.082: [ParNew: 430886K->20395K(463872K), 0.0083357 secs] 1186986K->776495K(1554228K), 0.0084955 secs] [Times: user=0.04 sys=0.00, real=0.01 secs] [/mw_shl_code]
回复

使用道具 举报

ighack 发表于 2019-6-5 09:23:25
[mw_shl_code=bash,true]2019-06-04 18:34:29,118 [myid:1] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection from /192.168.0.238:7785
2019-06-04 18:34:29,121 [myid:1] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@868] - Client attempting to establish new session at /192.168.0.238:7785
2019-06-04 18:34:29,124 [myid:1] - INFO  [CommitProcessor:1:ZooKeeperServer@617] - Established session 0x16b01d9f02d003c with negotiated timeout 80000 for client /192.168.0.238:
77852019-06-04 18:34:29,457 [myid:1] - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@357] - caught end of stream exception
EndOfStreamException: Unable to read additional data from client sessionid 0x16b01d9f02d003c, likely client has closed socket
        at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)
        at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
        at java.lang.Thread.run(Thread.java:748)
2019-06-04 18:34:29,458 [myid:1] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007] - Closed socket connection for client /192.168.0.238:7785 which had sessionid 0x16b01d9f02d003c[/mw_shl_code]
回复

使用道具 举报

12下一页
您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

关闭

推荐上一条 /2 下一条