那个 HA切换不是由zookeeper来做的么。它来判断HA是否挂掉?他们之间应该有通信把?
我在zkfc.log中看到个这个:
2015-02-10 03:51:19,409 INFO org.apache.zookeeper.ClientCnxn: Client session timed out, have not heard from server in 9629ms for sessionid 0x14845929e1a0165, closing socket connection and attempting reconnect
2015-02-10 03:51:19,511 INFO org.apache.hadoop.ha.ActiveStandbyElector: Session disconnected. Entering neutral mode...
2015-02-10 03:51:19,864 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server sh-d0-sv0481.sh.idc.yunyun.com/10.21.113.31:2181. Will not attempt to authenticate using SASL (Unable to locate a login configuration)
2015-02-10 03:51:19,865 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to sh-d0-sv0481.sh.idc.yunyun.com/10.21.113.31:2181, initiating session
2015-02-10 03:51:19,867 INFO org.apache.zookeeper.ClientCnxn: Session establishment complete on server sh-d0-sv0481.sh.idc.yunyun.com/10.21.113.31:2181, sessionid = 0x14845929e1a0165, negotiated timeout = 5000
2015-02-10 03:51:19,869 INFO org.apache.hadoop.ha.ActiveStandbyElector: Session connected.
2015-02-10 03:51:19,876 INFO org.apache.hadoop.ha.ZKFailoverController: ZK Election indicated that NameNode at sh-d0-sv0676.sh.idc.yunyun.com/10.21.129.42:8020 should become standby
2015-02-10 03:51:19,880 INFO org.apache.hadoop.ha.ZKFailoverController: Successfully transitioned NameNode at sh-d0-sv0676.sh.idc.yunyun.com/10.21.129.42:8020 to standby state
2015-02-10 04:20:22,561 INFO org.apache.zookeeper.ClientCnxn: Unable to read additional data from server sessionid 0x14845929e1a0165, likely server has closed socket, closing socket connection and attempting reconnect
2015-02-10 04:20:22,663 INFO org.apache.hadoop.ha.ActiveStandbyElector: Session disconnected. Entering neutral mode...
2015-02-10 04:20:22,814 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server sh-d0-sv0630.sh.idc.yunyun.com/10.21.112.44:2181. Will not attempt to authenticate using SASL (Unable to locate a login configuration)
2015-02-10 04:20:22,814 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to sh-d0-sv0630.sh.idc.yunyun.com/10.21.112.44:2181, initiating session
2015-02-10 04:20:22,816 INFO org.apache.zookeeper.ClientCnxn: Unable to reconnect to ZooKeeper service, session 0x14845929e1a0165 has expired, closing socket connection
2015-02-10 04:20:22,817 INFO org.apache.hadoop.ha.ActiveStandbyElector: Session expired. Entering neutral mode and rejoining...
2015-02-10 04:20:22,818 INFO org.apache.hadoop.ha.ActiveStandbyElector: Trying to re-establish ZK session
2015-02-10 04:20:22,821 INFO org.apache.zookeeper.ZooKeeper: Initiating client connection, connectString=sh-d0-sv0630.sh.idc.yunyun.com:2181,sh-d0-sv0481.sh.idc.yunyun.com:2181,sh-d0-sv0679.sh.idc.yunyun.com:2181 sessionTimeout=5000 watcher=org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef@7143363e
2015-02-10 04:20:22,823 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server sh-d0-sv0679.sh.idc.yunyun.com/10.21.129.45:2181. Will not attempt to authenticate using SASL (Unable to locate a login configuration)
2015-02-10 04:20:22,823 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to sh-d0-sv0679.sh.idc.yunyun.com/10.21.129.45:2181, initiating session
2015-02-10 04:20:22,831 INFO org.apache.zookeeper.ClientCnxn: Session establishment complete on server sh-d0-sv0679.sh.idc.yunyun.com/10.21.129.45:2181, sessionid = 0x14845929e1a0166, negotiated timeout = 5000
2015-02-10 04:20:22,834 INFO org.apache.zookeeper.ClientCnxn: EventThread shut down
2015-02-10 04:20:22,836 INFO org.apache.hadoop.ha.ActiveStandbyElector: Session connected.
2015-02-10 04:20:22,845 INFO org.apache.hadoop.ha.ActiveStandbyElector: Checking for any old active which needs to be fenced...
2015-02-10 04:20:22,847 INFO org.apache.hadoop.ha.ActiveStandbyElector: Old node exists: 0a126164732d6861646f6f702d636c757374657212036e6e311a1e73682d64302d7376303637352e73682e6964632e79756e79756e2e636f6d20d43e28d33e
2015-02-10 04:20:22,849 INFO org.apache.hadoop.ha.ZKFailoverController: Should fence: NameNode at sh-d0-sv0675.sh.idc.yunyun.com/10.21.129.41:8020
2015-02-10 04:20:22,934 INFO org.apache.hadoop.ha.ZKFailoverController: Successfully transitioned NameNode at sh-d0-sv0675.sh.idc.yunyun.com/10.21.129.41:8020 to standby state without fencing
2015-02-10 04:20:22,934 INFO org.apache.hadoop.ha.ActiveStandbyElector: Writing znode /hadoop-ha/ads-hadoop-cluster/ActiveBreadCrumb to indicate that the local node is the most recent active...
2015-02-10 04:20:22,947 INFO org.apache.hadoop.ha.ZKFailoverController: Trying to make NameNode at sh-d0-sv0676.sh.idc.yunyun.com/10.21.129.42:8020 active...
2015-02-10 04:20:23,885 INFO org.apache.hadoop.ha.ZKFailoverController: Successfully transitioned NameNode at sh-d0-sv0676.sh.idc.yunyun.com/10.21.129.42:8020 to active state
是不是就是说连接超时阿 ,然后给切换了。但是我在namenode中没有发现任何异常阿?所以一直怀疑是zookeeper出了问题?
|