备注:适用于CoroSync+Pacemaker+Rabbitmq集群+Openstack环境 当在Openstack环境中创建虚拟机或进行其它操作失败,查看日志错误原因为RabbitMQ连接超时时,可执行以下操作尝试解决问题: 一、确认RabbitMQ状态,连接任意controller节点: 1)执行以下命令查看pacemaker资源状态,确认以下示例中红色标示信息中包含了所有controller节点 [root@node-3 ~](controller)# pcs resource vip__public (ocf::mirantis:ns_IPaddr2): Started Clone Set: clone_ping_vip__public [ping_vip__public] Started: [ node-1.abc.com node-2.abc.com node-3.abc.com ] vip__management (ocf::mirantis:ns_IPaddr2): Started Clone Set: clone_p_openstack-heat-engine [p_openstack-heat-engine] Started: [ node-1.abc.com node-2.abc.com node-3.abc.com ] p_openstack-ceilometer-central (ocf::mirantis:ceilometer-agent-central): Started p_openstack-ceilometer-alarm-evaluator (ocf::mirantis:ceilometer-alarm-evaluator): Started Clone Set: clone_p_neutron-openvswitch-agent [p_neutron-openvswitch-agent] Started: [ node-1.abc.com node-2.abc.com node-3.abc.com ] p_neutron-dhcp-agent (ocf::mirantis:neutron-agent-dhcp): Started Clone Set: clone_p_neutron-metadata-agent [p_neutron-metadata-agent] Started: [ node-1.abc.com node-2.abc.com node-3.abc.com ] Clone Set: clone_p_neutron-l3-agent [p_neutron-l3-agent] Started: [ node-1.abc.com node-2.abc.com node-3.abc.com ] Clone Set: clone_p_mysql [p_mysql] Started: [ node-1.abc.com node-2.abc.com node-3.abc.com ] Clone Set: clone_p_rabbitmq-server [p_rabbitmq-server] Started: [ node-1.abc.com node-2.abc.com node-3.abc.com ] Clone Set: clone_p_haproxy [p_haproxy] Started: [ node-1.abc.com node-2.abc.com node-3.abc.com ] 2)如果以上信息正常,分别登陆所有controller节点执行以下命令查看rabbitmq集群状态,如果输出信息为类似以下信息,说明RabbitMQ集群出现问题: [root@node-1 ~](controller)# rabbitmqctl cluster_status Cluster status of node 'rabbit@node-1' ... [{nodes,[{disc,['rabbit@node-1','rabbit@node-2']}]}, {running_nodes,['rabbit@node-1','rabbit@node-2']}, {cluster_name,<<"rabbit@node-1.abc.com">>}, {partitions,[]}] ...done. [root@node-2 ~](controller)# rabbitmqctl cluster_status Cluster status of node 'rabbit@node-2' ... [{nodes,[{disc,['rabbit@node-1','rabbit@node-2']}]}, {running_nodes,['rabbit@node-1','rabbit@node-2']}, {cluster_name,<<"rabbit@node-1.abc.com">>}, {partitions,[]}] ...done. Cluster status of node 'rabbit@node-3' ... [{nodes,[{disc,['rabbit@node-3']}]}, {running_nodes,['rabbit@node-3']}, {cluster_name,<<"rabbit@node-3.abc.com">>}, {partitions,[]}] ...done. 如果看到类似以上信息,说明环境中出现了两个RabbitMQ集群,分别为{cluster_name,<<"rabbit@node-1.abc.com">>}和{cluster_name,<<"rabbit@node-3.abc.com">>} 或者可以理解为node-3没有加入{cluster_name,<<"rabbit@node-1.abc.com">>}这个集群中 二、解决问题 通过以上分析,我们只需要将node-3加入{cluster_name,<<"rabbit@node-1.abc.com">>}这个集群中即可,登陆任意controller节点,执行以下命令 [root@node-3 ~](controller)# pcs resource ban p_rabbitmq-server node-3.abc.com [root@node-3 ~](controller)# pcs resource clear p_rabbitmq-server node-3.abc.com 三、再次查看状态,确认问题已经解决 [root@node-3 ~](controller)# pcs resource vip__public (ocf::mirantis:ns_IPaddr2): Started Clone Set: clone_ping_vip__public [ping_vip__public] Started: [ node-1.abc.com node-2.abc.com node-3.abc.com ] vip__management (ocf::mirantis:ns_IPaddr2): Started Clone Set: clone_p_openstack-heat-engine [p_openstack-heat-engine] Started: [ node-1.abc.com node-2.abc.com node-3.abc.com ] p_openstack-ceilometer-central (ocf::mirantis:ceilometer-agent-central): Started p_openstack-ceilometer-alarm-evaluator (ocf::mirantis:ceilometer-alarm-evaluator): Started Clone Set: clone_p_neutron-openvswitch-agent [p_neutron-openvswitch-agent] Started: [ node-1.abc.com node-2.abc.com node-3.abc.com ] p_neutron-dhcp-agent (ocf::mirantis:neutron-agent-dhcp): Started Clone Set: clone_p_neutron-metadata-agent [p_neutron-metadata-agent] Started: [ node-1.abc.com node-2.abc.com node-3.abc.com ] Clone Set: clone_p_neutron-l3-agent [p_neutron-l3-agent] Started: [ node-1.abc.com node-2.abc.com node-3.abc.com ] Clone Set: clone_p_mysql [p_mysql] Started: [ node-1.abc.com node-2.abc.com node-3.abc.com ] Clone Set: clone_p_rabbitmq-server [p_rabbitmq-server] Started: [ node-1.abc.com node-2.abc.com node-3.abc.com ] Clone Set: clone_p_haproxy [p_haproxy] Started: [ node-1.abc.com node-2.abc.com node-3.abc.com ]
[root@node-1 ~](controller)# rabbitmqctl cluster_status Cluster status of node 'rabbit@node-1' ... [{nodes,[{disc,['rabbit@node-1','rabbit@node-2','rabbit@node-3']}]}, {running_nodes,['rabbit@node-3','rabbit@node-2','rabbit@node-1']}, {cluster_name,<<"rabbit@node-1.abc.com">>}, {partitions,[]}] ...done. [root@node-2 ~](controller)# rabbitmqctl cluster_status Cluster status of node 'rabbit@node-2' ... [{nodes,[{disc,['rabbit@node-1','rabbit@node-2','rabbit@node-3']}]}, {running_nodes,['rabbit@node-3','rabbit@node-1','rabbit@node-2']}, {cluster_name,<<"rabbit@node-1.abc.com">>}, {partitions,[]}] ...done. [root@node-3 ~](controller)# rabbitmqctl cluster_status Cluster status of node 'rabbit@node-3' ... [{nodes,[{disc,['rabbit@node-1','rabbit@node-2','rabbit@node-3']}]}, {running_nodes,['rabbit@node-1','rabbit@node-2','rabbit@node-3']}, {cluster_name,<<"rabbit@node-1.abc.com">>}, {partitions,[]}] ...done. 作者:林夕 职位:系统工程师/运维工程师 QQ:630995935
|