分享

rabbitmq-server的服务排障

atsky123 发表于 2016-2-23 17:07:06 [显示全部楼层] 回帖奖励 阅读模式 关闭右栏 0 11167
备注:适用于CoroSync+Pacemaker+Rabbitmq集群+Openstack环境
当在Openstack环境中创建虚拟机或进行其它操作失败,查看日志错误原因为RabbitMQ连接超时时,可执行以下操作尝试解决问题:
一、确认RabbitMQ状态,连接任意controller节点:
1)执行以下命令查看pacemaker资源状态,确认以下示例中红色标示信息中包含了所有controller节点
[root@node-3 ~](controller)# pcs resource
vip__public    (ocf::mirantis:ns_IPaddr2):    Started
Clone Set: clone_ping_vip__public [ping_vip__public]
     Started: [ node-1.abc.com node-2.abc.com node-3.abc.com ]
vip__management    (ocf::mirantis:ns_IPaddr2):    Started
Clone Set: clone_p_openstack-heat-engine [p_openstack-heat-engine]
     Started: [ node-1.abc.com node-2.abc.com node-3.abc.com ]
p_openstack-ceilometer-central    (ocf::mirantis:ceilometer-agent-central):    Started
p_openstack-ceilometer-alarm-evaluator    (ocf::mirantis:ceilometer-alarm-evaluator):    Started
Clone Set: clone_p_neutron-openvswitch-agent [p_neutron-openvswitch-agent]
     Started: [ node-1.abc.com node-2.abc.com node-3.abc.com ]
p_neutron-dhcp-agent    (ocf::mirantis:neutron-agent-dhcp):    Started
Clone Set: clone_p_neutron-metadata-agent [p_neutron-metadata-agent]
     Started: [ node-1.abc.com node-2.abc.com node-3.abc.com ]
Clone Set: clone_p_neutron-l3-agent [p_neutron-l3-agent]
     Started: [ node-1.abc.com node-2.abc.com node-3.abc.com ]
Clone Set: clone_p_mysql [p_mysql]
     Started: [ node-1.abc.com node-2.abc.com node-3.abc.com ]
Clone Set: clone_p_rabbitmq-server [p_rabbitmq-server]
     Started: [ node-1.abc.com node-2.abc.com node-3.abc.com ]
Clone Set: clone_p_haproxy [p_haproxy]
     Started: [ node-1.abc.com node-2.abc.com node-3.abc.com ]
2)如果以上信息正常,分别登陆所有controller节点执行以下命令查看rabbitmq集群状态,如果输出信息为类似以下信息,说明RabbitMQ集群出现问题:
[root@node-1 ~](controller)# rabbitmqctl cluster_status
Cluster status of node 'rabbit@node-1' ...
[{nodes,[{disc,['rabbit@node-1','rabbit@node-2']}]},
{running_nodes,['rabbit@node-1','rabbit@node-2']},
{cluster_name,<<"rabbit@node-1.abc.com">>},
{partitions,[]}]
...done.
[root@node-2 ~](controller)# rabbitmqctl cluster_status
Cluster status of node 'rabbit@node-2' ...
[{nodes,[{disc,['rabbit@node-1','rabbit@node-2']}]},
{running_nodes,['rabbit@node-1','rabbit@node-2']},
{cluster_name,<<"rabbit@node-1.abc.com">>},
{partitions,[]}]
...done.
Cluster status of node 'rabbit@node-3' ...
[{nodes,[{disc,['rabbit@node-3']}]},
{running_nodes,['rabbit@node-3']},
{cluster_name,<<"rabbit@node-3.abc.com">>},
{partitions,[]}]
...done.
如果看到类似以上信息,说明环境中出现了两个RabbitMQ集群,分别为{cluster_name,<<"rabbit@node-1.abc.com">>}和{cluster_name,<<"rabbit@node-3.abc.com">>}
或者可以理解为node-3没有加入{cluster_name,<<"rabbit@node-1.abc.com">>}这个集群中
二、解决问题
通过以上分析,我们只需要将node-3加入{cluster_name,<<"rabbit@node-1.abc.com">>}这个集群中即可,登陆任意controller节点,执行以下命令
[root@node-3 ~](controller)# pcs resource ban p_rabbitmq-server node-3.abc.com
[root@node-3 ~](controller)# pcs resource clear p_rabbitmq-server node-3.abc.com
三、再次查看状态,确认问题已经解决
[root@node-3 ~](controller)# pcs resource
vip__public    (ocf::mirantis:ns_IPaddr2):    Started
Clone Set: clone_ping_vip__public [ping_vip__public]
     Started: [ node-1.abc.com node-2.abc.com node-3.abc.com ]
vip__management    (ocf::mirantis:ns_IPaddr2):    Started
Clone Set: clone_p_openstack-heat-engine [p_openstack-heat-engine]
     Started: [ node-1.abc.com node-2.abc.com node-3.abc.com ]
p_openstack-ceilometer-central    (ocf::mirantis:ceilometer-agent-central):    Started
p_openstack-ceilometer-alarm-evaluator    (ocf::mirantis:ceilometer-alarm-evaluator):    Started
Clone Set: clone_p_neutron-openvswitch-agent [p_neutron-openvswitch-agent]
     Started: [ node-1.abc.com node-2.abc.com node-3.abc.com ]
p_neutron-dhcp-agent    (ocf::mirantis:neutron-agent-dhcp):    Started
Clone Set: clone_p_neutron-metadata-agent [p_neutron-metadata-agent]
     Started: [ node-1.abc.com node-2.abc.com node-3.abc.com ]
Clone Set: clone_p_neutron-l3-agent [p_neutron-l3-agent]
     Started: [ node-1.abc.com node-2.abc.com node-3.abc.com ]
Clone Set: clone_p_mysql [p_mysql]
     Started: [ node-1.abc.com node-2.abc.com node-3.abc.com ]
Clone Set: clone_p_rabbitmq-server [p_rabbitmq-server]
     Started: [ node-1.abc.com node-2.abc.com node-3.abc.com ]
Clone Set: clone_p_haproxy [p_haproxy]
     Started: [ node-1.abc.com node-2.abc.com node-3.abc.com ]

[root@node-1 ~](controller)# rabbitmqctl cluster_status
Cluster status of node 'rabbit@node-1' ...
[{nodes,[{disc,['rabbit@node-1','rabbit@node-2','rabbit@node-3']}]},
{running_nodes,['rabbit@node-3','rabbit@node-2','rabbit@node-1']},
{cluster_name,<<"rabbit@node-1.abc.com">>},
{partitions,[]}]
...done.
[root@node-2 ~](controller)# rabbitmqctl cluster_status
Cluster status of node 'rabbit@node-2' ...
[{nodes,[{disc,['rabbit@node-1','rabbit@node-2','rabbit@node-3']}]},
{running_nodes,['rabbit@node-3','rabbit@node-1','rabbit@node-2']},
{cluster_name,<<"rabbit@node-1.abc.com">>},
{partitions,[]}]
...done.
[root@node-3 ~](controller)# rabbitmqctl cluster_status
Cluster status of node 'rabbit@node-3' ...
[{nodes,[{disc,['rabbit@node-1','rabbit@node-2','rabbit@node-3']}]},
{running_nodes,['rabbit@node-1','rabbit@node-2','rabbit@node-3']},
{cluster_name,<<"rabbit@node-1.abc.com">>},
{partitions,[]}]
...done.
作者:林夕  职位:系统工程师/运维工程师  QQ:630995935

没找到任何评论,期待你打破沉寂

您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

关闭

推荐上一条 /2 下一条