分享

使用OpenStack遇到的问题

hochikong 发表于 2014-8-3 16:03:39 [显示全部楼层] 回帖奖励 阅读模式 关闭右栏 0 8254
问题导读:               
1.使用什么参数排查问题?
2.rabbitmq停掉以后compute退出该怎么处理?




1.安装完openstack-nova-compute后没有日志输出:

缺少python依赖包,安装依赖包

python-repoze.lru-0.3-1.1.x86_64.rpm

2.安装完nova-compute后启动服务:

此时如果没有初始化数据会报告一个无法查询数据库的错误。

解决方法:

配置nova.conf的nova数据库,并使用nova-manage db sync初始化数据库。

3.配置libvirt和libvirt_type,启动nova-compute,出现问题:

2012-04-13 23:56:24 AUDIT nova.service [-] Starting compute node (version 2012.1-LOCALBRANCH:LOCALREVISION)

2012-04-13 23:56:24 CRITICAL nova [-] [Errno 2] No such file or directory: ‘/usr/lib64/python2.6/site-packages/instances’

解决方法,创建目录:

mkdir -p /usr/lib64/python2.6/site-packages/instances

4.nova-compute启动时出现:

2012-04-15 18:25:18 TRACE nova     raise exception.ClassNotFound(class_name=class_str, exception=exc)

2012-04-15 18:25:18 TRACE nova ClassNotFound: Class API could not be found: No module named glance.common

2012-04-15 18:25:18 TRACE nova

解决方法:(安装缺少的python包)

Installing: python-dateutil-1.5-1.1 [done]

Installing: python-pycrypto-2.5-1.1 [done]

Installing: python-passlib-1.5.3-1.1 [done]

Installing: python-xattr-0.6.2-1.1 [done]

Installing: python-PasteScript-1.7.5-1.1 [done]

Installing: python-python-memcached-1.47-1.1 [done]

Installing: libmysqlclient_r15-5.0.94-0.2.4.1 [done]

Installing: python-ldap-2.3.5-1.21 [done]

Installing: python-mysql-1.2.2-2.12 [done]

Installing: python-keystone-2012.1-1.1 [done]

5.安装完成后启动nova-compute,启动,nova-compute日志

2012-04-15 19:51:06 TRACE nova   File “/usr/lib64/python2.6/site-packages/sqlalchemy/engine/default.py”, line 330, in do_execute

2012-04-15 19:51:06 TRACE nova     cursor.execute(statement, parameters)

2012-04-15 19:51:06 TRACE nova   File “/usr/lib64/python2.6/site-packages/MySQLdb/cursors.py”, line 166, in execute

2012-04-15 19:51:06 TRACE nova     self.errorhandler(self, exc, value)

2012-04-15 19:51:06 TRACE nova   File “/usr/lib64/python2.6/site-packages/MySQLdb/connections.py”, line 35, in defaulterrorhandler

2012-04-15 19:51:06 TRACE nova     raise errorclass, errorvalue

目前来看nova数据库需要连接后数据库还没初始化。

解决方法,初始化nova数据库:

SUSEsp2:/var/log/nova # nova-manage db sync

2012-04-15 20:02:45 DEBUG nova.utils [-] backend <module ‘nova.db.sqlalchemy.migration’ from ‘/usr/lib64/python2.6/site-packages/nova/db/sqlalchemy/migration.pyc’> from (pid=10941) __get_backend /usr/lib64/python2.6/site-packages/nova/utils.py:658

2012-04-15 20:03:23 WARNING nova.utils [-] /usr/lib64/python2.6/site-packages/nova/db/sqlalchemy/migrate_repo/versions/075_convert_bw_usage_to_store_network_id.py:49: SADeprecationWarning: useexisting is deprecated.  Use extend_existing.

useexisting=True)

2012-04-15 20:03:31 WARNING nova.utils [-] /usr/lib64/python2.6/site-packages/nova/db/sqlalchemy/migrate_repo/versions/081_drop_instance_id_bw_cache.py:40: SADeprecationWarning: useexisting is deprecated.  Use extend_existing.

useexisting=True)

6.libvirt连接错误:

2012-04-15 20:24:08 TRACE nova   File “/usr/lib64/python2.6/site-packages/libvirt.py”, line 2836, in getVersion

2012-04-15 20:24:08 TRACE nova     if ret == -1: raise libvirtError (‘virConnectGetVersion() failed’, conn=self)

2012-04-15 20:24:08 TRACE nova libvirtError: internal error Cannot find suitable emulator for x86_64

解决方法:

Essex默认配置nova.conf的libvirt_type=”xen”默认配置文件中需要有引号,无法读取,解决方式,libvirt_type=xen这样即可。

7.在用户生成证书时报如下错误:

在SUSEsp2:~/key # nova-manage project zipfile –project=mycloud –user=kevin –file=nova.zip

Stderr: “Using configuration from ./openssl.cnfnerror loading the config file ‘./openssl.cnf’n15649:error:02001002:system library:fopen:No such file or directory:bss_file.c:126:fopen(‘./openssl.cnf’,'rb’)n15649:error:2006D080:BIO routines:BIO_new_file:no such file:bss_file.c:129:n15649:error:0E078072:configuration file routines:DEF_LOAD:no such file:conf_def.c:197:n”

The above error may show that the certificate db has not been created.

Please create a database by running a nova-cert server on this host.

解决方法:

SUSEsp2:~/key # zypper install openstack-nova-cert

SUSEsp2:~/key # /etc/init.d/openstack-nova-cert start

SUSEsp2:~/key # chkconfig openstack-nova-cert on

8.在使用nova查看虚拟实例时出现400错误:

SUSEsp2:~/key # nova image-list

ERROR: n/a (HTTP 400)

解决方法:

SUSEsp2:~ # zypper search nova-api

Loading repository data…

Reading installed packages…

S | Name               | Summary                        | Type

–+——————–+——————————–+——–

| openstack-nova-api | OpenStack Compute API services | package

SUSEsp2:~ # zypper install openstack-nova-api

其它问题引起的http 400错误,novarc环境变量写错,这点很重要:

SUSE11sp2:~/user # cat novarc

NOVARC=$(readlink -f “${BASH_SOURCE:-${0}}” 2>/dev/null) ||

NOVARC=$(python -c ‘import os,sys; print os.path.abspath(os.path.realpath(sys.argv[1]))’ “${BASH_SOURCE:-${0}}”)

NOVA_KEY_DIR=${NOVARC%/*}

export EC2_ACCESS_KEY=”kevin:mycloud”

export EC2_SECRET_KEY=”f20bb381-9cbf-40a7-a84f-499b815efa19″

export EC2_URL=”http://192.168.1.76:8773/services/Cloud

export S3_URL=”http://192.168.1.76:3333

export EC2_USER_ID=42 # nova does not use user id, but bundling requires it

export EC2_PRIVATE_KEY=${NOVA_KEY_DIR}/pk.pem

export EC2_CERT=${NOVA_KEY_DIR}/cert.pem

export NOVA_CERT=${NOVA_KEY_DIR}/cacert.pem

export EUCALYPTUS_CERT=${NOVA_CERT} # euca-bundle-image seems to require this set

alias ec2-bundle-image=”ec2-bundle-image –cert ${EC2_CERT} –privatekey ${EC2_PRIVATE_KEY} –user 42 –ec2cert ${NOVA_CERT}”

alias ec2-upload-bundle=”ec2-upload-bundle -a ${EC2_ACCESS_KEY} -s ${EC2_SECRET_KEY} –url ${S3_URL} –ec2cert ${NOVA_CERT}”

export NOVA_API_KEY=”kevin”

export NOVA_USERNAME=”kevin”

export NOVA_PROJECT_ID=”mycloud”

export NOVA_URL=”http://192.168.1.76:8774/v1.1/

export NOVA_VERSION=”1.1″

9.在openstack-nova-compute启动时报错:

2012-04-14 00:33:54 TRACE nova     return libvirt.openAuth(uri, auth, 0)

2012-04-14 00:33:54 TRACE nova   File “/usr/lib64/python2.6/site-packages/libvirt.py”, line 102, in openAuth

2012-04-14 00:33:54 TRACE nova     if ret is None:raise libvirtError(‘virConnectOpenAuth() failed’)

2012-04-14 00:33:54 TRACE nova libvirtError: Failed to connect socket to ‘/var/run/libvirt/libvirt-sock’: No such file or directory

2012-04-14 00:33:54 TRACE nova

问题libvirt服务没启动,需要启动libvirt服务。

SUSE SP2上在物理机启动过程中,openstack-nova-compute先于libvirtd启动, 每次重启物理机需要在手动重启openstack-nova-compute。(不知道他们怎么理解的,估计这个是个BUG,嘿嘿)

另外造成上述错误也有可能缺少相关的软件包,安装并重启服务:

SUSE:/var/log/nova # zypper install avahi

Loading repository data…

Reading installed packages…

Resolving package dependencies…

The following NEW packages are going to be installed:

avahi avahi-lang libavahi-core5 libdaemon0 nss-mdns nss-mdns-32bit

10.启动nova-network时报地址池被占用:

The ‘listeners’ argument to Pool (and create_engine()) is deprecated. Use event.listen().n Pool.__init__(self, creator, **kw)nn2012-04-16 12:50:30 WARNING nova.utils [req-c4afc2fa-361a-4586-93aa-e203bff0937b None None] /usr/lib64/python2.6/site-packages/sqlalchemy/pool.py:145: SADeprecationWarning: Pool.add_listener is deprecated. Use event.listen()n self.add_listener(l)nnndnsmasq: failed to create listening socket for 172.16.0.1: Address already in usen”

解决方法:(这个问题为dnsmasq服务启动,如果再启动会占用原来的进程,多启动了一次)

/etc/init.d/dnsmasq stop

chkconfig dnsmasq off


排查问题使用–debug或者–verbose参数跟踪:

SUSEsp2:~ # nova –debug list

connect: (127.0.0.1, 8774)

send: ‘GET /v1.1 HTTP/1.1rnHost: 127.0.0.1:8774rnx-auth-project-id: mycloudrnaccept-encoding: gzip, deflaternx-auth-user: kevinrnuser-agent: python-novaclientrnx-auth-key: kevinrnaccept: application/jsonrnrn’

reply: ‘HTTP/1.1 204 No Contentrn’

header: Content-Length: 0

header: X-Auth-Token: kevin:mycloud

header: X-Server-Management-Url: http://127.0.0.1:8774/v1.1/mycloud

header: Content-Type: text/plain; charset=UTF-8

header: Date: Mon, 16 Apr 2012 03:19:47 GMT

send: ‘GET /v1.1/mycloud/servers/detail HTTP/1.1rnHost: 127.0.0.1:8774rnx-auth-project-id: mycloudrnx-auth-token: kevin:mycloudrnaccept-encoding: gzip, deflaternaccept: application/jsonrnuser-agent: python-novaclientrnrn’

reply: ‘HTTP/1.1 200 OKrn’

header: X-Compute-Request-Id: req-753d19f9-7267-410f-8591-f0fccb413cf9

header: Content-Type: application/json

header: Content-Length: 15

header: Date: Mon, 16 Apr 2012 03:19:47 GMT

+—-+——+——–+———-+

| ID | Name | Status | Networks |

+—-+——+——–+———-+

+—-+——+——–+———-+

13.在测试的时候,多次对于网络操作,会引起如下错误:
2012-05-11 17:51:04 TRACE nova.rpc.amqp [u'Traceback (most recent call last):n', u'  File "/usr/lib64/python2.6/site-packages/nova/rpc/amqp.py", line 252, in _process_datan    rval = node_func(context=ctxt, **node_args)n', u'  File "/usr/lib64/python2.6/site-packages/nova/network/manager.py", line 258, in wrappedn    return func(self, context, *args, **kwargs)n', u'  File "/usr/lib64/python2.6/site-packages/nova/network/manager.py", line 321, in allocate_for_instancen    **kwargs)n', u'  File "/usr/lib64/python2.6/site-packages/nova/network/manager.py", line 258, in wrappedn    return func(self, context, *args, **kwargs)n', u'  File "/usr/lib64/python2.6/site-packages/nova/network/manager.py", line 907, in allocate_for_instancen    requested_networks=requested_networks)n', u'  File "/usr/lib64/python2.6/site-packages/nova/network/manager.py", line 196, in _allocate_fixed_ipsn    utils.to_primitive(network)}})n', u'  File "/usr/lib64/python2.6/site-packages/nova/rpc/__init__.py", line 68, in calln    return _get_impl().call(context, topic, msg, timeout)n', u'  File "/usr/lib64/python2.6/site-packages/nova/rpc/impl_kombu.py", line 674, in calln    return rpc_amqp.call(context, topic, msg, timeout, Connection.pool)n', u'  File "/usr/lib64/python2.6/site-packages/nova/rpc/amqp.py", line 338, in calln    rv = list(rv)n', u'  File "/usr/lib64/python2.6/site-packages/nova/rpc/amqp.py", line 306, in __iter__n    raise resultn', u'RemoteError: Remote error: NetworkNotFound Network 4 could not be found.n[u'Traceback (most recent call last):\n', u'  File "/usr/lib64/python2.6/site-packages/nova/rpc/amqp.py", line 252, in _process_data\n    rval = node_func(context=ctxt, **node_args)\n', u'  File "/usr/lib64/python2.6/site-packages/nova/network/manager.py", line 785, in set_network_host\n    self.host)\n', u'  File "/usr/lib64/python2.6/site-packages/nova/db/api.py", line 818, in network_set_host\n    return IMPL.network_set_host(context, network_id, host_id)\n', u'  File "/usr/lib64/python2.6/site-packages/nova/db/sqlalchemy/api.py", line 102, in wrapper\n    return f(*args, **kwargs)\n', u'  File "/usr/lib64/python2.6/site-packages/nova/db/sqlalchemy/api.py", line 2110, in network_set_host\n    raise exception.NetworkNotFound(network_id=network_id)\n', u'NetworkNotFound: Network 4 could not be found.\n'].n’].

解决方法:
drop database nova;
create database nova;
重新初始化数据库:
nova-manage db sync

14.rabbitmq 停掉以后,compute会退出
当rabbitmq 停掉以后,过两分钟左右,compute会自动退出,日志中出现:

2012-03-25 21:41:26 INFO nova.rpc.common [-] Reconnecting to AMQP server on 192.168.28.5:5672
2012-03-25 21:41:27 ERROR nova.rpc.common [-] AMQP server on 192.168.28.5:5672 is unreachable: [Errno 113] EHOSTUNREACH. Trying again in 7 seconds.
(nova.rpc.common): TRACE: Traceback (most recent call last):
(nova.rpc.common): TRACE: File “/usr/lib/python2.6/site-packages/nova/rpc/impl_kombu.py”, line 446, in reconnect
(nova.rpc.common): TRACE: self._connect()
(nova.rpc.common): TRACE: File “/usr/lib/python2.6/site-packages/nova/rpc/impl_kombu.py”, line 423, in _connect
(nova.rpc.common): TRACE: self.connection.connect()
(nova.rpc.common): TRACE: File “/usr/lib/python2.6/site-packages/kombu/connection.py”, line 118, in connect
(nova.rpc.common): TRACE: return self.connection
(nova.rpc.common): TRACE: File “/usr/lib/python2.6/site-packages/kombu/connection.py”, line 438, in connection
(nova.rpc.common): TRACE: self._connection = self._establish_connection()
(nova.rpc.common): TRACE: File “/usr/lib/python2.6/site-packages/kombu/connection.py”, line 404, in _establish_connection
(nova.rpc.common): TRACE: conn = self.transport.establish_connection()
(nova.rpc.common): TRACE: File “/usr/lib/python2.6/site-packages/kombu/transport/pyamqplib.py”, line 242, in establish_connection
(nova.rpc.common): TRACE: connect_timeout=conninfo.connect_timeout)
(nova.rpc.common): TRACE: File “/usr/lib/python2.6/site-packages/kombu/transport/pyamqplib.py”, line 51, in __init__
(nova.rpc.common): TRACE: super(Connection, self).__init__(*args, **kwargs)
(nova.rpc.common): TRACE: File “/usr/lib/python2.6/site-packages/amqplib/client_0_8/connection.py”, line 125, in __init__
(nova.rpc.common): TRACE: self.transport = create_transport(host, connect_timeout, ssl)
(nova.rpc.common): TRACE: File “/usr/lib/python2.6/site-packages/amqplib/client_0_8/transport.py”, line 220, in create_transport
(nova.rpc.common): TRACE: return TCPTransport(host, connect_timeout)
(nova.rpc.common): TRACE: File “/usr/lib/python2.6/site-packages/amqplib/client_0_8/transport.py”, line 58, in __init__
(nova.rpc.common): TRACE: self.sock.connect((host, port))
(nova.rpc.common): TRACE: File “/usr/lib/python2.6/site-packages/eventlet/greenio.py”, line 179, in connect
(nova.rpc.common): TRACE: socket_checkerr(fd)
(nova.rpc.common): TRACE: File “/usr/lib/python2.6/site-packages/eventlet/greenio.py”, line 43, in socket_checkerr
(nova.rpc.common): TRACE: raise socket.error(err, errno.errorcode[err])
(nova.rpc.common): TRACE: error: [Errno 113] EHOSTUNREACH
(nova.rpc.common): TRACE:

这个问题,是由于openstack中,对rabbitmq 如果失去连接,会进行尝试,缺省是尝试12次,每次间隔10秒,到时间还不能连接,就抛出错误,退出。

解决办法,在 nova.conf 加入下面的参数:

#防止 rabbitmq重启导致 compute 死掉
rabbit_max_retries=0

具体原因,可以参见代码:impl_kombu.py

self.max_retries = FLAGS.rabbit_max_retries

def reconnect(self):
“”"Handles reconnecting and re-establishing queues.
Will retry up to self.max_retries number of times.
self.max_retries = 0 means to retry forever.
Sleep between tries, starting at self.interval_start
seconds, backing off self.interval_stepping number of seconds
each attempt.
“”"

if self.max_retries and attempt == self.max_retries:
LOG.exception(_(‘Unable to connect to AMQP server on ‘
‘%(hostname)s:%(port)d after %(max_retries)d ‘
‘tries: %(err_str)s’) % log_info)
# NOTE(comstud): Copied from original code. There’s
# really no better recourse because if this was a queue we
# need to consume on, we have no way to consume anymore.
sys.exit(1)





#############################################################

本文转编自:http://www.codesky.net/article/201206/171742.html

欢迎加入about云群9037177932227315139327136 ,云计算爱好者群,亦可关注about云腾讯认证空间||关注本站微信

没找到任何评论,期待你打破沉寂

您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

关闭

推荐上一条 /2 下一条