问题导读:
1.使用什么参数排查问题?
2.rabbitmq停掉以后compute退出该怎么处理?
1.安装完openstack-nova-compute后没有日志输出:
缺少python依赖包,安装依赖包
python-repoze.lru-0.3-1.1.x86_64.rpm
2.安装完nova-compute后启动服务:
此时如果没有初始化数据会报告一个无法查询数据库的错误。
解决方法:
配置nova.conf的nova数据库,并使用nova-manage db sync初始化数据库。
3.配置libvirt和libvirt_type,启动nova-compute,出现问题:
2012-04-13 23:56:24 AUDIT nova.service [-] Starting compute node (version 2012.1-LOCALBRANCH:LOCALREVISION)
2012-04-13 23:56:24 CRITICAL nova [-] [Errno 2] No such file or directory: ‘/usr/lib64/python2.6/site-packages/instances’
解决方法,创建目录:
mkdir -p /usr/lib64/python2.6/site-packages/instances
4.nova-compute启动时出现:
2012-04-15 18:25:18 TRACE nova raise exception.ClassNotFound(class_name=class_str, exception=exc)
2012-04-15 18:25:18 TRACE nova ClassNotFound: Class API could not be found: No module named glance.common
2012-04-15 18:25:18 TRACE nova
解决方法:(安装缺少的python包)
Installing: python-dateutil-1.5-1.1 [done]
Installing: python-pycrypto-2.5-1.1 [done]
Installing: python-passlib-1.5.3-1.1 [done]
Installing: python-xattr-0.6.2-1.1 [done]
Installing: python-PasteScript-1.7.5-1.1 [done]
Installing: python-python-memcached-1.47-1.1 [done]
Installing: libmysqlclient_r15-5.0.94-0.2.4.1 [done]
Installing: python-ldap-2.3.5-1.21 [done]
Installing: python-mysql-1.2.2-2.12 [done]
Installing: python-keystone-2012.1-1.1 [done]
5.安装完成后启动nova-compute,启动,nova-compute日志
2012-04-15 19:51:06 TRACE nova File “/usr/lib64/python2.6/site-packages/sqlalchemy/engine/default.py”, line 330, in do_execute
2012-04-15 19:51:06 TRACE nova cursor.execute(statement, parameters)
2012-04-15 19:51:06 TRACE nova File “/usr/lib64/python2.6/site-packages/MySQLdb/cursors.py”, line 166, in execute
2012-04-15 19:51:06 TRACE nova self.errorhandler(self, exc, value)
2012-04-15 19:51:06 TRACE nova File “/usr/lib64/python2.6/site-packages/MySQLdb/connections.py”, line 35, in defaulterrorhandler
2012-04-15 19:51:06 TRACE nova raise errorclass, errorvalue
目前来看nova数据库需要连接后数据库还没初始化。
解决方法,初始化nova数据库:
SUSEsp2:/var/log/nova # nova-manage db sync
2012-04-15 20:02:45 DEBUG nova.utils [-] backend <module ‘nova.db.sqlalchemy.migration’ from ‘/usr/lib64/python2.6/site-packages/nova/db/sqlalchemy/migration.pyc’> from (pid=10941) __get_backend /usr/lib64/python2.6/site-packages/nova/utils.py:658
2012-04-15 20:03:23 WARNING nova.utils [-] /usr/lib64/python2.6/site-packages/nova/db/sqlalchemy/migrate_repo/versions/075_convert_bw_usage_to_store_network_id.py:49: SADeprecationWarning: useexisting is deprecated. Use extend_existing.
useexisting=True)
2012-04-15 20:03:31 WARNING nova.utils [-] /usr/lib64/python2.6/site-packages/nova/db/sqlalchemy/migrate_repo/versions/081_drop_instance_id_bw_cache.py:40: SADeprecationWarning: useexisting is deprecated. Use extend_existing.
useexisting=True)
6.libvirt连接错误:
2012-04-15 20:24:08 TRACE nova File “/usr/lib64/python2.6/site-packages/libvirt.py”, line 2836, in getVersion
2012-04-15 20:24:08 TRACE nova if ret == -1: raise libvirtError (‘virConnectGetVersion() failed’, conn=self)
2012-04-15 20:24:08 TRACE nova libvirtError: internal error Cannot find suitable emulator for x86_64
解决方法:
Essex默认配置nova.conf的libvirt_type=”xen”默认配置文件中需要有引号,无法读取,解决方式,libvirt_type=xen这样即可。
7.在用户生成证书时报如下错误:
在SUSEsp2:~/key # nova-manage project zipfile –project=mycloud –user=kevin –file=nova.zip
Stderr: “Using configuration from ./openssl.cnfnerror loading the config file ‘./openssl.cnf’n15649:error:02001002:system library:fopen:No such file or directory:bss_file.c:126:fopen(‘./openssl.cnf’,'rb’)n15649:error:2006D080:BIO routines:BIO_new_file:no such file:bss_file.c:129:n15649:error:0E078072:configuration file routines:DEF_LOAD:no such file:conf_def.c:197:n”
The above error may show that the certificate db has not been created.
Please create a database by running a nova-cert server on this host.
解决方法:
SUSEsp2:~/key # zypper install openstack-nova-cert
SUSEsp2:~/key # /etc/init.d/openstack-nova-cert start
SUSEsp2:~/key # chkconfig openstack-nova-cert on
8.在使用nova查看虚拟实例时出现400错误:
SUSEsp2:~/key # nova image-list
ERROR: n/a (HTTP 400)
解决方法:
SUSEsp2:~ # zypper search nova-api
Loading repository data…
Reading installed packages…
S | Name | Summary | Type
–+——————–+——————————–+——–
| openstack-nova-api | OpenStack Compute API services | package
SUSEsp2:~ # zypper install openstack-nova-api
其它问题引起的http 400错误,novarc环境变量写错,这点很重要:
SUSE11sp2:~/user # cat novarc
NOVARC=$(readlink -f “${BASH_SOURCE:-${0}}” 2>/dev/null) ||
NOVARC=$(python -c ‘import os,sys; print os.path.abspath(os.path.realpath(sys.argv[1]))’ “${BASH_SOURCE:-${0}}”)
NOVA_KEY_DIR=${NOVARC%/*}
export EC2_ACCESS_KEY=”kevin:mycloud”
export EC2_SECRET_KEY=”f20bb381-9cbf-40a7-a84f-499b815efa19″
export EC2_URL=”http://192.168.1.76:8773/services/Cloud”
export S3_URL=”http://192.168.1.76:3333″
export EC2_USER_ID=42 # nova does not use user id, but bundling requires it
export EC2_PRIVATE_KEY=${NOVA_KEY_DIR}/pk.pem
export EC2_CERT=${NOVA_KEY_DIR}/cert.pem
export NOVA_CERT=${NOVA_KEY_DIR}/cacert.pem
export EUCALYPTUS_CERT=${NOVA_CERT} # euca-bundle-image seems to require this set
alias ec2-bundle-image=”ec2-bundle-image –cert ${EC2_CERT} –privatekey ${EC2_PRIVATE_KEY} –user 42 –ec2cert ${NOVA_CERT}”
alias ec2-upload-bundle=”ec2-upload-bundle -a ${EC2_ACCESS_KEY} -s ${EC2_SECRET_KEY} –url ${S3_URL} –ec2cert ${NOVA_CERT}”
export NOVA_API_KEY=”kevin”
export NOVA_USERNAME=”kevin”
export NOVA_PROJECT_ID=”mycloud”
export NOVA_URL=”http://192.168.1.76:8774/v1.1/”
export NOVA_VERSION=”1.1″
9.在openstack-nova-compute启动时报错:
2012-04-14 00:33:54 TRACE nova return libvirt.openAuth(uri, auth, 0)
2012-04-14 00:33:54 TRACE nova File “/usr/lib64/python2.6/site-packages/libvirt.py”, line 102, in openAuth
2012-04-14 00:33:54 TRACE nova if ret is None:raise libvirtError(‘virConnectOpenAuth() failed’)
2012-04-14 00:33:54 TRACE nova libvirtError: Failed to connect socket to ‘/var/run/libvirt/libvirt-sock’: No such file or directory
2012-04-14 00:33:54 TRACE nova
问题libvirt服务没启动,需要启动libvirt服务。
SUSE SP2上在物理机启动过程中,openstack-nova-compute先于libvirtd启动, 每次重启物理机需要在手动重启openstack-nova-compute。(不知道他们怎么理解的,估计这个是个BUG,嘿嘿)
另外造成上述错误也有可能缺少相关的软件包,安装并重启服务:
SUSE:/var/log/nova # zypper install avahi
Loading repository data…
Reading installed packages…
Resolving package dependencies…
The following NEW packages are going to be installed:
avahi avahi-lang libavahi-core5 libdaemon0 nss-mdns nss-mdns-32bit
10.启动nova-network时报地址池被占用:
The ‘listeners’ argument to Pool (and create_engine()) is deprecated. Use event.listen().n Pool.__init__(self, creator, **kw)nn2012-04-16 12:50:30 WARNING nova.utils [req-c4afc2fa-361a-4586-93aa-e203bff0937b None None] /usr/lib64/python2.6/site-packages/sqlalchemy/pool.py:145: SADeprecationWarning: Pool.add_listener is deprecated. Use event.listen()n self.add_listener(l)nnndnsmasq: failed to create listening socket for 172.16.0.1: Address already in usen”
解决方法:(这个问题为dnsmasq服务启动,如果再启动会占用原来的进程,多启动了一次)
/etc/init.d/dnsmasq stop
chkconfig dnsmasq off
排查问题使用–debug或者–verbose参数跟踪:
SUSEsp2:~ # nova –debug list
connect: (127.0.0.1, 8774)
send: ‘GET /v1.1 HTTP/1.1rnHost: 127.0.0.1:8774rnx-auth-project-id: mycloudrnaccept-encoding: gzip, deflaternx-auth-user: kevinrnuser-agent: python-novaclientrnx-auth-key: kevinrnaccept: application/jsonrnrn’
reply: ‘HTTP/1.1 204 No Contentrn’
header: Content-Length: 0
header: X-Auth-Token: kevin:mycloud
header: X-Server-Management-Url: http://127.0.0.1:8774/v1.1/mycloud
header: Content-Type: text/plain; charset=UTF-8
header: Date: Mon, 16 Apr 2012 03:19:47 GMT
send: ‘GET /v1.1/mycloud/servers/detail HTTP/1.1rnHost: 127.0.0.1:8774rnx-auth-project-id: mycloudrnx-auth-token: kevin:mycloudrnaccept-encoding: gzip, deflaternaccept: application/jsonrnuser-agent: python-novaclientrnrn’
reply: ‘HTTP/1.1 200 OKrn’
header: X-Compute-Request-Id: req-753d19f9-7267-410f-8591-f0fccb413cf9
header: Content-Type: application/json
header: Content-Length: 15
header: Date: Mon, 16 Apr 2012 03:19:47 GMT
+—-+——+——–+———-+
| ID | Name | Status | Networks |
+—-+——+——–+———-+
+—-+——+——–+———-+
13.在测试的时候,多次对于网络操作,会引起如下错误:
2012-05-11 17:51:04 TRACE nova.rpc.amqp [u'Traceback (most recent call last):n', u' File "/usr/lib64/python2.6/site-packages/nova/rpc/amqp.py", line 252, in _process_datan rval = node_func(context=ctxt, **node_args)n', u' File "/usr/lib64/python2.6/site-packages/nova/network/manager.py", line 258, in wrappedn return func(self, context, *args, **kwargs)n', u' File "/usr/lib64/python2.6/site-packages/nova/network/manager.py", line 321, in allocate_for_instancen **kwargs)n', u' File "/usr/lib64/python2.6/site-packages/nova/network/manager.py", line 258, in wrappedn return func(self, context, *args, **kwargs)n', u' File "/usr/lib64/python2.6/site-packages/nova/network/manager.py", line 907, in allocate_for_instancen requested_networks=requested_networks)n', u' File "/usr/lib64/python2.6/site-packages/nova/network/manager.py", line 196, in _allocate_fixed_ipsn utils.to_primitive(network)}})n', u' File "/usr/lib64/python2.6/site-packages/nova/rpc/__init__.py", line 68, in calln return _get_impl().call(context, topic, msg, timeout)n', u' File "/usr/lib64/python2.6/site-packages/nova/rpc/impl_kombu.py", line 674, in calln return rpc_amqp.call(context, topic, msg, timeout, Connection.pool)n', u' File "/usr/lib64/python2.6/site-packages/nova/rpc/amqp.py", line 338, in calln rv = list(rv)n', u' File "/usr/lib64/python2.6/site-packages/nova/rpc/amqp.py", line 306, in __iter__n raise resultn', u'RemoteError: Remote error: NetworkNotFound Network 4 could not be found.n[u'Traceback (most recent call last):\n', u' File "/usr/lib64/python2.6/site-packages/nova/rpc/amqp.py", line 252, in _process_data\n rval = node_func(context=ctxt, **node_args)\n', u' File "/usr/lib64/python2.6/site-packages/nova/network/manager.py", line 785, in set_network_host\n self.host)\n', u' File "/usr/lib64/python2.6/site-packages/nova/db/api.py", line 818, in network_set_host\n return IMPL.network_set_host(context, network_id, host_id)\n', u' File "/usr/lib64/python2.6/site-packages/nova/db/sqlalchemy/api.py", line 102, in wrapper\n return f(*args, **kwargs)\n', u' File "/usr/lib64/python2.6/site-packages/nova/db/sqlalchemy/api.py", line 2110, in network_set_host\n raise exception.NetworkNotFound(network_id=network_id)\n', u'NetworkNotFound: Network 4 could not be found.\n'].n’].
解决方法:
drop database nova;
create database nova;
重新初始化数据库:
nova-manage db sync
14.rabbitmq 停掉以后,compute会退出
当rabbitmq 停掉以后,过两分钟左右,compute会自动退出,日志中出现:
2012-03-25 21:41:26 INFO nova.rpc.common [-] Reconnecting to AMQP server on 192.168.28.5:5672
2012-03-25 21:41:27 ERROR nova.rpc.common [-] AMQP server on 192.168.28.5:5672 is unreachable: [Errno 113] EHOSTUNREACH. Trying again in 7 seconds.
(nova.rpc.common): TRACE: Traceback (most recent call last):
(nova.rpc.common): TRACE: File “/usr/lib/python2.6/site-packages/nova/rpc/impl_kombu.py”, line 446, in reconnect
(nova.rpc.common): TRACE: self._connect()
(nova.rpc.common): TRACE: File “/usr/lib/python2.6/site-packages/nova/rpc/impl_kombu.py”, line 423, in _connect
(nova.rpc.common): TRACE: self.connection.connect()
(nova.rpc.common): TRACE: File “/usr/lib/python2.6/site-packages/kombu/connection.py”, line 118, in connect
(nova.rpc.common): TRACE: return self.connection
(nova.rpc.common): TRACE: File “/usr/lib/python2.6/site-packages/kombu/connection.py”, line 438, in connection
(nova.rpc.common): TRACE: self._connection = self._establish_connection()
(nova.rpc.common): TRACE: File “/usr/lib/python2.6/site-packages/kombu/connection.py”, line 404, in _establish_connection
(nova.rpc.common): TRACE: conn = self.transport.establish_connection()
(nova.rpc.common): TRACE: File “/usr/lib/python2.6/site-packages/kombu/transport/pyamqplib.py”, line 242, in establish_connection
(nova.rpc.common): TRACE: connect_timeout=conninfo.connect_timeout)
(nova.rpc.common): TRACE: File “/usr/lib/python2.6/site-packages/kombu/transport/pyamqplib.py”, line 51, in __init__
(nova.rpc.common): TRACE: super(Connection, self).__init__(*args, **kwargs)
(nova.rpc.common): TRACE: File “/usr/lib/python2.6/site-packages/amqplib/client_0_8/connection.py”, line 125, in __init__
(nova.rpc.common): TRACE: self.transport = create_transport(host, connect_timeout, ssl)
(nova.rpc.common): TRACE: File “/usr/lib/python2.6/site-packages/amqplib/client_0_8/transport.py”, line 220, in create_transport
(nova.rpc.common): TRACE: return TCPTransport(host, connect_timeout)
(nova.rpc.common): TRACE: File “/usr/lib/python2.6/site-packages/amqplib/client_0_8/transport.py”, line 58, in __init__
(nova.rpc.common): TRACE: self.sock.connect((host, port))
(nova.rpc.common): TRACE: File “/usr/lib/python2.6/site-packages/eventlet/greenio.py”, line 179, in connect
(nova.rpc.common): TRACE: socket_checkerr(fd)
(nova.rpc.common): TRACE: File “/usr/lib/python2.6/site-packages/eventlet/greenio.py”, line 43, in socket_checkerr
(nova.rpc.common): TRACE: raise socket.error(err, errno.errorcode[err])
(nova.rpc.common): TRACE: error: [Errno 113] EHOSTUNREACH
(nova.rpc.common): TRACE:
这个问题,是由于openstack中,对rabbitmq 如果失去连接,会进行尝试,缺省是尝试12次,每次间隔10秒,到时间还不能连接,就抛出错误,退出。
解决办法,在 nova.conf 加入下面的参数:
#防止 rabbitmq重启导致 compute 死掉
rabbit_max_retries=0
具体原因,可以参见代码:impl_kombu.py
self.max_retries = FLAGS.rabbit_max_retries
def reconnect(self):
“”"Handles reconnecting and re-establishing queues.
Will retry up to self.max_retries number of times.
self.max_retries = 0 means to retry forever.
Sleep between tries, starting at self.interval_start
seconds, backing off self.interval_stepping number of seconds
each attempt.
“”"
if self.max_retries and attempt == self.max_retries:
LOG.exception(_(‘Unable to connect to AMQP server on ‘
‘%(hostname)s:%(port)d after %(max_retries)d ‘
‘tries: %(err_str)s’) % log_info)
# NOTE(comstud): Copied from original code. There’s
# really no better recourse because if this was a queue we
# need to consume on, we have no way to consume anymore.
sys.exit(1)
#############################################################
本文转编自:http://www.codesky.net/article/201206/171742.html
|
|