7182 是agent通信端口
首先检查下agent是不是挂掉了
如果不是参考下面错误:
类似错误1:
Detecting Cloudera Manager Server...
Detecting Cloudera Manager Server...
BEGIN host -t PTR 192.168.1.198
198.1.168.192.in-addr.arpa domain name pointer localhost.
END (0)
using localhost as scm server hostname
BEGIN which python
/usr/bin/python
END (0)
BEGIN python -c 'import socket; import sys; s = socket.socket(socket.AF_INET); s.settimeout(5.0); s.connect((sys.argv[1], int(sys.argv[2]))); s.close();' localhost 7182
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "<string>", line 1, in connect
socket.error: [Errno 111] Connection refused
END (1)
could not contact scm server at localhost:7182, giving up
waiting for rollback request
解决办法:
mv /usr/bin/host /usr/bin/host.bak 复制代码
类似错误2
Agent启动后,安装阶段“当前管理的主机”中显示的节点不全,每次刷新显示的都不一样。
Agent的错误日志表现如下:
[18/Nov/2014 21:12:56 +0000] 22681 MainThread agent ERROR Heartbeating to master:7182 failed.
Traceback (most recent call last):
File "/home/opt/cm-5.2.0/lib64/cmf/agent/src/cmf/agent.py", line 820, in send_heartbeat
response = self.requestor.request('heartbeat', dict(request=heartbeat))
File "/home/opt/cm-5.2.0/lib64/cmf/agent/build/env/lib/python2.6/site-packages/avro-1.6.3-py2.6.egg/avro/ipc.py", line 139, in request
return self.issue_request(call_request, message_name, request_datum)
File "/home/opt/cm-5.2.0/lib64/cmf/agent/build/env/lib/python2.6/site-packages/avro-1.6.3-py2.6.egg/avro/ipc.py", line 255, in issue_request
return self.read_call_response(message_name, buffer_decoder)
File "/home/opt/cm-5.2.0/lib64/cmf/agent/build/env/lib/python2.6/site-packages/avro-1.6.3-py2.6.egg/avro/ipc.py", line 235, in read_call_response
raise self.read_error(writers_schema, readers_schema, decoder)
File "/home/opt/cm-5.2.0/lib64/cmf/agent/build/env/lib/python2.6/site-packages/avro-1.6.3-py2.6.egg/avro/ipc.py", line 244, in read_error
return AvroRemoteException(datum_reader.read(decoder))
File "/home/opt/cm-5.2.0/lib64/cmf/agent/build/env/lib/python2.6/site-packages/avro-1.6.3-py2.6.egg/avro/io.py", line 444, in read
return self.read_data(self.writers_schema, self.readers_schema, decoder)
File "/home/opt/cm-5.2.0/lib64/cmf/agent/build/env/lib/python2.6/site-packages/avro-1.6.3-py2.6.egg/avro/io.py", line 448, in read_data
if not DatumReader.match_schemas(writers_schema, readers_schema):
File "/home/opt/cm-5.2.0/lib64/cmf/agent/build/env/lib/python2.6/site-packages/avro-1.6.3-py2.6.egg/avro/io.py", line 379, in match_schemas
w_type = writers_schema.type
AttributeError: 'NoneType' object has no attribute 'type' 复制代码
这是由于在主节点上启动了Agent后,又将Agent scp到了其他节点上导致的,首次启动Agent,它会生成一个uuid,路径为:
/opt/cm-5.1.3/lib/cloudera-scm-agent/uuid 复制代码
这样的话每台机器上的Agent的uuid都是一样的了,就会出现紊乱的情况。
解决方案:
删除
/opt/cm-5.1.3/lib/cloudera-scm-agent/ 复制代码
目录下的所有文件。
清空主节点CM数据库。