clouderaManager出现HostMonitor异常,求大虾指点
clouderaManager出现HostMonitor异常 zhzhang 发表于 2015-1-6 17:13非常感谢指点,1. /usr/bin/host已经重命名
2./opt/cm-5.1.3/lib/cloudera-scm-agent/里面的东西也已经 ...
卸载之后,重装agent
没记得做过什么操作,界面就显示不出来了,求指教啊!!! zhzhang 发表于 2015-1-6 09:47
没记得做过什么操作,界面就显示不出来了,求指教啊!!!
Host Monitor 服务没有启动,重启下试试
nextuser 发表于 2015-1-6 12:03
Host Monitor 服务没有启动,重启下试试
这个启动不了,所以很纠结
zhzhang 发表于 2015-1-6 15:29
这个启动不了,所以很纠结
启动不了,看看日志
bioger_hit 发表于 2015-1-6 15:38
启动不了,看看日志
3293 MainThread agent ERROR Heartbeating to 192.168.1.110:7182 failed.
Traceback (most recent call last):
File "/usr/lib64/cmf/agent/src/cmf/agent.py", line 815, in send_heartbeat
self.master_port)
File "/usr/lib64/cmf/agent/build/env/lib/python2.6/site-packages/avro-1.6.3-py2.6.egg/avro/ipc.py", line 464, in __init__
self.conn.connect()
File "/usr/lib64/python2.6/httplib.py", line 720, in connect
self.timeout)
File "/usr/lib64/python2.6/socket.py", line 567, in create_connection
raise error, msg
error: Connection refused
日志报错如上,7182端口是个什么?我发现我7182启动不了
zhzhang 发表于 2015-1-6 15:46
3293 MainThread agent ERROR Heartbeating to 192.168.1.110:7 ...
7182 是agent通信端口
首先检查下agent是不是挂掉了
如果不是参考下面错误:
类似错误1:
Detecting Cloudera Manager Server...
Detecting Cloudera Manager Server...
BEGIN host -t PTR 192.168.1.198
198.1.168.192.in-addr.arpa domain name pointer localhost.
END (0)
using localhost as scm server hostname
BEGIN which python
/usr/bin/python
END (0)
BEGIN python -c 'import socket; import sys; s = socket.socket(socket.AF_INET); s.settimeout(5.0); s.connect((sys.argv, int(sys.argv))); s.close();' localhost 7182
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "<string>", line 1, in connect
socket.error: Connection refused
END (1)
could not contact scm server at localhost:7182, giving up
waiting for rollback request
解决办法:
mv /usr/bin/host /usr/bin/host.bak
类似错误2
Agent启动后,安装阶段“当前管理的主机”中显示的节点不全,每次刷新显示的都不一样。
Agent的错误日志表现如下:
22681 MainThread agent ERROR Heartbeating to master:7182 failed.
Traceback (most recent call last):
File "/home/opt/cm-5.2.0/lib64/cmf/agent/src/cmf/agent.py", line 820, in send_heartbeat
response = self.requestor.request('heartbeat', dict(request=heartbeat))
File "/home/opt/cm-5.2.0/lib64/cmf/agent/build/env/lib/python2.6/site-packages/avro-1.6.3-py2.6.egg/avro/ipc.py", line 139, in request
return self.issue_request(call_request, message_name, request_datum)
File "/home/opt/cm-5.2.0/lib64/cmf/agent/build/env/lib/python2.6/site-packages/avro-1.6.3-py2.6.egg/avro/ipc.py", line 255, in issue_request
return self.read_call_response(message_name, buffer_decoder)
File "/home/opt/cm-5.2.0/lib64/cmf/agent/build/env/lib/python2.6/site-packages/avro-1.6.3-py2.6.egg/avro/ipc.py", line 235, in read_call_response
raise self.read_error(writers_schema, readers_schema, decoder)
File "/home/opt/cm-5.2.0/lib64/cmf/agent/build/env/lib/python2.6/site-packages/avro-1.6.3-py2.6.egg/avro/ipc.py", line 244, in read_error
return AvroRemoteException(datum_reader.read(decoder))
File "/home/opt/cm-5.2.0/lib64/cmf/agent/build/env/lib/python2.6/site-packages/avro-1.6.3-py2.6.egg/avro/io.py", line 444, in read
return self.read_data(self.writers_schema, self.readers_schema, decoder)
File "/home/opt/cm-5.2.0/lib64/cmf/agent/build/env/lib/python2.6/site-packages/avro-1.6.3-py2.6.egg/avro/io.py", line 448, in read_data
if not DatumReader.match_schemas(writers_schema, readers_schema):
File "/home/opt/cm-5.2.0/lib64/cmf/agent/build/env/lib/python2.6/site-packages/avro-1.6.3-py2.6.egg/avro/io.py", line 379, in match_schemas
w_type = writers_schema.type
AttributeError: 'NoneType' object has no attribute 'type'
这是由于在主节点上启动了Agent后,又将Agent scp到了其他节点上导致的,首次启动Agent,它会生成一个uuid,路径为:
/opt/cm-5.1.3/lib/cloudera-scm-agent/uuid
这样的话每台机器上的Agent的uuid都是一样的了,就会出现紊乱的情况。
解决方案:
删除
/opt/cm-5.1.3/lib/cloudera-scm-agent/
目录下的所有文件。清空主节点CM数据库。
nextuser 发表于 2015-1-6 16:24
7182 是agent通信端口
首先检查下agent是不是挂掉了
非常感谢指点,1. /usr/bin/host已经重命名
2./opt/cm-5.1.3/lib/cloudera-scm-agent/里面的东西也已经清空
但是我发现我还是启动不了agent,也就是说的那个7182端口,报错还和之前一样
zhzhang 发表于 2015-1-6 15:46
3293 MainThread agent ERROR Heartbeating to 192.168.1.110:7 ...
看下7182端口是否被暂用,检查下网络,防火墙值之类的
页:
[1]
2