zhzhang 发表于 2015-1-6 09:46:40

clouderaManager出现HostMonitor异常,求大虾指点

clouderaManager出现HostMonitor异常

nextuser 发表于 2015-1-6 18:11:08

zhzhang 发表于 2015-1-6 17:13
非常感谢指点,1. /usr/bin/host已经重命名
2./opt/cm-5.1.3/lib/cloudera-scm-agent/里面的东西也已经 ...

卸载之后,重装agent

zhzhang 发表于 2015-1-6 09:47:29

没记得做过什么操作,界面就显示不出来了,求指教啊!!!

nextuser 发表于 2015-1-6 12:03:39

zhzhang 发表于 2015-1-6 09:47
没记得做过什么操作,界面就显示不出来了,求指教啊!!!

Host Monitor 服务没有启动,重启下试试

zhzhang 发表于 2015-1-6 15:29:39

nextuser 发表于 2015-1-6 12:03
Host Monitor 服务没有启动,重启下试试

这个启动不了,所以很纠结

bioger_hit 发表于 2015-1-6 15:38:55

zhzhang 发表于 2015-1-6 15:29
这个启动不了,所以很纠结

启动不了,看看日志

zhzhang 发表于 2015-1-6 15:46:44

bioger_hit 发表于 2015-1-6 15:38
启动不了,看看日志

3293 MainThread agent      ERROR    Heartbeating to 192.168.1.110:7182 failed.
Traceback (most recent call last):
File "/usr/lib64/cmf/agent/src/cmf/agent.py", line 815, in send_heartbeat
    self.master_port)
File "/usr/lib64/cmf/agent/build/env/lib/python2.6/site-packages/avro-1.6.3-py2.6.egg/avro/ipc.py", line 464, in __init__
    self.conn.connect()
File "/usr/lib64/python2.6/httplib.py", line 720, in connect
    self.timeout)
File "/usr/lib64/python2.6/socket.py", line 567, in create_connection
    raise error, msg
error: Connection refused

日志报错如上,7182端口是个什么?我发现我7182启动不了

nextuser 发表于 2015-1-6 16:24:22

zhzhang 发表于 2015-1-6 15:46
3293 MainThread agent      ERROR    Heartbeating to 192.168.1.110:7 ...

7182 是agent通信端口
首先检查下agent是不是挂掉了

如果不是参考下面错误:

类似错误1:
Detecting Cloudera Manager Server...
Detecting Cloudera Manager Server...
BEGIN host -t PTR 192.168.1.198
198.1.168.192.in-addr.arpa domain name pointer localhost.
END (0)
using localhost as scm server hostname
BEGIN which python
/usr/bin/python
END (0)
BEGIN python -c 'import socket; import sys; s = socket.socket(socket.AF_INET); s.settimeout(5.0); s.connect((sys.argv, int(sys.argv))); s.close();' localhost 7182
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "<string>", line 1, in connect
socket.error: Connection refused
END (1)
could not contact scm server at localhost:7182, giving up
waiting for rollback request
解决办法:

mv /usr/bin/host /usr/bin/host.bak



类似错误2

Agent启动后,安装阶段“当前管理的主机”中显示的节点不全,每次刷新显示的都不一样。
Agent的错误日志表现如下:

22681 MainThread agent ERROR Heartbeating to master:7182 failed.
Traceback (most recent call last):
File "/home/opt/cm-5.2.0/lib64/cmf/agent/src/cmf/agent.py", line 820, in send_heartbeat
    response = self.requestor.request('heartbeat', dict(request=heartbeat))
File "/home/opt/cm-5.2.0/lib64/cmf/agent/build/env/lib/python2.6/site-packages/avro-1.6.3-py2.6.egg/avro/ipc.py", line 139, in request
    return self.issue_request(call_request, message_name, request_datum)
File "/home/opt/cm-5.2.0/lib64/cmf/agent/build/env/lib/python2.6/site-packages/avro-1.6.3-py2.6.egg/avro/ipc.py", line 255, in issue_request
    return self.read_call_response(message_name, buffer_decoder)
File "/home/opt/cm-5.2.0/lib64/cmf/agent/build/env/lib/python2.6/site-packages/avro-1.6.3-py2.6.egg/avro/ipc.py", line 235, in read_call_response
    raise self.read_error(writers_schema, readers_schema, decoder)
File "/home/opt/cm-5.2.0/lib64/cmf/agent/build/env/lib/python2.6/site-packages/avro-1.6.3-py2.6.egg/avro/ipc.py", line 244, in read_error
    return AvroRemoteException(datum_reader.read(decoder))
File "/home/opt/cm-5.2.0/lib64/cmf/agent/build/env/lib/python2.6/site-packages/avro-1.6.3-py2.6.egg/avro/io.py", line 444, in read
    return self.read_data(self.writers_schema, self.readers_schema, decoder)
File "/home/opt/cm-5.2.0/lib64/cmf/agent/build/env/lib/python2.6/site-packages/avro-1.6.3-py2.6.egg/avro/io.py", line 448, in read_data
    if not DatumReader.match_schemas(writers_schema, readers_schema):
File "/home/opt/cm-5.2.0/lib64/cmf/agent/build/env/lib/python2.6/site-packages/avro-1.6.3-py2.6.egg/avro/io.py", line 379, in match_schemas
    w_type = writers_schema.type
AttributeError: 'NoneType' object has no attribute 'type'
这是由于在主节点上启动了Agent后,又将Agent scp到了其他节点上导致的,首次启动Agent,它会生成一个uuid,路径为:
/opt/cm-5.1.3/lib/cloudera-scm-agent/uuid
这样的话每台机器上的Agent的uuid都是一样的了,就会出现紊乱的情况。

解决方案:
删除


/opt/cm-5.1.3/lib/cloudera-scm-agent/
目录下的所有文件。清空主节点CM数据库。













zhzhang 发表于 2015-1-6 17:13:42

nextuser 发表于 2015-1-6 16:24
7182 是agent通信端口
首先检查下agent是不是挂掉了



非常感谢指点,1. /usr/bin/host已经重命名
2./opt/cm-5.1.3/lib/cloudera-scm-agent/里面的东西也已经清空
但是我发现我还是启动不了agent,也就是说的那个7182端口,报错还和之前一样

bioger_hit 发表于 2015-1-6 23:20:22

zhzhang 发表于 2015-1-6 15:46
3293 MainThread agent      ERROR    Heartbeating to 192.168.1.110:7 ...

看下7182端口是否被暂用,检查下网络,防火墙值之类的
页: [1] 2
查看完整版本: clouderaManager出现HostMonitor异常,求大虾指点