我大概是知道问题的原因,了,
说出来一下,
我在海外的两个机器上单机部署hadoop,datanode就是一直启动不起来,但是在国内的机器上部署单机或者集群,都启动正常。日志中报错
2014-08-28 16:25:01,404 FATAL org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for block pool Block pool BP-1752408998-174.36.220.60-1409214225154 (storage id DS-649638061-174.36.220.60-50010-1409214301088) service to /10.108.110.196:9000
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.protocol.DisallowedDatanodeException): Datanode denied communication with namenode: DatanodeRegistration(0.0.0.0, storageID=DS-649638061-174.36.220.60-50010-1409214301088, infoPort=50075, ipcPort=50020, storageInfo=lv=-47;cid=CID-4f751648-b3c7-46f6-8a2a-e93ccfd6eca4;nsid=2017258318;c=0)
at org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.registerDatanode(DatanodeManager.java:739)
datanode找不到对应的namenode,可能的原因是,海外的机器ifconfig之后:有两个地址,而国内的就一个eth0,
eth0 Link encap:Ethernet HWaddr 06:E9:1C:B7:10:74
inet addr:10.108.110.196 Bcast:10.108.110.255 Mask:255.255.255.192
inet6 addr: fe80::4e9:1cff:feb7:1074/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:21936795 errors:0 dropped:0 overruns:0 frame:0
TX packets:37741271 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:2204808776 (2.0 GiB) TX bytes:9076228898 (8.4 GiB)
Interrupt:16
eth1 Link encap:Ethernet HWaddr 06:DF:17:8D:2D:92
inet addr:174.36.220.60 Bcast:174.36.220.63 Mask:255.255.255.248
inet6 addr: fe80::4df:17ff:fe8d:2d92/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:63187461 errors:0 dropped:0 overruns:0 frame:0
TX packets:42452899 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:8504285706 (7.9 GiB) TX bytes:5766099342 (5.3 GiB)
Interrupt:15
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:16436 Metric:1
这样的思路下去,可能就是我的hadoop集群中两个节点本来运行是好的,只是外网的原因找不到了,与namenode失联,导致报如下的错
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /flume/14082811/test-.1409195936631.tmp could only be replicated to 0 nodes instead of minReplication (=1). There are 2 datanode(s) running and 2 node(s) are excluded in this operation.
实际这种推测还是有很多的漏洞的,比如说,是海外的机器存在内网和外网的问题,跟国内的hadoop集群的内网和外网找不到datanode和namenode之间的映射没有什么直接的联系,只是从海外的这个情况推测的,
现在在/etc/hosts里面不管是配置ip还是外网的ip都是无济于事的,内网和外网一起配,那么连datanode都找不到了,,
请问大侠,我这想法对吗,接下来还有什么好招吗
|