求助,海外flume机器,采集数据到国内hadoop集群上报错
求大侠帮助啊,一直都没有解决2014-08-28 11:19:02,380 (pool-5-thread-1) Preparing to move file /data/flume/event_log/impression_washington_1_201408272358R to /data/flume/event_log/impression_washington_1_201408272358R.COMPLETED
2014-08-28 11:19:02,388 (pool-5-thread-1) Preparing to move file /data/flume/event_log/impression_washington_1_201408272359R to /data/flume/event_log/impression_washington_1_201408272359R.COMPLETED
2014-08-28 11:20:02,665 (Thread-5) Exception in createBlockOutputStream
java.net.ConnectException: Connection timed out
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:529)
at org.apache.hadoop.hdfs.DFSOutputStream.createSocketForPipeline(DFSOutputStream.java:1305)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1128)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1088)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:514)
2014-08-28 11:20:02,667 (Thread-5) Abandoning BP-1598070838-10.7.3.83-1408589886538:blk_1073808091_67267
2014-08-28 11:20:03,008 (Thread-5) ] Excluding datanode 10.7.3.83:50010
2014-08-28 11:21:06,351 (Thread-5) Exception in createBlockOutputStream
java.net.ConnectException: Connection timed out
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:529)
at org.apache.hadoop.hdfs.DFSOutputStream.createSocketForPipeline(DFSOutputStream.java:1305)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1128)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1088)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:514)
2014-08-28 11:21:06,352 (Thread-5) Abandoning BP-1598070838-10.7.3.83-1408589886538:blk_1073808092_67268
2014-08-28 11:21:06,686 (Thread-5) ] Excluding datanode 10.7.7.75:50010
2014-08-28 11:21:07,035 (Thread-5) DataStreamer Exception
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /flume/14082811/test-.1409195936631.tmp could only be replicated to 0 nodes instead of minReplication (=1).There are 2 datanode(s) running and 2 node(s) are excluded in this operation.
at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1384)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2477)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:555)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:387)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:59582)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2042)
at org.apache.hadoop.ipc.Client.call(Client.java:1347)
at org.apache.hadoop.ipc.Client.call(Client.java:1300)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
at com.sun.proxy.$Proxy13.addBlock(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy13.addBlock(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:330)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1226)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1078)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:514)
2014-08-28 11:21:07,036 (hdfs-k2-call-runner-1) Error while syncing
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /flume/14082811/test-.1409195936631.tmp could only be replicated to 0 nodes instead of minReplication (=1).There are 2 datanode(s) running and 2 node(s) are excluded in this operation.
at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1384)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2477)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:555)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:387)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:59582)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2042)
at org.apache.hadoop.ipc.Client.call(Client.java:1347)
at org.apache.hadoop.ipc.Client.call(Client.java:1300)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
at com.sun.proxy.$Proxy13.addBlock(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy13.addBlock(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:330)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1226)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1078)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:514)
2014-08-28 11:21:07,038 (SinkRunner-PollingRunner-DefaultSinkProcessor) HDFS IO error
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /flume/14082811/test-.1409195936631.tmp could only be replicated to 0 nodes instead of minReplication (=1).There are 2 datanode(s) running and 2 node(s) are excluded in this operation.
at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1384)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2477)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:555)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:387)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:59582)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2042)
at org.apache.hadoop.ipc.Client.call(Client.java:1347)
at org.apache.hadoop.ipc.Client.call(Client.java:1300)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
flume配置
#agent
a1.sources = r2
a1.sinks = k2
a1.channels = c2
#source
#agent_dpi03_hdfs.sources.r2.type = avro
#agent_dpi03_hdfs.sources.r2.channels = c2
#agent_dpi03_hdfs.sources.r2.bind = test00
#agent_dpi03_hdfs.sources.r2.port = 4545
#agent_dpi03_hdfs.sources.r2.batchSize = 100
a1.sources.r2.type = spooldir
a1.sources.r2.spoolDir = /data/tukmob/event_log
a1.sources.r2.channels = c2
a1.sources.r2.batchSize = 1000
#channel
a1.channels.c2.type = memory
a1.channels.c2.capacity = 1000000
a1.channels.c2.transactionCapacity = 1000
a1.channels.c2.byteCapacity = 2000000000
a1.channels.c2.byteCapacityBufferPercentage = 50
#sink
a1.sinks.k2.type = hdfs
a1.sinks.k2.channel = c2
a1.sinks.k2.hdfs.fileType = DataStream
a1.sinks.k2.hdfs.path = hdfs://zhangxin@10.7.3.83:9000/flume/%y%m%d%H
a1.sinks.k2.hdfs.filePrefix = test-
a1.sinks.k2.hdfs.rollInterval = 300
a1.sinks.k2.hdfs.rollSize = 0
a1.sinks.k2.hdfs.rollCount = 0
a1.sinks.k2.hdfs.batchSize = 1000
a1.sinks.k2.hdfs.callTimeout = 3600000
a1.sinks.k2.hdfs.round = true
a1.sinks.k2.hdfs.roundValue = 1
a1.sinks.k2.hdfs.roundUnit = hour
a1.sinks.k2.hdfs.useLocalTimeStamp = true
~
上面可以看出,
1.你们的网络不是很好,有时候可能会掉线
2.你们的datanode 没有被包含,也就是可能挂掉了
howtodown 发表于 2014-8-28 18:45
上面可以看出,
1.你们的网络不是很好,有时候可能会掉线
2.你们的datanode 没有被包含,也就是可能挂掉 ...
是的,网络很慢的,数据可能会丢失,
datanode运行的很好,国内的机器采集到hadoop集群上就很正常了,
采集机器的配置是完全相同的,
这个问题一直都解决不了,
项目就进行不下去了
现在已经做了如下动作
1.删掉临时文件和各种以前产生的文件,重新格式化--
2.将超时时间延长
3.将心跳时间从默认的3s,改成15s
4.两个datanode都是存活正常的,容量还很大
5.已经将海外的采集机器的端口全部打开了
6.防火墙也是关闭的
小小布衣 发表于 2014-8-28 19:07
是的,网络很慢的,数据可能会丢失,
datanode运行的很好,国内的机器采集到hadoop集群上就很正常了,
...
你们有多少个节点,最起码有两个节点是工作不正常的。其它应该工作正常。
现在这个集群就用两个节点试运行
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.
Configured Capacity: 2113788182528 (1.92 TB)
Present Capacity: 1995335450624 (1.81 TB)
DFS Remaining: 1991648043008 (1.81 TB)
DFS Used: 3687407616 (3.43 GB)
DFS Used%: 0.18%
Under replicated blocks: 57
Blocks with corrupt replicas: 0
Missing blocks: 0
-------------------------------------------------
Datanodes available: 2 (2 total, 0 dead)
Live datanodes:
Name: 10.7.3.83:50010 (tukmob1)
Hostname: tukmob1
Decommission Status : Normal
Configured Capacity: 1056894091264 (984.31 GB)
DFS Used: 1843703808 (1.72 GB)
Non DFS Used: 55596253184 (51.78 GB)
DFS Remaining: 999454134272 (930.81 GB)
DFS Used%: 0.17%
DFS Remaining%: 94.57%
Last contact: Thu Aug 28 19:25:44 CST 2014
Name: 10.7.7.75:50010 (tukmob2)
Hostname: tukmob2
Decommission Status : Normal
Configured Capacity: 1056894091264 (984.31 GB)
DFS Used: 1843703808 (1.72 GB)
Non DFS Used: 62856478720 (58.54 GB)
DFS Remaining: 992193908736 (924.05 GB)
DFS Used%: 0.17%
DFS Remaining%: 93.88%
Last contact: Thu Aug 28 19:25:44 CST 2014
小小布衣 发表于 2014-8-28 19:26
现在这个集群就用两个节点试运行
DEPRECATED: Use of this script to execute hdfs command is deprecated ...
虽然显示的是正常,但是我觉得你这两个节点有问题。
你仔细排查看看
那么除了节点有问题之外,会不会跟我配置的a1.sinks.k2.hdfs.path = hdfs://zhangxin@14.18.203.70:9000/flume/%y%m%d%H
这个外网有关系,或者要不要在/etc/hosts里面配置外网ip和内网ip之间的映射关系呢,我尝试配置了一下,然后datanode就启动不起来了,
或者还有什么其他的思路,或者使用avro的方式看能不能采集到hadoop所在的文件系统什么的
这些都试过了 小小布衣 发表于 2014-8-28 20:28
那么除了节点有问题之外,会不会跟我配置的a1.sinks.k2.hdfs.path = hdfs://zhangxin@14.18.203.70:9000/fl ...
你试过自己上传一个吧,报的是什么信息。
sstutu 发表于 2014-8-28 21:20
你试过自己上传一个吧,报的是什么信息。
自己上传下载删除hdfs上的信息都是可以的 小小布衣 发表于 2014-8-28 23:43
自己上传下载删除hdfs上的信息都是可以的
那可以排除datanode没有问题
页:
[1]
2