hadoop2.7 运行wordcount
问题导读1.Call From: ubuntu to localhost:9000 failed on connection exception: java.net.ConnectException: Connection本文是如何解决的?
2.运行wordcount,本文做了那些准备工作?
3.如何查看运行结果?
static/image/hrline/4.gif
接上篇
hadoop2.7【单节点】单机、伪分布、分布式安装指导
hdfs dfs -mkdir /user
hdfs dfs -mkdir /user/aboutyun
hdfs dfs -put etc/hadoop input#####################################
这条命令的执行需要注意路径:
hdfs dfs -put etc/hadoop input
执行路径为hadoop_home我这里是~/hadoop-2.7.0
在执行这条命令的时候遇到错误:
put: File /user/aboutyun/input/yarn-env.sh._COPYING_ could only be replicated to
0 nodes instead of minReplication (=1).There are 0 datanode(s) running and no
node(s) are excluded in this operation.
15/04/27 08:16:30 WARN hdfs.DFSClient: DataStreamer Exception
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File
/user/aboutyun/input/yarn-site.xml._COPYING_ could only be replicated to 0 nodes
instead of minReplication (=1).There are 0 datanode(s) running and no node(s)
are excluded in this operation.
at
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock
(BlockManager.java:1550)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock
(FSNamesystem.java:3067)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock
(NameNodeRpcServer.java:722)
at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.a
ddBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:492)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos
$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server
$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs
(UserGroupInformation.java:1657)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)
at org.apache.hadoop.ipc.Client.call(Client.java:1476)
at org.apache.hadoop.ipc.Client.call(Client.java:1407)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke
(ProtobufRpcEngine.java:229)
at com.sun.proxy.$Proxy14.addBlock(Unknown Source)
at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock
(ClientNamenodeProtocolTranslatorPB.java:418)
at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke
(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod
(RetryInvocationHandler.java:187)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke
(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy15.addBlock(Unknown Source)
at org.apache.hadoop.hdfs.DFSOutputStream
$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1430)
at org.apache.hadoop.hdfs.DFSOutputStream
$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1226)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run
(DFSOutputStream.java:449)
put: File /user/aboutyun/input/yarn-site.xml._COPYING_ could only be replicated
to 0 nodes instead of minReplication (=1).There are 0 datanode(s) running and
no node(s) are excluded in this operation.
通过jps查看进程都在,但是上面却报错,于是重启。
在重启的过程中,发现no datanode to stop。
看来datanode成为了僵死的进程。
再次启动
start-dfs.sh
还是没有成功,再次查看日志
2015-04-27 08:28:05,274 INFO org.apache.hadoop.hdfs.server.common.Storage: Lock
on /tmp/hadoop-aboutyun/dfs/data/in_use.lock acquired by nodename 13969@ubuntu
2015-04-27 08:28:05,278 WARN org.apache.hadoop.hdfs.server.common.Storage: java.
io.IOException: Incompatible clusterIDs in /tmp/hadoop-aboutyun/dfs/data: nameno
de clusterID = CID-adabf762-f2f4-43b9-a807-7501f83a9176; datanode clusterID = CI
D-5c0474f8-7030-4fbc-bb79-6c9163afc5b8
2015-04-27 08:28:05,279 FATAL org.apache.hadoop.hdfs.server.datanode.DataNode: I
nitialization failed for Block pool <registering> (Datanode Uuid unassigned) ser
vice to localhost/127.0.0.1:9000. Exiting.
java.io.IOException: All specified directories are failed to load.
at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionR
ead(DataStorage.java:477)
at org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.
java:1387)
at org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNod
e.java:1352)
at org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNam
espaceInfo(BPOfferService.java:316)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndH
andshake(BPServiceActor.java:228)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceAc
tor.java:852)
at java.lang.Thread.run(Thread.java:744)
2015-04-27 08:28:05,283 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: En
ding block pool service for: Block pool <registering> (Datanode Uuid unassigned)
service to localhost/127.0.0.1:9000
2015-04-27 08:28:05,305 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Re
moved Block pool <registering> (Datanode Uuid unassigned)
2015-04-27 08:28:07,306 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Ex
iting Datanode
2015-04-27 08:28:07,309 INFO org.apache.hadoop.util.ExitUtil: Exiting with statu
s 0
2015-04-27 08:28:07,310 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: SH
UTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down DataNode at ubuntu/127.0.1.1
************************************************************/
进入路径/tmp/hadoop-aboutyun/dfs/data,修改VERSION文件
然后停止集群
stop-dfs.sh
再次启动
stop-dfs.sh
验证:
停止集群的时候,看到stop datanode,说明修改成功
---------------------------------------------------------------------------
再次执行命令:hdfs dfs -put etc/hadoop input
遇到下面错误
put: File /user/aboutyun/input/yarn-env.sh._COPYING_ could only be replicated to 0 nodes instead of minReplication (=1).There are 0 datanode(s)
running and no node(s) are excluded in this operation.
15/04/27 08:16:30 WARN hdfs.DFSClient: DataStreamer Exception
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /user/aboutyun/input/yarn-site.xml._COPYING_ could only be replicated to 0 nodes
instead of minReplication (=1).There are 0 datanode(s) running and no node(s) are excluded in this operation.
at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1550)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3067)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:722)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock
(ClientNamenodeProtocolServerSideTranslatorPB.java:492)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod
(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)
at org.apache.hadoop.ipc.Client.call(Client.java:1476)
at org.apache.hadoop.ipc.Client.call(Client.java:1407)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
at com.sun.proxy.$Proxy14.addBlock(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:418)
at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy15.addBlock(Unknown Source)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1430)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1226)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:449)
put: File /user/aboutyun/input/yarn-site.xml._COPYING_ could only be replicated to 0 nodes instead of minReplication (=1).There are 0 datanode(s)
running and no node(s) are excluded in this operation.
ls: Call From java.net.UnknownHostException: ubuntu: ubuntu to localhost:9000 failed on connection exception: java.net.ConnectException: Connection
refused; For more details see:http://wiki.apache.org/hadoop/ConnectionRefused解决办法:
修改hosts:记得注释掉127.0.1.1 ubuntu
127.0.0.1 localhost
#127.0.1.1 ubuntu
10.0.0.81 ubuntu
上面很多都遇到这个问题,解决办法,还包括关闭防火墙,等因素,而这里需要注释掉127.0.1.1
----------------------------------------------------------------------------
ls: Call From java.net.UnknownHostException: ubuntu: ubuntu to localhost:9000 failed on connection exception: java.net.ConnectException: Connection
refused; For more details see:http://wiki.apache.org/hadoop/ConnectionRefused
查看日志 more hadoop-aboutyun-namenode-ubuntu.log
Directory /tmp/hadoop-aboutyun/dfs/name is in an inconsistent state: storage directory does n
ot exist or is not accessible.
原来是没有name这个文件夹,所以在 /tmp/hadoop-aboutyun/dfs/
手工创建一个name文件mkdir name
再次格式化,成功
#####################################
查看通过命令上传的文件hdfs dfs -put etc/hadoop input
这里需要注意路径的问题,进入hadoop_home在执行
bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.0.jar grep input output 'dfs+'
通过命令
hdfs dfs -cat /user/aboutyun/output/part-r-00000输出结果
6 dfs.audit.logger
4 dfs.class
3 dfs.server.namenode.
2 dfs.period
2 dfs.audit.log.maxfilesize
2 dfs.audit.log.maxbackupindex
1 dfsmetrics.log
1 dfsadmin
1 dfs.servers
1 dfs.replication
1 dfs.permissions
1 dfs.file
##################################
hdfs dfs -get output output
遇到问题:
WARN hdfs.DFSClient: DFSInputStream has been closed already
留待以后解决
非常感谢,照着这两篇文章,成功安装了hadoop2.8.对于最后的那个警告问题,apache官网有问题记录
https://issues.apache.org/jira/browse/HDFS-8099
建议是直接修改源代码,将WARN级别改为DEBUG级别。代码如下:
diff --git hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java
index cf8015f..9f7b15c 100644
--- hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java
+++ hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java
@@ -666,7 +666,7 @@ private synchronized DatanodeInfo blockSeekTo(long target) throws IOException {
@Override
public synchronized void close() throws IOException {
if (!closed.compareAndSet(false, true)) {
- DFSClient.LOG.warn("DFSInputStream has been closed already");
+ DFSClient.LOG.debug("DFSInputStream has been closed already");
return;
}
dfsClient.checkOpen(); 楼主:进入路径/tmp/hadoop-aboutyun/dfs/data,修改VERSION文件
并没有这个目录
D:/QQ截图20150619101221.png 楼主:进入路径/tmp/hadoop-aboutyun/dfs/data,修改VERSION文件
并没有这个目录
OK找到了
[*]WARN hdfs.DFSClient: DFSInputStream has been closed already
[*]这个问题解决了吗?
谢谢分享!
安装完hadoop 2.7.1是否自动有input、output目录,好像没见到要手动创建这两个目录? ableq 发表于 2015-11-2 10:31
安装完hadoop 2.7.1是否自动有input、output目录,好像没见到要手动创建这两个目录?
inputoutput是hdfs里面的路径
inputoutput是hdfs里面的路径
页:
[1]