STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.0 -r 1335192; compiled by ‘hortonfo’ on Tue May 8 20:31:25 UTC 2012
因为我前面在.bashrc中加了路径和环境变量,因此,也可以直接用
[zhouhh@Hadoop48 hadoop-1.0.3]$ hadoop namenode -format
该命令格式化hdfs-site.xml里面定义的dfs.name.dir路径,用于保存跟踪和协同DataNode的信息。
[zhouhh@Hadoop48 ~]$ find myhadoop/
myhadoop/
myhadoop/dfs
myhadoop/dfs/name
myhadoop/dfs/name/previous.checkpoint
myhadoop/dfs/name/previous.checkpoint/fstime
myhadoop/dfs/name/previous.checkpoint/edits
myhadoop/dfs/name/previous.checkpoint/fsimage
myhadoop/dfs/name/previous.checkpoint/VERSION
myhadoop/dfs/name/image
myhadoop/dfs/name/image/fsimage
myhadoop/dfs/name/current
myhadoop/dfs/name/current/fstime
myhadoop/dfs/name/current/edits
myhadoop/dfs/name/current/fsimage
myhadoop/dfs/name/current/VERSION
[zhouhh@Hadoop48 hadoop-1.0.3]$ start-dfs.sh
starting namenode, logging to /home/zhouhh/hadoop-1.0.3/libexec/../logs/hadoop-zhouhh-namenode-Hadoop48.out
Hadoop46: Bad owner or permissions on /home/zhouhh/.ssh/config
Hadoop47: Bad owner or permissions on /home/zhouhh/.ssh/config
Hadoop48: Bad owner or permissions on /home/zhouhh/.ssh/config
[zhouhh@Hadoop48 .ssh]$ ls -l
total 20
-rw——- 1 zhouhh zhouhh 794 Apr 13 10:21 authorized_keys
-rw-rw-r– 1 zhouhh zhouhh 288 May 23 10:37 config
原来config文件权限不对
[zhouhh@Hadoop48 .ssh]$ chmod 600 config
[zhouhh@Hadoop48 .ssh]$ ls -l
total 20
-rw——- 1 zhouhh zhouhh 794 Apr 13 10:21 authorized_keys
-rw——- 1 zhouhh zhouhh 288 May 23 10:37 config
[zhouhh@Hadoop48 ~]$ start-dfs.sh
starting namenode, logging to /home/zhouhh/hadoop-1.0.3/libexec/../logs/hadoop-zhouhh-namenode-Hadoop48.out
Hadoop47: bash: line 0: cd: /home/zhouhh/hadoop-1.0.3/libexec/..: No such file or directory
Hadoop47: bash: /home/zhouhh/hadoop-1.0.3/bin/hadoop-daemon.sh: No such file or directory
Hadoop46: starting datanode, logging to /home/zhouhh/hadoop-1.0.3/libexec/../logs/hadoop-zhouhh-datanode-Hadoop46.out
Hadoop48: starting secondarynamenode, logging to /home/zhouhh/hadoop-1.0.3/libexec/../logs/hadoop-zhouhh-secondarynamenode-Hadoop48.out
start-dfs.sh会启动本机NameNode 和 conf/slaves 添加的DataNode
[zhouhh@Hadoop48 ~]$ ssh Hadoop47
Last login: Tue May 22 17:57:01 2012 from hadoop48
[zhouhh@Hadoop47 ~]$
[zhouhh@Hadoop47 hadoop-1.0.3]$ vi conf/hadoop-env.sh
配置$JAVA_HOME为正确的路径。
Hadoop46做同样处理。
[zhouhh@Hadoop48 ~]$ start-dfs.sh
starting namenode, logging to /home/zhouhh/hadoop-1.0.3/libexec/../logs/hadoop-zhouhh-namenode-Hadoop48.out
Hadoop47: starting datanode, logging to /home/zhouhh/hadoop-1.0.3/libexec/../logs/hadoop-zhouhh-datanode-Hadoop47.out
Hadoop46: starting datanode, logging to /home/zhouhh/hadoop-1.0.3/libexec/../logs/hadoop-zhouhh-datanode-Hadoop46.out
Hadoop48: secondarynamenode running as process 23491. Stop it first.
HDFS已经运行成功 排错
[zhouhh@Hadoop47 logs]$ vi hadoop-zhouhh-datanode-Hadoop47.log
2012-05-23 17:17:14,230 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting DataNode
STARTUP_MSG: host = Hadoop47/192.168.10.47
STARTUP_MSG: args = []
STARTUP_MSG: version = 1.0.3
STARTUP_MSG: build = https://svn.apache.org/repos/asf ... branches/branch-1.0 -r 1335192; compiled by ‘hortonfo’ on Tue May 8 20:31:25 UTC 2012
************************************************************/
2012-05-23 17:17:14,762 INFO org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from hadoop-metrics2.properties
2012-05-23 17:17:14,772 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source MetricsSystem,sub=Stats registered.
2012-05-23 17:17:14,772 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s).
2012-05-23 17:17:14,772 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: DataNode metrics system started
2012-05-23 17:17:14,907 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source ugi registered.
2012-05-23 17:17:15,064 INFO org.apache.hadoop.util.NativeCodeLoader: Loaded the native-hadoop library
2012-05-23 17:17:15,187 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: java.lang.IllegalArgumentException: Does not contain a valid host:port authority: file:///
at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:162)
at org.apache.hadoop.hdfs.server.namenode.NameNode.getAddress(NameNode.java:198)
at org.apache.hadoop.hdfs.server.namenode.NameNode.getAddress(NameNode.java:228)
at org.apache.hadoop.hdfs.server.namenode.NameNode.getServiceAddress(NameNode.java:222)
at org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:337)
at org.apache.hadoop.hdfs.server.datanode.DataNode.(DataNode.java:299)
at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1582)
at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1521)
at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1539)
at org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:1665)
at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1682)
2012-05-23 17:17:15,187 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down DataNode at Hadoop47/192.168.10.47
************************************************************/
同样,需要配置相关的端口
[zhouhh@Hadoop48 bin]$ start-mapred.sh
[zhouhh@Hadoop48 ~]$ ssh Hadoop46
Last login: Wed May 23 17:33:05 2012 from hadoop47
[zhouhh@Hadoop46 ~]$ cd hadoop-1.0.3/logs
[zhouhh@Hadoop46 logs]$ vi hadoop-zhouhh-datanode-Hadoop46.log
2012-05-23 17:38:46,062 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: Hadoop48/192.168.10.48:54310. Already tried 0 time(s).
2012-05-23 17:38:47,065 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: Hadoop48/192.168.10.48:54310. Already tried 1 time(s).
[zhouhh@Hadoop46 logs]$ vi hadoop-zhouhh-tasktracker-Hadoop46.log
2012-05-23 17:58:13,356 INFO org.apache.hadoop.ipc.Server: IPC Server handler 3 on 54550: starting
2012-05-23 17:58:14,428 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: Hadoop48/192.168.10.48:54311. Already tried 0 time(s).
2012-05-23 17:58:15,430 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: Hadoop48/192.168.10.48:54311. Already tried 1 time(s).
[zhouhh@Hadoop48 conf]$ netstat -antp | grep 54310
(Not all processes could be identified, non-owned process info
will not be shown, you would have to be root to see it all.)
tcp 0 0 192.168.10.48:54310 192.168.20.188:30300 ESTABLISHED 20469/python
[zhouhh@Hadoop48 conf]$ netstat -antp | grep 54311
(Not all processes could be identified, non-owned process info
will not be shown, you would have to be root to see it all.)
tcp 0 0 192.168.10.48:54311 192.168.20.188:30300 TIME_WAIT -
原来端口被占用了,将相关占用端口python程序停掉。
[zhouhh@Hadoop48 hadoop-1.0.3]$ stop-mapred.sh
[zhouhh@Hadoop48 hadoop-1.0.3]$ stop-dfs.sh
[zhouhh@Hadoop48 hadoop-1.0.3]$ start-dfs.sh
starting namenode, logging to /home/zhouhh/hadoop-1.0.3/libexec/../logs/hadoop-zhouhh-namenode-Hadoop48.out
Hadoop47: starting datanode, logging to /home/zhouhh/hadoop-1.0.3/libexec/../logs/hadoop-zhouhh-datanode-Hadoop47.out
Hadoop46: starting datanode, logging to /home/zhouhh/hadoop-1.0.3/libexec/../logs/hadoop-zhouhh-datanode-Hadoop46.out
Hadoop48: starting secondarynamenode, logging to /home/zhouhh/hadoop-1.0.3/libexec/../logs/hadoop-zhouhh-secondarynamenode-Hadoop48.out
[zhouhh@Hadoop48 hadoop-1.0.3]$ netstat -antp | grep 54310
(Not all processes could be identified, non-owned process info
will not be shown, you would have to be root to see it all.)
tcp 0 0 192.168.10.48:54310 0.0.0.0:* LISTEN 24716/java
tcp 0 0 192.168.10.48:51040 192.168.10.48:54310 TIME_WAIT -
tcp 0 0 192.168.10.48:51038 192.168.10.48:54310 TIME_WAIT -
tcp 0 0 192.168.10.48:54310 192.168.10.46:38202 ESTABLISHED 24716/java
[zhouhh@Hadoop48 hadoop-1.0.3]$ start-mapred.sh
starting jobtracker, logging to /home/zhouhh/hadoop-1.0.3/libexec/../logs/hadoop-zhouhh-jobtracker-Hadoop48.out
Hadoop46: starting tasktracker, logging to /home/zhouhh/hadoop-1.0.3/libexec/../logs/hadoop-zhouhh-tasktracker-Hadoop46.out
Hadoop47: starting tasktracker, logging to /home/zhouhh/hadoop-1.0.3/libexec/../logs/hadoop-zhouhh-tasktracker-Hadoop47.out
[zhouhh@Hadoop48 hadoop-1.0.3]$ netstat -antp | grep 54311
(Not all processes could be identified, non-owned process info
will not be shown, you would have to be root to see it all.)
tcp 0 0 192.168.10.48:54311 0.0.0.0:* LISTEN 25238/java
tcp 0 0 192.168.10.48:54311 192.168.10.46:33561 ESTABLISHED 25238/java
tcp 0 0 192.168.10.48:54311 192.168.10.47:55277 ESTABLISHED 25238/java
查看DataNode的log,已经正常。
[zhouhh@Hadoop48 hadoop-1.0.3]$ jps
24716 NameNode
25625 Jps
25238 JobTracker
24909 SecondaryNameNode
[zhouhh@Hadoop46 ~]$ jps
10649 TaskTracker
10352 DataNode
10912 Jps
========================== MapReduce 测试
==========================
[zhouhh@Hadoop48 ~]$ vi test.txt
a b c d
a b c d
aa bb cc dd
ee ff gg hh
由前面.bashrc设置,fs为hadoop dfs的别称
hls为 hadoop -ls的别称
[zhouhh@Hadoop48 hadoop-1.0.3]$ fs -put test.txt test.txt
[zhouhh@Hadoop48 hadoop-1.0.3]$ hls
Found 1 items
-rw-r–r– 3 zhouhh supergroup 40 2012-05-23 19:39 /user/zhouhh/test.txt
执行mapreduce测试wordcount例子:
[zhouhh@Hadoop48 hadoop-1.0.3]$ ./bin/hadoop jar hadoop-examples-1.0.3.jar wordcount /user/zhouhh/test.txt output
12/05/23 19:40:52 INFO input.FileInputFormat: Total input paths to process : 1
12/05/23 19:40:52 INFO util.NativeCodeLoader: Loaded the native-hadoop library
12/05/23 19:40:52 WARN snappy.LoadSnappy: Snappy native library not loaded
12/05/23 19:40:52 INFO mapred.JobClient: Running job: job_201205231824_0001
12/05/23 19:40:53 INFO mapred.JobClient: map 0% reduce 0%
12/05/23 19:41:07 INFO mapred.JobClient: map 100% reduce 0%
12/05/23 19:41:19 INFO mapred.JobClient: map 100% reduce 100%
12/05/23 19:41:24 INFO mapred.JobClient: Job complete: job_201205231824_0001
12/05/23 19:41:24 INFO mapred.JobClient: Counters: 29
12/05/23 19:41:24 INFO mapred.JobClient: Job Counters
12/05/23 19:41:24 INFO mapred.JobClient: Launched reduce tasks=1
12/05/23 19:41:24 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=11561
12/05/23 19:41:24 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
12/05/23 19:41:24 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
12/05/23 19:41:24 INFO mapred.JobClient: Launched map tasks=1
12/05/23 19:41:24 INFO mapred.JobClient: Data-local map tasks=1
12/05/23 19:41:24 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=9934
12/05/23 19:41:24 INFO mapred.JobClient: File Output Format Counters
12/05/23 19:41:24 INFO mapred.JobClient: Bytes Written=56
12/05/23 19:41:24 INFO mapred.JobClient: FileSystemCounters
12/05/23 19:41:24 INFO mapred.JobClient: FILE_BYTES_READ=110
12/05/23 19:41:24 INFO mapred.JobClient: HDFS_BYTES_READ=147
12/05/23 19:41:24 INFO mapred.JobClient: FILE_BYTES_WRITTEN=43581
12/05/23 19:41:24 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=56
12/05/23 19:41:24 INFO mapred.JobClient: File Input Format Counters
12/05/23 19:41:24 INFO mapred.JobClient: Bytes Read=40
12/05/23 19:41:24 INFO mapred.JobClient: Map-Reduce Framework
12/05/23 19:41:24 INFO mapred.JobClient: Map output materialized bytes=110
12/05/23 19:41:24 INFO mapred.JobClient: Map input records=4
12/05/23 19:41:24 INFO mapred.JobClient: Reduce shuffle bytes=110
12/05/23 19:41:24 INFO mapred.JobClient: Spilled Records=24
12/05/23 19:41:24 INFO mapred.JobClient: Map output bytes=104
12/05/23 19:41:24 INFO mapred.JobClient: CPU time spent (ms)=1490
12/05/23 19:41:24 INFO mapred.JobClient: Total committed heap usage (bytes)=194969600
12/05/23 19:41:24 INFO mapred.JobClient: Combine input records=16
12/05/23 19:41:24 INFO mapred.JobClient: SPLIT_RAW_BYTES=107
12/05/23 19:41:24 INFO mapred.JobClient: Reduce input records=12
12/05/23 19:41:24 INFO mapred.JobClient: Reduce input groups=12
12/05/23 19:41:24 INFO mapred.JobClient: Combine output records=12
12/05/23 19:41:24 INFO mapred.JobClient: Physical memory (bytes) snapshot=271958016
12/05/23 19:41:24 INFO mapred.JobClient: Reduce output records=12
12/05/23 19:41:24 INFO mapred.JobClient: Virtual memory (bytes) snapshot=1126625280
12/05/23 19:41:24 INFO mapred.JobClient: Map output records=16
复制代码
可见,效率不高,但成功了。
[zhouhh@Hadoop48 ~]$ hls
Found 2 items
drwxr-xr-x – zhouhh supergroup 0 2012-05-23 19:41 /user/zhouhh/output
-rw-r–r– 3 zhouhh supergroup 40 2012-05-23 19:39 /user/zhouhh/test.txt
hls所列,实际存在于分布式系统中。
[zhouhh@Hadoop48 ~]$ hadoop dfs -get /user/zhouhh/output .
[zhouhh@Hadoop48 ~]$ cat output/*
cat: output/_logs: Is a directory
a 2
aa 1
b 2
bb 1
c 2
cc 1
d 2
dd 1
ee 1
ff 1
gg 1
hh 1
或直接远程查看:
[zhouhh@Hadoop48 ~]$ hadoop dfs -cat output/*
cat: File does not exist: /user/zhouhh/output/_logs
a 2
aa 1
…
可见,分布式hadoop配置成功。
希望对大家有所帮助