问题是发生在测试集群上的。这个集群一共有5台虚拟机,2个NameNode,3个DataNode,Hadoop版本为2.2.0。
Hadoop已经连续运行了两个多月,平时就跑跑程序,一直没有事。但是今天在维护时候发生了问题:
- [tester@HDFS0 conf]$ stop-all.sh
- This script is Deprecated. Instead use stop-dfs.sh and stop-yarn.sh
- 15/02/09 17:30:06 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
- Stopping namenodes on [HDFS0.hs HDFS1.hs]
- HDFS0.hs: no namenode to stop
- HDFS1.hs: no namenode to stop
- HDFS4.hs: no datanode to stop
- HDFS3.hs: no datanode to stop
- HDFS2.hs: no datanode to stop
- Stopping journal nodes [HDFS2.hs HDFS3.hs HDFS4.hs]
- HDFS2.hs: no journalnode to stop
- HDFS3.hs: no journalnode to stop
- HDFS4.hs: no journalnode to stop
- 15/02/09 17:30:14 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
- Stopping ZK Failover Controllers on NN hosts [HDFS0.hs HDFS1.hs]
- HDFS1.hs: no zkfc to stop
- HDFS0.hs: no zkfc to stop
- stopping yarn daemons
- no resourcemanager to stop
- HDFS3.hs: no nodemanager to stop
- HDFS4.hs: no nodemanager to stop
- HDFS2.hs: no nodemanager to stop
- no proxyserver to stop
复制代码
stop-all.sh 并没有找到 pid,无法正常停止相应进程!
我已经在 hadoop-env.sh 和 yarn-env.sh 中设置好了pid的存储路径:
在 hadoop-env.sh 中设置如下:
- export HADOOP_PID_DIR=/home/tester/hadoop/hadoop-2.2.0/pids
复制代码
在yarn-env.sh 中设置如下:- export YARN_PID_DIR=/home/tester/hadoop/hadoop-2.2.0/pids
复制代码
进入目录中看到pid文件都在:
- [tester@HDFS0 pids]$ ls
- hadoop-tester-namenode.pid yarn-tester-resourcemanager.pid
- hadoop-tester-zkfc.pid
复制代码
其中pid分别为:NN 58096; RM 58431 ; ZKFC 58351 。
与jps中查看的pid并不一致!
- [tester@HDFS0 conf]$ jps
- 9225 Jps
- 60038 NameNode
- 60400 ResourceManager
- 60313 DFSZKFailoverController
复制代码
有遇到过类似问题的朋友吗?
|