2014-11-11 02:12:54,057 FATAL org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer: Unknown error encountered while tailing edits. Shutting down standby NN.
java.io.FileNotFoundException: File does not exist: /user/dong/data/dpp/classification/gender/vw-output-train/2014-10-30-research-with-confict-fix-bug-rerun/_temporary/1/_temporary/attempt_1415171013961_37060_m_000015_0/part-00015
at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:65)
at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:55)
...
复制代码
其中提到了"Unknown error encountered while tailing edits. Shutting down standby NN."
于是,我们尝试启动Standby Namenode服务,结果报告以下错误:
2014-11-12 04:26:28,860 INFO org.apache.hadoop.hdfs.server.namenode.EditLogInputStream: Fast-forwarding stream 'http://idc2-server10.heylinux.com:8480/getJournal?jid=idc2&segmentTxId=240823073&storageInfo=-55%3A1838233660%3A0%3ACID-d77ea84b-1b24-4bc2-ad27-7d2384d222d6' to transaction ID 240741256
java.io.FileNotFoundException: File does not exist: /user/dong/data/dpp/classification/gender/vw-output-train/2014-10-30-research-with-confict-fix-bug-rerun/_temporary/1/_temporary/attempt_1415171013961_37060_m_000015_0/part-00015
at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:65)
at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:55)
java.io.FileNotFoundException: File does not exist: /user/dong/data/dpp/classification/gender/vw-output-train/2014-10-30-research-with-confict-fix-bug-rerun/_temporary/1/_temporary/attempt_1415171013961_37060_m_000015_0/part-00015
at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:65)
复制代码
说找不到"/user/dong/data/dpp/classification/gender/vw-output-train/2014-10-30-research-with-confict-fix-bug-rerun/_temporary/1/_temporary/attempt_1415171013961_37060_m_000015_0/part-00015"这个文件。
而事实上,这个文件是临时文件,不重要并且已经被删除了。
但在上面,却报告"ERROR org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: Encountered exception on operation CloseOp",可以看出是在加载edits文件,执行OP_CLOSE操作时提示找不到文件。
In case there is some problem with hadoop cluster and the edits file is corrupted it is possible to save at least part of the edits file that is correct. This can be done by converting the binary edits to XML, edit it manually and then convert it back to binary.
通过以上描述,我们了解到edits文件可能会corrupted,而反编译之后手动修改,在编译回二进制格式进行替换,可以作为一种解决办法。
于是我们将上面找到的两个关联edits文件,将其复制出来并进行了反编译:
可以看到,对于"/user/dong/data/dpp/classification/gender/vw-output-train/2014-10-30-research-with-confict-fix-bug-rerun/_temporary/1/_temporary/attempt_1415171013961_37060_m_000015_0/part-00015",OP_DELETE发生在了OP_CLOSE之前,因此执行OP_CLOSE时会提示"File does not exist"。
2014-11-12 08:57:13,053 INFO org.apache.hadoop.hdfs.server.namenode.EditLogInputStream: Fast-forwarding stream 'http://idc2-server1.heylinux.com:8480/getJournal?jid=idc2prod&segmentTxId=240823073&storageInfo=-55%3A1838233660%3A0%3ACID-d77ea84b-1b24-4bc2-ad27-7d2384d222d6' to transaction ID 240299210
java.io.FileNotFoundException: File does not exist: /user/dong/data/dpp/classification/gender/vw-output-train/2014-10-30-research-with-confict-fix-bug-rerun/_temporary/1/_temporary/attempt_1415171013961_37060_m_000018_0/part-00018
at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:65)
at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:55)
java.io.FileNotFoundException: File does not exist: /user/dong/data/dpp/classification/gender/vw-output-train/2014-10-30-research-with-confict-fix-bug-rerun/_temporary/1/_temporary/attempt_1415171013961_37060_m_000018_0/part-00018
at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:65)