1、hadoop会以块的形式存储在HDFS系统。通过命令可以查看所在节点和块的位置:
[root@master softpackage]# hadoop fs -put scala-2.10.4.tgz /
[root@master softpackage]# hadoop fsck /scala-2.10.4.tgz -files -locations -blocks
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.
Connecting to namenode via http://master:50070/fsck?ugi=root&files=1&locations=1&blocks=1&path=%2Fscala-2.10.4.tgz
FSCK started by root (auth:SIMPLE) from /192.168.86.133 for path /scala-2.10.4.tgz at Fri Jun 09 11:14:14 EDT 2017
/scala-2.10.4.tgz 29937534 bytes, 1 block(s): Under replicated BP-1810807976-192.168.86.133-1496888566245:blk_1073741829_1005. Target Replicas is 3 but found 2 replica(s).
0. BP-1810807976-192.168.86.133-1496888566245:blk_1073741829_1005 len=29937534 repl=2 [DatanodeInfoWithStorage[192.168.86.132:50010,DS-ead6ac48-ce41-4133-9552-ec5ca51a6204,DISK], DatanodeInfoWithStorage[192.168.86.134:50010,DS-5059d0f7-4e64-4554-aa92-375a1fe573b8,DISK]]
Status: HEALTHY
Total size: 29937534 B
Total dirs: 0
Total files: 1
Total symlinks: 0
Total blocks (validated): 1 (avg. block size 29937534 B)
Minimally replicated blocks: 1 (100.0 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 1 (100.0 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 3
Average block replication: 2.0
Corrupt blocks: 0
Missing replicas: 1 (33.333332 %)
Number of data-nodes: 2
Number of racks: 1
FSCK ended at Fri Jun 09 11:14:14 EDT 2017 in 2 milliseconds
The filesystem under path '/scala-2.10.4.tgz' is HEALTHY
然后去datanode 查找具体位置:
[root@slave2 subdir0]# du -sh *
224K blk_1073741825
4.0K blk_1073741825_1001.meta
4.0K blk_1073741827
4.0K blk_1073741827_1003.meta
4.0K blk_1073741828
4.0K blk_1073741828_1004.meta
29M blk_1073741829
232K blk_1073741829_1005.meta
[root@slave2 subdir0]# pwd
/opt/hadoop/dfs/data/current/BP-1810807976-192.168.86.133-1496888566245/current/finalized/subdir0/subdir0
可参考:
http://www.myexception.cn/database/1997522.html