HDFS本地累积64M一个block再上传?
新手叩拜。。请问如果向hdfs put一个来自client本地的大于64M的文件,按照说明文档,应该先在本地累积到64M再上传至datanode吗?那么这个累积的过程要怎么才能看到呢吗?这个local temp 文件夹在哪呢吗?就是比如从一个文件夹可以看到原来是空的 然后慢慢长到64M 然后再flush到datanode,这个temp文件夹再变空吗?
拜求指点!!! >> 按照说明文档,应该先在本地累积到64M再上传至datanode吗?
这个说法是错误的。客户端在本地累积一个packet,再传给datanode。packet是传输单元,一般有64K左右。累积的过程不写磁盘 回复 2# baggioss
说明文档写的是:
A client request to create a file does not reach the NameNode immediately. In fact, initially the HDFS client caches the file data into a temporary local file. Application writes are transparently redirected to this temporary local file. When the local file accumulates data worth over one HDFS block size, the client contacts the NameNode.
这里的block不是packet呀,您说的那个packet我也了解一点,那个默认是64k没错,是在流水线往datanode上写的时候的buffer大小。
不知道我的理解对不对吗? 当本地数据文件大于一个HDFS文件系统的块大小,才联系namenode ,这块大小是看配置的 HDFS的源代码中并没有做到在客户端本地缓存64M的一个块后再上传到一个最近的DataNode,而是按照一个一个packet发送给DataNode。只不过在向DataNode传完一个Block后必须要等到DataNode对所有的该Block的packet的答复后才决定传下一个Block。 回复 5# dt_long
刚刚看了源代码,确实如此:[*] // get packet to be sent.
[*] one = dataQueue.getFirst();
[*] long offsetInBlock = one.offsetInBlock;
[*]
[*] // get new block from namenode.
[*] if (blockStream == null) {
[*] LOG.debug("Allocating new block");
[*] nodes = nextBlockOutputStream(src);
[*] this.setName("DataStreamer for file " + src +
[*] " block " + block);
[*] response = new ResponseProcessor(nodes);
[*] response.start();
[*] }
[*]
[*] if (offsetInBlock >= blockSize) {
[*] throw new IOException("BlockSize " + blockSize +
[*] " is smaller than data size. " +
[*] " Offset of packet in block " +
[*] offsetInBlock +
[*] " Aborting file " + src);
[*] }
[*]
[*] ByteBuffer buf = one.getBuffer();
[*]
[*] // move packet from dataQueue to ackQueue
[*] dataQueue.removeFirst();
[*] dataQueue.notifyAll();
[*] synchronized (ackQueue) {
[*] ackQueue.addLast(one);
[*] ackQueue.notifyAll();
[*] }
[*]
[*] // write out data to remote datanode
[*] blockStream.write(buf.array(), buf.position(), buf.remaining());复制代码
页:
[1]