hadoop block的问题:数据大于64MB如何分块
如题,在Hadoop 中hdfs上传的文件大于默认的64MB时是如何分块的,比如说是100MB的文件是分成64MB和36MB?还是分成2块加起来是100MB的block?求指导,如果有源代码的话就更好了万分感谢!!!第一次发帖求指导~ 64M加36M。 [*] if (currentPacket.numChunks == currentPacket.maxChunks
[*] || bytesCurBlock == blockSize) {
[*] if (LOG.isDebugEnabled()) {
[*] LOG.debug("DFSClient writeChunk packet full seqno="
[*] + currentPacket.seqno + ", src=" + src + ", bytesCurBlock="
[*] + bytesCurBlock + ", blockSize=" + blockSize + ", appendChunk="
[*] + appendChunk);
[*] }
[*] //
[*] // if we allocated a new packet because we encountered a block
[*] // boundary, reset bytesCurBlock.
[*] //
[*] if (bytesCurBlock == blockSize) {
[*] currentPacket.lastPacketInBlock = true;
[*] bytesCurBlock = 0;
[*] lastFlushOffset = 0;
[*] }
[*] enqueueCurrentPacket();
[*]
[*] // If this was the first write after reopening a file, then the above
[*] // write filled up any partial chunk. Tell the summer to generate full
[*] // crc chunks from now on.
[*] if (appendChunk) {
[*] appendChunk = false;
[*] resetChecksumChunk(bytesPerChecksum);
[*] }
[*] int psize = Math.min((int) (blockSize - bytesCurBlock),
[*] writePacketSize);
[*] computePacketChunkSize(psize, bytesPerChecksum);
[*] }
[*] }复制代码 64+36,不过它有个balance功能,大概就是可以执行一个命令,就变成50+50 回复 2# alexanderdai
难道不是存储两个64M的block? 回复 5# nerversayno
不是,hdfs的块不占满整个空间。 不占满整个空间的话,剩余空间怎么使用吗?比如 64M block里用了1M,剩下63M怎么办吗? 剩余的空间留着继续用,不占空间。
页:
[1]