分享

shuffle与sort

lavystord 发表于 2015-4-6 23:14:33 [显示全部楼层] 回帖奖励 阅读模式 关闭右栏 3 14706
hadoop2.6 官方文档里这两段话应该怎么理解,本人小白.Sort
The framework groups Reducer inputs by keys (since different mappers may have output the same key) in this stage.
The shuffle and sort phases occur simultaneously; while map-outputs are being fetched they are merged.

Secondary Sort
If equivalence rules for grouping the intermediate keys are required to be different from those for grouping keys before reduction, then one may specify a Comparator viaJob.setSortComparatorClass(Class). Since Job.setGroupingComparatorClass(Class) can be used to control how intermediate keys are grouped, these can be used in conjunction to simulatesecondary sort on values.


已有(3)人评论

跳转到指定楼层
sstutu 发表于 2015-4-7 14:19:12


看这个图:



来源:
彻底了解mapreduce核心Shuffle--解惑各种mapreduce问题

上面是一个shuffle图,英文大体意思是说shuffle与sort是同时进行的。不同的map根据key输出到reduce,r然后merge


回复

使用道具 举报

lavystord 发表于 2015-4-7 23:19:33

可以解答一下Job.setGroupingComparatorClass(Class)有什么用吗
回复

使用道具 举报

bioger_hit 发表于 2015-4-8 01:34:24
lavystord 发表于 2015-4-7 23:19
可以解答一下Job.setGroupingComparatorClass(Class)有什么用吗
// For secondary sort, 这里设置自定义排序的三个类
        job.setSortComparatorClass(CompositeKeyComparator.class);
        job.setPartitionerClass(NaturalKeyPartitioner.class);
        job.setGroupingComparatorClass(NaturalKeyGroupComparator.class);

详细参考这篇:
HBase MapReduce 二次排序Secondary Sort

回复

使用道具 举报

您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

关闭

推荐上一条 /2 下一条