shuffle与sort

hadoop2.6 官方文档里这两段话应该怎么理解,本人小白.Sort

The framework groups Reducer inputs by keys (since different mappers may have output the same key) in this stage.

The shuffle and sort phases occur simultaneously; while map-outputs are being fetched they are merged.

Secondary Sort

If equivalence rules for grouping the intermediate keys are required to be different from those for grouping keys before reduction, then one may specify a Comparator viaJob.setSortComparatorClass(Class). Since Job.setGroupingComparatorClass(Class) can be used to control how intermediate keys are grouped, these can be used in conjunction to simulatesecondary sort on values.

sstutu · 发表于 2015-4-7 14:19:12

看这个图：

来源：
彻底了解mapreduce核心Shuffle--解惑各种mapreduce问题

上面是一个shuffle图，英文大体意思是说shuffle与sort是同时进行的。不同的map根据key输出到reduce,r然后merge

lavystord · 发表于 2015-4-7 23:19:33

sstutu 发表于 2015-4-7 14:19
看这个图：

可以解答一下Job.setGroupingComparatorClass(Class)有什么用吗

bioger_hit · 发表于 2015-4-8 01:34:24

lavystord 发表于 2015-4-7 23:19
可以解答一下Job.setGroupingComparatorClass(Class)有什么用吗

// For secondary sort, 这里设置自定义排序的三个类
      job.setSortComparatorClass(CompositeKeyComparator.class);
      job.setPartitionerClass(NaturalKeyPartitioner.class);
      job.setGroupingComparatorClass(NaturalKeyGroupComparator.class);

详细参考这篇：
HBase MapReduce 二次排序Secondary Sort

图文精华

shuffle与sort

相关帖子

已有(3)人评论

最佳新人

活跃会员

热心会员

推荐 /2