Spark性能优化之reduceByKey和groupByKey比较
举个例子val counts=pairs.reduceByKey(_+_)
val counts=pairs.groupByKey().map(wordcounts=>(wordcounts._1,wordcounts_2.sum))
如果能用reduceByKey那就用reduceByKey,因为它会在map端,先进行本地combine,可以大大的减少要传输到reduce端的数据量,减少网路传输的开销
只有在reduceByKey 处理不了的时候才会用groupbByKey.map()来替代
下面给出一个图解介绍一下val counts=pairs.groupByKey().map(wordcounts=>(wordcounts._1,wordcounts_2.sum))
下面给出一个图解介绍一下val counts=pairs.reduceByKey(_+_)
页:
[1]