举个例子
[mw_shl_code=scala,true]val counts=pairs.reduceByKey(_+_)
val counts=pairs.groupByKey().map(wordcounts=>(wordcounts._1,wordcounts_2.sum))[/mw_shl_code]
如果能用reduceByKey那就用reduceByKey,因为它会在map端,先进行本地combine,可以大大的减少要传输到reduce端的数据量,减少网路传输的开销
只有在reduceByKey 处理不了的时候才会用groupbByKey.map()来替代
下面给出一个图解介绍一下val counts=pairs.groupByKey().map(wordcounts=>(wordcounts._1,wordcounts_2.sum))
下面给出一个图解介绍一下val counts=pairs.reduceByKey(_+_)
|
|