对于sample,输出的个数,跟第二参数和第三个参数成正比,这两个数越大,输出的结果就会越多。目前还没有确定确切公式
如下
[mw_shl_code=scala,true] uriCounts.sample(false, 0.8, 10).collect().foreach(println)[/mw_shl_code]
[mw_shl_code=bash,true](/icons/powered_by_rh.png,3)
(/,2)
(/server-status,2)
(/icons/apache_pb.gif,1)
[/mw_shl_code]
[mw_shl_code=scala,true]uriCounts.sample(false, 0.8, 100).collect().foreach(println)
[/mw_shl_code]
[mw_shl_code=bash,true](/favicon.ico,2)
(/server-status,2)
(/icons/apache_pb.gif,1)
[/mw_shl_code]
[mw_shl_code=scala,true] uriCounts.sample(false, 0.2, 100).collect().foreach(println)
[/mw_shl_code]
[mw_shl_code=bash,true](/,2)[/mw_shl_code]
[mw_shl_code=bash,true]uriCounts.sample(false, 0.4, 100).collect().foreach(println)
[/mw_shl_code]
[mw_shl_code=bash,true](/icons/powered_by_rh.png,3)
(/,2)
(/icons/apache_pb.gif,1)
[/mw_shl_code]
|