本帖最后由 ighack 于 2020-1-16 10:53 编辑
[mw_shl_code=java,true]Dataset<Row> userRecs = model.recommendForAllUsers(10);
userRecs.toJavaRDD().map(new Function<Row, String>() {
@Override
public String call(Row row) throws Exception {
Integer userid = row.getInt(0);
String rating = "";
JSONObject json = new JSONObject();
JSONArray arr = new JSONArray();
json.put("UserID",userid);
Iterator<GenericRowWithSchema> itor =((WrappedArray)row.getAs(1)).iterator();
while (itor.hasNext()){
GenericRowWithSchema res = itor.next();
Integer moveid = res.getInt(0);
Float rat = res.getFloat(1);
JSONObject _data = new JSONObject();
_data.put("MovieID",moveid);
_data.put("rating",rat);
arr.add(_data);
}
json.put("result",arr);
return json.toJSONString();
}
}).saveAsTextFile("G:\\code\\Java\\sparkml\\src\\main\\resources\\data");[/mw_shl_code]
我做了一个预测以后。想要把结果保存下来
但这样的代码保存出来的结果有好多文件
类似
.part-00048.crc
.part-00049.crc
还有很多
part-00015
part-00016
part-00017
像part-00017这样的文件里是结果,内容为
[mw_shl_code=applescript,true]{"result":[{"MovieID":170,"rating":4.360397},{"MovieID":143,"rating":4.2904},{"MovieID":694,"rating":4.283311},{"MovieID":64,"rating":4.2403197},{"MovieID":213,"rating":4.119807},{"MovieID":216,"rating":4.107259},{"MovieID":69,"rating":4.069874},{"MovieID":97,"rating":4.039567},{"MovieID":318,"rating":4.0127234},{"MovieID":215,"rating":4.0114603}],"UserID":731}
{"result":[{"MovieID":313,"rating":4.123612},{"MovieID":1643,"rating":3.9563293},{"MovieID":340,"rating":3.9490104},{"MovieID":316,"rating":3.9220722},{"MovieID":483,"rating":3.9156215},{"MovieID":320,"rating":3.8911288},{"MovieID":1625,"rating":3.8898299},{"MovieID":1449,"rating":3.8802822},{"MovieID":134,"rating":3.8677},{"MovieID":1398,"rating":3.8600028}],"UserID":587}[/mw_shl_code]
虽然是我要的结果但文件很多。而且每个文件的结果只有只条。
为什么不是保存到几个或者是一个文件里
.part-00049.crc这样的文件是什么呢。和保存模型的文件很像
我怎才能保存到一个大文件里。或者保存到HDFS
[mw_shl_code=java,true]userRecs.write().json("G:\\code\\Java\\sparkml\\src\\main\\resources\\data-json");[/mw_shl_code]
这样做其他也是一样的结果
|