用下面的代码分析日志,日志的编码格式可能为ascii,utf-8,GBK,分析出来是乱码,有人知道这该怎么解决吗?
val conf = new SparkConf().setAppName("SparkSQLDemo").setMaster("local")
val sc = new SparkContext(conf)
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
// this is used to implicitly convert an RDD to a DataFrame.
import sqlContext.implicits._
val people = sc.textFile(hdfsFilePath).map(_.split("\t"))
.map(p => Log(p(0), p(1),p(2),p(3))).toDF()
people.registerTempTable("tmpLogs")
people.select("logTime","userName").show(10)
|
|