好久没写了,找了一个月的工作,心累,有些东西都快忘了(没实操)
环境:spark1.6 hive1.2.1 hadoop2.6.4
1.添加一下依赖包
spark-hive_2.10的添加为了能创建hivecontext对象
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-hive_2.10</artifactId>
<version>1.6.1</version>
</dependency>
mysql驱动链接元数据
<dependency>
<groupId>mysql</groupId>
<artifactId>mysql-connector-java</artifactId>
<version>5.1.38</version>
<scope>compile</scope>
</dependency>
2.添加hive-site.xml文件内容如下
<?xml version="1.0" encoding="UTF-8"?>
<!--Autogenerated by Cloudera Manager-->
<configuration>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://master:3306/hive?createDatabaseIfNotExist=true</value>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>hive</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>hive</value>
</property>
</configuration>
然后就可以开始读取hive表的数据了,代码如下
object App {
def main(args: Array[String]): Unit = {
val conf = new SparkConf().setAppName("test").setMaster("local[2]")
val sc = new SparkContext(conf)
val sqlContext = new HiveContext(sc)
sqlContext.table("test.person") // 库名.表明 格式
.registerTempTable("person") //注册临时表的格式
sqlContext.sql(
"""
| select *
| from person
| limit 10
""".stripMargin).show()
sc.stop()
}
}