问题导读
1.编译CDH Spark需要哪些步骤?
2.编译CDH Spark的命令是什么?
3.本文如何生成压缩包?
本文以Cloudera维护的Spark分支项目为例,记录跟新Spark分支以及编译Spark源代码的过程。
下载代码下载代码: [mw_shl_code=bash,true]$ git clone https://github.com/javachen/spark[/mw_shl_code]
然后,切换到最新的分支,当前为 cdh5-1.3.0_5.4.0。 [mw_shl_code=bash,true]$ cd spark$
git checkout cdh5-1.3.0_5.4.0[/mw_shl_code]
查看当前分支: [mw_shl_code=bash,true] git branch
* cdh5-1.3.0_5.4.0
master[/mw_shl_code]
如果spark发布了新的版本,需要同步到我自己维护的spark项目中,可以按以下步骤进行操作: [mw_shl_code=bash,true]# 添加远程仓库地址
$ git remote add cdh git@github.com:cloudera/spark.git
# 抓取远程仓库更新:
$ git fetch cdh
# 假设cloudera发布了新的版本 cdh/cdh5-1.3.0_5.4.X
$ git checkout -b cdh5-1.3.0_5.4.X cdh/cdh5-1.3.0_5.4.X
# 切换到新下载的分支
$ git checkout cdh5-1.3.0_5.4.X
# 将其提交到自己的远程仓库:
$ git push origin cdh5-1.3.0_5.4.X:cdh5-1.3.0_5.4.X[/mw_shl_code]
编译
安装 zinc[mw_shl_code=bash,true]$ brew install zinc[/mw_shl_code]
使用maven编译指定hadoop版本为2.6.0-cdh5.4.0,并集成yarn和hive: [mw_shl_code=bash,true]$ export MAVEN_OPTS="-Xmx2g -XX:MaxPermSize=512M -XX:ReservedCodeCacheSize=512m"
$ mvn -Pyarn -Dhadoop.version=2.6.0-cdh5.4.0 -Phive -DskipTests clean package[/mw_shl_code]
在CDH的spark中,要想集成hive-thriftserver进行编译,需要修改 pom.xml 文件,添加一行 sql/hive-thriftserver:
[mw_shl_code=xml,true]<modules>
<module>core</module>
<module>bagel</module>
<module>graphx</module>
<module>mllib</module>
<module>tools</module>
<module>streaming</module>
<module>sql/catalyst</module>
<module>sql/core</module>
<module>sql/hive</module>
<module>sql/hive-thriftserver</module> <!--添加的一行-->
<module>repl</module>
<module>assembly</module>
<module>external/twitter</module>
<module>external/kafka</module>
<module>external/flume</module>
<module>external/flume-sink</module>
<module>external/zeromq</module>
<module>external/mqtt</module>
<module>examples</module>
</modules>[/mw_shl_code]
然后,再执行: [mw_shl_code=bash,true]$ export MAVEN_OPTS="-Xmx2g -XX:MaxPermSize=512M -XX:ReservedCodeCacheSize=512m"
$ mvn -Pyarn -Dhadoop.version=2.6.0-cdh5.4.0 -Phive -Phive-thriftserver -DskipTests clean package[/mw_shl_code]
运行测试用例: [mw_shl_code=bash,true]$ mvn -Pyarn -Dhadoop.version=2.6.0-cdh5.4.0 -Phive test[/mw_shl_code]
运行java8测试: [mw_shl_code=bash,true]$ mvn install -DskipTests -Pjava8-tests[/mw_shl_code]
使用sbt编译
[mw_shl_code=bash,true]$ build/sbt -Pyarn -Dhadoop.version=2.6.0-cdh5.4.0 -Phive assembly[/mw_shl_code]
生成压缩包
[mw_shl_code=bash,true]$ ./make-distribution.sh[/mw_shl_code]
排错- Unable to find configuration file at location scalastyle-config.xml 异常
在idea中使用maven对examples模块运行package或者install命令会出现Unable to find configuration file at location scalastyle-config.xml异常,解决办法是将根目录下的scalastyle-config.xml拷贝到examples目录下去,这是因为pom.xml中定义的是scalastyle-maven-plugin插件从maven运行的当前目录查找该文件。 [mw_shl_code=xml,true]<plugin>
<groupId>org.scalastyle</groupId>
<artifactId>scalastyle-maven-plugin</artifactId>
<version>0.4.0</version>
<configuration>
<verbose>false</verbose>
<failOnViolation>true</failOnViolation>
<includeTestSourceDirectory>false</includeTestSourceDirectory>
<failOnWarning>false</failOnWarning>
<sourceDirectory>${basedir}/src/main/scala</sourceDirectory>
<testSourceDirectory>${basedir}/src/test/scala</testSourceDirectory>
<configLocation>scalastyle-config.xml</configLocation>
<outputFile>scalastyle-output.xml</outputFile>
<outputEncoding>UTF-8</outputEncoding>
</configuration>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>check</goal>
</goals>
</execution>
</executions>
</plugin>[/mw_shl_code]
参考
|