spark编译2:构建基于hadoop的spark安装包实践及遇到问题总结
问题导读
1.编译失败的可能问题有哪些?
2.指定hadoop版本,需要添加什么配置?
3.如不添加profile,会出现什么问题?
static/image/hrline/4.gif
上一篇:
spark编译1:构建基于hadoop的spark安装包
http://www.aboutyun.com/forum.php?mod=viewthread&tid=23257
spark编译时间还是比较长的,可能要一两个小时,而且有时候卡住那不动。
在编译的过程中,有编译失败,也有编译成功的。总之两个条件:
1.跟版本有关系
版本不对,可能编译失败
2.跟网速有关系
网速不好,有些下载失败,导致编译失败。
使用hadoop2.6.5
spark源码下载:
链接:http://pan.baidu.com/s/1gfMpTqb 密码:c6dc
$SPARK_SRC/make-distribution.sh --tgz -Pyarn -Phadoop-2.6.5 -Dhadoop.version=2.6.5 -Phive
这里使用的是spark2.3.0,hadoop版本为2.6.5,最后编译失败。报错如下
------------------------------------------------------------------------
------------------------------------------------------------------------
Reactor Summary:
Skipping Spark Integration for Kafka 0.10 Assembly
This project has been banned from the build due to previous failures.
------------------------------------------------------------------------
Spark Project Parent POM ........................... SUCCESS
Spark Project Tags ................................. SUCCESS
Spark Project Sketch ............................... SUCCESS [ 10.413 s]
Spark Project Local DB ............................. SUCCESS
Spark Project Networking ........................... SUCCESS [ 36.812 s]
Spark Project Shuffle Streaming Service ............ SUCCESS [ 11.964 s]
Spark Project Unsafe ............................... SUCCESS [ 34.261 s]
Spark Project Launcher ............................. SUCCESS
Spark Project Core ................................. SUCCESS
Spark Project ML Local Library ..................... SUCCESS
Spark Project GraphX ............................... SUCCESS
Spark Project Streaming ............................ SUCCESS
Spark Project Catalyst ............................. SUCCESS
Spark Project SQL .................................. FAILURE
Spark Project ML Library ........................... SKIPPED
Spark Project Tools ................................ SUCCESS [ 17.955 s]
Spark Project Hive ................................. SKIPPED
Spark Project REPL ................................. SKIPPED
Spark Project YARN Shuffle Service ................. SUCCESS [ 19.981 s]
Spark Project YARN ................................. SUCCESS
Spark Project Assembly ............................. SKIPPED
Spark Integration for Kafka 0.10 ................... SUCCESS
Kafka 0.10 Source for Structured Streaming ......... SKIPPED
Spark Project Examples ............................. SKIPPED
Spark Integration for Kafka 0.10 Assembly .......... SKIPPED
------------------------------------------------------------------------
BUILD FAILURE
------------------------------------------------------------------------
Total time: 56:26 min
Finished at: 2017-11-08T10:44:58+08:00
Final Memory: 65M/296M
------------------------------------------------------------------------
The requested profile "hadoop-2.6.5" could not be activated because it does not exist.
Failed to execute goal net.alchim31.maven:scala-maven-plugin:3.2.2:testCompile (scala-test-compile-first) on project spark-sql_2.11: Execution scala-test-compile-first of goal net.alchim31.maven:scala-maven-plugin:3.2.2:testCompile failed. CompileFailed ->
To see the full stack trace of the errors, re-run Maven with the -e switch.
Re-run Maven using the -X switch to enable full debug logging.
For more information about the errors and possible solutions, please read the following articles:
http://cwiki.apache.org/confluence/display/MAVEN/PluginExecutionException
After correcting the problems, you can resume the build with the command
mvn <goals> -rf :spark-sql_2.11
有一个错误和警告是比较关键的
The requested profile "hadoop-2.6.5" could not be activated because it does not exist.
Failed to execute goal net.alchim31.maven:scala-maven-plugin:3.2.2:testCompile (scala-test-compile-first) on project spark-sql_2.11:
上面是profile中是没有hadoop-2.6.5,因此我们需要增加profile
<profile>
<id>hadoop-2.6.5</id>
</profile>
第二个问题不能执行,这个可能就跟网速有关系。多次执行仍然失败,也可能跟版本有关系
更换hadoop2.7.1
上面失败,接着我们尝试hadoop2.7.1
<profile>
<id>hadoop-2.6.5</id>
</profile>
编译成功
但是报错如下
+ TARDIR_NAME='spark- The requested profile "hadoop-2.7.1" could not be activated because it does not exist.-bin- The requested profile "hadoop-2.7.1" could not be activated because it does not exist.'
+ TARDIR='/home/aboutyun/spark/spark- The requested profile "hadoop-2.7.1" could not be activated because it does not exist.-bin- The requested profile "hadoop-2.7.1" could not be activated because it does not exist.'
+ rm -rf '/home/aboutyun/spark/spark- The requested profile "hadoop-2.7.1" could not be activated because it does not exist.-bin- The requested profile "hadoop-2.7.1" could not be activated because it does not exist.'
+ cp -r /home/aboutyun/spark/dist '/home/aboutyun/spark/spark- The requested profile "hadoop-2.7.1" could not be activated because it does not exist.-bin- The requested profile "hadoop-2.7.1" could not be activated because it does not exist.'
+ tar czf 'spark- The requested profile "hadoop-2.7.1" could not be activated because it does not exist.-bin- The requested profile "hadoop-2.7.1" could not be activated because it does not exist..tgz' -C /home/aboutyun/spark 'spark- The requested profile "hadoop-2.7.1" could not be activated because it does not exist.-bin- The requested profile "hadoop-2.7.1" could not be activated because it does not exist.'
+ rm -rf '/home/aboutyun/spark/spark- The requested profile "hadoop-2.7.1" could not be activated because it does not exist.-bin- The requested profile "hadoop-2.7.1" could not be activated because it does not exist.'
The requested profile "hadoop-2.7.1" could not be activated because it does not exist.这句很关键,意思是虽然编译成功,但是不能使用。在后面将截图给大家看。
所以在pom.xml文件中添加如下属性
<profile>
<id>hadoop-2.7.1</id>
</profile>
编译成功
下面我们通过winscp查看第一个为未添加profile,第二个添加后,编译成功。
编译包下载:
链接:http://pan.baidu.com/s/1nv7QAwT 密码:oo2r
感谢分享
页:
[1]