spark怎样编译?

因为已经配置好了hadoop的，所以在官网下载了spark-1.6.1-bin-without-hadoop.tgz
但执行spark-sql时提示：
Failed to load main class org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.
You need to build Spark with -Phive and -Phive-thriftserver.

结果就是说一定要自己编译才能用spark-sql ? 要加参数 -Phive and -Phive-thriftserver 编译？

好吧。下载了源码，也编译成功了，用的命令是：
./make-distribution.sh --name hadoop-provided --tgz -Phive -Phive-thriftserver -Pyarn -DskipTests

结果执行 spark-sql 时没那个错误了，
但是执行 spark on yarn时测试程序都运行不了啦
就是这个：
spark-submit --master yarn \
--deploy-mode client \
examples/src/main/python/pi.py \
10

执行时能提交到yarn，但一直是挂起的，提示是：
INFO yarn.Client: Application report for application_1472058188372_0001 (state: ACCEPTED)
反复提示这个。网页上显示的状态也是accepted,但一直不是Running，
YarnApplicationState: ACCEPTED: waiting for AM container to be allocated, launched and register with RM.
FinalStatus Reported by AM: Application has not completed yet.

是编译时的参数不对吗？

------------------
Packaging without Hadoop Dependencies for YARN

The assembly directory produced by mvn package will, by default, include all of Spark’s dependencies, including Hadoop and some of its ecosystem projects. On YARN deployments, this causes multiple versions of these to appear on executor classpaths: the version packaged in the Spark assembly and the version on each node, included with yarn.application.classpath. The hadoop-provided profile builds the assembly without including Hadoop-ecosystem projects, like ZooKeeper and Hadoop itself.

langke93 · 发表于 2016-8-24 17:29:49

spark部署采用的什么方式

cckp · 发表于 2016-8-25 09:27:12

langke93 发表于 2016-8-24 17:29
spark部署采用的什么方式

on yarn

用下载的spark without hadoop的话 pi例子可以正常运行。hive on spark也可以。
替换了目录为自己编译的就始终hang挂起了，过一段时间后会fail.

如果再把目录替换为原来下载的spark-1.6.1-bin-without-hadoop.tgz后又可以正常运行。

所以应该是编译中出了些问题。由于编译是显示success的，可能是编译的参数有问题？

请问你编译时用的什么参数？

图文精华

spark怎样编译?

已有(2)人评论

最佳新人

活跃会员

热心会员

推荐 /2