因为已经配置好了hadoop的,所以在官网下载了spark-1.6.1-bin-without-hadoop.tgz
但执行spark-sql时提示:
Failed to load main class org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.
You need to build Spark with -Phive and -Phive-thriftserver.
结果就是说一定要自己编译才能用spark-sql ? 要加参数 -Phive and -Phive-thriftserver 编译?
好吧。下载了源码,也编译成功了,用的命令是:
./make-distribution.sh --name hadoop-provided --tgz -Phive -Phive-thriftserver -Pyarn -DskipTests
结果执行 spark-sql 时没那个错误了,
但是执行 spark on yarn时测试程序都运行不了啦
就是这个:
spark-submit --master yarn \
--deploy-mode client \
examples/src/main/python/pi.py \
10
执行时能提交到yarn,但一直是挂起的,提示是:
INFO yarn.Client: Application report for application_1472058188372_0001 (state: ACCEPTED)
反复提示这个。网页上显示的状态也是accepted,但一直不是Running,
YarnApplicationState: ACCEPTED: waiting for AM container to be allocated, launched and register with RM.
FinalStatus Reported by AM: Application has not completed yet.
是编译时的参数不对吗?
------------------
Packaging without Hadoop Dependencies for YARN
The assembly directory produced by mvn package will, by default, include all of Spark’s dependencies, including Hadoop and some of its ecosystem projects. On YARN deployments, this causes multiple versions of these to appear on executor classpaths: the version packaged in the Spark assembly and the version on each node, included with yarn.application.classpath. The hadoop-provided profile builds the assembly without including Hadoop-ecosystem projects, like ZooKeeper and Hadoop itself. |