分享

spark任务 在spark-shell上能跑,但是在spark-submit上报错

小小布衣 发表于 2015-1-8 17:48:05 [显示全部楼层] 回帖奖励 阅读模式 关闭右栏 11 147769
请大神,帮我看看这个问题,
1.我自己写的WordCount在本地能跑,在spark-shell上能跑,但是在spark-submit上报错
2.排除scala版本的问题,本地和集群的scala单位版本都是2.10.4
3.跑官方的WordCount报错和自己WordCount报错是一样的
4.我提交方式是:./spark-submit --master spark://10.1.0.141:7077  --class org.apache.spark.examples.SparkPi  --name Spark-Pi --executor-memory 500M  --driver-memory 512M  lib/spark-examples-1.1.0-cdh5.2.0-hadoop2.5.0-cdh5.2.0.jar 1000(这样的话,应该也不会存在jar包冲突的问题,但是错误依旧)




15/01/08 15:19:58 INFO Remoting: Remoting now listens on addresses: [akka.tcp://sparkDriver@n5.china.com:20545]
15/01/08 15:19:58 INFO Utils: Successfully started service 'sparkDriver' on port 20545.
15/01/08 15:19:58 INFO SparkEnv: Registering MapOutputTracker
15/01/08 15:19:58 INFO SparkEnv: Registering BlockManagerMaster
15/01/08 15:19:58 INFO DiskBlockManager: Created local directory at /tmp/spark-local-20150108151958-cadb
15/01/08 15:19:58 INFO Utils: Successfully started service 'Connection manager for block manager' on port 63094.
15/01/08 15:19:58 INFO ConnectionManager: Bound socket to port 63094 with id = ConnectionManagerId(n5.china.com,63094)
15/01/08 15:19:58 INFO MemoryStore: MemoryStore started with capacity 265.4 MB
15/01/08 15:19:58 INFO BlockManagerMaster: Trying to register BlockManager
15/01/08 15:19:58 INFO BlockManagerMasterActor: Registering block manager n5.china.com:63094 with 265.4 MB RAM
15/01/08 15:19:58 INFO BlockManagerMaster: Registered BlockManager
15/01/08 15:19:58 INFO HttpFileServer: HTTP File server directory is /tmp/spark-3b796d21-db73-4aad-84bc-23cd0ce3fab7
15/01/08 15:19:58 INFO HttpServer: Starting HTTP Server
15/01/08 15:19:58 INFO Utils: Successfully started service 'HTTP file server' on port 38860.
15/01/08 15:19:58 INFO Utils: Successfully started service 'SparkUI' on port 4040.
15/01/08 15:19:58 INFO SparkUI: Started SparkUI at http://n5.china.com:4040
15/01/08 15:19:59 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/01/08 15:19:59 INFO SparkContext: Added JAR file:/opt/cloudera/parcels/CDH-5.2.0-1.cdh5.2.0.p0.36/jars/spark-examples-1.1.0-cdh5.2.0-hadoop2.5.0-cdh5.2.0.jar at http://10.1.0.182:38860/jars/spa ... p2.5.0-cdh5.2.0.jar with timestamp 1420701599601
15/01/08 15:19:59 INFO AppClient$ClientActor: Connecting to master spark://10.1.0.141:7077...
15/01/08 15:19:59 INFO SparkDeploySchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0
15/01/08 15:19:59 INFO SparkContext: Starting job: reduce at SparkPi.scala:35
15/01/08 15:19:59 INFO DAGScheduler: Got job 0 (reduce at SparkPi.scala:35) with 1000 output partitions (allowLocal=false)
15/01/08 15:19:59 INFO DAGScheduler: Final stage: Stage 0(reduce at SparkPi.scala:35)
15/01/08 15:19:59 INFO DAGScheduler: Parents of final stage: List()
15/01/08 15:19:59 INFO DAGScheduler: Missing parents: List()
15/01/08 15:19:59 INFO DAGScheduler: Submitting Stage 0 (MappedRDD[1] at map at SparkPi.scala:31), which has no missing parents
15/01/08 15:20:00 INFO MemoryStore: ensureFreeSpace(1728) called with curMem=0, maxMem=278302556
15/01/08 15:20:00 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 1728.0 B, free 265.4 MB)
15/01/08 15:20:00 INFO MemoryStore: ensureFreeSpace(1125) called with curMem=1728, maxMem=278302556
15/01/08 15:20:00 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 1125.0 B, free 265.4 MB)
15/01/08 15:20:00 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on n5.china.com:63094 (size: 1125.0 B, free: 265.4 MB)
15/01/08 15:20:00 INFO BlockManagerMaster: Updated info of block broadcast_0_piece0
15/01/08 15:20:00 INFO DAGScheduler: Submitting 1000 missing tasks from Stage 0 (MappedRDD[1] at map at SparkPi.scala:31)
15/01/08 15:20:00 INFO TaskSchedulerImpl: Adding task set 0.0 with 1000 tasks
15/01/08 15:20:15 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory
15/01/08 15:20:19 INFO AppClient$ClientActor: Connecting to master spark://10.1.0.141:7077...
15/01/08 15:20:30 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory
15/01/08 15:20:39 INFO AppClient$ClientActor: Connecting to master spark://10.1.0.141:7077...
15/01/08 15:20:45 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory
15/01/08 15:20:59 ERROR SparkDeploySchedulerBackend: Application has been killed. Reason: All masters are unresponsive! Giving up.
15/01/08 15:20:59 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool
15/01/08 15:20:59 INFO TaskSchedulerImpl: Cancelling stage 0
15/01/08 15:20:59 INFO DAGScheduler: Failed to run reduce at SparkPi.scala:35
Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: All masters are unresponsive! Giving up.
        at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1185)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1174)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1173)
        at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
        at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
        at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1173)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:688)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:688)
        at scala.Option.foreach(Option.scala:236)
        at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:688)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1391)
        at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
        at akka.actor.ActorCell.invoke(ActorCell.scala:456)
        at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
        at akka.dispatch.Mailbox.run(Mailbox.scala:219)
        at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
        at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
        at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
        at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
        at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
[root@n5 bin]#


已有(12)人评论

跳转到指定楼层
muyannian 发表于 2015-1-8 21:22:07
spark内存要求比较高,应该是运行的时候资源不足了。
回复

使用道具 举报

langke93 发表于 2015-1-8 21:30:14
检查spark-env.sh中是否正确配置master节点的ip

  1. export SPARK_MASTER_IP=x.x.x.x
复制代码



回复

使用道具 举报

小小布衣 发表于 2015-1-9 09:58:29
muyannian 发表于 2015-1-8 21:22
spark内存要求比较高,应该是运行的时候资源不足了。

这个是有可能的,我在cdh的cm上面看到,每个work只有64M,但是spark是在cm上安装的,应该是环境的问题,但是问题还是没有解决
回复

使用道具 举报

小小布衣 发表于 2015-1-9 09:59:21
langke93 发表于 2015-1-8 21:30
检查spark-env.sh中是否正确配置master节点的ip

这个ip是自动安装的,我查看了下,,是对的

点评

在内存方面下点功夫,比如增加内存或则停掉其它服务。总之让内存够大  发表于 2015-1-9 10:15
回复

使用道具 举报

小小布衣 发表于 2015-1-9 13:56:20
小小布衣 发表于 2015-1-9 09:59
这个ip是自动安装的,我查看了下,,是对的

我的内存是16g的每个worker,但是在添加spark服务的时候,每个worker只有64M,手动改配置重启后, 错误依旧,我现在在怀疑是cdh环境的问题
回复

使用道具 举报

stark_summer 发表于 2015-1-9 19:15:13
你集群服务器 内存太低了吧
还有ib/spark-examples-1.1.0-cdh5.2.0-hadoop2.5.0-cdh5.2.0.jar 这个 不用上传吧,你集群服务器应该有了
回复

使用道具 举报

小小布衣 发表于 2015-1-12 15:15:31
stark_summer 发表于 2015-1-9 19:15
你集群服务器 内存太低了吧
还有ib/spark-examples-1.1.0-cdh5.2.0-hadoop2.5.0-cdh5.2.0.jar 这个 不用 ...

内存不低的,有16g了。只是在cdh上自动生成的spark服务的时候,竟然默认是64M,不知道在哪里出的错
回复

使用道具 举报

小小布衣 发表于 2015-1-25 17:38:07
小小布衣 发表于 2015-1-9 09:59
这个ip是自动安装的,我查看了下,,是对的

我跑官方提供的例子:/opt/cloudera/parcels/CDH/bin/spark-submit --class org.apache.spark.examples.SparkPi --executor-memory 2g /opt/cloudera/parcels/CDH-5.2.0-1.cdh5.2.0.p0.36/jars/spark-examples-1.1.0-cdh5.2.0-hadoop2.5.0-cdh5.2.0.jar 1000没有报错,,但是不能加主节点的的参数: --master spark://10.0.0.44:7077 不然就会报错,连不上主节点,
15/01/25 17:25:04 INFO BlockManagerMaster: Updated info of block broadcast_1_piece0
15/01/25 17:25:04 INFO DAGScheduler: Submitting 2 missing tasks from Stage 1 (MappedRDD[3] at map at WordCount.scala:16)
15/01/25 17:25:04 INFO TaskSchedulerImpl: Adding task set 1.0 with 2 tasks
15/01/25 17:25:19 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory
15/01/25 17:25:34 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory




问题还是解决不了啊,大神帮帮忙
回复

使用道具 举报

cnhu 发表于 2015-5-21 10:38:24
小小布衣 发表于 2015-1-25 17:38
我跑官方提供的例子:/opt/cloudera/parcels/CDH/bin/spark-submit   --class org.apache.spark.examples ...

请问你的这个问题解决了吗?我也遇到这样的问题了,好忧伤,::>_<::
WARN TaskSchedulerImpl: Initial job has notaccepted any resources; check your cluster UI to ensure that workers areregistered and have sufficient memory
ERROR SparkDeploySchedulerBackend:Application has been killed. Reason: All masters are unresponsive! Giving up.
Exception in thread "main"org.apache.spark.SparkException: Job aborted due to stage failure: All mastersare unresponsive! Giving up.

回复

使用道具 举报

12下一页
您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

关闭

推荐上一条 /2 下一条