让你真正明白spark streaming

查看数: 33889 | 评论数: 13 | 收藏 7
关灯 | 提示:支持键盘翻页<-左 右->
    组图打开中,请稍候......
发布时间: 2017-3-3 09:52

正文摘要:

本帖最后由 pig2 于 2017-3-21 07:33 编辑 问题导读 1.为什么使用spark streaming? 2.什么是StreamingContext? 3.什么是DStream? spark streaming介绍 Spark streaming是Spark核心API的一个 ...

回复

琅琊榜尾 发表于 2019-8-13 15:52:36
打卡学习第N天
linux_oracle 发表于 2018-12-3 17:27:11
学习了
linux_oracle 发表于 2018-11-28 18:18:32
666666666
arsenduan 发表于 2017-9-8 10:26:10
fengfengda 发表于 2017-9-7 16:55
环境变量怎么会有问题呢。没有自定义jar包啊,直接在Idea中运行的在浏览器可以看到
的错误信息

原因都是不一样的,下面仅供参考。
standalone 集群模式运行

VM options = -Dspark.master=spark://master:7077 (master应该替换为所需要的集群主机名)

Program arguments = 文件在本地机器上的绝对路径(或者hdfs://... 或者其他路径)

按照如上参数直接在idea启动spark任务,会出现异常

org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 4.0 failed 4 times, most recent failure: Lost task 0.3 in stage 4.0 (TID 77, 192.168.1.194): java.lang.ClassCastException: cannot assign instance of scala.collection.immutable.List$SerializationProxy to field org.apache.spark.rdd.RDD.org$apache$spark$rdd$RDD$$dependencies_ of type scala.collection.Seq in instance of org.apache.spark.rdd.MapPartitionsRDD  (见 https://issues.apache.org/jira/browse/SPARK-9219

因此,改用submit的方式提交

①点击 File - project structure -artifacts - jar - from modules with dependency,选择对应的module和main class。


②设置VM options = -Dspark.master=spark://master:7077

!!!注意:此模式下,sc.textFile(path)实际上是 hdfs://master:7077/path

③点击 build - build artifacts,生成jar包(位于自己在idea中指定的路径下,本文在项目/out/..目录下)

④zip -d /usr/mywork/project/scala/FirstSpark/out/artifacts/firstspark_jar/firstspark.jar META-INF/*.RSA META-INF/*.DSA META-INF/*.SF

(  /usr/mywork/project/scala/FirstSpark/out/artifacts/firstspark_jar/firstspark.jar是生成的jar路径 )

(参考 http://blog.csdn.net/dai451954706/article/details/50086295,如果不加入这一步,会出现错误:Exception in thread "main" java.lang.SecurityException: Invalid signature file digest for Manifest main attributes  



⑤以standalone模式提交(还有其它模式, 参见 http://blog.csdn.net/kinger0/article/details/46562473

./spark-submit  --class WordCount --master spark://pmaster:7077 /usr/mywork/project/scala/FirstSpark/out/artifacts/firstspark_jar/firstspark.jar

(WordCount 代表要执行的任务的入口类;

spark://pmaster:7077 代表所提交的集群的master机器;

/usr/mywork/project/scala/FirstSpark/out/artifacts/firstspark_jar/firstspark.jar 代表所要提交的jar包)

a530491093 发表于 2017-9-8 09:52:58
MARK下,,不错感谢~
fengfengda 发表于 2017-9-7 16:55:57
nextuser 发表于 2017-9-7 13:20
可能环境变量的问题,确保一些jar包,特别自定义的。在每个节点都有。

环境变量怎么会有问题呢。没有自定义jar包啊,直接在Idea中运行的在浏览器可以看到
的错误信息
QQ截图20170907165507.png
nextuser 发表于 2017-9-7 13:20:10
fengfengda 发表于 2017-9-7 10:41
WordCount代码报如下的错误
17/09/07 10:37:05 INFO JobScheduler: Started JobScheduler
17/09/07 10:37 ...

可能环境变量的问题,确保一些jar包,特别自定义的。在每个节点都有。
fengfengda 发表于 2017-9-7 10:41:41
WordCount代码报如下的错误
17/09/07 10:37:05 INFO JobScheduler: Started JobScheduler
17/09/07 10:37:05 INFO StreamingContext: StreamingContext started
17/09/07 10:37:11 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Registered executor NettyRpcEndpointRef(null) (172.28.41.196:52138) with ID 0
17/09/07 10:37:11 INFO BlockManagerMasterEndpoint: Registering block manager 172.28.41.196:35789 with 413.9 MB RAM, BlockManagerId(0, 172.28.41.196, 35789)
17/09/07 10:37:30 WARN FileInputDStream: Error finding new files
java.lang.NullPointerException
        at scala.collection.mutable.ArrayOps$ofRef$.length$extension(ArrayOps.scala:192)
        at scala.collection.mutable.ArrayOps$ofRef.length(ArrayOps.scala:192)
        at scala.collection.SeqLike$class.size(SeqLike.scala:106)
        at scala.collection.mutable.ArrayOps$ofRef.size(ArrayOps.scala:186)
        at scala.collection.mutable.Builder$class.sizeHint(Builder.scala:69)
        at scala.collection.mutable.ArrayBuilder.sizeHint(ArrayBuilder.scala:22)
        at scala.collection.TraversableLike$class.builder$1(TraversableLike.scala:230)
        at scala.collection.TraversableLike$class.map(TraversableLike.scala:233)
        at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186)
        at org.apache.spark.streaming.dstream.FileInputDStream.findNewFiles(FileInputDStream.scala:205)
        at org.apache.spark.streaming.dstream.FileInputDStream.compute(FileInputDStream.scala:149)
        at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1$$anonfun$apply$7.apply(DStream.scala:341)
        at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1$$anonfun$apply$7.apply(DStream.scala:341)
        at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
        at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1.apply(DStream.scala:340)
        at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1.apply(DStream.scala:340)
        at org.apache.spark.streaming.dstream.DStream.createRDDWithLocalProperties(DStream.scala:415)
        at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:335)
        at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:333)
        at scala.Option.orElse(Option.scala:289)
        at org.apache.spark.streaming.dstream.DStream.getOrCompute(DStream.scala:330)
        at org.apache.spark.streaming.dstream.MappedDStream.compute(MappedDStream.scala:36)
        at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1$$anonfun$apply$7.apply(DStream.scala:341)
        at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1$$anonfun$apply$7.apply(DStream.scala:341)
        at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
        at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1.apply(DStream.scala:340)
        at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1.apply(DStream.scala:340)
        at org.apache.spark.streaming.dstream.DStream.createRDDWithLocalProperties(DStream.scala:415)
        at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:335)
        at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:333)
        at scala.Option.orElse(Option.scala:289)
        at org.apache.spark.streaming.dstream.DStream.getOrCompute(DStream.scala:330)
        at org.apache.spark.streaming.dstream.FlatMappedDStream.compute(FlatMappedDStream.scala:36)
        at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1$$anonfun$apply$7.apply(DStream.scala:341)
        at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1$$anonfun$apply$7.apply(DStream.scala:341)
        at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
        at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1.apply(DStream.scala:340)
        at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1.apply(DStream.scala:340)
        at org.apache.spark.streaming.dstream.DStream.createRDDWithLocalProperties(DStream.scala:415)
        at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:335)
        at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:333)
        at scala.Option.orElse(Option.scala:289)
        at org.apache.spark.streaming.dstream.DStream.getOrCompute(DStream.scala:330)
        at org.apache.spark.streaming.dstream.MappedDStream.compute(MappedDStream.scala:36)
        at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1$$anonfun$apply$7.apply(DStream.scala:341)
        at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1$$anonfun$apply$7.apply(DStream.scala:341)
        at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
        at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1.apply(DStream.scala:340)
        at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1.apply(DStream.scala:340)
        at org.apache.spark.streaming.dstream.DStream.createRDDWithLocalProperties(DStream.scala:415)
        at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:335)
        at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:333)
        at scala.Option.orElse(Option.scala:289)
        at org.apache.spark.streaming.dstream.DStream.getOrCompute(DStream.scala:330)
        at org.apache.spark.streaming.dstream.ShuffledDStream.compute(ShuffledDStream.scala:41)
        at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1$$anonfun$apply$7.apply(DStream.scala:341)
        at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1$$anonfun$apply$7.apply(DStream.scala:341)
        at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
        at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1.apply(DStream.scala:340)
        at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1.apply(DStream.scala:340)
        at org.apache.spark.streaming.dstream.DStream.createRDDWithLocalProperties(DStream.scala:415)
        at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:335)
        at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:333)
        at scala.Option.orElse(Option.scala:289)
        at org.apache.spark.streaming.dstream.DStream.getOrCompute(DStream.scala:330)
        at org.apache.spark.streaming.dstream.ForEachDStream.generateJob(ForEachDStream.scala:48)
        at org.apache.spark.streaming.DStreamGraph$$anonfun$1.apply(DStreamGraph.scala:117)
        at org.apache.spark.streaming.DStreamGraph$$anonfun$1.apply(DStreamGraph.scala:116)
        at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
        at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
        at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
        at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
        at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
        at scala.collection.AbstractTraversable.flatMap(Traversable.scala:104)
        at org.apache.spark.streaming.DStreamGraph.generateJobs(DStreamGraph.scala:116)
        at org.apache.spark.streaming.scheduler.JobGenerator$$anonfun$3.apply(JobGenerator.scala:249)
        at org.apache.spark.streaming.scheduler.JobGenerator$$anonfun$3.apply(JobGenerator.scala:247)
        at scala.util.Try$.apply(Try.scala:192)
        at org.apache.spark.streaming.scheduler.JobGenerator.generateJobs(JobGenerator.scala:247)
        at org.apache.spark.streaming.scheduler.JobGenerator.org$apache$spark$streaming$scheduler$JobGenerator$$processEvent(JobGenerator.scala:183)
        at org.apache.spark.streaming.scheduler.JobGenerator$$anon$1.onReceive(JobGenerator.scala:89)
        at org.apache.spark.streaming.scheduler.JobGenerator$$anon$1.onReceive(JobGenerator.scala:88)
        at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
17/09/07 10:37:30 INFO FileInputDStream: New files at time 1504751850000 ms:

17/09/07 10:37:30 INFO JobScheduler: Added jobs for time 1504751850000 ms
17/09/07 10:37:30 INFO JobScheduler: Starting job streaming job 1504751850000 ms.0 from job set of time 1504751850000 ms
17/09/07 10:37:30 INFO SparkContext: Starting job: print at WordCountStreaming.scala:19
17/09/07 10:37:30 INFO DAGScheduler: Registering RDD 3 (map at WordCountStreaming.scala:18)
17/09/07 10:37:30 INFO DAGScheduler: Got job 0 (print at WordCountStreaming.scala:19) with 1 output partitions
17/09/07 10:37:30 INFO DAGScheduler: Final stage: ResultStage 1 (print at WordCountStreaming.scala:19)
17/09/07 10:37:30 INFO DAGScheduler: Parents of final stage: List(ShuffleMapStage 0)
17/09/07 10:37:30 INFO DAGScheduler: Missing parents: List()
17/09/07 10:37:30 INFO DAGScheduler: Submitting ResultStage 1 (ShuffledRDD[4] at reduceByKey at WordCountStreaming.scala:18), which has no missing parents
17/09/07 10:37:30 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 2.8 KB, free 907.2 MB)
17/09/07 10:37:30 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 1723.0 B, free 907.2 MB)
17/09/07 10:37:30 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 172.28.26.83:53299 (size: 1723.0 B, free: 907.2 MB)
17/09/07 10:37:30 INFO SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:1012
17/09/07 10:37:30 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 1 (ShuffledRDD[4] at reduceByKey at WordCountStreaming.scala:18)
17/09/07 10:37:30 INFO TaskSchedulerImpl: Adding task set 1.0 with 1 tasks
17/09/07 10:37:30 INFO TaskSetManager: Starting task 0.0 in stage 1.0 (TID 0, 172.28.41.196, partition 0, PROCESS_LOCAL, 5798 bytes)
17/09/07 10:37:30 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Launching task 0 on executor id: 0 hostname: 172.28.41.196.
17/09/07 10:37:31 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 172.28.41.196:35789 (size: 1723.0 B, free: 413.9 MB)
17/09/07 10:37:31 WARN TaskSetManager: Lost task 0.0 in stage 1.0 (TID 0, 172.28.41.196): java.lang.ClassCastException: cannot assign instance of scala.collection.immutable.List$SerializationProxy to field org.apache.spark.rdd.RDD.org$apache$spark$rdd$RDD$$dependencies_ of type scala.collection.Seq in instance of org.apache.spark.rdd.ShuffledRDD
        at java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2133)
        at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1305)
        at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2024)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1942)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1808)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353)
        at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2018)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1942)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1808)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353)
        at java.io.ObjectInputStream.readObject(ObjectInputStream.java:373)
        at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75)
        at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:114)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
        at org.apache.spark.scheduler.Task.run(Task.scala:86)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)

17/09/07 10:37:31 INFO TaskSetManager: Starting task 0.1 in stage 1.0 (TID 1, 172.28.41.196, partition 0, PROCESS_LOCAL, 5798 bytes)
17/09/07 10:37:31 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Launching task 1 on executor id: 0 hostname: 172.28.41.196.
17/09/07 10:37:32 INFO TaskSetManager: Lost task 0.1 in stage 1.0 (TID 1) on executor 172.28.41.196: java.lang.ClassCastException (cannot assign instance of scala.collection.immutable.List$SerializationProxy to field org.apache.spark.rdd.RDD.org$apache$spark$rdd$RDD$$dependencies_ of type scala.collection.Seq in instance of org.apache.spark.rdd.ShuffledRDD) [duplicate 1]
17/09/07 10:37:32 INFO TaskSetManager: Starting task 0.2 in stage 1.0 (TID 2, 172.28.41.196, partition 0, PROCESS_LOCAL, 5798 bytes)
17/09/07 10:37:32 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Launching task 2 on executor id: 0 hostname: 172.28.41.196.
17/09/07 10:37:32 INFO TaskSetManager: Lost task 0.2 in stage 1.0 (TID 2) on executor 172.28.41.196: java.lang.ClassCastException (cannot assign instance of scala.collection.immutable.List$SerializationProxy to field org.apache.spark.rdd.RDD.org$apache$spark$rdd$RDD$$dependencies_ of type scala.collection.Seq in instance of org.apache.spark.rdd.ShuffledRDD) [duplicate 2]
17/09/07 10:37:32 INFO TaskSetManager: Starting task 0.3 in stage 1.0 (TID 3, 172.28.41.196, partition 0, PROCESS_LOCAL, 5798 bytes)
17/09/07 10:37:32 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Launching task 3 on executor id: 0 hostname: 172.28.41.196.
17/09/07 10:37:32 INFO TaskSetManager: Lost task 0.3 in stage 1.0 (TID 3) on executor 172.28.41.196: java.lang.ClassCastException (cannot assign instance of scala.collection.immutable.List$SerializationProxy to field org.apache.spark.rdd.RDD.org$apache$spark$rdd$RDD$$dependencies_ of type scala.collection.Seq in instance of org.apache.spark.rdd.ShuffledRDD) [duplicate 3]
17/09/07 10:37:32 ERROR TaskSetManager: Task 0 in stage 1.0 failed 4 times; aborting job
17/09/07 10:37:32 INFO TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool
17/09/07 10:37:32 INFO TaskSchedulerImpl: Cancelling stage 1
17/09/07 10:37:32 INFO DAGScheduler: ResultStage 1 (print at WordCountStreaming.scala:19) failed in 1.762 s
17/09/07 10:37:32 INFO DAGScheduler: Job 0 failed: print at WordCountStreaming.scala:19, took 1.994405 s
17/09/07 10:37:32 INFO JobScheduler: Finished job streaming job 1504751850000 ms.0 from job set of time 1504751850000 ms
17/09/07 10:37:32 INFO JobScheduler: Total delay: 2.188 s for time 1504751850000 ms (execution: 2.029 s)
17/09/07 10:37:32 ERROR JobScheduler: Error running job streaming job 1504751850000 ms.0
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 3, 172.28.41.196): java.lang.ClassCastException: cannot assign instance of scala.collection.immutable.List$SerializationProxy to field org.apache.spark.rdd.RDD.org$apache$spark$rdd$RDD$$dependencies_ of type scala.collection.Seq in instance of org.apache.spark.rdd.ShuffledRDD
        at java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2133)
        at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1305)
        at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2024)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1942)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1808)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353)
        at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2018)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1942)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1808)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353)
        at java.io.ObjectInputStream.readObject(ObjectInputStream.java:373)
        at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75)
        at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:114)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
        at org.apache.spark.scheduler.Task.run(Task.scala:86)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)

Driver stacktrace:
        at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1454)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1442)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1441)
        at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
        at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
        at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1441)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:811)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:811)
        at scala.Option.foreach(Option.scala:257)
        at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:811)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1667)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1622)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1611)
        at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
        at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:632)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1873)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1886)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1899)
        at org.apache.spark.rdd.RDD$$anonfun$take$1.apply(RDD.scala:1324)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
        at org.apache.spark.rdd.RDD.withScope(RDD.scala:358)
        at org.apache.spark.rdd.RDD.take(RDD.scala:1298)
        at org.apache.spark.streaming.dstream.DStream$$anonfun$print$2$$anonfun$foreachFunc$3$1.apply(DStream.scala:734)
        at org.apache.spark.streaming.dstream.DStream$$anonfun$print$2$$anonfun$foreachFunc$3$1.apply(DStream.scala:733)
        at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ForEachDStream.scala:51)
        at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:51)
        at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:51)
        at org.apache.spark.streaming.dstream.DStream.createRDDWithLocalProperties(DStream.scala:415)
        at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply$mcV$sp(ForEachDStream.scala:50)
        at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:50)
        at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:50)
        at scala.util.Try$.apply(Try.scala:192)
        at org.apache.spark.streaming.scheduler.Job.run(Job.scala:39)
        at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply$mcV$sp(JobScheduler.scala:247)
        at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply(JobScheduler.scala:247)
        at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply(JobScheduler.scala:247)
        at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
        at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler.run(JobScheduler.scala:246)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ClassCastException: cannot assign instance of scala.collection.immutable.List$SerializationProxy to field org.apache.spark.rdd.RDD.org$apache$spark$rdd$RDD$$dependencies_ of type scala.collection.Seq in instance of org.apache.spark.rdd.ShuffledRDD
        at java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2133)
        at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1305)
        at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2024)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1942)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1808)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353)
        at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2018)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1942)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1808)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353)
        at java.io.ObjectInputStream.readObject(ObjectInputStream.java:373)
        at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75)
        at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:114)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
        at org.apache.spark.scheduler.Task.run(Task.scala:86)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
        ... 3 more
Exception in thread "main" 17/09/07 10:37:32 INFO FileInputDStream: Cleared 0 old files that were older than 1504751790000 ms:
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 3, 172.28.41.196): java.lang.ClassCastException: cannot assign instance of scala.collection.immutable.List$SerializationProxy to field org.apache.spark.rdd.RDD.org$apache$spark$rdd$RDD$$dependencies_ of type scala.collection.Seq in instance of org.apache.spark.rdd.ShuffledRDD
        at java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2133)
        at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1305)
        at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2024)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1942)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1808)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353)
zuizui 发表于 2017-7-1 14:51:25
看看再说
关闭

推荐上一条 /2 下一条