立即注册 登录
About云-梭伦科技 返回首页

xw2016的个人空间 https://aboutyun.com/?40798 [收藏] [复制] [分享] [RSS]

日志

spark on yarn集群运行程序报错:Stack trace: ExitCodeException exitCode=15:

热度 1已有 15021 次阅读2016-7-9 22:21 |个人分类:spark| spark, yarn, yarn, yarn, yarn

启动hadoop集群,启动后守护进程如下(因是提交给yarn,所以不用启动spark):
[hadoop@yun01-nn-01 spark]$ jps
17891 NameNode
20116 ResourceManager
25014 Jps
18155 DFSZKFailoverController

spark on yarn上运行:
[hadoop@yun01-dn-01 spark]$bin/spark-submit --master yarn-cluster --executor-memory 1g --class sparkscala.k3.app.Query /application/hadoop/testjar/query.jar  hdfs://yun01-nn-01:9000/data/log1.txt hdfs://yun01-nn-01:9000/sparkout/query1

然后报错:
16/07/09 05:23:24 INFO Client: 
         client token: N/A
         diagnostics: Application application_1468011021285_0002 failed 2 times due to AM Container for appattempt_1468011021285_0002_000002 exited with  exitCode: 15
For more detailed output, check application tracking page:http://yun01-nn-01:8088/proxy/application_1468011021285_0002/Then, click on links to logs of each attempt.
Diagnostics: Exception from container-launch.
Container id: container_1468011021285_0002_02_000001
Exit code: 15
Stack trace: ExitCodeException exitCode=15: 
        at org.apache.hadoop.util.Shell.runCommand(Shell.java:538)
        at org.apache.hadoop.util.Shell.run(Shell.java:455)
        at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715)
        at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:211)
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)

按照提示,说是AM Container中出错,打开提示中的网址http://yun01-nn-01:8088/proxy/application_1468011021285_0002/Then,其中有一段
16/07/09 05:23:02 INFO yarn.ApplicationMaster: Final app status: FAILED, exitCode: 15, (reason: User class threw exception: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): Operation category READ is not supported in state standby
        at org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation(StandbyState.java:87)

Operation category READ is not supported in state standby,yun01-nn-01是standby的吗?查看节点状态:
[hadoop@yun01-nn-01 myfile]$ hdfs haadmin -getServiceState nn1
standby

还真是!不知什么时候这个节点变成standby了,而yun01-nn-02变成active了,我的命令中是把结果写到:hdfs://yun01-nn-01:9000/sparkout/query1,所以报错。 

解决办法一:
把yun01-nn-01切换成active,或者修改执行命令,把读写文件的节点换成yun01-nn02。把yun01-nn-01切换成active的操作如下:
远程到yun01-nn-02:
ssh yun01-nn-02
停止namenode再启动:
[hadoop@yun01-nn-02 ~]$ /application/hadoop/hadoop/sbin/hadoop-daemon.sh stop namenode
stopping namenode
[hadoop@yun01-nn-02 ~]$ /application/hadoop/hadoop/sbin/hadoop-daemon.sh start namenode
因为集群的自动切换机制,这样,yun01-nn-01就自动切换成active了,查看:
[hadoop@yun01-nn-01 myfile]$ hdfs haadmin -getServiceState nn1
active

再到yun01-dn-01上运行上面的命令,运行成功。
不过这种解决办法不好,因为namenode是可以自动切换的,集群运行过程中,可能因为某些原因,active的namenode可能又变了,就还会出现这个错。
另一个解决办法:
修改运行命令为:
bin/spark-submit --master yarn-cluster --executor-memory 1g --class sparkscala.k3.app.Query /application/hadoop/testjar/query.jar  hdfs://ns1/data/log1.txt hdfs://ns1/sparkout/query1

这样无论哪个节点是active,对文件读写都没有影响,运行上述命令后,日志如下,时间有点久,不过最终结果还是出来了:

[hadoop@yun01-dn-01 spark]$ bin/spark-submit --master yarn-cluster --executor-memory 1g --class sparkscala.k3.app.Query /application/hadoop/testjar/query.jar  hdfs://ns1/data/log1.txt hdfs://ns1/sparkout/query17
16/07/09 07:20:24 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/07/09 07:20:26 INFO Client: Requesting a new application from cluster with 3 NodeManagers
16/07/09 07:20:26 INFO Client: Verifying our application has not requested more than the maximum memory capability of the cluster (8192 MB per container)
16/07/09 07:20:26 INFO Client: Will allocate AM container, with 1408 MB memory including 384 MB overhead
16/07/09 07:20:26 INFO Client: Setting up container launch context for our AM
16/07/09 07:20:26 INFO Client: Setting up the launch environment for our AM container
16/07/09 07:20:26 WARN Client: SPARK_JAR detected in the system environment. This variable has been deprecated in favor of the spark.yarn.jar configuration variable.
16/07/09 07:20:26 INFO Client: Preparing resources for our AM container
16/07/09 07:20:28 WARN Client: SPARK_JAR detected in the system environment. This variable has been deprecated in favor of the spark.yarn.jar configuration variable.
16/07/09 07:20:28 INFO Client: Uploading resource file:/application/hadoop/spark/lib/spark-assembly-1.5.2-hadoop2.6.0.jar -> hdfs://ns1/user/hadoop/.sparkStaging/application_1468011021285_0007/spark-assembly-1.5.2-hadoop2.6.0.jar
16/07/09 07:20:39 INFO Client: Uploading resource file:/application/hadoop/testjar/query.jar -> hdfs://ns1/user/hadoop/.sparkStaging/application_1468011021285_0007/query.jar
16/07/09 07:20:39 INFO Client: Uploading resource file:/tmp/spark-2e9f5748-88fb-4505-a53f-156bbf05f4eb/__spark_conf__99133437026482474.zip -> hdfs://ns1/user/hadoop/.sparkStaging/application_1468011021285_0007/__spark_conf__99133437026482474.zip
16/07/09 07:20:39 INFO SecurityManager: Changing view acls to: hadoop
16/07/09 07:20:39 INFO SecurityManager: Changing modify acls to: hadoop
16/07/09 07:20:39 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hadoop); users with modify permissions: Set(hadoop)
16/07/09 07:20:40 INFO Client: Submitting application 7 to ResourceManager
16/07/09 07:20:40 INFO YarnClientImpl: Submitted application application_1468011021285_0007
16/07/09 07:20:41 INFO Client: Application report for application_1468011021285_0007 (state: ACCEPTED)
16/07/09 07:20:41 INFO Client: 
         client token: N/A
         diagnostics: N/A
         ApplicationMaster host: N/A
         ApplicationMaster RPC port: -1
         queue: default
         start time: 1468020047457
         final status: UNDEFINED
         tracking URL: http://yun01-nn-01:8088/proxy/application_1468011021285_0007/
         user: hadoop
16/07/09 07:20:42 INFO Client: Application report for application_1468011021285_0007 (state: ACCEPTED)
16/07/09 07:20:43 INFO Client: Application report for application_1468011021285_0007 (state: ACCEPTED)
16/07/09 07:20:44 INFO Client: Application report for application_1468011021285_0007 (state: ACCEPTED)
16/07/09 07:20:45 INFO Client: Application report for application_1468011021285_0007 (state: ACCEPTED)
16/07/09 07:20:46 INFO Client: Application report for application_1468011021285_0007 (state: ACCEPTED)
16/07/09 07:20:47 INFO Client: Application report for application_1468011021285_0007 (state: ACCEPTED)
16/07/09 07:20:48 INFO Client: Application report for application_1468011021285_0007 (state: ACCEPTED)
16/07/09 07:20:49 INFO Client: Application report for application_1468011021285_0007 (state: ACCEPTED)
16/07/09 07:20:50 INFO Client: Application report for application_1468011021285_0007 (state: ACCEPTED)
16/07/09 07:20:51 INFO Client: Application report for application_1468011021285_0007 (state: ACCEPTED)
16/07/09 07:20:52 INFO Client: Application report for application_1468011021285_0007 (state: ACCEPTED)
16/07/09 07:20:53 INFO Client: Application report for application_1468011021285_0007 (state: ACCEPTED)
16/07/09 07:20:54 INFO Client: Application report for application_1468011021285_0007 (state: ACCEPTED)
16/07/09 07:20:55 INFO Client: Application report for application_1468011021285_0007 (state: ACCEPTED)
16/07/09 07:20:56 INFO Client: Application report for application_1468011021285_0007 (state: ACCEPTED)
16/07/09 07:20:57 INFO Client: Application report for application_1468011021285_0007 (state: ACCEPTED)
16/07/09 07:20:58 INFO Client: Application report for application_1468011021285_0007 (state: ACCEPTED)
16/07/09 07:20:59 INFO Client: Application report for application_1468011021285_0007 (state: ACCEPTED)
16/07/09 07:21:00 INFO Client: Application report for application_1468011021285_0007 (state: ACCEPTED)
16/07/09 07:21:01 INFO Client: Application report for application_1468011021285_0007 (state: RUNNING)
16/07/09 07:21:01 INFO Client: 
         client token: N/A
         diagnostics: N/A
         ApplicationMaster host: 192.168.56.14
         ApplicationMaster RPC port: 0
         queue: default
         start time: 1468020047457
         final status: UNDEFINED
         tracking URL: http://yun01-nn-01:8088/proxy/application_1468011021285_0007/
         user: hadoop
16/07/09 07:21:02 INFO Client: Application report for application_1468011021285_0007 (state: RUNNING)
16/07/09 07:21:03 INFO Client: Application report for application_1468011021285_0007 (state: RUNNING)
16/07/09 07:21:04 INFO Client: Application report for application_1468011021285_0007 (state: RUNNING)
16/07/09 07:21:05 INFO Client: Application report for application_1468011021285_0007 (state: RUNNING)
16/07/09 07:21:06 INFO Client: Application report for application_1468011021285_0007 (state: RUNNING)
16/07/09 07:21:07 INFO Client: Application report for application_1468011021285_0007 (state: RUNNING)
16/07/09 07:21:08 INFO Client: Application report for application_1468011021285_0007 (state: RUNNING)
16/07/09 07:21:09 INFO Client: Application report for application_1468011021285_0007 (state: RUNNING)
16/07/09 07:21:10 INFO Client: Application report for application_1468011021285_0007 (state: RUNNING)
16/07/09 07:21:11 INFO Client: Application report for application_1468011021285_0007 (state: RUNNING)
16/07/09 07:21:12 INFO Client: Application report for application_1468011021285_0007 (state: RUNNING)
16/07/09 07:21:13 INFO Client: Application report for application_1468011021285_0007 (state: RUNNING)
16/07/09 07:21:14 INFO Client: Application report for application_1468011021285_0007 (state: RUNNING)
16/07/09 07:21:15 INFO Client: Application report for application_1468011021285_0007 (state: RUNNING)
16/07/09 07:21:16 INFO Client: Application report for application_1468011021285_0007 (state: RUNNING)
16/07/09 07:21:17 INFO Client: Application report for application_1468011021285_0007 (state: RUNNING)
16/07/09 07:21:18 INFO Client: Application report for application_1468011021285_0007 (state: RUNNING)
16/07/09 07:21:19 INFO Client: Application report for application_1468011021285_0007 (state: RUNNING)
16/07/09 07:21:20 INFO Client: Application report for application_1468011021285_0007 (state: RUNNING)
16/07/09 07:21:21 INFO Client: Application report for application_1468011021285_0007 (state: RUNNING)
16/07/09 07:21:22 INFO Client: Application report for application_1468011021285_0007 (state: RUNNING)
16/07/09 07:21:23 INFO Client: Application report for application_1468011021285_0007 (state: RUNNING)
16/07/09 07:21:24 INFO Client: Application report for application_1468011021285_0007 (state: RUNNING)
16/07/09 07:21:25 INFO Client: Application report for application_1468011021285_0007 (state: RUNNING)
16/07/09 07:21:26 INFO Client: Application report for application_1468011021285_0007 (state: RUNNING)
16/07/09 07:21:27 INFO Client: Application report for application_1468011021285_0007 (state: RUNNING)
16/07/09 07:21:28 INFO Client: Application report for application_1468011021285_0007 (state: RUNNING)
16/07/09 07:21:30 INFO Client: Application report for application_1468011021285_0007 (state: RUNNING)
16/07/09 07:21:31 INFO Client: Application report for application_1468011021285_0007 (state: RUNNING)
16/07/09 07:21:32 INFO Client: Application report for application_1468011021285_0007 (state: RUNNING)
16/07/09 07:21:33 INFO Client: Application report for application_1468011021285_0007 (state: RUNNING)
16/07/09 07:21:34 INFO Client: Application report for application_1468011021285_0007 (state: RUNNING)
16/07/09 07:21:35 INFO Client: Application report for application_1468011021285_0007 (state: RUNNING)
16/07/09 07:21:36 INFO Client: Application report for application_1468011021285_0007 (state: RUNNING)
16/07/09 07:21:37 INFO Client: Application report for application_1468011021285_0007 (state: RUNNING)
16/07/09 07:21:38 INFO Client: Application report for application_1468011021285_0007 (state: FINISHED)
16/07/09 07:21:38 INFO Client: 
         client token: N/A
         diagnostics: N/A
         ApplicationMaster host: 192.168.56.14
         ApplicationMaster RPC port: 0
         queue: default
         start time: 1468020047457
         final status: SUCCEEDED
         tracking URL: http://yun01-nn-01:8088/proxy/application_1468011021285_0007/A
         user: hadoop
16/07/09 07:21:38 INFO ShutdownHookManager: Shutdown hook called
16/07/09 07:21:38 INFO ShutdownHookManager: Deleting directory /tmp/spark-2e9f5748-88fb-4505-a53f-156bbf05f4eb


1

路过

雷人

握手

鲜花

鸡蛋

刚表态过的朋友 (1 人)

全部作者的其他最新日志

评论 (0 个评论)

facelist doodle 涂鸦板

您需要登录后才可以评论 登录 | 立即注册

关闭

推荐上一条 /2 下一条