zenppl 发表于 2018-11-22 11:15:48

hive on spark 的资源调度问题

跪求大佬解答!

集群资源在利用了70%的情况下,在hive中执行操作一直等待,直到失败。
看了下yarn的日志,在container状态转为running后,停了好久ApplicationMasterService才注册,此处日志如下
2018-11-22 10:23:57,497 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1542599163995_0369_000001 State change from ALLOCATED to LAUNCHED on event = LAUNCHED
2018-11-22 10:23:58,467 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: container_e43_1542599163995_0369_01_000001 Container Transitioned from ACQUIRED to RUNNING
2018-11-22 10:25:51,385 INFO org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: AM registration appattempt_1542599163995_0369_000001
2018-11-22 10:25:51,385 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=hive        IP=10.9.0.204        OPERATION=Register App Master        TARGET

可以看到10:23:58后等待了两分钟才到下一步,而此时在hue上执行的hive查询已经失败了。


附完整日志

2018-11-22 10:23:54,504 INFO org.apache.hadoop.yarn.server.resourcemanager.ClientRMService: Allocated new applicationId: 369
2018-11-22 10:23:56,633 INFO org.apache.hadoop.yarn.server.resourcemanager.ClientRMService: Application with id 369 submitted by user hive
2018-11-22 10:23:56,633 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: Storing application with id application_1542599163995_0369
2018-11-22 10:23:56,633 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=hive        IP=10.9.22.127        OPERATION=Submit Application Request        TARGET=ClientRMService        RESULT=SUCCESS        APPID=application_1542599163995_0369
2018-11-22 10:23:56,633 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: application_1542599163995_0369 State change from NEW to NEW_SAVING on event = START
2018-11-22 10:23:56,633 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Storing info for app: application_1542599163995_0369
2018-11-22 10:23:56,645 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: application_1542599163995_0369 State change from NEW_SAVING to SUBMITTED on event = APP_NEW_SAVED
2018-11-22 10:23:56,645 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Accepted application application_1542599163995_0369 from user: hive, in queue: root.users.mobvoi_r, currently num of applications: 2
2018-11-22 10:23:56,645 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: application_1542599163995_0369 State change from SUBMITTED to ACCEPTED on event = APP_ACCEPTED
2018-11-22 10:23:56,645 INFO org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: Registering app attempt : appattempt_1542599163995_0369_000001
2018-11-22 10:23:56,645 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1542599163995_0369_000001 State change from NEW to SUBMITTED on event = START
2018-11-22 10:23:56,645 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Added Application Attempt appattempt_1542599163995_0369_000001 to scheduler from user: hive
2018-11-22 10:23:56,645 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1542599163995_0369_000001 State change from SUBMITTED to SCHEDULED on event = ATTEMPT_ADDED
2018-11-22 10:23:57,468 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: container_e43_1542599163995_0369_01_000001 Container Transitioned from NEW to ALLOCATED
2018-11-22 10:23:57,468 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=hive        OPERATION=AM Allocated Container        TARGET=SchedulerApp        RESULT=SUCCESS        APPID=application_1542599163995_0369        CONTAINERID=container_e43_1542599163995_0369_01_000001
2018-11-22 10:23:57,468 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerNode: Assigned container container_e43_1542599163995_0369_01_000001 of capacity <memory:1536, vCores:1> on host prd-hadoop-1.persistent.uc.mobvoi-idc.com:8041, which has 3 containers, <memory:15872, vCores:11> used and <memory:8704, vCores:4> available after allocation
2018-11-22 10:23:57,468 INFO org.apache.hadoop.yarn.server.resourcemanager.security.NMTokenSecretManagerInRM: Sending NMToken for nodeId : prd-hadoop-1.persistent.uc.mobvoi-idc.com:8041 for container : container_e43_1542599163995_0369_01_000001
2018-11-22 10:23:57,468 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: container_e43_1542599163995_0369_01_000001 Container Transitioned from ALLOCATED to ACQUIRED
2018-11-22 10:23:57,468 INFO org.apache.hadoop.yarn.server.resourcemanager.security.NMTokenSecretManagerInRM: Clear node set for appattempt_1542599163995_0369_000001
2018-11-22 10:23:57,468 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: Storing attempt: AppId: application_1542599163995_0369 AttemptId: appattempt_1542599163995_0369_000001 MasterContainer: Container:
2018-11-22 10:23:57,468 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1542599163995_0369_000001 State change from SCHEDULED to ALLOCATED_SAVING on event = CONTAINER_ALLOCATED
2018-11-22 10:23:57,472 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1542599163995_0369_000001 State change from ALLOCATED_SAVING to ALLOCATED on event = ATTEMPT_NEW_SAVED
2018-11-22 10:23:57,472 INFO org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Launching masterappattempt_1542599163995_0369_000001
2018-11-22 10:23:57,473 INFO org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Setting up container Container: for AM appattempt_1542599163995_0369_000001
2018-11-22 10:23:57,473 INFO org.apache.hadoop.yarn.server.resourcemanager.security.AMRMTokenSecretManager: Create AMRMToken for ApplicationAttempt: appattempt_1542599163995_0369_000001
2018-11-22 10:23:57,473 INFO org.apache.hadoop.yarn.server.resourcemanager.security.AMRMTokenSecretManager: Creating password for appattempt_1542599163995_0369_000001
2018-11-22 10:23:57,497 INFO org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Done launching container Container: for AM appattempt_1542599163995_0369_000001
2018-11-22 10:23:57,497 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1542599163995_0369_000001 State change from ALLOCATED to LAUNCHED on event = LAUNCHED
2018-11-22 10:23:58,467 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: container_e43_1542599163995_0369_01_000001 Container Transitioned from ACQUIRED to RUNNING
2018-11-22 10:25:51,385 INFO org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: AM registration appattempt_1542599163995_0369_000001
2018-11-22 10:25:51,385 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=hive        IP=10.9.0.204        OPERATION=Register App Master        TARGET=ApplicationMasterService        RESULT=SUCCESS        APPID=application_1542599163995_0369        APPATTEMPTID=appattempt_1542599163995_0369_000001
2018-11-22 10:25:51,386 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1542599163995_0369_000001 State change from LAUNCHED to RUNNING on event = REGISTERED
2018-11-22 10:25:51,386 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: application_1542599163995_0369 State change from ACCEPTED to RUNNING on event = ATTEMPT_REGISTERED
2018-11-22 10:25:52,205 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: container_e43_1542599163995_0369_01_000002 Container Transitioned from NEW to ALLOCATED
2018-11-22 10:25:52,205 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=hive        OPERATION=AM Allocated Container        TARGET=SchedulerApp        RESULT=SUCCESS        APPID=application_1542599163995_0369        CONTAINERID=container_e43_1542599163995_0369_01_000002
2018-11-22 10:25:52,205 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerNode: Assigned container container_e43_1542599163995_0369_01_000002 of capacity <memory:7168, vCores:5> on host prd-hadoop-6.persistent.uc.mobvoi-idc.com:8041, which has 3 containers, <memory:21504, vCores:15> used and <memory:3072, vCores:0> available after allocation
2018-11-22 10:25:52,702 INFO org.apache.hadoop.yarn.server.resourcemanager.security.NMTokenSecretManagerInRM: Sending NMToken for nodeId : prd-hadoop-6.persistent.uc.mobvoi-idc.com:8041 for container : container_e43_1542599163995_0369_01_000002
2018-11-22 10:25:52,702 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: container_e43_1542599163995_0369_01_000002 Container Transitioned from ALLOCATED to ACQUIRED
2018-11-22 10:25:53,949 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo: checking for deactivate...
2018-11-22 10:25:58,765 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: Updating application attempt appattempt_1542599163995_0369_000001 with final state: FINISHING, and exit status: -1000
2018-11-22 10:25:58,765 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1542599163995_0369_000001 State change from RUNNING to FINAL_SAVING on event = UNREGISTERED
2018-11-22 10:25:58,765 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: Updating application application_1542599163995_0369 with final state: FINISHING
2018-11-22 10:25:58,765 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: application_1542599163995_0369 State change from RUNNING to FINAL_SAVING on event = ATTEMPT_UNREGISTERED
2018-11-22 10:25:58,769 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Updating info for app: application_1542599163995_0369
2018-11-22 10:25:58,769 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1542599163995_0369_000001 State change from FINAL_SAVING to FINISHING on event = ATTEMPT_UPDATE_SAVED
2018-11-22 10:25:58,820 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: application_1542599163995_0369 State change from FINAL_SAVING to FINISHING on event = APP_UPDATE_SAVED
2018-11-22 10:25:59,173 INFO org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: application_1542599163995_0369 unregistered successfully.
2018-11-22 10:26:00,865 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: container_e43_1542599163995_0369_01_000001 Container Transitioned from RUNNING to COMPLETED
2018-11-22 10:26:00,865 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: Completed container: container_e43_1542599163995_0369_01_000001 in state: COMPLETED event:FINISHED
2018-11-22 10:26:00,865 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=hive        OPERATION=AM Released Container        TARGET=SchedulerApp        RESULT=SUCCESS        APPID=application_1542599163995_0369        CONTAINERID=container_e43_1542599163995_0369_01_000001
2018-11-22 10:26:00,865 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerNode: Released container container_e43_1542599163995_0369_01_000001 of capacity <memory:1536, vCores:1> on host prd-hadoop-1.persistent.uc.mobvoi-idc.com:8041, which currently has 2 containers, <memory:14336, vCores:10> used and <memory:10240, vCores:5> available, release resources=true
2018-11-22 10:26:00,865 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Application attempt appattempt_1542599163995_0369_000001 released container container_e43_1542599163995_0369_01_000001 on node: host: prd-hadoop-1.persistent.uc.mobvoi-idc.com:8041 #containers=2 available=10240 used=14336 with event: FINISHED
2018-11-22 10:26:00,865 INFO org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: Unregistering app attempt : appattempt_1542599163995_0369_000001
2018-11-22 10:26:00,866 INFO org.apache.hadoop.yarn.server.resourcemanager.security.AMRMTokenSecretManager: Application finished, removing password for appattempt_1542599163995_0369_000001
2018-11-22 10:26:00,866 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1542599163995_0369_000001 State change from FINISHING to FINISHED on event = CONTAINER_FINISHED
2018-11-22 10:26:00,866 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: application_1542599163995_0369 State change from FINISHING to FINISHED on event = ATTEMPT_FINISHED
2018-11-22 10:26:00,866 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=hive        OPERATION=Application Finished - Succeeded        TARGET=RMAppManager        RESULT=SUCCESS        APPID=application_1542599163995_0369
2018-11-22 10:26:00,866 INFO org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Cleaning master appattempt_1542599163995_0369_000001
2018-11-22 10:26:00,866 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAppManager$ApplicationSummary: appId=application_1542599163995_0369,name=Hive on Spark (sessionId \= 8687b02f-36fc-45ee-9c7a-5b0e49fb2947),user=hive,queue=root.users.mobvoi_r,state=FINISHED,trackingUrl=http://prd-hadoop-4.persistent.uc.mobvoi-idc.com:8088/proxy/application_1542599163995_0369/,appMasterHost=10.9.0.204,startTime=1542853436633,finishTime=1542853558765,finalStatus=SUCCEEDED,memorySeconds=251621,vcoreSeconds=166,preemptedAMContainers=0,preemptedNonAMContainers=0,preemptedResources=<memory:0\, vCores:0>
2018-11-22 10:26:00,866 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAppManager: Max number of completed apps kept in state store met: maxCompletedAppsInStateStore = 10000, removing app application_1533111111316_0044 from state store.
2018-11-22 10:26:00,866 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAppManager: Application should be expired, max number of completed apps kept in memory met: maxCompletedAppsInMemory = 10000, removing app application_1533111111316_0044 from memory:
2018-11-22 10:26:00,866 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Removing info for app: application_1533111111316_0044
2018-11-22 10:26:00,866 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Application appattempt_1542599163995_0369_000001 is done. finalState=FINISHED
2018-11-22 10:26:00,866 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: container_e43_1542599163995_0369_01_000002 Container Transitioned from ACQUIRED to KILLED
2018-11-22 10:26:00,866 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: Completed container: container_e43_1542599163995_0369_01_000002 in state: KILLED event:KILL
2018-11-22 10:26:00,866 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=hive        OPERATION=AM Released Container        TARGET=SchedulerApp        RESULT=SUCCESS        APPID=application_1542599163995_0369        CONTAINERID=container_e43_1542599163995_0369_01_000002
2018-11-22 10:26:00,866 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerNode: Released container container_e43_1542599163995_0369_01_000002 of capacity <memory:7168, vCores:5> on host prd-hadoop-6.persistent.uc.mobvoi-idc.com:8041, which currently has 2 containers, <memory:14336, vCores:10> used and <memory:10240, vCores:5> available, release resources=true
2018-11-22 10:26:00,866 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Application attempt appattempt_1542599163995_0369_000001 released container container_e43_1542599163995_0369_01_000002 on node: host: prd-hadoop-6.persistent.uc.mobvoi-idc.com:8041 #containers=2 available=10240 used=14336 with event: KILL
2018-11-22 10:26:00,866 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo: Application application_1542599163995_0369 requests cleared

zenppl 发表于 2018-11-22 11:31:01

另外hue上hive 报错的为Error while processing statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.spark.SparkTask

bioger_hit 发表于 2018-11-22 12:02:53

zenppl 发表于 2018-11-22 11:31
另外hue上hive 报错的为Error while processing statement: FAILED: Execution Error, return code 2 fro ...

这个错误看不出来什么问题。在找找其他日子。比如hive的,resourcemanager的,nodemanager,spark的等

zenppl 发表于 2018-11-22 16:18:41

bioger_hit 发表于 2018-11-22 12:02
这个错误看不出来什么问题。在找找其他日子。比如hive的,resourcemanager的,nodemanager,spark的等

大佬好!我一楼给出的是resourcemanager的日志信息,不知道是否有有用的信息在里边?另外我接下来再去收集下hive跟spark的日志看看。

zenppl 发表于 2018-11-27 18:27:33

zenppl 发表于 2018-11-22 16:18
大佬好!我一楼给出的是resourcemanager的日志信息,不知道是否有有用的信息在里边?另外我接下来再去收 ...

暂时无法复现,复现时我会再贴上hive跟spark日志
页: [1]
查看完整版本: hive on spark 的资源调度问题