分享

spark 为什么慢,map时候就非常慢,各帮忙啊

ananan36 发表于 2015-12-3 11:06:03 [显示全部楼层] 回帖奖励 阅读模式 关闭右栏 5 21377
15/12/03 10:33:39 INFO scheduler.TaskSetManager: Ignoring task-finished event for 781.1 in stage 0.0 because task 781 has already completed successfully
15/12/03 10:33:39 INFO scheduler.TaskSetManager: Ignoring task-finished event for 904.0 in stage 0.0 because task 904 has already completed successfully
15/12/03 10:33:40 INFO scheduler.TaskSetManager: Ignoring task-finished event for 1112.1 in stage 0.0 because task 1112 has already completed successfully
15/12/03 10:33:40 INFO scheduler.TaskSetManager: Ignoring task-finished event for 956.1 in stage 0.0 because task 956 has already completed successfully
15/12/03 10:33:41 INFO scheduler.TaskSetManager: Ignoring task-finished event for 948.0 in stage 0.0 because task 948 has already completed successfully
15/12/03 10:33:41 INFO scheduler.TaskSetManager: Ignoring task-finished event for 1139.1 in stage 0.0 because task 1139 has already completed successfully
15/12/03 10:33:41 INFO scheduler.TaskSetManager: Ignoring task-finished event for 910.0 in stage 0.0 because task 910 has already completed successfully
15/12/03 10:33:42 INFO scheduler.TaskSetManager: Ignoring task-finished event for 896.0 in stage 0.0 because task 896 has already completed successfully
15/12/03 10:33:42 INFO scheduler.TaskSetManager: Ignoring task-finished event for 750.0 in stage 0.0 because task 750 has already completed successfully
15/12/03 10:33:42 INFO scheduler.TaskSetManager: Ignoring task-finished event for 998.1 in stage 0.0 because task 998 has already completed successfully
15/12/03 10:33:42 INFO scheduler.TaskSetManager: Ignoring task-finished event for 953.0 in stage 0.0 because task 953 has already completed successfully
15/12/03 10:33:43 INFO scheduler.TaskSetManager: Ignoring task-finished event for 668.0 in stage 0.0 because task 668 has already completed successfully
15/12/03 10:33:43 INFO scheduler.TaskSetManager: Ignoring task-finished event for 754.1 in stage 0.0 because task 754 has already completed successfully
15/12/03 10:33:43 INFO scheduler.TaskSetManager: Ignoring task-finished event for 680.0 in stage 0.0 because task 680 has already completed successfully
15/12/03 10:33:43 INFO scheduler.TaskSetManager: Ignoring task-finished event for 690.0 in stage 0.0 because task 690 has already completed successfully
15/12/03 10:33:43 INFO scheduler.TaskSetManager: Ignoring task-finished event for 938.0 in stage 0.0 because task 938 has already completed successfully
15/12/03 10:33:43 INFO scheduler.TaskSetManager: Ignoring task-finished event for 875.0 in stage 0.0 because task 875 has already completed successfully
15/12/03 10:33:44 INFO scheduler.TaskSetManager: Ignoring task-finished event for 964.1 in stage 0.0 because task 964 has already completed successfully
15/12/03 10:33:44 INFO scheduler.TaskSetManager: Ignoring task-finished event for 946.0 in stage 0.0 because task 946 has already completed successfully
15/12/03 10:33:44 INFO scheduler.TaskSetManager: Ignoring task-finished event for 874.0 in stage 0.0 because task 874 has already completed successfully

已有(5)人评论

跳转到指定楼层
arsenduan 发表于 2015-12-3 12:22:37
楼主详细说说你的集群情况,做的什么业务,配置是什么情况
回复

使用道具 举报

lmlm1234 发表于 2015-12-3 14:22:01
配合和优化有问题!!!
回复

使用道具 举报

regan 发表于 2015-12-3 14:38:21
从你的日志来看,该处由taskSetManager打印,首先你得明白taskSetManager的主要职责。taskSetManger主要负责taskSet中task的生命周期的管理,包括向worker节点分配任务,管理task任务执行状态等等。。。从日志来看,是stage中的task已经执行完成,当worker节点再一次返回task-finished事件的时候,taskSetManager认识到任务事先已经完成,因此直接忽略。至于楼主所说的map任务慢,这有很多中可能,首先看你采用什么资源管理,一般原生的standablone会快很多,大型集群中可能会用yarn,相比standalone会慢一点,其次还要看你的机器资源,以及你的分区状况。理论上分区数越多,粒度越小,并行读越好,执行越快,但是考虑到调度以及shuflle阶段等情况,分区数还是不能想当然,看你的配置了
回复

使用道具 举报

ananan36 发表于 2015-12-5 16:37:44
arsenduan 发表于 2015-12-3 12:22
楼主详细说说你的集群情况,做的什么业务,配置是什么情况

我用的是SPARKSQL
回复

使用道具 举报

regan 发表于 2015-12-7 09:06:13
Spark中的DataFrame经过了优化的,相比于原始的RDD操作速度上得到了最大的优化
回复

使用道具 举报

您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

关闭

推荐上一条 /2 下一条