1、Storm 对消息处理的保证性机制是什么?
This page explains the design details of Storm that make it a fault-tolerant system.
What happens when a worker dies?
When a worker dies, the supervisor will restart it. If it continuously fails on startup and is unable
to heartbeat to Nimbus, Nimbus will reassign the worker to another machine.
What happens when a node dies?
The tasks assigned to that machine will time-out and Nimbus will reassign those tasks to other machines.
What happens when Nimbus or Supervisor daemons die?
当Nimbus 或者Supervisor的后台线程会挂掉
Numbus和Supervisor的后台线程被设计于 fail-fast机制,以及无状态机制(实时上,我们的状态是保持在Zookeeper之中的),Numbus 和Supervisor的daemons 一定要运通过一定的工具,好比 daemontools 或则是 monit,于是,如果Nimbus 或则Supervisor daemons死掉了,那么就会像没发生一样的去重启。
The Nimbus and Supervisor daemons are designed to be fail-fast (process self-destructs whenever any unexpected situation is encountered) and stateless (all state is kept in Zookeeper or on disk). As described in Setting up a Storm cluster, the Nimbus and Supervisor daemons must be run under supervision using a tool like daemontools or monit. So if the Nimbus or Supervisor daemons die, they restart like nothing happened.
Most notably, no worker processes are affected by the death of Nimbus or the Supervisors. This is in contrast to Hadoop, where if the JobTracker dies, all the running jobs are lost.
Is Nimbus a single point of failure?
If you lose the Nimbus node, the workers will still continue to function. Additionally, supervisors will continue to restart workers if they die. However, without Nimbus, workers won’t be reassigned to other machines when necessary (like if you lose a worker machine).
于是乎,真正的正确的答案是,Numbus本身是一系列的 SPOF(SPof 是我们的 single point of failure的缩写),实际上来讲,在Storm之中。这并不是一个相当的大的问题,在将来会有使Nimbus变得可用的可能。
So the answer is that Nimbus is “sort of” a SPOF. In practice, it’s not a big deal since nothing catastrophic happens when the Nimbus daemon dies. There are plans to make Nimbus highly available in the future.
How does Storm guarantee data processing?
Storm 对于消息的处理有保证性的机制,不管是在机器亦或者是消息丢失的情况之下。如果感兴趣的朋友,可以直接参考官方的文档,《Guaranteeing Message 》
Storm provides mechanisms to guarantee data processing even if nodes die or messages are lost.See Guaranteeing message processing for the details.