Kafka Consumer及其监控

本帖最后由 pig2 于 2014-12-27 23:16 编辑
问题导读

1.Kafka目前为java提供了哪两种consumer的API？
2.本文讲了Kafka 哪两种监控方式？

1、Kafka目前为java提供了两种consumer的API：
1）、- high level consumer api
该consumer api 封装了很多consumer需要的高级功能，如

Auto/Hidden Offset Management
Auto(Simple) Partition Assignment
Broker Failover => Auto Rebalance
Consumer Failover => Auto Rebalance
If user do not want any of these, then simple consumer is sufficient
If user want to control over offset management with others unchanged, one option is to expose the current ZK implementation of the high-level consumer to users and allow them to override; another option is to change the high-level consumer API to return the offset vector associated with messages
If user want to control partition assignment, one option is to change the high-level consumer API to allow such config info be passed in while creating the stream; another option is ad-hoc: just make a single-partition topic and assign it to the consumer.
If user just want the automatic partition assignment be more "smart" with co-location consideration, etc, one option is to store the host/rack info in ZK and let the rebalance algorithm read them while doing the computation.

该consumer默认会把自己的信息写在zk路径 /consumers/<groupId>，其中包括

offsets 该topic的<partition_num>上的offset的值
owners 当前<topic>的每个partition，在该<groupId>下能收取数据的consumer的唯一ID
ids 当前<groupId>的所有consumer列表

正常情况下，High level consumer可以满足我们日常大多数用途。
2)、- simple consumer api
只有最基本的链接、读取功能，可以自己去读offset，并指定offset的读取方式。适合于各种自定义。

2、Kafka的监控目前有两种方式：
1）. JMX
Kafka内置有一个Mx4jLoader的程序，该程序如果在classpath中发现了mx4j-tools.jar，就会加载该jar，在8082 可以查看MX4J提供的网页信息。
除该内置的接口外，也可以自行修改Java启动命令，加入jmx。然后基于jmx集成到各大监控系统，如Zabbix, Ganglia等。后者直接github上直接有一个项目（猛击这里）
2）. zookeeper
典型监控有kafkamonitor 和kafka-web-console
两者的安装都比较简单。这里就不再多写了，可直接参见。

看官方wiki说，0.9开始似乎要对consumer的api有大改动，个人是比较支持的。目前consumer的api看上去是有点要么过于简单、要么封装过深。
wiki：https://cwiki.apache.org/conflue ... er+Client+Re-Design

图文精华

Kafka Consumer及其监控

推荐 /2