HBase 利用Coprocessor实现聚合函数

问题导读：
1、HBase默认不支持聚合函数，那我们该用什么来实现呢？
2、怎么用编程的方式去实现呢？

HBase默认不支持聚合函数（sum,avg等）。可利用HBase的coprocessor特性实现。这样做的好处是利用regionserver在服务端进行运算。效率高，避免客户端取回大量数据，占用网络带宽，消耗大量内存等。

实现方式：

利用HBase提供的endPoint类型的AggregateImplementation Coprocess，配合AggregationClient访问客户端实现RegionServer端的集合计算。AggregationClient访问代码如下：

aggregationClient.avg(Bytes. toBytes("TableName"), ci, scan);
复制代码

scan即为要计算列的查询条件。这里有一个ColumnInterperter类型的参数ci。即列解释器，用于解析列中的值。HBase默认提供了LongColumnInterpreter。而我要处理的值是double类型的，所以先实现了一个DoubleColumnInterpreter。（从JIRA上看Doulbe类型的解释器好像正在开发中）。ColumnInterpreter接口的实现会在AggregateImplementation

/**
* Double类型的列解释器实现
*
 * @author OneCoder
*/
public class DoubleColumnInterpreter implements
           ColumnInterpreter<Double, Double> {
     @Override
     public void write(DataOutput out) throws IOException {
     }
     @Override
     public void readFields(DataInput in) throws IOException {
     }
     @Override
     public Double getValue( byte[] colFamily, byte[] colQualifier, KeyValue kv)
                 throws IOException {
            if (kv == null)
                 return null;
            // 临时解决方案，如果采用Bytes.toDouble(kv.getValue())会报错，偏移量大于总长度。
            // toDouble(getBuffer(), getValueOffset)，偏移量也不对。
            return Double. valueOf(new String(kv.getValue()));
     }
     @Override
     public Double add(Double l1, Double l2) {
            if (l1 == null ^ l2 == null) {
                 return l1 == null ? l2 : l1;
           } else if (l1 == null) {
                 return null;
           }
            return l1 + l2;
     }
     @Override
     public Double getMaxValue() {
            // TODO Auto-generated method stub
            return null;
     }
     @Override
     public Double getMinValue() {
            // TODO Auto-generated method stub
            return null;
     }
     @Override
     public Double multiply(Double o1, Double o2) {
            if (o1 == null ^ o2 == null) {
                 return o1 == null ? o2 : o1;
           } else if (o1 == null) {
                 return null;
           }
            return o1 * o2;
     }
     @Override
     public Double increment(Double o) {
            // TODO Auto-generated method stub
            return null;
     }
     @Override
     public Double castToReturnType(Double o) {
            return o.doubleValue();
     }
     @Override
     public int compare(Double l1, Double l2) {
            if (l1 == null ^ l2 == null) {
                 return l1 == null ? -1 : 1; // either of one is null.
           } else if (l1 == null)
                 return 0; // both are null
            return l1.compareTo(l2); // natural ordering.
     }
     @Override
     public double divideForAvg(Double o, Long l) {
            return (o == null || l == null) ? Double. NaN : (o.doubleValue() / l
                     .doubleValue());
     }
}
复制代码

导出jar包上传到HBase Region节点的lib下。然后配置RegionServer的Coprocessor。在服务端hbase-site.xml中，增加

<property>
            <name >hbase.coprocessor.region.classes </name >
           <value >org.apache.hadoop.hbase.coprocessor.AggregateImplementation </value >
 </property >   
复制代码

最后，我们重启服务，使配置和jar生效。然后调用AggregationClient中提供的avg, max等聚合函数，即可在region端计算出结果，返回。

最后，感谢原作者的分享：本文出自

linguobao · 发表于 2014-7-1 17:19:44

楼主，请教下：
AggregationClient的rowCount所调用的服务器端AggregateImplementation的getRowNum方法，和RowCountEndpoint的getRowCount方法，大方向感觉是一样的，但是，RowCountEndpoint采用了protobuf。这有什么区别？？？

ohano_javaee · 发表于 2014-10-19 00:44:10

请问AggregationClient是自己写的类吗如果不是在哪个包下？

howtodown · 发表于 2014-10-19 01:05:54

ohano_javaee 发表于 2014-10-19 00:44
请问AggregationClient是自己写的类吗如果不是在哪个包下？

hbase 自带的AggregationClient只能对单一列族的单一列进行聚合。

ohano_javaee · 发表于 2014-10-19 13:17:38

我试了一下，似乎不行。
org.apache.hadoop.hbase.coprocessor.ColumnInterpreter是一个抽象类，不是接口。
AggregationClient这个类我也没找到。我用的是hbase-0.96.2-hadoop2。不知道是不是版本的原因。

wubaozhou · 发表于 2015-1-1 22:36:49

雷夫23 · 发表于 2016-7-5 16:55:23

谢谢分享

Rommy.Yang · 发表于 2016-7-25 20:03:30

如果想实现group后sum怎么做?

spftoto · 发表于 2018-8-30 20:37:52

不错不错。

图文精华

HBase 利用Coprocessor实现聚合函数

已有(8)人评论

最佳新人

活跃会员

突出贡献

论坛元老

热心会员

推广达人

宣传达人

优秀版主

推荐 /2