注意,第一条设置是针对于每一个memstore。设置该值的的时候,你需要计算一下每台RS上的region数量。随着RS的数量增加(而且你只在RS数量较少的时候设置了该值),memstore flush可能会由于第二条设置提前触发。Note that the first setting is the size per Memstore. I.e. when you define it you should take into account the number of regions served by each RS. When number of RS grows (and you configured the setting when there were few of them) Memstore flushes are likely to be triggered by the second threshold earlier.
第二类的设置主要是针对安全性的考虑。有的时候写负载非常高,flush的速度会跟不上。我们也不想memstore无限制的增长,因此这种情况下,写请求会被阻塞,直到memstore回归到一个可控的大小。这些临界值配置如下:Second group of settings is for safety reasons: sometimes write load is so high that flushing cannot keep up with it and since we don’t want memstore to grow without a limit, in this situation writes are blocked unless memstore has “manageable” size. These thresholds are configured with:
More on HFiles creation & Compaction can be found here.
理论上,memstore在不超过upperLimit的条件下应该尽可能的使用内存。So, ideally Memstore should use as much memory as it can (as configured, not all RS heap: there are also in-memory caches), but not cross the upper limit.
注意:确保hbase.regionserver.hlog.blocksize * hbase.regionserver.maxlogs 比hbase.regionserver.global.memstore.lowerLimit * HBASE_HEAPSIZE的值只高那么一点点。.
Compression & Memstore Flush
hbase建议将存储在HDFS上的文件进行压缩,可以节省可观的磁盘存储和网络传输成本。但是当数据flush到storefile的时候,Compression 压缩并不会大幅度降低flush的效率,否则我们遇到上述的很多问题。Data is compressed when it is written to HDFS, i.e. when Memstore flushes. Compression should not slow down flushing process a lot, otherwise we may hit many of the problems above, like blocking writes caused by Memstore being too big (hit upper limit) and such.