hbase数据快速导入方案--bulkload

问题导读：
1.bulkload有哪些使用场景？
2.hbase数据导入需要做哪些准备？
2.如何使用bulkload导入数据？

场景：
hbase数据在数据库中不能正常读取，重建hbase后将原数据尽快导入到新hbase中
需求：
（1）保留原表结构或建表命令
（2）所有操作需保证文件位于hadoop集群上
原理：
利用hbase的数据信息按照特定格式存储在hdfs内这一原理，直接在HDFS中生成持久化的HFile数据格式文件，然后上传至合适位置，即完成巨量数据快速入库的办法。配合mapreduce完成，高效便捷，而且不占用region资源，增添负载，在大数据量写入时能极大的提高写入效率，并降低对HBase节点的写入压力。
此过程使用的是现有的Hfile，因此不需要进行文件的格式转换，直接使用文件上传即可。

（1）获取建表命令
hbase(main):002:0> describe 'name_test'
DESCRIPTION                                                                                              ENABLED
'name_test', {NAME => 'name', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'NONE', REPLICATION true
_SCOPE => '0', VERSIONS => '1', COMPRESSION => 'GZ', MIN_VERSIONS => '0', TTL => '2147483647', KEEP_DELETE
D_CELLS => 'false', BLOCKSIZE => '65536', IN_MEMORY => 'false', ENCODE_ON_DISK => 'true', BLOCKCACHE => 't
rue'}

create 'name_test', {NAME => 'name', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', VERSIONS => '1', COMPRESSION => 'GZ', MIN_VERSIONS => '0', TTL => '2147483647', KEEP_DELETED_CELLS => 'false', BLOCKSIZE => '65536', IN_MEMORY => 'false', ENCODE_ON_DISK => 'true', BLOCKCACHE => 'true'}

（2）保留数据文件
$ hadoop dfs -cp /hbase/name /hbasebak

（3）检查拷贝文件
$ hadoop dfs -ls /hbasebak/name_test
Found 4 items
-rw-r--r-- 1 hadoop supergroup       693 2014-07-09 16:43 /hbasebak/name_test/.tableinfo.0000000001
drwxr-xr-x - hadoop supergroup       0 2014-07-09 16:43 /hbasebak/name_test/.tmp
drwxr-xr-x - hadoop supergroup       0 2014-07-09 16:43 /hbasebak/name_test/8d8e15ad46bf33fc334a4544918f584d
drwxr-xr-x - hadoop supergroup       0 2014-07-09 16:43 /hbasebak/name_test/c9edeb3f9de30cb2beba45fb35037bac

$ hadoop dfs -ls /hbasebak/name_test/8d8e15ad46bf33fc334a4544918f584d
-rw-r--r-- 1 hadoop supergroup       377 2014-07-09 16:43 /hbasebak/name_test/8d8e15ad46bf33fc334a4544918f584d/.regioninfo
drwxr-xr-x - hadoop supergroup       0 2014-07-09 16:43 /hbasebak/name_test/8d8e15ad46bf33fc334a4544918f584d/name

$ hadoop dfs -ls /hbasebak/name_test/c9edeb3f9de30cb2beba45fb35037bac
-rw-r--r-- 1 hadoop supergroup       313 2014-07-09 16:43 /hbasebak/name_test/c9edeb3f9de30cb2beba45fb35037bac/.regioninfo
drwxr-xr-x - hadoop supergroup       0 2014-07-09 16:43 /hbasebak/name_test/c9edeb3f9de30cb2beba45fb35037bac/name

（4）删除重建线上表
> disable 'name_test'
> drop 'name_test'
> create 'name_test', {NAME => 'name', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', VERSIONS => '1', COMPRESSION => 'GZ', MIN_VERSIONS => '0', TTL => '2147483647', KEEP_DELETED_CELLS => 'false', BLOCKSIZE => '65536', IN_MEMORY => 'false', ENCODE_ON_DISK => 'true', BLOCKCACHE => 'true'}

（5）bulkload导入数据
hadoop jar /usr/local/hbase/hbase-0.94.20.jar completebulkload /hbasebak/name_test/8d8e15ad46bf33fc334a4544918f584d/ name_test
hadoop jar /usr/local/hbase/hbase-0.94.20.jar completebulkload /hbasebak/name_test/c9edeb3f9de30cb2beba45fb35037bac/ name_test

（6）检查数据
> scan 'name_test'

guxingyu · 发表于 2014-7-14 16:55:41

楼主，我按照你上述的操作，出现了下面的错误，你能否帮我分析一下问题出在哪里，
hadoop1.0.4和hbase0.94.4版本
java.util.concurrent.ExecutionException: java.lang.IllegalStateException: The value of the hbase.metrics.showTableName conf option has not been specified in SchemaMetrics
      at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:252)
      at java.util.concurrent.FutureTask.get(FutureTask.java:111)
      at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.groupOrSplitPhase(LoadIncrementalHFiles.java:333)
      at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.doBulkLoad(LoadIncrementalHFiles.java:232)
      at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.run(LoadIncrementalHFiles.java:699)
      at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
      at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
      at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.main(LoadIncrementalHFiles.java:704)
      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      at java.lang.reflect.Method.invoke(Method.java:601)
      at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
      at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      at java.lang.reflect.Method.invoke(Method.java:601)
      at org.apache.hadoop.hbase.mapreduce.Driver.main(Driver.java:51)
      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      at java.lang.reflect.Method.invoke(Method.java:601)
      at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
Caused by: java.lang.IllegalStateException: The value of the hbase.metrics.showTableName conf option has not been specified in SchemaMetrics

hyj · 发表于 2014-7-14 20:14:33

guxingyu 发表于 2014-7-14 16:55
楼主，我按照你上述的操作，出现了下面的错误，你能否帮我分析一下问题出在哪里，
hadoop1.0.4和hbase0.94 ...

这可能是0.94的一个bug.
你用这个：SchemaMetrics.configureGlobally(conf)试一下。

YLV · 发表于 2015-3-11 14:53:25

不错，谢谢分享

图文精华

hbase数据快速导入方案--bulkload

已有(3)人评论

浏览过的版块

推荐 /2