问题导读:
1.bulkload有哪些使用场景?
2.hbase数据导入需要做哪些准备?
2.如何使用bulkload导入数据?
场景:
hbase数据在数据库中不能正常读取,重建hbase后将原数据尽快导入到新hbase中
需求:
(1)保留原表结构或建表命令
(2)所有操作需保证文件位于hadoop集群上
原理:
利用hbase的数据信息按照特定格式存储在hdfs内这一原理,直接在HDFS中生成持久化的HFile数据格式文件,然后上传至合适位置,即完成巨量数据快速入库的办法。配合mapreduce完成,高效便捷,而且不占用region资源,增添负载,在大数据量写入时能极大的提高写入效率,并降低对HBase节点的写入压力。
此过程使用的是现有的Hfile,因此不需要进行文件的格式转换,直接使用文件上传即可。
(1)获取建表命令
hbase(main):002:0> describe 'name_test'
DESCRIPTION ENABLED
'name_test', {NAME => 'name', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'NONE', REPLICATION true
_SCOPE => '0', VERSIONS => '1', COMPRESSION => 'GZ', MIN_VERSIONS => '0', TTL => '2147483647', KEEP_DELETE
D_CELLS => 'false', BLOCKSIZE => '65536', IN_MEMORY => 'false', ENCODE_ON_DISK => 'true', BLOCKCACHE => 't
rue'}
create 'name_test', {NAME => 'name', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', VERSIONS => '1', COMPRESSION => 'GZ', MIN_VERSIONS => '0', TTL => '2147483647', KEEP_DELETED_CELLS => 'false', BLOCKSIZE => '65536', IN_MEMORY => 'false', ENCODE_ON_DISK => 'true', BLOCKCACHE => 'true'}
(2)保留数据文件
$ hadoop dfs -cp /hbase/name /hbasebak
(3)检查拷贝文件
$ hadoop dfs -ls /hbasebak/name_test
Found 4 items
-rw-r--r-- 1 hadoop supergroup 693 2014-07-09 16:43 /hbasebak/name_test/.tableinfo.0000000001
drwxr-xr-x - hadoop supergroup 0 2014-07-09 16:43 /hbasebak/name_test/.tmp
drwxr-xr-x - hadoop supergroup 0 2014-07-09 16:43 /hbasebak/name_test/8d8e15ad46bf33fc334a4544918f584d
drwxr-xr-x - hadoop supergroup 0 2014-07-09 16:43 /hbasebak/name_test/c9edeb3f9de30cb2beba45fb35037bac
$ hadoop dfs -ls /hbasebak/name_test/8d8e15ad46bf33fc334a4544918f584d
-rw-r--r-- 1 hadoop supergroup 377 2014-07-09 16:43 /hbasebak/name_test/8d8e15ad46bf33fc334a4544918f584d/.regioninfo
drwxr-xr-x - hadoop supergroup 0 2014-07-09 16:43 /hbasebak/name_test/8d8e15ad46bf33fc334a4544918f584d/name
$ hadoop dfs -ls /hbasebak/name_test/c9edeb3f9de30cb2beba45fb35037bac
-rw-r--r-- 1 hadoop supergroup 313 2014-07-09 16:43 /hbasebak/name_test/c9edeb3f9de30cb2beba45fb35037bac/.regioninfo
drwxr-xr-x - hadoop supergroup 0 2014-07-09 16:43 /hbasebak/name_test/c9edeb3f9de30cb2beba45fb35037bac/name
(4)删除重建线上表
> disable 'name_test'
> drop 'name_test'
> create 'name_test', {NAME => 'name', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', VERSIONS => '1', COMPRESSION => 'GZ', MIN_VERSIONS => '0', TTL => '2147483647', KEEP_DELETED_CELLS => 'false', BLOCKSIZE => '65536', IN_MEMORY => 'false', ENCODE_ON_DISK => 'true', BLOCKCACHE => 'true'}
(5)bulkload导入数据
hadoop jar /usr/local/hbase/hbase-0.94.20.jar completebulkload /hbasebak/name_test/8d8e15ad46bf33fc334a4544918f584d/ name_test
hadoop jar /usr/local/hbase/hbase-0.94.20.jar completebulkload /hbasebak/name_test/c9edeb3f9de30cb2beba45fb35037bac/ name_test
(6)检查数据
> scan 'name_test'
|