对于两个hadoop集群(生产环境和测试环境),想讲两个hbase表同步或者迁移,更保守的方式使用hbase自带的import & export
export
语法为:
[mw_shl_code=bash,true]bin/hbase org.apache.hadoop.hbase.mapreduce.Export
ERROR: Wrong number of arguments: 0
Usage: Export [-D <property=value>]* <tablename> <outputdir> [<versions> [<starttime> [<endtime>]] [^[regex pattern] or [Prefix] to filter]]
Note: -D properties will be applied to the conf used.
For example:
-D mapred.output.compress=true
-D mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec
-D mapred.output.compression.type=BLOCK
Additionally, the following SCAN properties can be specified
to control/limit what is exported..
-D hbase.mapreduce.scan.column.family=<familyName>
-D hbase.mapreduce.include.deleted.rows=true
For performance consider the following properties:
-Dhbase.client.scanner.caching=100
-Dmapred.map.tasks.speculative.execution=false
-Dmapred.reduce.tasks.speculative.execution=false
For tables with very wide rows consider setting the batch size as below:
-Dhbase.export.scanner.batch=10[/mw_shl_code]
执行将userinfo表内容导出到/tmp/stark_summer/userinfo
[mw_shl_code=bash,true]$hbase org.apache.hadoop.hbase.mapreduce.Export userinfo /tmp/stark_summer/userinfo[/mw_shl_code]
验证结果:
[mw_shl_code=bash,true]$hadoop fs -mkdir /tmp/stark_summer/userinfo
-rw-r--r-- 3 dp supergroup 2227271212 2015-10-12 18:11 /tmp/stark_summer/userinfo/part-m-00505
-rw-r--r-- 3 dp supergroup 2197170985 2015-10-12 18:03 /tmp/stark_summer/userinfo/part-m-00506
-rw-r--r-- 3 dp supergroup 2153010200 2015-10-12 18:07 /tmp/stark_summer/userinfo/part-m-00507
-rw-r--r-- 3 dp supergroup 2176954334 2015-10-12 18:13 /tmp/stark_summer/userinfo/part-m-00508
-rw-r--r-- 3 dp supergroup 2103440385 2015-10-12 17:58 /tmp/stark_summer/userinfo/part-m-00509
-rw-r--r-- 3 dp supergroup 2251398531 2015-10-12 17:57 /tmp/stark_summer/userinfo/part-m-00510
-rw-r--r-- 3 dp supergroup 2109241561 2015-10-12 18:08 /tmp/stark_summer/userinfo/part-m-00511
-rw-r--r-- 3 dp supergroup 2250926125 2015-10-12 17:58 /tmp/stark_summer/userinfo/part-m-00512
-rw-r--r-- 3 dp supergroup 2108699321 2015-10-12 18:10 /tmp/stark_summer/userinfo/part-m-00513
-rw-r--r-- 3 dp supergroup 2098245068 2015-10-12 18:00 /tmp/stark_summer/userinfo/part-m-00514
-rw-r--r-- 3 dp supergroup 3214343288 2015-10-12 18:05 /tmp/stark_summer/userinfo/part-m-00515
-rw-r--r-- 3 dp supergroup 2049251086 2015-10-12 17:55 /tmp/stark_summer/userinfo/part-m-00516
-rw-r--r-- 3 dp supergroup 2034110542 2015-10-12 18:16 /tmp/stark_summer/userinfo/part-m-00517
-rw-r--r-- 3 dp supergroup 2032354338 2015-10-12 18:04 /tmp/stark_summer/userinfo/part-m-00518
-rw-r--r-- 3 dp supergroup 2022307329 2015-10-12 18:04 /tmp/stark_summer/userinfo/part-m-00519
-rw-r--r-- 3 dp supergroup 1937084305 2015-10-12 18:06 /tmp/stark_summer/userinfo/part-m-00520
-rw-r--r-- 3 dp supergroup 1940429009 2015-10-12 18:01 /tmp/stark_summer/userinfo/part-m-00521
-rw-r--r-- 3 dp supergroup 1826924060 2015-10-12 17:57 /tmp/stark_summer/userinfo/part-m-00522
-rw-r--r-- 3 dp supergroup 1034179651 2015-10-12 17:58 /tmp/stark_summer/userinfo/part-m-00523
-rw-r--r-- 3 dp supergroup 840825819 2015-10-12 17:55 /tmp/stark_summer/userinfo/part-m-00524
-rw-r--r-- 3 dp supergroup 769846685 2015-10-12 17:56 /tmp/stark_summer/userinfo/part-m-00525
-rw-r--r-- 3 dp supergroup 925831304 2015-10-12 17:53 /tmp/stark_summer/userinfo/part-m-00526
-rw-r--r-- 3 dp supergroup 370177218 2015-10-12 17:52 /tmp/stark_summer/userinfo/part-m-00527[/mw_shl_code]
通过text 或者 cat 可查看具体内容
[mw_shl_code=bash,true]hadoop fs -text /tmp/stark_summer/userinfo/part-m-00527
hadoop fs -cat /tmp/stark_summer/userinfo/part-m-00527[/mw_shl_code]
import
语法:
[mw_shl_code=bash,true]bin/hbase org.apache.hadoop.hbase.mapreduce.Import
ERROR: Wrong number of arguments: 0
Usage: Import [options] <tablename> <inputdir>
By default Import will load data directly into HBase. To instead generate
HFiles of data to prepare for a bulk data load, pass the option:
-Dimport.bulk.output=/path/for/output
To apply a generic org.apache.hadoop.hbase.filter.Filter to the input, use
-Dimport.filter.class=<name of filter class>
-Dimport.filter.args=<comma separated list of args for filter
NOTE: The filter will be applied BEFORE doing key renames via the HBASE_IMPORTER_RENAME_CFS property. Futher, filters will only use the Filter#filterRowKey(byte[] buffer, int offset, int length) method to identify whether the current row needs to be ignored completely for processing and Filter#filterKeyValue(KeyValue) method to determine if the KeyValue should be added; Filter.ReturnCode#INCLUDE and #INCLUDE_AND_NEXT_COL will be considered as including the KeyValue.
For performance consider the following options:
-Dmapred.map.tasks.speculative.execution=false
-Dmapred.reduce.tasks.speculative.execution=false
-Dimport.wal.durability=<Used while writing data to hbase. Allowed values are the supported durability values like SKIP_WAL/ASYNC_WAL/SYNC_WAL/...>[/mw_shl_code]
import hdfs数据到hbase userinfo2表
[mw_shl_code=bash,true]bin/hbase org.apache.hadoop.hbase.mapreduce.Import userinfo2 hdfs://master/tmp/stark_summer/userinfo/[/mw_shl_code]
核实数据是否load到hbase中
[mw_shl_code=bash,true] hbase shell
scan 'userinfo2',{LIMIT=>5}
ROW COLUMN+CELL
\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x0 column=info:tags, timestamp=1444092751178, value={"t_100709":0.72,"a_sns":0.72}
0\x00\x00\x00\x00\x00\x00\x00e98a6da1f226
\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x0 column=info:deviceinfo, timestamp=1426062523272, value={"device":"SPHS on Hsdroid", "os":"Android", "ddate":"2014-01-06"}
0\x00\x00\x00bdd7db9355c7db82
\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x0 column=info:tags, timestamp=1444092882808, value={"t_100709":0.72,"a_sns":0.72}
0\x00\x00\x00eb591e1674839b0
\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x0 column=info:deviceinfo, timestamp=1426062523290, value={"device":"GT-I9100", "os":"Android", "ddate":"2013-07-22"}
0\x00?\xC3\xA4\xC2\x85\xC2\xAE\xC3\xAB\xC2\x89\xC2\
x98\xC3\xA4\xC2\x85\xC2\xAF\xC3\xAA\xC2\x8C\xC2\x88
\xC3\xA4\xC2\x85\xC2\xAF3825b6fe84
\x00\x00\x00\x00\x03\x00\xC3\xAD\xC2\x98\xC2\x88\xC column=info:deviceinfo, timestamp=1426062523293, value={"device":"HUAWEI U8950D", "os":"Android", "ddate":"2014-06-06"}
3\xA4\xC2\xB0\xC2\xAA\x00\x00\x00\x00\x0B\x00\xC3\x
AD\xC2\x9A\xC2\xB8\xC3\xA4\xC2\xB0\xC2\xAA\x00\x00\
x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
\x00
5 row(s) in 0.0300 seconds[/mw_shl_code]
copytable方式
[mw_shl_code=bash,true]bin/hbase org.apache.hadoop.hbase.mapreduce.CopyTable --peer.adr=zookeeper1,zookeeper2,zookeeper3:/hbase 'testtable'[/mw_shl_code]
目前0.92之前的版本的不支持多版本的复制,0.94已经支持多个版本的复制。当然这个操作需要添加hbase目录里的conf/mapred-site.xml,可以复制hadoop的过来。
Export/Import
[mw_shl_code=bash,true]bin/hbase org.apache.hadoop.hbase.mapreduce.Export testtable /user/testtable [versions] [starttime] [stoptime]
bin/hbase org.apache.hadoop.hbase.mapreduce.Import testtable /user/testtable[/mw_shl_code]
直接拷贝hdfs对应的文件
首先拷贝hdfs文件,如
[mw_shl_code=bash,true]bin/hadoop distcp hdfs://srcnamenode:9000/hbase/testtable/ hdfs://distnamenode:9000/hbase/testtable/[/mw_shl_code]
然后在目的hbase上执行bin/hbase org.jruby.Main bin/add_table.rb /hbase/testtable
生成meta信息后,重启hbase
这个操作是简单的方式,操作之前可以关闭hbase的写入,执行flush所有表(上面有介绍),再distcp拷贝。
|