sunshine_junge 发表于 2014-12-28 20:21:43

配置Hadoop与Hive使用LZO压缩

本帖最后由 sunshine_junge 于 2014-12-28 20:24 编辑



问题导读:
1.hadoop如何使用LZO?
2.hive如何使用LZO?



static/image/hrline/4.gif

安装LZO压缩工具

LZOwget http://www.oberhumer.com/opensource/lzo/download/lzo-2.08.tar.gz
tar -zxvf lzo-2.08.tar.gz
cd lzo-2.08
export CFLAGS=-m64
./configure -enable-shared -prefix=/usr/local/cloud/hadoop/lzo/
make && make install
cp /usr/local/cloud/hadoop/lzo/lib/* /usr/lib/
cp /usr/local/cloud/hadoop/lzo/lib/* /usr/lib64/
cp -r /usr/local/cloud/hadoop/lzo/include/* /usr/include/

LZOPwget http://www.lzop.org/download/lzop-1.03.tar.gz
tar -zxvf lzop-1.03.tar.gz
cd lzop-1.03
./configure -enable-shared -prefix=/usr/local/cloud/hadoop/lzop/
make && make install
cd /usr/bin
ln -s -f /usr/local/cloud/hadoop/lzop/bin/lzop

测试history > history.log
lzop history.log看到有history.log.lzo文件生成则lzo安装完成。



安装Hadoop-LZO
<font size="2">git clone https://github.com/twitter/hadoop-lzo.git
export CFLAGS=-m64
export CXXFLAGS=-m64
export C_INCLUDE_PATH=/usr/local/cloud/hadoop/lzo/include
export LIBRARY_PATH=/usr/local/cloud/hadoop/lzo/lib
mvn clean package -Dmaven.test.skip=true
cp -r target/native/Linux-amd64-64 /usr/local/cloud/hadoop/lib/native/
cp target/hadoop-lzo-0.4.20-SNAPSHOT.jar /usr/local/cloud/hadoop/share/hadoop/common/</font>
这里需要注意, 可以修改pom.xml来调整自己的hadoop版本, 找到hadoop.current.version配置项进行修改, 同时因为要添cloudera的仓库地址
<repositories>
    <repository>
      <id>cloudera</id>
      <url>https://repository.cloudera.com/artifactory/cloudera-repos/</url>
    </repository>
</repositories>
修改配置
<font size="2"># 添加如下配置项
export JAVA_LIBRARY_PATH=$HADOOP_HOME/lib/native/Linux-amd64-64
export LD_LIBRARY_PATH=$HADOOP_HOME/lzo/lib</font>
<property>
    <name>io.compression.codecs</name>
    <value>org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.DefaultCodec,com.hadoop.compression.lzo.LzoCodec,com.hadoop.compression.lzo.LzopCodec,org.apache.hadoop.io.compress.BZip2Codec</value>
</property>
<property>
    <name>io.compression.codec.lzo.class</name>
    <value>com.hadoop.compression.lzo.LzoCodec</value>
</property>
<font size="2" style="font-weight: normal;"><property>
    <name>mapred.compress.map.output</name>
    <value>true</value>
</property>
<property>
    <name>mapred.map.output.compression.codec</name>
    <value>com.hadoop.compression.lzo.LzoCodec</value>
</property>
<property>
    <name>mapred.child.env</name>
    <value>LD_LIBRARY_PATH=/usr/local/cloud/hadoop/lzo/lib</value>
</property></font>

Hive

hive使用lzo格式的文件需要在建表时指定格式create table log_lzo (
line string comment 'text line')
partitioned by (logdate string comment 'log file time,format-yyyyMMdd')
STORED AS
INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat';
之后可以在此基础上解析文件存储为RCFile格式
create table log_rcfile (
`ip` string COMMENT 'ip',
`timestamp` string COMMENT 'timestamp',
`url` string COMMENT 'ip')
PARTITIONED BY (
logdate string comment 'log file time,format-yyyyMMdd')
STORED AS RCFILE;

insert overwrite table log_rcfile partition(logdate='20140625')
select
array as `ip`,
array as `timestamp`,
array as `url`
from (
select
    split(line, '#\\|~') as array
from log_lzo
where
    1 = 1
    and logdate = '20140625'
)t;


引用:http://matrix-lisp.github.io/blog/2014/07/07/hadoop-lzo-install/


页: [1]
查看完整版本: 配置Hadoop与Hive使用LZO压缩