分享

CDH5: 使用parcels配置lzo

desehawk 发表于 2014-12-23 12:33:11 [显示全部楼层] 只看大图 回帖奖励 阅读模式 关闭右栏 3 73218

问题导读

1.Parcel 部署有哪些步骤?
2.如何使用parcels配置lzo,需要修改哪些配置?







一、Parcel 部署步骤   

1 下载: 首先需要下载 Parcel。下载完成后,Parcel 将驻留在 Cloudera Manager 主机的本地目录中。
    2 分配: Parcel 下载后,将分配到群集中的所有主机上并解压缩。
    3 激活: 分配后,激活 Parcel 为群集重启后使用做准备。激活前可能还需要升级。


二、lzo parcels本地化
    1、到http://archive-primary.cloudera.com/gplextras/parcels/latest/下载最新lzo parcels包,根据安装hadoop集群的服务器操作系统版本下载,我使用的是rhel6.2, 所以下载的是HADOOP_LZO-0.4.15-1.gplextras.p0.64-el6.parcel
    2、同时下载manifest.json,并根据manifest.json文件中的hash值创建sha文件(注意:sha文件的名称与parcels包名一样)
    3、命令行进入Apache(如果没有安装,则需要安装)的网站根目录下,默认是/var/www/html,在此目录下创建lzo,并将这三个文件放在lzo目录中
    4、启动httpd服务,在浏览器查看,如http://ip/lzo,则结果如下:
         1.png
    5、将发布的local parcels发布地址配置到远程 Parcel 存储库 URL地址中,见下图
       2.png   
6、在cloud manager的parcel页面的可下载parcel中,就可以看到lzo parcels, 点击并进行下载
7、根据parcels的部署步骤,进行分配、激活。结果如下图
3.png



三、修改配置
    修改hdfs的配置
    将io.compression.codecs属性值中追加,org.apache.hadoop.io.compress.Lz4Codec,com.hadoop.compression.lzo.LzopCodec
    修改yarn配置
    将mapreduce.application.classpath的属性值修改为:
  1. $HADOOP_MAPRED_HOME/*,$HADOOP_MAPRED_HOME/lib/*,$MR2_CLASSPATH,/opt/cloudera/parcels/HADOOP_LZO/lib/hadoop/lib/*
复制代码

    将mapreduce.admin.user.env的属性值修改为:
  1. LD_LIBRARY_PATH=$HADOOP_COMMON_HOME/lib/native:$JAVA_LIBRARY_PATH:/opt/cloudera/parcels/HADOOP_LZO/lib/hadoop/lib/native
复制代码

四、验证
    create external table lzo(id int,name string)  row format delimited fields terminated by '#' STORED AS INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' location '/test';
    创建一个data.txt,内容如下:  
  1. 1#tianhe
  2. 2#gz
  3. 3#sz
  4. 4#sz
  5. 5#bx
复制代码


?


    然后使用lzop命令对此文件压缩,然后上传到hdfs的/test目录下
    启动hive,建表并进行数据查询,结果如下:
  1. hive> create external table lzo(id int,name string)  row format delimited fields terminated by '#' STORED AS INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' location '/test';
  2. OK
  3. Time taken: 0.108 seconds
  4. hive> select * from lzo where id>2;
  5. Total MapReduce jobs = 1
  6. Launching Job 1 out of 1
  7. Number of reduce tasks is set to 0 since there's no reduce operator
  8. Starting Job = job_1404206497656_0002, Tracking URL = http://hadoop01.kt:8088/proxy/application_1404206497656_0002/
  9. Kill Command = /opt/cloudera/parcels/CDH-5.0.1-1.cdh5.0.1.p0.47/lib/hadoop/bin/hadoop job  -kill job_1404206497656_0002
  10. Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
  11. 2014-07-01 17:30:27,547 Stage-1 map = 0%,  reduce = 0%
  12. 2014-07-01 17:30:37,403 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.84 sec
  13. 2014-07-01 17:30:38,469 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.84 sec
  14. 2014-07-01 17:30:39,527 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.84 sec
  15. MapReduce Total cumulative CPU time: 2 seconds 840 msec
  16. Ended Job = job_1404206497656_0002
  17. MapReduce Jobs Launched:
  18. Job 0: Map: 1   Cumulative CPU: 2.84 sec   HDFS Read: 295 HDFS Write: 15 SUCCESS
  19. Total MapReduce CPU Time Spent: 2 seconds 840 msec
  20. OK
  21. 3       sz
  22. 4       sz
  23. 5       bx
  24. Time taken: 32.803 seconds, Fetched: 3 row(s)
  25. hive> SET hive.exec.compress.output=true;
  26. hive> SET mapred.output.compression.codec=com.hadoop.compression.lzo.LzopCodec;
  27. hive> create external table lzo2(id int,name string)  row format delimited fields terminated by '#' STORED AS INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' location '/test';
  28. OK
  29. Time taken: 0.092 seconds
  30. hive> insert into table lzo2 select * from lzo;
  31. Total MapReduce jobs = 3
  32. Launching Job 1 out of 3
  33. Number of reduce tasks is set to 0 since there's no reduce operator
  34. Starting Job = job_1404206497656_0003, Tracking URL = http://hadoop01.kt:8088/proxy/application_1404206497656_0003/
  35. Kill Command = /opt/cloudera/parcels/CDH-5.0.1-1.cdh5.0.1.p0.47/lib/hadoop/bin/hadoop job  -kill job_1404206497656_0003
  36. Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
  37. 2014-07-01 17:33:47,351 Stage-1 map = 0%,  reduce = 0%
  38. 2014-07-01 17:33:57,114 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.96 sec
  39. 2014-07-01 17:33:58,170 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.96 sec
  40. MapReduce Total cumulative CPU time: 1 seconds 960 msec
  41. Ended Job = job_1404206497656_0003
  42. Stage-4 is selected by condition resolver.
  43. Stage-3 is filtered out by condition resolver.
  44. Stage-5 is filtered out by condition resolver.
  45. Moving data to: hdfs://hadoop01.kt:8020/tmp/hive-hdfs/hive_2014-07-01_17-33-22_504_966970548620625440-1/-ext-10000
  46. Loading data to table default.lzo2
  47. Table default.lzo2 stats: [num_partitions: 0, num_files: 2, num_rows: 0, total_size: 171, raw_data_size: 0]
  48. MapReduce Jobs Launched:
  49. Job 0: Map: 1   Cumulative CPU: 1.96 sec   HDFS Read: 295 HDFS Write: 79 SUCCESS
  50. Total MapReduce CPU Time Spent: 1 seconds 960 msec
  51. OK
  52. Time taken: 36.625 seconds
复制代码




已有(3)人评论

跳转到指定楼层
wubaozhou 发表于 2015-1-2 23:08:31
回复

使用道具 举报

zhanmsl 发表于 2015-10-22 15:33:50
请问版主有自己做过吗?
我的CDH版本是CDH5.3.5,按照上述的方法集成出现了点问题,请有时间帮忙看一下吧。

15/10/22 15:17:38 INFO lzo.GPLNativeCodeLoader: Loaded native gpl library
15/10/22 15:17:38 WARN lzo.LzoCompressor: java.lang.UnsatisfiedLinkError: Cannot load liblzo2.so.2 (liblzo2.so.2: cannot open shared object file: No such file or directory)!
15/10/22 15:17:38 ERROR lzo.LzoCodec: Failed to load/initialize native-lzo library

谢谢了!
回复

使用道具 举报

fangyafenqidai 发表于 2017-5-24 22:02:46
我用的是5.11,cloudera的那个远程的parcel源要求是https连接吗?我不管是http还是https都找不到,麻烦问下知道为什么吗?总是显示错误:备注: 未在已配置的存储库中找到任何 parcel。尝试在更多选项下添加一个自定义存储库。否则,您可能只能继续使用包

回复

使用道具 举报

您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

关闭

推荐上一条 /2 下一条