本帖最后由 howtodown 于 2014-4-17 03:43 编辑
导读:
Flume-1.4.0和Hbase-0.96.0整合还是比较简单的,那么Flume-0.9.4和Hbase-0.96整合比Flume-1.4.0和Hbase-0.96整合麻烦多了!不是随便几个配置就能搞定的,里面涉及到修改Flume和Hadoop的源码。
此篇,可以作为学习篇,不建议做如此复杂的配置。建议参考Flume-1.4.0和Hbase-0.96.0整合实践
1.都需要修改那些文件?
2.为什么修改这些文件?
3.代码有的地方需要改动,猜测原因什么?
1、修改Flume-src根目录下的pom.xml文件中的部分依赖版本
(1)、Hadoop2x里面已经没有hadoop-core jar包,所以修改Hadoop的依赖包的版本:
- <dependency>
- <groupId>org.apache.hadoop</groupId>
- <artifactId>hadoop-core</artifactId>
- <version>${cdh.hadoop.version}</version>
- </dependency>
-
- 修改为
-
- <dependency>
- <groupId>org.apache.hadoop</groupId>
- <artifactId>hadoop-mapreduce-client-core</artifactId>
- <version>2.2.0</version>
- </dependency>
- <dependency>
- <groupId>org.apache.hadoop</groupId>
- <artifactId>hadoop-common</artifactId>
- <version>2.2.0</version>
- </dependency>
- <dependency>
- <groupId>org.apache.hadoop</groupId>
- <artifactId>hadoop-mapreduce-client-common</artifactId>
- <version>2.2.0</version>
- </dependency>
- <dependency>
- <groupId>org.apache.hadoop</groupId>
- <artifactId>hadoop-mapreduce-client-jobclient</artifactId>
- <version>2.2.0</version>
- </dependency>
复制代码
(2)、修改Guava的版本- <dependency>
- <groupId>com.google.guava</groupId>
- <artifactId>guava</artifactId>
- <version>r07</version>
- </dependency>
-
- 修改为
-
- <dependency>
- <groupId>com.google.guava</groupId>
- <artifactId>guava</artifactId>
- <version>10.0.1</version>
- </dependency>
复制代码
(3)、修改flume-src\flume-core\pom.xml里面的以下配置
- <dependency>
- <groupId>org.apache.hadoop</groupId>
- <artifactId>hadoop-core</artifactId>
- </dependency>
-
- 修改为
-
- <dependency>
- <groupId>org.apache.hadoop</groupId>
- <artifactId>hadoop-mapreduce-client-core</artifactId>
- <version>2.2.0</version>
- </dependency>
- <dependency>
- <groupId>org.apache.hadoop</groupId>
- <artifactId>hadoop-common</artifactId>
- <version>2.2.0</version>
- </dependency>
- <dependency>
- <groupId>org.apache.hadoop</groupId>
- <artifactId>hadoop-mapreduce-client-common</artifactId>
- <version>2.2.0</version>
- </dependency>
- <dependency>
- <groupId>org.apache.hadoop</groupId>
- <artifactId>hadoop-mapreduce-client-jobclient</artifactId>
- <version>2.2.0</version>
- </dependency>
复制代码
(4)、修改flume-src\plugins\flume-plugin-hbasesink\pom.xml里面的以下配置
- <dependency>
- <groupId>org.apache.hbase</groupId>
- <artifactId>hbase</artifactId>
- <version>${cdh.hbase.version}</version>
- </dependency>
-
- <dependency>
- <groupId>org.apache.hbase</groupId>
- <artifactId>hbase</artifactId>
- <version>${cdh.hbase.version}</version>
- <classifier>tests</classifier>
- <scope>test</scope>
- </dependency>
-
- <dependency>
- <groupId>org.apache.hadoop</groupId>
- <artifactId>hadoop-test</artifactId>
- <version>${cdh.hadoop.version}</version>
- <scope>test</scope>
- </dependency>
-
- 修改为
-
- <dependency>
- <groupId>org.apache.hbase</groupId>
- <artifactId>hbase-it</artifactId>
- <version>0.96.0-hadoop2</version>
- </dependency>
复制代码
2、修改flume-core\src\main\java\org\apache\hadoop\io\FlushingSequenceFileWriter.java和RawSequenceFileWriter.java两个java类
因为在步骤一中我们用新版本的Hadoop替换了旧版本的Hadoop,而新版本Hadoop中的org.apache.hadoop.io.SequenceFile.Writer类和旧版本的org.apache.hadoop.io.SequenceFile.Writer类有些不一样。所以导致了FlushingSequenceFileWriter.java和RawSequenceFileWriter.java两个java类出现了部分的错误,解决方法如下:
(1)、需要修改Hadoop-2.2.0源码中的hadoop-2.2.0-src\hadoop-common-project\hadoop-common\src\main\java\org\apache\hadoop\io\SequenceFile.java类,在Writer类里面添加默认的构造函数:
- Writer(){
- this.compress = CompressionType.NONE;
- }
复制代码
然后重新编译hadoop-common-project工程,将编译后的hadoop-common-2.2.0.jar替换之前的hadoop-common-2.2.0.jar
(2)、修改FlushingSequenceFileWriter.java和RawSequenceFileWriter.java
这两个类中有错误,请用新版本Hadoop的相应API替换掉旧版本Hadoop的API,具体怎么修改,这就不不说了,如有需要的同学,可以邮件联系我(wyphao.2007@163.com)
(3)、修改com.cloudera.flume.handlers.seqfile中的SequenceFileOutputFormat类修改如下:
- this(SequenceFile.getCompressionType(FlumeConfiguration.get()),
- new DefaultCodec());
-
- 修改为
-
- this(SequenceFile.getDefaultCompressionType(FlumeConfiguration.get()),
- new DefaultCodec());
-
- CompressionType compressionType = SequenceFile.getCompressionType(conf);
-
- 修改为
-
- CompressionType compressionType = SequenceFile.getDefaultCompressionType(conf);
复制代码
3、重新编译Flume源码
重新编译Flume源码(如何编译Flume源码?请参见本博客的《Flume-0.9.4源码编译及一些编译出错解决方法》),并用编译之后的flume-core-0.9.4-cdh3u3.jar替换${FLUME_HOME}/lib中的flume-core-0.9.4-cdh3u3.jar类。删掉${FLUME_HOME}/lib/hadoop-core-0.20.2-cdh3u3.jar等有关Hadoop旧版本的包。
4、修改${FLUME_HOME}/bin/flume启动脚本
仔细分析${FLUME_HOME}/bin/flume脚本,你会发现如下代码:
- # put hadoop conf dir in classpath to include Hadoop
- # core-site.xml/hdfs-site.xml
- if [ -n "${HADOOP_CONF_DIR}" ]; then
- CLASSPATH="${CLASSPATH}:${HADOOP_CONF_DIR}"
- elif [ -n "${HADOOP_HOME}" ] ; then
- CLASSPATH="${CLASSPATH}:${HADOOP_HOME}/conf"
- elif [ -e "/usr/lib/hadoop/conf" ] ; then
- # if neither is present see if the CDH dir exists
- CLASSPATH="${CLASSPATH}:/usr/lib/hadoop/conf";
- HADOOP_HOME="/usr/lib/hadoop"
- fi # otherwise give up
-
- # try to load the hadoop core jars
- HADOOP_CORE_FOUND=false
- while true; do
- if [ -n "$HADOOP_HOME" ]; then
- HADCOREJARS=`find ${HADOOP_HOME}/hadoop-core*.jar || \
- find ${HADOOP_HOME}/lib/hadoop-core*.jar || true`
- if [ -n "$HADCOREJARS" ]; then
- HADOOP_CORE_FOUND=true
- CLASSPATH="$CLASSPATH:${HADCOREJARS}"
- break;
- fi
- fi
-
- HADCOREJARS=`find ./lib/hadoop-core*.jar 2> /dev/null || true`
- if [ -n "$HADCOREJARS" ]; then
- # if this is the dev environment then hadoop jar will
- # get added as part of ./lib (below)
- break
- fi
-
- # core jars may be missing, we'll check for this below
- break
- done
复制代码
你会发现,这是Flume加载Hadoop旧版本的依赖包,在新版本的Hadoop根本就没有${HADOOP_HOME}/conf等文件夹,所以会出现Flume不能加载对新版本Hadoop的依赖。这里教你用最简单的方法来实现对新版本的Hbase和Hadoop的依赖,在${FLUME_HOME}/bin/flume脚本里面加入下面的CLASSPATH依赖:
- CLASSPATH="/home/q/hbase/hbase-0.96.0-hadoop2/lib/*"
复制代码
请注意hbase-0.96.0-hadoop2里面对hadoop的依赖,hbase-0.96.0-hadoop2里面对Hadoop的依赖包是2.1.0,用上面编译好的hadoop-common-2.2.0.jar替换${HBASE_HOME}/lib里面的hadoop-common-2.1.0.jar
5、如何和Hbase-0.96整合
在flume-src\plugins\flume-plugin-hbasesink\src\main\java里面的添加自己的类(当然你完全可以自己创建一个新的maven工程)。如果需要和Hbase整合,必须继承EventSink.Base类,重写里面的方法(可以参照flume-src\plugins\flume-plugin-hbasesink\src\main\java\com\cloudera\flume\hbase\Attr2HBaseEventSink.java),写完之后需要重新编译flume-src\plugins\flume-plugin-hbasesink底下的类,打包成jar文件。然后将你写好的Hbase sink注册到Flume中。
|