分享

Spring Hadoop 快速入门:如何配置Spring及验证mapreduce

howtodown 发表于 2014-11-15 12:45:51 [显示全部楼层] 回帖奖励 阅读模式 关闭右栏 1 98931
本帖最后由 howtodown 于 2014-11-15 12:47 编辑

问题导读
1.Spring Hadoop如何验证mapreduce是否成功?
2.如何配置Spring Hadoop?







传说中的 Spring 终于整合了 Hadoop, 推出了 Spring Hadoop.
当你想要开始体验 Spring Hadoop 的时候, 你会遇到各式各样奇怪的问题, 目前也有人开始陆续回报了.
如果你只是想要简单的试用一下, 又不想要自己解决这些疑难杂症, 建议大家可以参考下面的步骤来快速体验一下 Spring Hadoop 的威力.

  环境要求: Hadoop 0.20.2以上

                                                                    
  安装之后, 那就让我们来开始吧...


Step1. 下载 Spring Hadoop, 这边是使用 git 去下载, 如果你对 git 不熟悉的话, 也可以直接从官网下载再解压缩
参考:软件版本控制-在Windows中使用Git视频介绍

  这边的例子里面是用我的 home 目录为例, 大家记得要改成你自己的目录名称
  /home/evanshsu mkdir springhadoop 
  /home/evanshsu cd springhadoop
  /home/evanshsu/springhadoop git init
  /home/evanshsu/springhadoop git pull "git://github.com/SpringSource/spring-hadoop.git"


Step2. build spring-hadoop.jar
  build完之后, 我们要把所有的 jar 檔都放在 /home/evanshsu/springhadoop/lib 里面, 以便之后把所有的jar 档包在同一包里面
  /home/evanshsu/springhadoop ./gradlew jar
  /home/evanshsu/springhadoop mkdir lib
  /home/evanshsu/springhadoop cp build/libs/spring-data-hadoop-1.0.0.BUILD-SNAPSHOT.jar lib/

   

Step3. get spring-framework.

  因为 spring hadoop 是倚赖于 spring-framework 的, 所以我们也要把 spring-framework 的 jar 檔放在 lib 里面
  /home/evanshsu/spring wget "http://s3.amazonaws.com/dist.springframework.org/release/SPR/spring-framework-3.1.1.RELEASE.zip"
  /home/evanshsu/spring unzip spring-framework-3.1.1.RELEASE.zip
  /home/evanshsu/spring cp spring-framework-3.1.1.RELEASE/dist/*.jar /home/evanshsu/springhadoop/lib/



Step4. 修改 build file 让我们可以把所有的 jar 檔, 封装到同一个 jar 档里面
  /home/evanshsu/spring/samples/wordcount vim build.gradle

    description = 'Spring Hadoop Samples - WordCount'

    apply plugin: 'base'
    apply plugin: 'java'
    apply plugin: 'idea'
    apply plugin: 'eclipse'

    repositories {
        flatDir(dirs: '/home/evanshsu/springhadoop/lib/')
        // Public Spring artefacts
        maven { url "http://repo.springsource.org/libs-release" }
        maven { url "http://repo.springsource.org/libs-milestone" }
        maven { url "http://repo.springsource.org/libs-snapshot" }
    }

    dependencies {
        compile fileTree('/home/evanshsu/springhadoop/lib/')
        compile "org.apache.hadoop:hadoop-examples:$hadoopVersion"
        // see HADOOP-7461
        runtime "org.codehaus.jackson:jackson-mapper-asl:$jacksonVersion"

        testCompile "junit:junit:$junitVersion"
        testCompile "org.springframework:spring-test:$springVersion"
    }

    jar {
        from configurations.compile.collect { it.isDirectory() ? it : zipTree(it).matching{
            exclude 'META-INF/spring.schemas'
            exclude 'META-INF/spring.handlers'
            } }
    }


Step5. 这边有一个特殊的 hadoop.properties 主要是放置 hadoop 相关的设定数据.
  基本上我们要把 wordcount.input.path wordcount.output.path 改成之后执行 wordcount 要使用的目录, 而且wordcount.input.path 里面记得要放几个文本文件
  另外, 还要把 hd.fs 改成你 hdfs 的设定
  如果你是用国网中心 Hadoop 的话, 要把 hd.fs 改成 hd.fs=hdfs://gm2.nchc.org.tw:8020
  /home/evanshsu/spring/samples/wordcount vim src/main/resources/hadoop.properties

    wordcount.input.path=/user/evanshsu/input.txt
    wordcount.output.path=/user/evanshsu/output

    hive.host=localhost
    hive.port=12345
    hive.url=jdbc:hive://${hive.host}:${hive.port}
    hd.fs=hdfs://localhost:9000
    mapred.job.tracker=localhost:9001

    path.cat=bin${file.separator}stream-bin${file.separator}cat
    path.wc=bin${file.separator}stream-bin${file.separator}wc

    input.directory=logs
    log.input=/logs/input/
    log.output=/logs/output/

    distcp.src=${hd.fs}/distcp/source.txt
    distcp.dst=${hd.fs}/distcp/dst


Step6. 这是最重要的一个配置文件, 有用过 Spring 的人都知道这个配置文件是Spring 的灵魂
  /home/evanshsu/spring/samples/wordcount vim src/main/resources/META-INF/spring/context.xml

    <?xml version="1.0" encoding="UTF-8"?>
    <beans xmlns="http://www.springframework.org/schema/beans"
        xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
        xmlns:context="http://www.springframework.org/schema/context"
        xmlns:hdp="http://www.springframework.org/schema/hadoop"
        xmlns:p="http://www.springframework.org/schema/p"
        xsi:schemaLocation="http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans.xsd
        http://www.springframework.org/schema/context http://www.springframework.org/schema/context/spring-context.xsd
        http://www.springframework.org/schema/hadoop http://www.springframework.org/schema/hadoop/spring-hadoop.xsd">

        <context:property-placeholder location="hadoop.properties"/>

        <hdp:configuration>
            fs.default.name=${hd.fs}
        </hdp:configuration>

        <hdp:job id="wordcount-job" validate-paths="false"
            input-path="${wordcount.input.path}" output-path="${wordcount.output.path}"
            mapper="org.springframework.data.hadoop.samples.wordcount.WordCountMapper"
            reducer="org.springframework.data.hadoop.samples.wordcount.WordCountReducer"
            jar-by-class="org.springframework.data.hadoop.samples.wordcount.WordCountMapper" />

        <!-- simple job runner -->
        <bean id="runner" class="org.springframework.data.hadoop.mapreduce.JobRunner"  p:jobs-ref="wordcount-job"/>
      
    </beans>


Step7. 加上自己的 mapper, reducer
  /home/evanshsu/spring/samples/wordcount vim src/main/java/org/springframework/data/hadoop/samples/wordcount/WordCountMapper.java

    package org.springframework.data.hadoop.samples.wordcount;
    import java.io.IOException;
    import java.util.StringTokenizer;

    import org.apache.hadoop.io.IntWritable;
    import org.apache.hadoop.io.Text;
    import org.apache.hadoop.mapreduce.Mapper;

    public class WordCountMapper extends Mapper<Object, Text, Text, IntWritable> {

        private final static IntWritable one = new IntWritable(1);
        private Text word = new Text();

        public void map(Object key, Text value, Context context)
                throws IOException, InterruptedException {
            StringTokenizer itr = new StringTokenizer(value.toString());
            while (itr.hasMoreTokens()) {
                word.set(itr.nextToken());
                context.write(word, one);
            }
        }
    }


  /home/evanshsu/spring/samples/wordcount vim src/main/java/org/springframework/data/hadoop/samples/wordcount/WordCountReducer.java

    package org.springframework.data.hadoop.samples.wordcount;
    import java.io.IOException;

    import org.apache.hadoop.io.IntWritable;
    import org.apache.hadoop.io.Text;
    import org.apache.hadoop.mapreduce.Reducer;

    public class WordCountReducer extends
            Reducer<Text, IntWritable, Text, IntWritable> {
        private IntWritable result = new IntWritable();

        public void reduce(Text key, Iterable<IntWritable> values, Context context)
                throws IOException, InterruptedException {
            int sum = 0;
            for (IntWritable val : values) {
                sum += val.get();
            }
            result.set(sum);
            context.write(key, result);
        }
    }


Step8.  加上 spring.schemas, spring.handlers
  /home/evanshsu/spring/samples/wordcount vim src/main/resources/META-INF/spring.schemas
    http\://www.springframework.org/schema/context/spring-context.xsd=org/springframework/context/config/spring-context-3.1.xsd
    http\://www.springframework.org/schema/hadoop/spring-hadoop.xsd=/org/springframework/data/hadoop/config/spring-hadoop-1.0.xsd

  /home/evanshsu/spring/samples/wordcount vim src/main/resources/META-INF/spring.handlers
    http\://www.springframework.org/schema/p=org.springframework.beans.factory.xml.SimplePropertyNamespaceHandler
    http\://www.springframework.org/schema/context=org.springframework.context.config.ContextNamespaceHandler
    http\://www.springframework.org/schema/hadoop=org.springframework.data.hadoop.config.HadoopNamespaceHandler


Step9. 终于到最后一步啰, 这一步我们要把所有的 jar 档封装在一起, 并且丢到hadoop 上面去跑
/home/evanshsu/spring/samples/wordcount ../../gradlew jar
/home/evanshsu/spring/samples/wordcount hadoop jar build/libs/wordcount-1.0.0.M1.jar org.springframework.data.hadoop.samples.wordcount.Main


Step10. 最后来确认看看结果有没有跑出来吧
/home/evanshsu/spring/samples/wordcount hadoop fs -cat /user/evanshsu/output/*







已有(1)人评论

跳转到指定楼层
wjhdtx 发表于 2014-11-19 18:13:40
问下楼主,你这能解决我的这个问题吗?
http://www.aboutyun.com/thread-10129-1-1.html
谢谢
回复

使用道具 举报

您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

关闭

推荐上一条 /2 下一条