分享

Hadoop2.2.0源码分析(一)——Eclipse运行WordCount.java

hyj 2013-12-30 12:19:47 发表于 实操演练 [显示全部楼层] 回帖奖励 阅读模式 关闭右栏 0 23948
    在hadoop-2.2.0.tar.gz文件下没有找到源码(新版本不但没有Eclipse插件,也没有源码,只有.class字节码文件),可以下载hadoop-2.2.0-src.tar.gz,解压,然后在hadoop-mapreduce-examples/src/main/java/org/apache/hadoop/examples目录下获取源码。
  1. /**
  2. * Licensed to the Apache Software Foundation (ASF) under one
  3. * or more contributor license agreements.  See the NOTICE file
  4. * distributed with this work for additional information
  5. * regarding copyright ownership.  The ASF licenses this file
  6. * to you under the Apache License, Version 2.0 (the
  7. * "License"); you may not use this file except in compliance
  8. * with the License.  You may obtain a copy of the License at
  9. *
  10. *     http://www.apache.org/licenses/LICENSE-2.0
  11. *
  12. * Unless required by applicable law or agreed to in writing, software
  13. * distributed under the License is distributed on an "AS IS" BASIS,
  14. * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  15. * See the License for the specific language governing permissions and
  16. * limitations under the License.
  17. */
  18. package org.apache.hadoop.examples;
  19. import java.io.IOException;
  20. import java.util.StringTokenizer;
  21. import org.apache.hadoop.conf.Configuration;
  22. import org.apache.hadoop.fs.Path;
  23. import org.apache.hadoop.io.IntWritable;
  24. import org.apache.hadoop.io.Text;
  25. import org.apache.hadoop.mapreduce.Job;
  26. import org.apache.hadoop.mapreduce.Mapper;
  27. import org.apache.hadoop.mapreduce.Reducer;
  28. import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
  29. import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
  30. import org.apache.hadoop.util.GenericOptionsParser;
  31. public class WordCount {
  32.   public static class TokenizerMapper
  33.        extends Mapper<Object, Text, Text, IntWritable>{
  34.    
  35.     private final static IntWritable one = new IntWritable(1);
  36.     private Text word = new Text();
  37.     // value已经是文件内容的一行
  38.     public void map(Object key, Text value, Context context
  39.                     ) throws IOException, InterruptedException {
  40.       StringTokenizer itr = new StringTokenizer(value.toString());
  41.       while (itr.hasMoreTokens()) {
  42.         word.set(itr.nextToken());
  43.         context.write(word, one);
  44.       }
  45.     }
  46.   }
  47.   
  48.   public static class IntSumReducer
  49.        extends Reducer<Text,IntWritable,Text,IntWritable> {
  50.     private IntWritable result = new IntWritable();
  51.     public void reduce(Text key, Iterable<IntWritable> values,
  52.                        Context context
  53.                        ) throws IOException, InterruptedException {
  54.       int sum = 0;
  55.       for (IntWritable val : values) {
  56.         sum += val.get();
  57.       }
  58.       result.set(sum);
  59.       context.write(key, result);
  60.     }
  61.   }
  62.   public static void main(String[] args) throws Exception {
  63.     Configuration conf = new Configuration();
  64.     String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
  65.     if (otherArgs.length != 2) {
  66.       System.err.println("Usage: wordcount <in> <out>");
  67.       System.exit(2);
  68.     }
  69.     Job job = new Job(conf, "word count");
  70.     job.setJarByClass(WordCount.class);
  71.     job.setMapperClass(TokenizerMapper.class);
  72.     job.setCombinerClass(IntSumReducer.class);
  73.     job.setReducerClass(IntSumReducer.class);
  74.     job.setOutputKeyClass(Text.class);
  75.     job.setOutputValueClass(IntWritable.class);
  76.     FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
  77.     FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));
  78.     System.exit(job.waitForCompletion(true) ? 0 : 1);
  79.   }
  80. }
复制代码
在Eclipse中创建一个MapReduce Project,然后新建Java类,例如创建一个MyWordCount 类,然后将WordCount.java程序代码拷贝到MyWordCount.java文件中。然后点击Run-->Run Configurations…,在弹出的对话框中左边栏选择Java Application,选中MyWordCount,在右边栏中对Arguments进行配置。

在Program arguments中配置输入输出目录参数

/home/jack/Desktop/in /home/jack/Desktop/out

在VM arguments中配置VM arguments的参数

-Xms512m -Xmx1024m -XX:MaxPermSize=256m

注:


  • in文件夹是需要在程序运行前创建的,并且要放入需要统计词频的文件,out文件夹是不能提前创建的,要由系统自动生成,否则运行时会出现Output directory file:/home/jack/Desktop/out already exists错误。
  • 文件输入和输出目录为本地文件系统中的文件。
  • 程序运行需要点击菜单栏上的Run。

    程序运行结束后,可以在/home/jack/Desktop/out目录下的part-r-00000文件查看到词频统计的结果。


没找到任何评论,期待你打破沉寂

您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

关闭

推荐上一条 /2 下一条