运行一个map/reduce程序不成功，请教大虾！

本人在一个教程上看到一个文件类容去重的例子，于是就仿照它写了一个，但是在eclipse上运行时，map到50%就不动了，我是用vm加centos 部署的一个集群，一个namenode两个datanode，用户Hadoop是1.2.1，下面是我的代码，请帮忙看看为什么会卡住。

package com.hebut.mr;
import java.io.IOException;
import java.util.StringTokenizer;

import org.apache.commons.io.output.NullWriter;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;

public class Dedup {

public static class Map extends Mapper<Object,Text,Text,IntWritable>{
      private static final IntWritable one = new IntWritable(1);
private Text word = new Text();

public void map(Object key, Text value, Mapper<Object, Text, Text, IntWritable>.Context context) throws IOException, InterruptedException
{
   StringTokenizer itr = new StringTokenizer(value.toString());
   while (itr.hasMoreTokens()) {
      this.word.set(itr.nextToken());
      context.write(this.word, one);
   }
}
}

public static class Reduce extends Reducer<Text,IntWritable,Text,NullWritable>{
      private IntWritable result = new IntWritable();

         public void reduce(Text key, Iterable<IntWritable> values, Reducer<Text, IntWritable, Text, NullWritable>.Context context)
            throws IOException, InterruptedException
         {
            int sum = 0;
            for (IntWritable val : values) {
            sum += val.get();
            }
            this.result.set(sum);
            context.write(key, NullWritable.get());
         }
}
public static void main(String[] args) throws Exception{
Configuration conf = new Configuration();
String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
if (otherArgs.length != 2) {
System.err.println("Usage: Data Deduplication <in> <out>");
System.exit(2);
}
Job job = new Job(conf, "Data Deduplication");
job.setJarByClass(Dedup.class);
job.setMapperClass(Map.class);
job.setCombinerClass(Reduce.class);
job.setReducerClass(Reduce.class);
job.setMapOutputValueClass(IntWritable.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(NullWritable.class);
FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}

控制台信息

Joker · 发表于 2014-12-18 13:21:21

你去掉这段代码

job.setCombinerClass(Reduce.class);
复制代码

gefieder · 发表于 2014-12-18 13:40:49

Hadoop中NullWritable用的时候需要谨慎。详细参考Hadoop中NullWritable不能乱用

可以尝试运行这个例子：

新手指导，该如何在开发环境中，创建mapreduce程序

EASONLIU · 发表于 2014-12-18 14:08:24

哦哦。。。

fanbells · 发表于 2014-12-18 15:00:37

赞同楼上的看法，是NullWritable的问题。

jwb590 · 发表于 2014-12-18 16:02:02

确实是NullWritable的原因，我换成 new Text("")就没有问题了

xiaojinboaisisi · 发表于 2014-12-18 16:53:49

NullWritable 代替new Text("")

zhujun182104906 · 发表于 2014-12-19 09:51:41

我之前写的程序输出用NullWritable也没问题啊

jwb590 · 发表于 2014-12-19 10:30:55

zhujun182104906 发表于 2014-12-19 09:51
我之前写的程序输出用NullWritable也没问题啊

能把代码粘贴来看看嘛？

songyl525 · 发表于 2014-12-28 20:50:42

NullWritable最好不要用着value中，应为null不参与运算

图文精华

运行一个map/reduce程序不成功，请教大虾！

已有(10)人评论

最佳新人

热心会员

推荐 /2