1.首先这是我的文本文件内容,就三行
2.我想统计里面多少个大写字符,多少个小写字符,就是希望我的reduce输出结果是:大写个数 小写个数 上面结果也就是 9 84
3.我的代码如下:
public class BigAndSmall {
public static class MapBM extends Mapper<LongWritable, Text, Text, IntWritable>{
public Text keyText = new Text();
public IntWritable intValue = new IntWritable(1);
@Override
protected void map(LongWritable key, Text value,Context context)
throws IOException, InterruptedException {
String line = value.toString();
char[] c = line.toCharArray();
for(int i=0;i<c.length;i++){
keyText.set(String.valueOf(c));
context.write(keyText,intValue);
}
}
}
public static class ReduceBM extends Reducer<Text, IntWritable, Text, IntWritable> {
public Text big = new Text();
public IntWritable small = new IntWritable(0);
@Override
protected void reduce(Text key,
Iterable<IntWritable> values,Context context)
throws IOException, InterruptedException {
int bigNum = 0;
int smallNum = 0;
int count=0;
char c = key.toString().charAt(0);
for(IntWritable value:values){
count+=value.get();
}
//大写字母
if(c>='A' && c<='Z'){
bigNum+=count;
}
//小写字母
if(c>='a' && c<='z'){
smallNum+=count;
}
big.set(String.valueOf(bigNum));
small.set(smallNum);
context.write(big,small);
}
}
public static void main(String[] args) throws Exception {
System.setProperty("HADOOP_USER_NAME", "hadoop");
Configuration conf = new Configuration();
Job job = Job.getInstance(conf);
job.setJobName("BigAndSmall");
job.setJarByClass(BigAndSmall.class);
job.setMapperClass(MapBM.class);
job.setReducerClass(ReduceBM.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileSystem fs = FileSystem.get(URI.create("hdfs://slave2:9000"),conf);
Path inPath = new Path("hdfs://slave2:9000/usr/test_in/");
Path outPath = new Path("hdfs://slave2:9000/usr/test_out/");
if(fs.exists(outPath)){
fs.delete(outPath, true);
}
FileInputFormat.addInputPath(job, inPath);
FileOutputFormat.setOutputPath(job, outPath);
job.waitForCompletion(true);
}
}
4.结果如下:
看,左边一列相加是9 ,右边一列相加是 84,我想一次就输出 9 84 而不是我如图这种,怎么解决啊?我知道再写一个MR去统计上图信息可以实现,但是有点麻烦啊 ,有什么代码技巧吗?
|