分享

Hadoop2.4.1入门实例------MaxTemperature



问题导读:
针对MaxTemperature目的,如何编写map、reduce函数?







一、前期准备
1、创建伪分布Hadoop环境,请参考官方文档。

2、准备数据文件如下sample.txt:


  1. 123456798676231190101234567986762311901012345679867623119010123456798676231190101234561+00121534567890356
  2. 123456798676231190101234567986762311901012345679867623119010123456798676231190101234562+01122934567890456
  3. 123456798676231190201234567986762311901012345679867623119010123456798676231190101234562+02120234567893456
  4. 123456798676231190401234567986762311901012345679867623119010123456798676231190101234561+00321234567803456
  5. 123456798676231190101234567986762311902012345679867623119010123456798676231190101234561+00429234567903456
  6. 123456798676231190501234567986762311902012345679867623119010123456798676231190101234561+01021134568903456
  7. 123456798676231190201234567986762311902012345679867623119010123456798676231190101234561+01124234578903456
  8. 123456798676231190301234567986762311905012345679867623119010123456798676231190101234561+04121234678903456
  9. 123456798676231190301234567986762311905012345679867623119010123456798676231190101234561+00821235678903456
复制代码



二、编写代码

1、创建Map


  1. package org.jediael.hadoopDemo.maxtemperature;
  2. import java.io.IOException;
  3. import org.apache.hadoop.io.IntWritable;
  4. import org.apache.hadoop.io.LongWritable;
  5. import org.apache.hadoop.io.Text;
  6. import org.apache.hadoop.mapreduce.Mapper;
  7. public class MaxTemperatureMapper extends
  8.                 Mapper<LongWritable, Text, Text, IntWritable> {
  9.         private static final int MISSING = 9999;
  10.         @Override
  11.         public void map(LongWritable key, Text value, Context context)
  12.                         throws IOException, InterruptedException {
  13.                 String line = value.toString();
  14.                 String year = line.substring(15, 19);
  15.                 int airTemperature;
  16.                 if (line.charAt(87) == '+') { // parseInt doesn't like leading plus
  17.                                                                                 // signs
  18.                         airTemperature = Integer.parseInt(line.substring(88, 92));
  19.                 } else {
  20.                         airTemperature = Integer.parseInt(line.substring(87, 92));
  21.                 }
  22.                 String quality = line.substring(92, 93);
  23.                 if (airTemperature != MISSING && quality.matches("[01459]")) {
  24.                         context.write(new Text(year), new IntWritable(airTemperature));
  25.                 }
  26.         }
  27. }
复制代码




2、创建Reduce

  1. package org.jediael.hadoopDemo.maxtemperature;
  2. import java.io.IOException;
  3. import org.apache.hadoop.io.IntWritable;
  4. import org.apache.hadoop.io.Text;
  5. import org.apache.hadoop.mapreduce.Reducer;
  6. public class MaxTemperatureReducer extends
  7.                 Reducer<Text, IntWritable, Text, IntWritable> {
  8.         @Override
  9.         public void reduce(Text key, Iterable<IntWritable> values, Context context)
  10.                         throws IOException, InterruptedException {
  11.                 int maxValue = Integer.MIN_VALUE;
  12.                 for (IntWritable value : values) {
  13.                         maxValue = Math.max(maxValue, value.get());
  14.                 }
  15.                 context.write(key, new IntWritable(maxValue));
  16.         }
  17. }
复制代码



3、创建main方法

  1. package org.jediael.hadoopDemo.maxtemperature;
  2. import org.apache.hadoop.fs.Path;
  3. import org.apache.hadoop.io.IntWritable;
  4. import org.apache.hadoop.io.Text;
  5. import org.apache.hadoop.mapreduce.Job;
  6. import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
  7. import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
  8. public class MaxTemperature {
  9.         public static void main(String[] args) throws Exception {
  10.                 if (args.length != 2) {
  11.                         System.err
  12.                                         .println("Usage: MaxTemperature <input path> <output path>");
  13.                         System.exit(-1);
  14.                 }
  15.                 Job job = new Job();
  16.                 job.setJarByClass(MaxTemperature.class);
  17.                 job.setJobName("Max temperature");
  18.                 FileInputFormat.addInputPath(job, new Path(args[0]));
  19.                 FileOutputFormat.setOutputPath(job, new Path(args[1]));
  20.                 job.setMapperClass(MaxTemperatureMapper.class);
  21.                 job.setReducerClass(MaxTemperatureReducer.class);
  22.                 job.setOutputKeyClass(Text.class);
  23.                 job.setOutputValueClass(IntWritable.class);
  24.                 System.exit(job.waitForCompletion(true) ? 0 : 1);
  25.         }
  26. }
复制代码



4、导出成MaxTemp.jar,并上传至运行程序的服务器。



三、运行程序

1、创建input目录并将sample.txt复制到input目录


  1. hadoop fs -put sample.txt /
复制代码


2、运行程序


  1. export HADOOP_CLASSPATH=MaxTemp.jar
  2. hadoop org.jediael.hadoopDemo.maxtemperature.MaxTemperature /sample.txt output10
复制代码



注意输出目录不能已经存在,否则会创建失败。


3、查看结果

(1)查看结果


  1. [jediael@jediael44 code][        DISCUZ_CODE_14        ]nbsp; hadoop fs -cat output10/*
  2. 14/07/09 14:51:35 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
  3. 1901    42
  4. 1902    212
  5. 1903    412
  6. 1904    32
  7. 1905    102
复制代码



(2)运行时输出


  1. [jediael@jediael44 code][        DISCUZ_CODE_15        ]nbsp; hadoop org.jediael.hadoopDemo.maxtemperature.MaxTemperature /sample.txt output10
  2. 14/07/09 14:50:40 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
  3. 14/07/09 14:50:41 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
  4. 14/07/09 14:50:42 WARN mapreduce.JobSubmitter: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
  5. 14/07/09 14:50:43 INFO input.FileInputFormat: Total input paths to process : 1
  6. 14/07/09 14:50:43 INFO mapreduce.JobSubmitter: number of splits:1
  7. 14/07/09 14:50:44 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1404888618764_0001
  8. 14/07/09 14:50:44 INFO impl.YarnClientImpl: Submitted application application_1404888618764_0001
  9. 14/07/09 14:50:44 INFO mapreduce.Job: The url to track the job: http://jediael44:8088/proxy/application_1404888618764_0001/
  10. 14/07/09 14:50:44 INFO mapreduce.Job: Running job: job_1404888618764_0001
  11. 14/07/09 14:50:57 INFO mapreduce.Job: Job job_1404888618764_0001 running in uber mode : false
  12. 14/07/09 14:50:57 INFO mapreduce.Job:  map 0% reduce 0%
  13. 14/07/09 14:51:05 INFO mapreduce.Job:  map 100% reduce 0%
  14. 14/07/09 14:51:15 INFO mapreduce.Job:  map 100% reduce 100%
  15. 14/07/09 14:51:15 INFO mapreduce.Job: Job job_1404888618764_0001 completed successfully
  16. 14/07/09 14:51:16 INFO mapreduce.Job: Counters: 49
  17.         File System Counters
  18.                 FILE: Number of bytes read=94
  19.                 FILE: Number of bytes written=185387
  20.                 FILE: Number of read operations=0
  21.                 FILE: Number of large read operations=0
  22.                 FILE: Number of write operations=0
  23.                 HDFS: Number of bytes read=1051
  24.                 HDFS: Number of bytes written=43
  25.                 HDFS: Number of read operations=6
  26.                 HDFS: Number of large read operations=0
  27.                 HDFS: Number of write operations=2
  28.         Job Counters
  29.                 Launched map tasks=1
  30.                 Launched reduce tasks=1
  31.                 Data-local map tasks=1
  32.                 Total time spent by all maps in occupied slots (ms)=5812
  33.                 Total time spent by all reduces in occupied slots (ms)=7023
  34.                 Total time spent by all map tasks (ms)=5812
  35.                 Total time spent by all reduce tasks (ms)=7023
  36.                 Total vcore-seconds taken by all map tasks=5812
  37.                 Total vcore-seconds taken by all reduce tasks=7023
  38.                 Total megabyte-seconds taken by all map tasks=5951488
  39.                 Total megabyte-seconds taken by all reduce tasks=7191552
  40.         Map-Reduce Framework
  41.                 Map input records=9
  42.                 Map output records=8
  43.                 Map output bytes=72
  44.                 Map output materialized bytes=94
  45.                 Input split bytes=97
  46.                 Combine input records=0
  47.                 Combine output records=0
  48.                 Reduce input groups=5
  49.                 Reduce shuffle bytes=94
  50.                 Reduce input records=8
  51.                 Reduce output records=5
  52.                 Spilled Records=16
  53.                 Shuffled Maps =1
  54.                 Failed Shuffles=0
  55.                 Merged Map outputs=1
  56.                 GC time elapsed (ms)=154
  57.                 CPU time spent (ms)=1450
  58.                 Physical memory (bytes) snapshot=303112192
  59.                 Virtual memory (bytes) snapshot=1685733376
  60.                 Total committed heap usage (bytes)=136515584
  61.         Shuffle Errors
  62.                 BAD_ID=0
  63.                 CONNECTION=0
  64.                 IO_ERROR=0
  65.                 WRONG_LENGTH=0
  66.                 WRONG_MAP=0
  67.                 WRONG_REDUCE=0
  68.         File Input Format Counters
  69.                 Bytes Read=954
  70.         File Output Format Counters
  71.                 Bytes Written=43
复制代码







欢迎加入about云群90371779322273151432264021 ,云计算爱好者群,亦可关注about云腾讯认证空间||关注本站微信

已有(1)人评论

跳转到指定楼层
您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

关闭

推荐上一条 /2 下一条