分享

貌似是override,具体看不太懂,求助。

a3087661 发表于 2015-4-23 07:23:02 [显示全部楼层] 回帖奖励 阅读模式 关闭右栏 6 18328
zxy@zxy-virtual-machine:/usr/hadoop/hadoop-2.4.0$ hadoop jar WordCount.jar WordCount /input /output
15/04/23 07:12:49 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
15/04/23 07:12:49 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
15/04/23 07:12:50 INFO input.FileInputFormat: Total input paths to process : 1
15/04/23 07:12:50 INFO mapreduce.JobSubmitter: number of splits:1
15/04/23 07:12:51 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local168934583_0001
15/04/23 07:12:51 WARN conf.Configuration: file:/home/zxy/hadoop_tmp/mapred/staging/zxy168934583/.staging/job_local168934583_0001/job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval;  Ignoring.
15/04/23 07:12:51 WARN conf.Configuration: file:/home/zxy/hadoop_tmp/mapred/staging/zxy168934583/.staging/job_local168934583_0001/job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts;  Ignoring.
15/04/23 07:12:52 WARN conf.Configuration: file:/home/zxy/hadoop_tmp/mapred/local/localRunner/zxy/job_local168934583_0001/job_local168934583_0001.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval;  Ignoring.
15/04/23 07:12:52 WARN conf.Configuration: file:/home/zxy/hadoop_tmp/mapred/local/localRunner/zxy/job_local168934583_0001/job_local168934583_0001.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts;  Ignoring.
15/04/23 07:12:52 INFO mapreduce.Job: The url to track the job: http://localhost:8080/
15/04/23 07:12:52 INFO mapreduce.Job: Running job: job_local168934583_0001
15/04/23 07:12:52 INFO mapred.LocalJobRunner: OutputCommitter set in config null
15/04/23 07:12:52 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
15/04/23 07:12:52 INFO mapred.LocalJobRunner: Waiting for map tasks
15/04/23 07:12:52 INFO mapred.LocalJobRunner: Starting task: attempt_local168934583_0001_m_000000_0
15/04/23 07:12:52 INFO mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
15/04/23 07:12:52 INFO mapred.MapTask: Processing split: hdfs://localhost:9000/input/data.txt:0+57
15/04/23 07:12:52 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
15/04/23 07:12:53 INFO mapreduce.Job: Job job_local168934583_0001 running in uber mode : false
15/04/23 07:12:53 INFO mapreduce.Job:  map 0% reduce 0%
15/04/23 07:12:55 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
15/04/23 07:12:55 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
15/04/23 07:12:55 INFO mapred.MapTask: soft limit at 83886080
15/04/23 07:12:55 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
15/04/23 07:12:55 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
15/04/23 07:12:55 INFO mapred.MapTask: Starting flush of map output
15/04/23 07:12:55 INFO mapred.MapTask: Spilling map output
15/04/23 07:12:55 INFO mapred.MapTask: bufstart = 0; bufend = 36; bufvoid = 104857600
15/04/23 07:12:55 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 26214376(104857504); length = 21/6553600
15/04/23 07:12:55 INFO mapred.MapTask: Finished spill 0
15/04/23 07:12:55 INFO mapred.LocalJobRunner: map task executor complete.
15/04/23 07:12:55 WARN mapred.LocalJobRunner: job_local168934583_0001
java.lang.Exception: java.lang.ArrayIndexOutOfBoundsException: 3
        at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
        at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522)
Caused by: java.lang.ArrayIndexOutOfBoundsException: 3
        at WordCount$TokenizerMapper.map(WordCount.java:35)
        at WordCount$TokenizerMapper.map(WordCount.java:1)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
        at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
15/04/23 07:12:55 INFO mapreduce.Job: Job job_local168934583_0001 failed with state FAILED due to: NA
15/04/23 07:12:55 INFO mapreduce.Job: Counters: 0
zxy@zxy-virtual-machine:/usr/hadoop/hadoop-2.4.0$ hadoop fs -ls /output


已有(6)人评论

跳转到指定楼层
a3087661 发表于 2015-4-23 07:24:56
二楼是代码,和要解决的问题。
问题:7.当前日志采样格式为

a,b,c,d
b,b,f,e
a,a,c,f

请用你最熟悉的语言编写一个mapreduce,并计算第四列每个元素出现的个数。


代码:import java.io.IOException;
import java.util.StringTokenizer;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;

public class WordCount {

  /*
   * 通过扩展Mapper实现内部类TokenizerMapper
   */
  public static class TokenizerMapper
       extends Mapper<LongWritable, Text, Text, IntWritable>{
   
    private final static IntWritable one = new IntWritable(1);
    private Text word = new Text() ;
    /*
     * 重载map方法(non-Javadoc)
     * @see org.apache.hadoop.mapreduce.Mapper#map(KEYIN, VALUEIN, org.apache.hadoop.mapreduce.Mapper.Context)
     */
    public void map(LongWritable key , Text value, Context context
                    ) throws IOException, InterruptedException {
        String []str=value.toString().split(",");
        word.set(str[3].toString());
        context.write(word, one);//写入处理的中间结果<key,value>
      
    }
  }
  
  /*
   * 通过扩展Reducer实现内部类IntSumReducer
   */
  public static class IntSumReducer
       extends Reducer<Text,IntWritable,Text,IntWritable> {
    private IntWritable result = new IntWritable();

    /*
     * 重载reduce方法(non-Javadoc)
     * @see org.apache.hadoop.mapreduce.Reducer#reduce(KEYIN, java.lang.Iterable, org.apache.hadoop.mapreduce.Reducer.Context)
     */
    public void reduce(Text key, Iterable<IntWritable> values,
                       Context context
                       ) throws IOException, InterruptedException {
      int sum = 0;
      for (IntWritable val : values) {
        sum += val.get(); //计数
      }
      result.set(sum);
      context.write(key, result);  //写回结果
    }
  }

  public static void main(String[] args) throws Exception {
    Configuration conf = new Configuration(); //启用默认配置
    String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
    if (otherArgs.length != 2) {
      System.err.println("Usage: wordcount <in> <out>");
      System.exit(2);
    }
    Job job = new Job(conf, "word count");//定义一个job
    job.setJarByClass(WordCount.class);//设定执行类
    job.setMapperClass(TokenizerMapper.class);//设定Mapper实现类
    job.setCombinerClass(IntSumReducer.class);//设定Combiner实现类
    job.setReducerClass(IntSumReducer.class);//设定Reducer实现类
    job.setOutputKeyClass(Text.class);//设定OutputKey实现类,Text.class是默认实现
    job.setOutputValueClass(IntWritable.class);//设定OutputValue实现类
    FileInputFormat.addInputPath(job, new Path(otherArgs[0]));//设定job输入文件夹
    FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));//设定job输出文件夹
    System.exit(job.waitForCompletion(true) ? 0 : 1);
   
  }
}
回复

使用道具 举报

evababy 发表于 2015-4-23 09:31:06
本帖最后由 evababy 于 2015-4-23 09:33 编辑

按照楼主描述不应该有错误,只是split,难道/input下还有其他文件被读取了?
hdfs://localhost:9000/input/data.txt 正确么?
回复

使用道具 举报

a3087661 发表于 2015-4-23 09:49:32
本帖最后由 a3087661 于 2015-4-23 09:55 编辑
evababy 发表于 2015-4-23 09:31
按照楼主描述不应该有错误,只是split,难道/input下还有其他文件被读取了?
hdfs://localhost:9000/inpu ...

这个就是input文件夹下的东西


zxy@zxy-virtual-machine:/usr/hadoop/hadoop-2.4.0$ hadoop fs -ls /input
Found 1 items
-rw-r--r--   1 zxy supergroup         57 2015-04-22 20:40 /input/data.txt
zxy@zxy-virtual-machine:/usr/hadoop/hadoop-2.4.0$ hadoop fs -cat /input/data.txta,b,c,d
e,f,g,h
i,j,k,l
m,n,o,p
q,r,s,t
u,v,w,x
y,z
zxy@zxy-virtual-machine:/usr/hadoop/hadoop-2.4.0$



回复

使用道具 举报

evababy 发表于 2015-4-23 09:58:00
回复

使用道具 举报

arsenduan 发表于 2015-4-23 10:24:33
a3087661 发表于 2015-4-23 09:49
这个就是input文件夹下的东西



数据溢出了,有的数据,没有第四个元素,所以楼主还需要判断当前数组的长度。

public void map(LongWritable key , Text value, Context context
                    ) throws IOException, InterruptedException {
        String []str=value.toString().split(",");
        word.set(str[3].toString());
        context.write(word, one);//写入处理的中间结果<key,value>
      
    }
改成下面形式

public void map(LongWritable key , Text value, Context context
                    ) throws IOException, InterruptedException {
        String []str=value.toString().split(",");
       if(str.length()==4)
{
        word.set(str[3].toString());
        context.write(word, one);//写入处理的中间结果<key,value>

}
      
    }
回复

使用道具 举报

a3087661 发表于 2015-4-23 10:40:14
arsenduan 发表于 2015-4-23 10:24
数据溢出了,有的数据,没有第四个元素,所以楼主还需要判断当前数组的长度。

public void map(Lo ...

改完之后成功了,多谢。
回复

使用道具 举报

您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

关闭

推荐上一条 /2 下一条