nettman 发表于 2014-7-2 12:01:58

hadoop2.4新api编程:Hadoop Tool,ToolRunner原理分析


问题导读:
1.Tool是接口还是类?
2.Tool继承了那个类?
3.Tool与ToolRunner的关系是什么?
4. Tool与ToolRunner作用分别是什么?

static/image/hrline/4.gif


hadoop分为新旧api,由于hadoop目前最新版本2.4,本文是以hadoop2.4发布的api来进行分析的。

首先我们需要会查看源码,源码的查看,可以参考:如何通过eclipse查看、阅读hadoop2.4源码

我们首先查看接口Tool:
(Tool.java)


@InterfaceAudience.Public
@InterfaceStability.Stable
public interface Tool extends Configurable {
/**
   * Execute the command with the given arguments.
   *
   * @param args command specific arguments.
   * @return exit code.
   * @throws Exception
   */
int run(String [] args) throws Exception;
}Tool接口继承了Configurable接口,只有一个run()方法。(接口继承接口)

继续自Configurable接口



public interface Configurable {

/** Set the configuration to be used by this object. */
void setConf(Configuration conf);

/** Return the configuration used by this object. */
Configuration getConf();
}

Configurable接口只定义了两个方法:setConf与 getConf。

Configured类实现了Configurable接口:




@InterfaceAudience.Public
@InterfaceStability.Stable
public class Configured implements Configurable {

private Configuration conf;

/** Construct a Configured. */
public Configured() {
    this(null);
}

/** Construct a Configured. */
public Configured(Configuration conf) {
    setConf(conf);
}

// inherit javadoc
@Override
public void setConf(Configuration conf) {
    this.conf = conf;
}

// inherit javadoc
@Override
public Configuration getConf() {
    return conf;
}

}继承关系如下:

再看ToolRunner类的一部分:


下面两个是重载函数:
public static int run(Configuration conf, Tool tool, String[] args)
    throws Exception{
    if(conf == null) {
      conf = new Configuration();
    }
    GenericOptionsParser parser = new GenericOptionsParser(conf, args);
    //set the configuration back, so that Tool can configure itself
    tool.setConf(conf);
   
    //get the args w/o generic hadoop args
    String[] toolArgs = parser.getRemainingArgs();
    return tool.run(toolArgs);
}



/**
   * Runs the <code>Tool</code> with its <code>Configuration</code>.
   *
   * Equivalent to <code>run(tool.getConf(), tool, args)</code>.
   *
   * @param tool <code>Tool</code> to run.
   * @param args command-line arguments to the tool.
   * @return exit code of the {@link Tool#run(String[])} method.
   */
public static int run(Tool tool, String[] args)
    throws Exception{
    return run(tool.getConf(), tool, args);
}


从上面两个ToolRunner的静态方法run()可以看到,处理hadoop的通用命令行参数,然后将args交给tool来处理,再由tool来运行自己的run方法。

这里在强调一下:以下面函数为准
/**
   * Runs the given <code>Tool</code> by {@link Tool#run(String[])}, after
   * parsing with the given generic arguments. Uses the given
   * <code>Configuration</code>, or builds one if null.
   *
   * Sets the <code>Tool</code>'s configuration with the possibly modified
   * version of the <code>conf</code>.
   *
   * @param conf <code>Configuration</code> for the <code>Tool</code>.
   * @param tool <code>Tool</code> to run.
   * @param args command-line arguments to the tool.
   * @return exit code of the {@link Tool#run(String[])} method.
   */
public static int run(Configuration conf, Tool tool, String[] args)
    throws Exception{
    if(conf == null) {
      conf = new Configuration();
    }
    GenericOptionsParser parser = new GenericOptionsParser(conf, args);
    //set the configuration back, so that Tool can configure itself
    tool.setConf(conf);
   
    //get the args w/o generic hadoop args
    String[] toolArgs = parser.getRemainingArgs();
    return tool.run(toolArgs);
}

Tool是一个接口,ToolRunner是一个类,ToolRunner类里面的run函数,如下
public static int run(Configuration conf, Tool tool, String[] args)
    throws Exception{
.............................
    return tool.run(toolArgs);
}
这个函数把二者给结合起来了,也就是说我们ToolRunner的run方法本质是调用的tool的run方法。而run方法,则是BookCount类继承了tool,然后重写了
@Override
public int run(String[] args) throws Exception
这个run方法里,我们把job的各种设置由驱动主函数mian()移植到run方法中,然后驱动函数main()通过 ToolRunner.run(conf, new BookCount(), args);调用这个重写方法,代码可以参考下面



static/image/hrline/2.gif


我们run重写run方法

@Override
      public int run(String[] args) throws Exception {

                try {

                        //函数实现

                } catch (Exception e) {

                        logger.error(e.getMessage());

                        e.printStackTrace();

                }

                return 0;

      }
驱动主函数:

public class BookCount extends Configured implements Tool {

public static final Logger logger = Logger.getLogger(BookCount.class);

public static void main(String[] args) throws Exception {

PropertyConfigurator.configure("conf/log4j.properties");

logger.info("BookCountNew starting");

System.setProperty("HADOOP_USER_NAME", "hduser");

Configuration conf = new Configuration();

int res = ToolRunner.run(conf, new BookCount(), args);

logger.info("BookCountNew end");

System.exit(res);

}







271592448 发表于 2014-7-3 10:42:13

真心不错.楼主搞得很好。
页: [1]
查看完整版本: hadoop2.4新api编程:Hadoop Tool,ToolRunner原理分析