hadoop2.4新api编程:Hadoop Tool,ToolRunner原理分析
问题导读:
1.Tool是接口还是类?
2.Tool继承了那个类?
3.Tool与ToolRunner的关系是什么?
4. Tool与ToolRunner作用分别是什么?
static/image/hrline/4.gif
hadoop分为新旧api,由于hadoop目前最新版本2.4,本文是以hadoop2.4发布的api来进行分析的。
首先我们需要会查看源码,源码的查看,可以参考:如何通过eclipse查看、阅读hadoop2.4源码
我们首先查看接口Tool:
(Tool.java)
@InterfaceAudience.Public
@InterfaceStability.Stable
public interface Tool extends Configurable {
/**
* Execute the command with the given arguments.
*
* @param args command specific arguments.
* @return exit code.
* @throws Exception
*/
int run(String [] args) throws Exception;
}Tool接口继承了Configurable接口,只有一个run()方法。(接口继承接口)
继续自Configurable接口
public interface Configurable {
/** Set the configuration to be used by this object. */
void setConf(Configuration conf);
/** Return the configuration used by this object. */
Configuration getConf();
}
Configurable接口只定义了两个方法:setConf与 getConf。
Configured类实现了Configurable接口:
@InterfaceAudience.Public
@InterfaceStability.Stable
public class Configured implements Configurable {
private Configuration conf;
/** Construct a Configured. */
public Configured() {
this(null);
}
/** Construct a Configured. */
public Configured(Configuration conf) {
setConf(conf);
}
// inherit javadoc
@Override
public void setConf(Configuration conf) {
this.conf = conf;
}
// inherit javadoc
@Override
public Configuration getConf() {
return conf;
}
}继承关系如下:
再看ToolRunner类的一部分:
下面两个是重载函数:
public static int run(Configuration conf, Tool tool, String[] args)
throws Exception{
if(conf == null) {
conf = new Configuration();
}
GenericOptionsParser parser = new GenericOptionsParser(conf, args);
//set the configuration back, so that Tool can configure itself
tool.setConf(conf);
//get the args w/o generic hadoop args
String[] toolArgs = parser.getRemainingArgs();
return tool.run(toolArgs);
}
/**
* Runs the <code>Tool</code> with its <code>Configuration</code>.
*
* Equivalent to <code>run(tool.getConf(), tool, args)</code>.
*
* @param tool <code>Tool</code> to run.
* @param args command-line arguments to the tool.
* @return exit code of the {@link Tool#run(String[])} method.
*/
public static int run(Tool tool, String[] args)
throws Exception{
return run(tool.getConf(), tool, args);
}
从上面两个ToolRunner的静态方法run()可以看到,处理hadoop的通用命令行参数,然后将args交给tool来处理,再由tool来运行自己的run方法。
这里在强调一下:以下面函数为准
/**
* Runs the given <code>Tool</code> by {@link Tool#run(String[])}, after
* parsing with the given generic arguments. Uses the given
* <code>Configuration</code>, or builds one if null.
*
* Sets the <code>Tool</code>'s configuration with the possibly modified
* version of the <code>conf</code>.
*
* @param conf <code>Configuration</code> for the <code>Tool</code>.
* @param tool <code>Tool</code> to run.
* @param args command-line arguments to the tool.
* @return exit code of the {@link Tool#run(String[])} method.
*/
public static int run(Configuration conf, Tool tool, String[] args)
throws Exception{
if(conf == null) {
conf = new Configuration();
}
GenericOptionsParser parser = new GenericOptionsParser(conf, args);
//set the configuration back, so that Tool can configure itself
tool.setConf(conf);
//get the args w/o generic hadoop args
String[] toolArgs = parser.getRemainingArgs();
return tool.run(toolArgs);
}
Tool是一个接口,ToolRunner是一个类,ToolRunner类里面的run函数,如下
public static int run(Configuration conf, Tool tool, String[] args)
throws Exception{
.............................
return tool.run(toolArgs);
}
这个函数把二者给结合起来了,也就是说我们ToolRunner的run方法本质是调用的tool的run方法。而run方法,则是BookCount类继承了tool,然后重写了
@Override
public int run(String[] args) throws Exception
这个run方法里,我们把job的各种设置由驱动主函数mian()移植到run方法中,然后驱动函数main()通过 ToolRunner.run(conf, new BookCount(), args);调用这个重写方法,代码可以参考下面
static/image/hrline/2.gif
我们run重写run方法
@Override
public int run(String[] args) throws Exception {
try {
//函数实现
} catch (Exception e) {
logger.error(e.getMessage());
e.printStackTrace();
}
return 0;
}
驱动主函数:
public class BookCount extends Configured implements Tool {
public static final Logger logger = Logger.getLogger(BookCount.class);
public static void main(String[] args) throws Exception {
PropertyConfigurator.configure("conf/log4j.properties");
logger.info("BookCountNew starting");
System.setProperty("HADOOP_USER_NAME", "hduser");
Configuration conf = new Configuration();
int res = ToolRunner.run(conf, new BookCount(), args);
logger.info("BookCountNew end");
System.exit(res);
}
真心不错.楼主搞得很好。
页:
[1]