本帖最后由 pig2 于 2014-5-11 22:39 编辑
问题导读:
1.带参数hadoop程序与不带参数书写上有什么区别?
2. implements Tool 的作用是什么?
3.它们之间主函数有什么区别?
在运行hadoop程序的时候,有时候可能我们需要在各模式间切换运行,hadoop支持在运行时指定配置来方便的做到这一点
package com.sun.hadoop;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
/**
* @author sunjun
* @create 2010-7-13 下午21:39:07
*/
public class Test extends Configured implements Tool { //变化在这里,由Tool提供了此支持
public static void main(String[] args) throws Exception {
int exitCode = ToolRunner.run(new Test(), args);
System.exit(exitCode);
}
@Override
public int run(String[] args) throws Exception {
if (args.length != 2)
throw new NullPointerException("args is error.");
Job job = new Job(getConf(), "test job");
job.setJarByClass(Test.class);
job.setMapperClass(MyMapper.class);
job.setReducerClass(MyReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
job.waitForCompletion(true);
return 0;
}
}
配置文件:
hadoop-local.xml(本地模式):
-
- <?xml version="1.0"?>
- <?xml-stylesheet type="text/xsl" href="http://sunjun041640.blog.163.com/blog/configuration.xsl"?>
-
- <configuration>
-
- <property>
- <name>fs.default.name</name>
- <value>file:///</value>
- </property>
-
- <property>
- <name>dfs.replication</name>
- <value>1</value>
- </property>
-
- <property>
- <name>mapred.job.tracker</name>
- <value>local</value>
- </property>
-
- <property>
- <name>mapred.child.tmp</name>
- <value>/tmp</value>
- </property>
-
- </configuration>
复制代码
运行:
-
- $ bin/hadoop com.sun.hadoop.TestNew -conf hadoop-local.xml test-in test-out
复制代码
Tool接口
Tool 接口支持处理常用的Hadoop命令行选项。
Tool 是Map/Reduce工具或应用的标准。应用程序应只处理其定制参数, 要把标准命令行选项通过 ToolRunner.run(Tool, String[]) 委托给 GenericOptionsParser处理。
Hadoop命令行的常用选项有:
-
- -conf <configuration file>
-
- -D <property=value>
-
- -fs <local|namenode:port>
复制代码
|