教你如何查看API及使用hadoop新api编程:hadoop2.4新api与旧api调用例子对比说明
问题导读:
一直想写hadoop新旧api之间的关系,这对于爱好编程的程序猿来讲,是必备的。
1.hadoop中mapred与mapreduce包,那个是被弃用的?
2.hadoop旧api如何初始化job?
3.hadoop新api使用那个函数来初始化job对象?
static/image/hrline/4.gif
程序说明:
下面的mapreduce程序的功能只是计算文件booklist.log的行数,最后输出结果。
分别调用旧包和新包的方法编写了两分带有main函数的java代码。
a,新建了mapreduce工程后,先把hadoop的配置目录下的xml都拷贝到src目录下。
b,在工程src同级目录旁建立conf目录,并放一个log4j.properties文件。
c, src目录下建立bookCount目录,然后再添加后面的子java文件。
d, 右击"run as application"或选择hadoop插件菜单"run on hadoop"来触发执行MapReduce程序即可运行。
static/image/hrline/1.gif
生成要分析的输入文件
vi booklist.log
添加以下内容即可:
bookname
bookname
bookname
bookname
bookname
bookname
bookname
bookname
bookname
bookname
bookname
bookname
保存退出。
执行的前请通过hdfs的copyFromLocal命令拷贝到hdfs的/user/hduser用户目录下。
static/image/hrline/1.gif
老API使用mapred包的代码
文件BookCount.java:
package bookCount;
import java.io.IOException;
import java.util.Iterator;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.FileInputFormat;
import org.apache.hadoop.mapred.FileOutputFormat;
import org.apache.hadoop.mapred.JobClient;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.mapred.MapReduceBase;
import org.apache.hadoop.mapred.Mapper;
import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapred.Reducer;
import org.apache.hadoop.mapred.Reporter;
import org.apache.log4j.Logger;
import org.apache.log4j.PropertyConfigurator;
public class BookCount {
public static Logger logger = Logger.getLogger(BookCount.class);
public static void main(String[] args) throws IOException {
PropertyConfigurator.configure("conf/log4j.properties");
logger = Logger.getLogger(BookCount.class);
logger.info("AnaSpeedMr starting");
System.setProperty("HADOOP_USER_NAME", "hduser");
JobConf conf = new JobConf(BookCount.class);
conf.setJobName("bookCount_sample_job");
FileInputFormat.setInputPaths(conf, new Path("booklist.log"));
FileOutputFormat.setOutputPath(conf, new Path("booklistResultDir"));
conf.setMapperClass(BookCountMapper.class);
conf.setReducerClass(BookCountReducer.class);
conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(IntWritable.class);
JobClient.runJob(conf);
}
static class BookCountMapper extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable> {
@Override
public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException {
output.collect(new Text("booknum"), new IntWritable(1));
logger.info("foxson_mapper_ok");
System.out.println("foxsonMapper");
}
}
static class BookCountReducer extends MapReduceBase implements Reducer<Text, IntWritable, Text, LongWritable> {
@Override
public void reduce(Text key, Iterator<IntWritable> values, OutputCollector<Text, LongWritable> output, Reporter reporter) throws IOException {
long sumBookNum= 0;
while (values.hasNext()) {
sumBookNum =sumBookNum+1;
values.next();
}
logger.info("foxson_BookCountReducer_ok");
output.collect(key, new LongWritable(sumBookNum));
System.out.println("foxsonReduce");
}
}
}
static/image/hrline/1.gif
新API使用mapreduce包的例子
文件BookCountNew.java:
package bookCount;
import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
import org.apache.log4j.Logger;
import org.apache.log4j.PropertyConfigurator;
public class BookCountNew extends Configured implements Tool {
public static final Logger logger = Logger.getLogger(BookCountNew.class);
public static void main(String[] args) throws Exception {
PropertyConfigurator.configure("conf/log4j.properties");
logger.info("BookCountNew starting");
System.setProperty("HADOOP_USER_NAME", "hduser");
Configuration conf = new Configuration();
int res = ToolRunner.run(conf, new BookCountNew(), args);
logger.info("BookCountNew end");
System.exit(res);
}
@Override
public int run(String[] arg0) throws Exception {
try {
Configuration conf = getConf();
Job job = Job.getInstance(conf, "bookCount_new_sample_job");
job.setJarByClass(getClass());
job.setMapperClass(BookCountMapper.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);
job.setReducerClass(BookCountReducer.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
TextInputFormat.addInputPath(job, new Path("booklist.log"));
TextOutputFormat.setOutputPath(job, new Path("booklistResultDir"));
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
System.exit(job.waitForCompletion(true) ? 0 : 1);
} catch (Exception e) {
logger.error(e.getMessage());
e.printStackTrace();
}
return 0;
}
static class BookCountMapper extends Mapper<LongWritable, Text, Text, IntWritable> {
@Override
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
context.write(new Text("booknum"), new IntWritable(1));
logger.info("foxson_mapper_ok");
System.out.println("foxsonMapper");
}
}
static class BookCountReducer extends Reducer<Text, IntWritable, Text, LongWritable> {
@Override
public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
long sumBookNum = 0;
for (IntWritable value : values) {
sumBookNum = sumBookNum + 1;
}
logger.info("foxson_BookCountReducer_ok");
context.write(key, new LongWritable(sumBookNum));
System.out.println("foxsonReduce");
}
}
}
static/image/hrline/1.gif
上面例子大家可以用来学习,这里在交给大家该如何学习查看api,
咱们还是以上面为例:
1.查看hadoop2.4在线api
首先打开下面链接
http://hadoop.apache.org/docs/r2.4.0/api/index.html
打开之后,我们说一下查看顺序:
如下图所示:
1-->2-->3的顺序
也就是说:如果想了解这个包都包含哪些类接口等需要查看2区域,想看类和接口的详细信息,比如包含哪些函数,函数有什么功能,查看3区域。
2.旧api的各个函数及实例
我们这里以jobconf为例:
从上图查看顺序,我们得到下面代码:
// Create a new JobConf
JobConf job = new JobConf(new Configuration(), MyJob.class);
// Specify various job-specific parameters
job.setJobName("myjob");
FileInputFormat.setInputPaths(job, new Path("in"));
FileOutputFormat.setOutputPath(job, new Path("out"));
job.setMapperClass(MyJob.MyMapper.class);
job.setCombinerClass(MyJob.MyReducer.class);
job.setReducerClass(MyJob.MyReducer.class);
job.setInputFormat(SequenceFileInputFormat.class);
job.setOutputFormat(SequenceFileOutputFormat.class);
3.新api的各个函数及实例
给了这么个例子:
// Create a new Job
Job job = new Job(new Configuration());
job.setJarByClass(MyJob.class);
// Specify various job-specific parameters
job.setJobName("myjob");
job.setInputPath(new Path("in"));
job.setOutputPath(new Path("out"));
job.setMapperClass(MyJob.MyMapper.class);
job.setReducerClass(MyJob.MyReducer.class);
// Submit the job, then poll for progress until the job is complete
job.waitForCompletion(true);上面放到eclipse中,一看不对啊
带个横杠,含义就是被弃用了
下面我们继续寻找:
getInstance() 有很多重载函数,这里不需要解释什么是重载吧,面向对象估计大家学习过,重载就是函数名相同,参数个数和类型可能不同。
好吧,我们试一下这个,如上面新api就是采用这种实例化job的。同时这种实例化的方式采用的是工厂模式,工厂模式,大家也可以找找这方面的资料。
寻找api完毕,更多的函数大家可以在找找。
static/image/hrline/1.gif
相关帖子推荐:
Hadoop中mapred包和mapreduce包的区别与联系
请问有没类似jdk那种直接输入函数名,就能搜索到代码实现的hadoop工具啊!
页:
[1]