eclipse中开发Hadoop2.x的Map/Reduce项目汇总-Hadoop|YARN-About云-梭伦科技

admin 发表于 2014-4-29 20:18:11

eclipse中开发Hadoop2.x的Map/Reduce项目汇总

问题导读：1.如何创建MR程序？2.如何配置运行参数？3.HADOOP_HOME为空会出现什么问题？4.hadoop-common-2.2.0-bin-master/bin的作用是什么？扩展：4.winutils.exe是什么？
static/image/hrline/4.gif

本文总结了两个例子，分别从不同角度。一、eclipse中开发Hadoop2.x的Map/Reduce项目本文演示如何在Eclipse中开发一个Map/Reduce项目：1、环境说明
[*]Hadoop2.2.0
[*]Eclipse Juno SR2
[*]Hadoop2.x-eclipse-plugin 插件的编译安装配置的过程参考：http://www.micmiu.com/bigdata/hadoop/hadoop2-x-eclipse-plugin-build-install/
2、新建MR工程依次点击 File → New → Ohter…选择 “Map/Reduce Project”，然后输入项目名称:micmiu_MRDemo，创建新项目:

3、创建Mapper和Reducer
依次点击 File → New → Ohter… 选择Mapper，自动继承Mapper<KEYIN, VALUEIN, KEYOUT, VALUEOUT>

创建Reducer的过程同Mapper，具体的业务逻辑自己实现即可。本文就以官方自带的WordCount为例进行测试：
package com.micmiu.mr;

/**
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements.See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership.The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License.You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

import java.io.IOException;
import java.util.StringTokenizer;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;

public class WordCount {

public static class TokenizerMapper
   extends Mapper<Object, Text, Text, IntWritable>{

private final static IntWritable one = new IntWritable(1);
private Text word = new Text();

public void map(Object key, Text value, Context context
               ) throws IOException, InterruptedException {
   StringTokenizer itr = new StringTokenizer(value.toString());
   while (itr.hasMoreTokens()) {
   word.set(itr.nextToken());
   context.write(word, one);
   }
}
}

public static class IntSumReducer
   extends Reducer<Text,IntWritable,Text,IntWritable> {
private IntWritable result = new IntWritable();

public void reduce(Text key, Iterable<IntWritable> values,
                  Context context
                  ) throws IOException, InterruptedException {
   int sum = 0;
   for (IntWritable val : values) {
   sum += val.get();
   }
   result.set(sum);
   context.write(key, result);
}
}

public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
if (otherArgs.length != 2) {
   System.err.println("Usage: wordcount <in> <out>");
   System.exit(2);
}
//conf.set("fs.defaultFS", "hdfs://192.168.6.77:9000");
Job job = new Job(conf, "word count");
job.setJarByClass(WordCount.class);
job.setMapperClass(TokenizerMapper.class);
job.setCombinerClass(IntSumReducer.class);
job.setReducerClass(IntSumReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path(otherArgs));
FileOutputFormat.setOutputPath(job, new Path(otherArgs));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}

4、准备测试数据micmiu-01.txt：
Hi Michael welcome to Hadoop
more see micmiu.com
micmiu-02.txt：

Hi Michael welcome to BigData
more see micmiu.com
micmiu-03.txt：

Hi Michael welcome to Spark
more see micmiu.com
把 micmiu 打头的三个文件上传到hdfs：

micmiu-mbp:Downloads micmiu$ hdfs dfs -copyFromLocal micmiu-*.txt /user/micmiu/test/input
micmiu-mbp:Downloads micmiu$ hdfs dfs -ls /user/micmiu/test/input
Found 3 items
-rw-r--r-- 1 micmiu supergroup       50 2014-04-15 14:53 /user/micmiu/test/input/micmiu-01.txt
-rw-r--r-- 1 micmiu supergroup       50 2014-04-15 14:53 /user/micmiu/test/input/micmiu-02.txt
-rw-r--r-- 1 micmiu supergroup       49 2014-04-15 14:53 /user/micmiu/test/input/micmiu-03.txt
micmiu-mbp:Downloads micmiu$

5、配置运行参数
Run As → Run Configurations… ，在Arguments中配置运行参数，例如程序的输入参数：
6、运行
Run As -> Run on Hadoop ，执行完成后可以看到如下信息：
到此Eclipse中调用Hadoop2x本地伪分布式模式执行MR演示成功。ps：调用集群环境MR运行一直失败，暂时没有找到原因。
static/image/hrline/2.gif
上面说了一个整体的过程，下面详细描述了遇到的问题
二、Win7 Eclipse调试Centos Hadoop2.2-Mapreduce

1.搭建了一套Centos5.3 + Hadoop2.2 + Hbase0.96.1.1的开发环境，Win7 Eclipse调试MapReduce成功。
2. Hadoop安装

MapReduce的配置可以参考http://blog.sina.com.cn/s/blog_546abd9f0101i8b8.html。安装成功后，能顺利查看以下几个页面，就OK了。我的集群环境是200master，201-203slave。dfs.http.address 192.168.1.200:50070dfs.secondary.http.address192.168.1.200:50090dfs.datanode.http.address192.168.1.201:50075yarn.resourcemanager.webapp.address192.168.1.200:50030mapreduce.jobhistory.webapp.address 192.168.1.200:19888。这个好像访问不了。需要启动hadoop/sbin/mr-jobhistory-daemon.sh start historyserver才可以访问。三. Hadoop2.x eclispe-plugin需要注意一点的是，Hadoop installation directory里填写Win下的hadoop home地址，其目的在于创建MapReduce Project能从这个地方自动引入MapReduce需要的jar。插件可以从下面下载：Hadoop 2.2.0编译hadoop-eclipse-plugin插件
hadoop-eclipse-plugin-2.2.0.jar插件包分享
四. 各种问题1.上面一步完成后，创建一个MapReduce Project，运行时发现出问题了。java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries.

跟代码就去发现是HADOOP_HOME的问题。如果HADOOP_HOME为空，必然fullExeName为null\bin\winutils.exe。解决方法很简单啦，乖乖的配置环境变量吧，不想重启电脑可以在MapReduce程序里加上System.setProperty("hadoop.home.dir", "...");暂时缓缓。org.apache.hadoop.util.Shell.javapublic static final String getQualifiedBinPath(String executable)
throws IOException {
// construct hadoop bin path to the specified executable
String fullExeName = HADOOP_HOME_DIR + File.separator + "bin"
   + File.separator + executable;

File exeFile = new File(fullExeName);
if (!exeFile.exists()) {
   throw new IOException("Could not locate executable " + fullExeName
   + " in the Hadoop binaries.");
}

return exeFile.getCanonicalPath();
}

private static String HADOOP_HOME_DIR = checkHadoopHome();
private static String checkHadoopHome() {

// first check the Dflag hadoop.home.dir with JVM scope
String home = System.getProperty("hadoop.home.dir");

// fall back to the system/user-global env variable
if (home == null) {
   home = System.getenv("HADOOP_HOME");
}
...
}
2.这个时候得到完整的地址fullExeName，我机器上是D:\Hadoop\tar\hadoop-2.2.0\hadoop-2.2.0\bin\winutils.exe。继续执行代码又发现了错误Could not locate executable D:\Hadoop\tar\hadoop-2.2.0\hadoop-2.2.0\bin\winutils.exe in the Hadoop binaries.
就去一看，没有winutils.exe这个东西。去https://github.com/srccodes/hadoop-common-2.2.0-bin下载一个，放就去即可。3.继续出问题at org.apache.hadoop.util.Shell.execCommand(Shell.java:661)
at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:639)
at org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:435)继续跟代码org.apache.hadoop.util.Shell.javapublic static String[] getSetPermissionCommand(String perm, boolean recursive,
                                             String file) {
String[] baseCmd = getSetPermissionCommand(perm, recursive);
String[] cmdWithFile = Arrays.copyOf(baseCmd, baseCmd.length + 1);
cmdWithFile = file;
return cmdWithFile;
}

/** Return a command to set permission */
public static String[] getSetPermissionCommand(String perm, boolean recursive) {
if (recursive) {
   return (WINDOWS) ? new String[] { WINUTILS, "chmod", "-R", perm }
                     : new String[] { "chmod", "-R", perm };
} else {
   return (WINDOWS) ? new String[] { WINUTILS, "chmod", perm }
                  : new String[] { "chmod", perm };
}
}
cmdWithFile数组的内容为{"D:\Hadoop\tar\hadoop-2.2.0\hadoop-2.2.0\bin\winutils.exe", "chmod", "755", "xxxfile"}，我把这个单独在cmd里执行了一下，发现无法启动此程序，因为计算机中丢失 MSVCR100.dll
那就下载一个呗http://files.cnblogs.com/sirkevin/msvcr100.rar，丢到C:\Windows\System32里面。再次cmd执行，又来了问题应用程序无法正常启动(0xc000007b)

下载http://blog.csdn.net/vbcom/article/details/7245186 ,DirectX_Repair来解决这个问题吧。记得修复完后要重启电脑。搞定后cmd试一下，很棒。4.到了这里，已经看到曙光了，但问题又来了Exception in thread "main" java.lang.UnsatisfiedLinkError: org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Ljava/lang/String;I)Z
代码就去 /** Windows only method used to check if the current process has requested
*access rights on the given path. */
private static native boolean access0(String path, int requestedAccess);
显然缺少dll文件，还记得https://github.com/srccodes/hadoop-common-2.2.0-bin下载的东西吧，里面就有hadoop.dll，最好的方法就是用hadoop-common-2.2.0-bin-master/bin目录替换本地hadoop的bin目录，并在环境变量里配置PATH=HADOOP_HOME/bin，重启电脑。
5.终于看到了MapReduce的正确输出output99。

五. 总结
hadoop eclipse插件不是必须的，其作用在我看来就是如下三点(这个是一个错误的认识，具体请参考http://zy19982004.iteye.com/blog/2031172)。study-hadoop是一个普通project，直接运行(不通过Run on Hadoop这只大象)，一样可以调试到MapReduce。
对hadoop中的文件可视化。
创建MapReduce Project时帮你引入依赖的jar。
Configuration conf = new Configuration();时就已经包含了所有的配置信息。
还是自己下载hadoop2.2的源码编译好，应该是不会有任何问题的(没有亲测)。

六. 其它问题

1.还是
Exception in thread "main" java.lang.UnsatisfiedLinkError: org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Ljava/lang/String;I)Z
代码跟到org.apache.hadoop.util.NativeCodeLoader.java去看
static {
// Try to load native hadoop library and set fallback flag appropriately
if(LOG.isDebugEnabled()) {
   LOG.debug("Trying to load the custom-built native-hadoop library...");
}
try {
   System.loadLibrary("hadoop");
   LOG.debug("Loaded the native-hadoop library");
   nativeCodeLoaded = true;
} catch (Throwable t) {
   // Ignore failure to load
   if(LOG.isDebugEnabled()) {
   LOG.debug("Failed to load native-hadoop with error: " + t);
   LOG.debug("java.library.path=" +
         System.getProperty("java.library.path"));
   }
}

if (!nativeCodeLoaded) {
   LOG.warn("Unable to load native-hadoop library for your platform... " +
            "using builtin-java classes where applicable");
}
}
这里报错如下
DEBUG org.apache.hadoop.util.NativeCodeLoader - Failed to load native-hadoop with error: java.lang.UnsatisfiedLinkError: HADOOP_HOME\bin\hadoop.dll: Can't load AMD 64-bit .dll on a IA 32-bit platform

怀疑是32位jdk的问题，替换成64位后，没问题了
2014-03-11 19:43:08,805 DEBUG org.apache.hadoop.util.NativeCodeLoader - Trying to load the custom-built native-hadoop library...
2014-03-11 19:43:08,812 DEBUG org.apache.hadoop.util.NativeCodeLoader - Loaded the native-hadoop library
这里也解决了一个常见的警告
WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

zhangcd123 发表于 2014-4-30 14:17:54

你好，执行mapreduce的时候有没有遇到过这种错误
14/04/28 18:10:48 INFO mapreduce.Job: Job job_1398679196466_0003 failed with state FAILED due to: Application application_1398679196466_0003 failed 2 times due to AM Container for appattempt_1398679196466_0003_000002 exited withexitCode: 126 due to: Exception from container-launch:
org.apache.hadoop.util.Shell$ExitCodeException:
   at org.apache.hadoop.util.Shell.runCommand(Shell.java:464)
   at org.apache.hadoop.util.Shell.run(Shell.java:379)
   at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589)
   at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
   at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:283)
   at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:79)
   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
   at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
   at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
   at java.lang.Thread.run(Thread.java:662)

.Failing this attempt.. Failing the application.
14/04/28 18:10:48 INFO mapreduce.Job: Counters: 0

pig2 发表于 2014-4-30 15:17:46

本帖最后由 pig2 于 2014-4-30 15:19 编辑

zhangcd123 发表于 2014-4-30 14:17
你好，执行mapreduce的时候有没有遇到过这种错误
14/04/28 18:10:48 INFO mapreduce.Job: Job job_1398679 ...
参考一下这个，出现的错误跟你的类似
Eclipse调用hadoop2运作MR程序

fzleejm 发表于 2014-5-24 12:25:23

谢谢分享。。。。

wuzhenmin2008 发表于 2014-5-27 14:17:02

小手一抖，积分到手。

enson16855 发表于 2014-7-12 20:18:48

程序都是一样，为什么我的执行完了output木有输出？执行过程中也没有报错~~~请大侠赐教~

pig2 发表于 2014-7-12 21:04:30

enson16855 发表于 2014-7-12 20:18
程序都是一样，为什么我的执行完了output木有输出？执行过程中也没有报错~~~请大侠赐教~

这个是程序的问题，你把这个代码放到你的程序里，然后运行试一下。新手指导，该如何在开发环境中，创建mapreduce程序

fdfdggg 发表于 2015-7-6 15:06:49

学习一下，mark.

jack_zhang 发表于 2016-7-21 11:05:01

超级棒，支持！虽然，我已经调试通过了！谢谢版主

页: [1]

About云-梭伦科技's Archiver

eclipse中开发Hadoop2.x的Map/Reduce项目汇总