分享

hadoop2.2.0 简易集群安装

      
环境: 系统 CentOS 6.4 64位 * 3
hdmaster 192.168.233.128
hdslave1  192.168.233.129
hdslave2  192.168.233.130
软件: Hadoop-2.2.0-src.tar.gz
hadoop用户hduser
1,创建用户:
groupadd hadoop
useradd hduser
2,修改主机名:
vi /etc/sysconfig/network
修改主机名分别对应hdmaster,hdslave1,hdslave2
3,修改hosts文件:
vi  /etc/hosts
添加:
192.168.233.128 hdmaster
192.168.233.129 hdslave1
192.168.233.130
hdslave2
4,设置各机器之间无密码登陆:
首先设置主机(master)
su hduser
ssh-keygen -t rsa (之后一路回车)生成秘钥
cd /home/hduser/.ssh
把id_rsa.pub分别追加到hdslave1和hdslave2的authorized_keys中
ssh-copy-id -i id_rsa.pub hduser@hdslave1
ssh-copy-id -i id_rsa.pub hduser@hdslave2
第一次连接会提示。。。。。。。。。yes/no  输入 yes
这样ssh hdslave1 或者 ssh hdslave2就不用输入密码了
然后在hdslave1和hdslave2上如此操作使其ssh到hdmaster也不需要密码
5,安装java1.6以上版本,我安装的是1.7
配置环境变量:
vim /etc/profile
JAVA_HOME=/usr/local/jdk1.7.0
CLASSPATH=:$CLASSPATH:$JAVA_HOME/lib:$JAVA_HOME/jre/lib
PATH=$PATH:$JAVA_HOME/bin:$JAVA_HOME/jre/bin
MAVEN_HOME=/usr/local/maven
PATH=$PATH:$MAVEN_HOME/bin
HADOOP_HOME=/home/hduser/hadoop-2.2.0
PATH=$PATH:$HADOOP_HOME/bin
export PATH USER LOGNAME MAIL HOSTNAME HISTSIZE HISTCONTROL JAVA_HOME CLASSPATH MAVEN_HOME HADOOP_HOME

6,安装编译hadoop源码需要的程序:
使用root权限执行:su root
         
yum  -y  install  svn  ncurses-devel  gcc*
yum -y install lzo-devel zlib-devel autoconf automake libtool cmake openssl-devel
安装mave,
下载apache-maven-3.1.1-bin.tar.gz  
   
    tar zxvf  apache-maven-3.1.1-bin.tar.gz  
m
v  apache-maven-3.1.1  maven
配置环境变量(如: @java部分
    下载:wget http://code.google.com/p/protobuf/downloads/detail?name=protobuf-2.5.0.tar.gz

tar zxvf  protobuf-2.5.0.tar.g
编译安装protobuf
① cd  protobuf-2.5.0  
② ./configure
③ make
④ make install
7,下载编译hadoop2.2.0
wget http://apache.claz.org/hadoop/co ... op-2.2.0-src.tar.gz
tar zxvf hadoop-2.2.0-src.tar.gz
cd hadoop-2.2.0-src/
源码里面依赖有缺失:
vim hadoop-common-project/hadoop-auth/pom.xml
添加一个依赖
<dependency>
      <groupId>org.mortbay.jetty</groupId>
      <artifactId>jetty-util</artifactId>
      <scope>test</scope>
    </dependency>
mvn clean package -Pdist,native -DskipTests -Dtar
    编译需要半个到一个小时,耐心等待。。。
编译完以后生成 hadoop-dist/target/hadoop-2.2.0.tar.gz
然后解压 到/home/hduser目录下面
cp hadoop-2.2.0.tar.gz  /home/hduser/
tar zxvf hadoop-2.2.0.tar.gz
chown -R hduser:hadoop hadoop-2.2.0
然后修改环境变量 vim /etc/profile (参考@Java部分)
7 配置hadoop:
su hduser
cd /home/hduser/hadoop-2.2.0/etc/hadoop
vim hadoop-env.sh
export JAVA_HOME=/usr/local/jdk1.7.0
vim yarn-env.sh
export JAVA_HOME=/usr/local/jdk1.7.0
然后修改四个xml文件:
vim core-site.xml
<configuration>
    <property>
      <name>hadoop.common.configuration.version</name>
      <value>0.23.0</value>
      <description>version of this configuration file</description>
    </property>
   
    <property>
      <name>fs.defaultFS</name>
      <value>hdfs://hdmaster:9000</value>
    </property>
    <property>
      <name>io.file.buffer.size</name>
      <value>131072</value>
    </property>
    <property>
      <name>hadoop.tmp.dir</name>
      <value>/home/hduser/hadoop-2.2.0/temp</value>
      <description>A base for other temporary directories.</description>
    </property>
</configuration>

vim hdfs-site.xml(不要配置 dfs.datanode.address 等端口属性,否则会出错-BindException)
<configuration>
    <property>
      <name>dfs.namenode.name.dir</name>
      <!-- <value>file://${hadoop.tmp.dir}/dfs/name</value>-->
      <value>file:/home/hduser/hadoop-2.2.0/dfs/name</value>
    </property>
    <property>
        <!-- <value>file://${hadoop.tmp.dir}/dfs/data</value>-->
      <name>dfs.datanode.data.dir</name>
      <value>file:/home/hduser/hadoop-2.2.0/dfs/data</value>
    </property>
    <property>
      <name>dfs.blocksize</name>
      <value>134217728</value>
    </property>
    <property>
      <name>dfs.namenode.handler.count</name>
      <value>10</value>
      <description>The number of server threads for the namenode.</description>
    </property>
   
    <property>
      <name>dfs.datanode.handler.count</name>
      <value>10</value>
      <description>The number of server threads for the datanode.</description>
    </property>
   
    <property>
      <name>dfs.replication</name>
      <value>3</value>
      <description>Default block replication.
      The actual number of replications can be specified when the file is created.
      The default is used if replication is not specified in create time.
      </description>
    </property>
    <property>
      <name>dfs.webhdfs.enabled</name>
      <value>true</value>
      <description>
        Enable WebHDFS (REST API) in Namenodes and Datanodes.
      </description>
    </property>
    <property>
      <name>dfs.namenode.secondary.http-address</name>
      <value>hdmaster:50090</value>
      <description>
        The secondary namenode http server address and port.
      </description>
    </property>
</configuration>

vim mapred-site.xml
<configuration>
    <!--Configurations for MapReduce Applications:-->
    <property>
      <name>mapreduce.framework.name</name>
      <value>yarn</value>
      <description>The runtime framework for executing MapReduce jobs.
      Can be one of local, classic or yarn.
      </description>
    </property>
    <property>
      <name>mapreduce.map.memory.mb</name>
      <value>1536</value>
      <description>Larger resource limit for maps.</description>
    </property>
    <property>
      <name>mapreduce.map.java.opts</name>
      <value>-Xmx1024M</value>
      <description>    Larger heap-size for child jvms of maps.</description>
    </property>
    <property>
      <name>mapreduce.reduce.memory.mb</name>
      <value>3072</value>
      <description>    Larger resource limit for reduces.</description>
    </property>
    <property>
      <name>mapreduce.reduce.java.opts</name>
      <value>-Xmx2560M</value>
      <description>    Larger heap-size for child jvms of maps.</description>
    </property>
    <property>
      <name>mapreduce.task.io.sort.mb</name>
      <value>512</value>
      <description>    Higher memory-limit while sorting data for efficiency.</description>
    </property>
    <property>
      <name>mapreduce.task.io.sort.factor</name>
      <value>100</value>
      <description>    More streams merged at once while sorting files.</description>
    </property>
    <property>
      <name>mapreduce.reduce.shuffle.parallelcopies</name>
      <value>50</value>
      <description>Higher number of parallel copies run by reduces to fetch outputs from very large number of maps.</description>
    </property>
    <!--Configurations for MapReduce JobHistory Server:-->
    <property>
      <name>mapreduce.jobhistory.address</name>
      <value>hdmaster:10020</value>
      <description>MapReduce JobHistory Server IPC host:port</description>
    </property>
    <property>
      <name>mapreduce.jobhistory.webapp.address</name>
      <value>hdmaster:19888</value>
      <description>MapReduce JobHistory Server Web UI host:port</description>
    </property>
    <property>
      <name>mapreduce.jobhistory.intermediate-done-dir</name>
      <value>${yarn.app.mapreduce.am.staging-dir}/history/done_intermediate</value>
      <description>    Directory where history files are written by MapReduce jobs.</description>
    </property>
    <property>
      <name>mapreduce.jobhistory.done-dir</name>
      <value>${yarn.app.mapreduce.am.staging-dir}/history/done</value>
      <description>    Directory where history files are managed by the MR JobHistory Server.</description>
    </property>
</configuration>

vim yarn-site.xml(注释部分可选)
<configuration>
    <!--Configurations for ResourceManager and NodeManager:-->
    <property>
    <name>yarn.acl.enable</name>
    <value>true</value>
  </property>
  <property>
    <description>ACL of who can be admin of the YARN cluster.</description>
    <name>yarn.admin.acl</name>
    <value>*</value>
  </property>
    <property>
    <description>Whether to enable log aggregation</description>
    <name>yarn.log-aggregation-enable</name>
    <value>false</value>
  </property>
<!--Configurations for ResourceManager:-->
  <property>
    <description>The hostname of the RM.</description>
    <name>yarn.resourcemanager.hostname</name>
    <value>hdmaster</value>
  </property>
  <property>
    <description>The address of the applications manager interface in the RM.</description>
    <name>yarn.resourcemanager.address</name>
    <value>${yarn.resourcemanager.hostname}:8032</value>
  </property>
  <property>
    <description>The address of the scheduler interface.</description>
    <name>yarn.resourcemanager.scheduler.address</name>
    <value>${yarn.resourcemanager.hostname}:8030</value>
  </property>
  <property>
    <name>yarn.resourcemanager.resource-tracker.address</name>
    <value>${yarn.resourcemanager.hostname}:8031</value>
  </property>
  <property>
    <description>The address of the RM admin interface.</description>
    <name>yarn.resourcemanager.admin.address</name>
    <value>${yarn.resourcemanager.hostname}:8033</value>
  </property>
  <property>
    <description>The http address of the RM web application.</description>
    <name>yarn.resourcemanager.webapp.address</name>
    <value>${yarn.resourcemanager.hostname}:8088</value>
  </property>
  <property>
    <description>The class to use as the resource scheduler.</description>
    <name>yarn.resourcemanager.scheduler.class</name>
    <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler</value>
  </property>
  <property>
    <description>The maximum allocation for every container request at the RM,
    in MBs. Memory requests higher than this won't take effect,
    and will get capped to this value.</description>
    <name>yarn.scheduler.maximum-allocation-mb</name>
    <value>8192</value>
  </property>
  <property>
    <description>Path to file with nodes to include.</description>
    <name>yarn.resourcemanager.nodes.include-path</name>
    <value></value>
  </property>
  <property>
    <description>Path to file with nodes to exclude.</description>
    <name>yarn.resourcemanager.nodes.exclude-path</name>
    <value></value>
  </property>
    <!--Configurations for NodeManager:-->
    <property>
    <description>Amount of physical memory, in MB, that can be allocated
    for containers.</description>
    <name>yarn.nodemanager.resource.memory-mb</name>
    <value>8192</value>
  </property>
  <property>
    <description>Ratio between virtual memory to physical memory when
    setting memory limits for containers. Container allocations are
    expressed in terms of physical memory, and virtual memory usage
    is allowed to exceed this allocation by this ratio.
    </description>
    <name>yarn.nodemanager.vmem-pmem-ratio</name>
    <value>2.1</value>
  </property>
  <property>
    <description>List of directories to store localized files in. An
      application's localized file directory will be found in:
      ${yarn.nodemanager.local-dirs}/usercache/${user}/appcache/application_${appid}.
      Individual containers' work directories, called container_${contid}, will
      be subdirectories of this.
   </description>
    <name>yarn.nodemanager.local-dirs</name>
    <value>${hadoop.tmp.dir}/nm-local-dir</value>
  </property>
  <property>
    <description>
      Where to store container logs. An application's localized log directory
      will be found in ${yarn.nodemanager.log-dirs}/application_${appid}.
      Individual containers' log directories will be below this, in directories
      named container_{$contid}. Each container directory will contain the files
      stderr, stdin, and syslog generated by that container.
    </description>
    <name>yarn.nodemanager.log-dirs</name>
    <value>${yarn.log.dir}/userlogs</value>
  </property>
  <property>
    <description>Time in seconds to retain user logs. Only applicable if
    log aggregation is disabled
    </description>
    <name>yarn.nodemanager.log.retain-seconds</name>
    <value>10800</value>
  </property>
  <property>
    <description>Where to aggregate logs to.</description>
    <name>yarn.nodemanager.remote-app-log-dir</name>
    <value>/tmp/logs</value>
  </property>
  <property>
    <description>The remote log dir will be created at
      {yarn.nodemanager.remote-app-log-dir}/${user}/{thisParam}
    </description>
    <name>yarn.nodemanager.remote-app-log-dir-suffix</name>
    <value>logs</value>
  </property>
  <property>
    <description>the valid service name should only contain a-zA-Z0-9_ and can not start with numbers</description>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
  </property>
  <!--<property>
     <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
     <value>org.apache.hadoop.mapred.ShuffleHandler</value>
  </property>-->
<!--Configurations for History Server (Needs to be moved elsewhere):-->   
    <property>
    <description>How long to keep aggregation logs before deleting them.  -1 disables.
    Be careful set this too small and you will spam the name node.</description>
    <name>yarn.log-aggregation.retain-seconds</name>
    <value>-1</value>
  </property>
  <property>
    <description>How long to wait between aggregated log retention checks.
    If set to 0 or a negative value then the value is computed as one-tenth
    of the aggregated log retention time. Be careful set this too small and
    you will spam the name node.</description>
    <name>yarn.log-aggregation.retain-check-interval-seconds</name>
    <value>-1</value>
  </property>
  <!--The following parameters can be used to control the node health monitoring:-->
  <!--<property>
    <description>The health check script to run.</description>
    <name>yarn.nodemanager.health-checker.script.path</name>
    <value></value>
  </property>
  <property>
    <description>The arguments to pass to the health check script.</description>
    <name>yarn.nodemanager.health-checker.script.opts</name>
    <value></value>
  </property>
  <property>
    <description>Time interval for running health script.</description>
    <name>yarn.nodemanager.health-checker.script.interval-ms</name>
    <value></value>
  </property>
  <property>
    <description>Timeout for health script execution.</description>
    <name>yarn.nodemanager.health-checker.script.timeout-ms</name>
    <value></value>
  </property>
  -->
</configuration>

    把所有配置文件同步到各台机器上
然后在hdmaster上启动
/home/hduser/hadoop-2.2.0/bin/hdfs namenode -format
/home/hduser/hadoop-2.2.0/sbin/start-dfs.sh
此时执行 jps 发现hdmaster上面有两个hadoop的进程:         
SecondaryNameNode
NameNode
hdslave1和hdslave2上有一个进程:
DataNode
然后执行 /home/hduser/hadoop-2.2.0/sbin/start-dfs.sh
此时执行 jps 发现hdmaster上面有三个hadoop的进程:
ResourceManager
SecondaryNameNode
NameNode
hdslave1和hdslave2上有两个进程:
NodeManager
DataNode
访问页面:
   
                 
                 
    hadoop.zip (50.3 KB, 下载次数: 5)

已有(0)人评论

跳转到指定楼层
问心有仙 发表于 2013-12-4 22:11:50
怎么都没人看看啊,真不给面子
回复

使用道具 举报

您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

关闭

推荐上一条 /2 下一条