环境: 系统 CentOS 6.4 64位 * 3 hdmaster 192.168.233.128 hdslave1 192.168.233.129 hdslave2 192.168.233.130 软件: Hadoop-2.2.0-src.tar.gz hadoop用户hduser 1,创建用户:
groupadd hadoop
useradd hduser
2,修改主机名:
vi /etc/sysconfig/network
修改主机名分别对应hdmaster,hdslave1,hdslave2
3,修改hosts文件:
vi /etc/hosts
添加:
192.168.233.128 hdmaster 192.168.233.129 hdslave1
192.168.233.130 hdslave2
4,设置各机器之间无密码登陆:
首先设置主机(master)
su hduser
ssh-keygen -t rsa (之后一路回车)生成秘钥
cd /home/hduser/.ssh
把id_rsa.pub分别追加到hdslave1和hdslave2的authorized_keys中
ssh-copy-id -i id_rsa.pub hduser@hdslave1
ssh-copy-id -i id_rsa.pub hduser@hdslave2
第一次连接会提示。。。。。。。。。yes/no 输入 yes
这样ssh hdslave1 或者 ssh hdslave2就不用输入密码了
然后在hdslave1和hdslave2上如此操作使其ssh到hdmaster也不需要密码
5,安装java1.6以上版本,我安装的是1.7
配置环境变量:
vim /etc/profile
JAVA_HOME=/usr/local/jdk1.7.0
CLASSPATH=:$CLASSPATH:$JAVA_HOME/lib:$JAVA_HOME/jre/lib
PATH=$PATH:$JAVA_HOME/bin:$JAVA_HOME/jre/bin
MAVEN_HOME=/usr/local/maven
PATH=$PATH:$MAVEN_HOME/bin
HADOOP_HOME=/home/hduser/hadoop-2.2.0
PATH=$PATH:$HADOOP_HOME/bin
export PATH USER LOGNAME MAIL HOSTNAME HISTSIZE HISTCONTROL JAVA_HOME CLASSPATH MAVEN_HOME HADOOP_HOME
6,安装编译hadoop源码需要的程序:
使用root权限执行:su root
yum -y install svn ncurses-devel gcc* yum -y install lzo-devel zlib-devel autoconf automake libtool cmake openssl-devel
安装mave,下载apache-maven-3.1.1-bin.tar.gz
tar zxvf apache-maven-3.1.1-bin.tar.gz
mv apache-maven-3.1.1 maven
配置环境变量(如: @java部分 下载:wget http://code.google.com/p/protobuf/downloads/detail?name=protobuf-2.5.0.tar.gz
tar zxvf protobuf-2.5.0.tar.g编译安装protobuf ① cd protobuf-2.5.0
② ./configure ③ make ④ make install 7,下载编译hadoop2.2.0
wget http://apache.claz.org/hadoop/co ... op-2.2.0-src.tar.gz
tar zxvf hadoop-2.2.0-src.tar.gz
cd hadoop-2.2.0-src/
源码里面依赖有缺失:
vim hadoop-common-project/hadoop-auth/pom.xml
添加一个依赖
<dependency>
<groupId>org.mortbay.jetty</groupId>
<artifactId>jetty-util</artifactId>
<scope>test</scope>
</dependency>
mvn clean package -Pdist,native -DskipTests -Dtar
编译需要半个到一个小时,耐心等待。。。
编译完以后生成 hadoop-dist/target/hadoop-2.2.0.tar.gz
然后解压 到/home/hduser目录下面
cp hadoop-2.2.0.tar.gz /home/hduser/
tar zxvf hadoop-2.2.0.tar.gz
chown -R hduser:hadoop hadoop-2.2.0
然后修改环境变量 vim /etc/profile (参考@Java部分)
7 配置hadoop:
su hduser
cd /home/hduser/hadoop-2.2.0/etc/hadoop
vim hadoop-env.sh
export JAVA_HOME=/usr/local/jdk1.7.0
vim yarn-env.sh
export JAVA_HOME=/usr/local/jdk1.7.0
然后修改四个xml文件:
vim core-site.xml
<configuration>
<property>
<name>hadoop.common.configuration.version</name>
<value>0.23.0</value>
<description>version of this configuration file</description>
</property>
<property>
<name>fs.defaultFS</name>
<value>hdfs://hdmaster:9000</value>
</property>
<property>
<name>io.file.buffer.size</name>
<value>131072</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hduser/hadoop-2.2.0/temp</value>
<description>A base for other temporary directories.</description>
</property>
</configuration>
vim hdfs-site.xml(不要配置 dfs.datanode.address 等端口属性,否则会出错-BindException)
<configuration>
<property>
<name>dfs.namenode.name.dir</name>
<!-- <value>file://${hadoop.tmp.dir}/dfs/name</value>-->
<value>file:/home/hduser/hadoop-2.2.0/dfs/name</value>
</property>
<property>
<!-- <value>file://${hadoop.tmp.dir}/dfs/data</value>-->
<name>dfs.datanode.data.dir</name>
<value>file:/home/hduser/hadoop-2.2.0/dfs/data</value>
</property>
<property>
<name>dfs.blocksize</name>
<value>134217728</value>
</property>
<property>
<name>dfs.namenode.handler.count</name>
<value>10</value>
<description>The number of server threads for the namenode.</description>
</property>
<property>
<name>dfs.datanode.handler.count</name>
<value>10</value>
<description>The number of server threads for the datanode.</description>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
<description>Default block replication.
The actual number of replications can be specified when the file is created.
The default is used if replication is not specified in create time.
</description>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
<description>
Enable WebHDFS (REST API) in Namenodes and Datanodes.
</description>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>hdmaster:50090</value>
<description>
The secondary namenode http server address and port.
</description>
</property>
</configuration>
vim mapred-site.xml
<configuration>
<!--Configurations for MapReduce Applications:-->
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
<description>The runtime framework for executing MapReduce jobs.
Can be one of local, classic or yarn.
</description>
</property>
<property>
<name>mapreduce.map.memory.mb</name>
<value>1536</value>
<description>Larger resource limit for maps.</description>
</property>
<property>
<name>mapreduce.map.java.opts</name>
<value>-Xmx1024M</value>
<description> Larger heap-size for child jvms of maps.</description>
</property>
<property>
<name>mapreduce.reduce.memory.mb</name>
<value>3072</value>
<description> Larger resource limit for reduces.</description>
</property>
<property>
<name>mapreduce.reduce.java.opts</name>
<value>-Xmx2560M</value>
<description> Larger heap-size for child jvms of maps.</description>
</property>
<property>
<name>mapreduce.task.io.sort.mb</name>
<value>512</value>
<description> Higher memory-limit while sorting data for efficiency.</description>
</property>
<property>
<name>mapreduce.task.io.sort.factor</name>
<value>100</value>
<description> More streams merged at once while sorting files.</description>
</property>
<property>
<name>mapreduce.reduce.shuffle.parallelcopies</name>
<value>50</value>
<description>Higher number of parallel copies run by reduces to fetch outputs from very large number of maps.</description>
</property>
<!--Configurations for MapReduce JobHistory Server:-->
<property>
<name>mapreduce.jobhistory.address</name>
<value>hdmaster:10020</value>
<description>MapReduce JobHistory Server IPC host:port</description>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>hdmaster:19888</value>
<description>MapReduce JobHistory Server Web UI host:port</description>
</property>
<property>
<name>mapreduce.jobhistory.intermediate-done-dir</name>
<value>${yarn.app.mapreduce.am.staging-dir}/history/done_intermediate</value>
<description> Directory where history files are written by MapReduce jobs.</description>
</property>
<property>
<name>mapreduce.jobhistory.done-dir</name>
<value>${yarn.app.mapreduce.am.staging-dir}/history/done</value>
<description> Directory where history files are managed by the MR JobHistory Server.</description>
</property>
</configuration>
vim yarn-site.xml(注释部分可选)
<configuration>
<!--Configurations for ResourceManager and NodeManager:-->
<property>
<name>yarn.acl.enable</name>
<value>true</value>
</property>
<property>
<description>ACL of who can be admin of the YARN cluster.</description>
<name>yarn.admin.acl</name>
<value>*</value>
</property>
<property>
<description>Whether to enable log aggregation</description>
<name>yarn.log-aggregation-enable</name>
<value>false</value>
</property>
<!--Configurations for ResourceManager:-->
<property>
<description>The hostname of the RM.</description>
<name>yarn.resourcemanager.hostname</name>
<value>hdmaster</value>
</property>
<property>
<description>The address of the applications manager interface in the RM.</description>
<name>yarn.resourcemanager.address</name>
<value>${yarn.resourcemanager.hostname}:8032</value>
</property>
<property>
<description>The address of the scheduler interface.</description>
<name>yarn.resourcemanager.scheduler.address</name>
<value>${yarn.resourcemanager.hostname}:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>${yarn.resourcemanager.hostname}:8031</value>
</property>
<property>
<description>The address of the RM admin interface.</description>
<name>yarn.resourcemanager.admin.address</name>
<value>${yarn.resourcemanager.hostname}:8033</value>
</property>
<property>
<description>The http address of the RM web application.</description>
<name>yarn.resourcemanager.webapp.address</name>
<value>${yarn.resourcemanager.hostname}:8088</value>
</property>
<property>
<description>The class to use as the resource scheduler.</description>
<name>yarn.resourcemanager.scheduler.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler</value>
</property>
<property>
<description>The maximum allocation for every container request at the RM,
in MBs. Memory requests higher than this won't take effect,
and will get capped to this value.</description>
<name>yarn.scheduler.maximum-allocation-mb</name>
<value>8192</value>
</property>
<property>
<description>Path to file with nodes to include.</description>
<name>yarn.resourcemanager.nodes.include-path</name>
<value></value>
</property>
<property>
<description>Path to file with nodes to exclude.</description>
<name>yarn.resourcemanager.nodes.exclude-path</name>
<value></value>
</property>
<!--Configurations for NodeManager:-->
<property>
<description>Amount of physical memory, in MB, that can be allocated
for containers.</description>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>8192</value>
</property>
<property>
<description>Ratio between virtual memory to physical memory when
setting memory limits for containers. Container allocations are
expressed in terms of physical memory, and virtual memory usage
is allowed to exceed this allocation by this ratio.
</description>
<name>yarn.nodemanager.vmem-pmem-ratio</name>
<value>2.1</value>
</property>
<property>
<description>List of directories to store localized files in. An
application's localized file directory will be found in:
${yarn.nodemanager.local-dirs}/usercache/${user}/appcache/application_${appid}.
Individual containers' work directories, called container_${contid}, will
be subdirectories of this.
</description>
<name>yarn.nodemanager.local-dirs</name>
<value>${hadoop.tmp.dir}/nm-local-dir</value>
</property>
<property>
<description>
Where to store container logs. An application's localized log directory
will be found in ${yarn.nodemanager.log-dirs}/application_${appid}.
Individual containers' log directories will be below this, in directories
named container_{$contid}. Each container directory will contain the files
stderr, stdin, and syslog generated by that container.
</description>
<name>yarn.nodemanager.log-dirs</name>
<value>${yarn.log.dir}/userlogs</value>
</property>
<property>
<description>Time in seconds to retain user logs. Only applicable if
log aggregation is disabled
</description>
<name>yarn.nodemanager.log.retain-seconds</name>
<value>10800</value>
</property>
<property>
<description>Where to aggregate logs to.</description>
<name>yarn.nodemanager.remote-app-log-dir</name>
<value>/tmp/logs</value>
</property>
<property>
<description>The remote log dir will be created at
{yarn.nodemanager.remote-app-log-dir}/${user}/{thisParam}
</description>
<name>yarn.nodemanager.remote-app-log-dir-suffix</name>
<value>logs</value>
</property>
<property>
<description>the valid service name should only contain a-zA-Z0-9_ and can not start with numbers</description>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<!--<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>-->
<!--Configurations for History Server (Needs to be moved elsewhere):-->
<property>
<description>How long to keep aggregation logs before deleting them. -1 disables.
Be careful set this too small and you will spam the name node.</description>
<name>yarn.log-aggregation.retain-seconds</name>
<value>-1</value>
</property>
<property>
<description>How long to wait between aggregated log retention checks.
If set to 0 or a negative value then the value is computed as one-tenth
of the aggregated log retention time. Be careful set this too small and
you will spam the name node.</description>
<name>yarn.log-aggregation.retain-check-interval-seconds</name>
<value>-1</value>
</property>
<!--The following parameters can be used to control the node health monitoring:-->
<!--<property>
<description>The health check script to run.</description>
<name>yarn.nodemanager.health-checker.script.path</name>
<value></value>
</property>
<property>
<description>The arguments to pass to the health check script.</description>
<name>yarn.nodemanager.health-checker.script.opts</name>
<value></value>
</property>
<property>
<description>Time interval for running health script.</description>
<name>yarn.nodemanager.health-checker.script.interval-ms</name>
<value></value>
</property>
<property>
<description>Timeout for health script execution.</description>
<name>yarn.nodemanager.health-checker.script.timeout-ms</name>
<value></value>
</property>
-->
</configuration>
把所有配置文件同步到各台机器上
然后在hdmaster上启动
/home/hduser/hadoop-2.2.0/bin/hdfs namenode -format
/home/hduser/hadoop-2.2.0/sbin/start-dfs.sh
此时执行 jps 发现hdmaster上面有两个hadoop的进程: SecondaryNameNode
NameNode
hdslave1和hdslave2上有一个进程:
DataNode 然后执行 /home/hduser/hadoop-2.2.0/sbin/start-dfs.sh
此时执行 jps 发现hdmaster上面有三个hadoop的进程:
ResourceManager SecondaryNameNode
NameNode hdslave1和hdslave2上有两个进程:
NodeManager DataNode 访问页面:
hadoop.zip
(50.3 KB, 下载次数: 5)
|