分享

Hadoop2.4.1 + ZK + HBase 环境搭建


问题导读:
1. hadoop2.4.1如何搭建环境?
2.如何配置ZK?
3.HBase有哪些重要参数?









一、Hadoop 环境搭建
下载 2.4.1 bin 包, 解压好以后按照链接上配置各个配置文件, 启动时会遇到 "Unable to load realm info from SCDynamicStore" 的问题, 这个问题需要在 hadoop-env.sh 中加入如下配置(配置 HBase 的时候也会遇到这个问题, 使用同样的方法在 hbase-env.sh 中加入如下配置解决)

hadoop-env.sh(hbase-env.sh) 配置, 增加

  1. export JAVA_HOME="/System/Library/Java/JavaVirtualMachines/1.6.0.jdk/Contents/Home"
  2. export HBASE_OPTS="-Djava.security.krb5.realm=OX.AC.UK -Djava.security.krb5.kdc=kdc0.ox.ac.uk:kdc1.ox.ac.uk"
复制代码



最后自己写一下启动和停止脚本


hadoop-start.sh


  1. #!/bin/bash
  2. HADOOP_PREFIX="/Users/zhenweiliu/Work/Software/hadoop-2.4.1"
  3. HADOOP_YARN_HOME="/Users/zhenweiliu/Work/Software/hadoop-2.4.1"
  4. HADOOP_CONF_DIR="/Users/zhenweiliu/Work/Software/hadoop-2.4.1/etc/hadoop"
  5. cluster_name="hadoop_cat"
  6.   # Format a new distributed filesystem
  7.   if [ "$1" == "format" ]; then
  8.     $HADOOP_PREFIX/bin/hdfs namenode -format $cluster_name
  9.   fi
  10. # Start the HDFS with the following command, run on the designated NameNode:
  11. $HADOOP_PREFIX/sbin/hadoop-daemon.sh --config $HADOOP_CONF_DIR --script hdfs start namenode
  12. # Run a script to start DataNodes on all slaves:
  13. $HADOOP_PREFIX/sbin/hadoop-daemon.sh --config $HADOOP_CONF_DIR --script hdfs start datanode
  14. # Start the YARN with the following command, run on the designated ResourceManager:
  15. $HADOOP_YARN_HOME/sbin/yarn-daemon.sh --config $HADOOP_CONF_DIR start resourcemanager
  16. # Run a script to start NodeManagers on all slaves:
  17. $HADOOP_YARN_HOME/sbin/yarn-daemon.sh --config $HADOOP_CONF_DIR start nodemanager
  18. # Start a standalone WebAppProxy server. If multiple servers are used with load balancing it should be run on each of them:
  19. $HADOOP_YARN_HOME/sbin/yarn-daemon.sh start proxyserver --config $HADOOP_CONF_DIR
  20. # Start the MapReduce JobHistory Server with the following command, run on the designated server:
  21. $HADOOP_PREFIX/sbin/mr-jobhistory-daemon.sh start historyserver --config $HADOOP_CONF_DIR
复制代码



hadoop-stop.sh


  1. #!/bin/bash
  2. HADOOP_PREFIX="/Users/zhenweiliu/Work/Software/hadoop-2.4.1"
  3. HADOOP_YARN_HOME="/Users/zhenweiliu/Work/Software/hadoop-2.4.1"
  4. HADOOP_CONF_DIR="/Users/zhenweiliu/Work/Software/hadoop-2.4.1/etc/hadoop"
  5. cluster_name="hadoop_cat"
  6. # Stop the NameNode with the following command, run on the designated NameNode:
  7. $HADOOP_PREFIX/sbin/hadoop-daemon.sh --config $HADOOP_CONF_DIR --script hdfs stop namenode
  8. # Run a script to stop DataNodes on all slaves:
  9. $HADOOP_PREFIX/sbin/hadoop-daemon.sh --config $HADOOP_CONF_DIR --script hdfs stop datanode
  10. # Stop the ResourceManager with the following command, run on the designated ResourceManager:
  11. $HADOOP_YARN_HOME/sbin/yarn-daemon.sh --config $HADOOP_CONF_DIR stop resourcemanager
  12. # Run a script to stop NodeManagers on all slaves:
  13. $HADOOP_YARN_HOME/sbin/yarn-daemon.sh --config $HADOOP_CONF_DIR stop nodemanager
  14. # Stop the WebAppProxy server. If multiple servers are used with load balancing it should be run on each of them:
  15. $HADOOP_YARN_HOME/sbin/yarn-daemon.sh stop proxyserver --config $HADOOP_CONF_DIR
  16. # Stop the MapReduce JobHistory Server with the following command, run on the designated server:
  17. $HADOOP_PREFIX/sbin/mr-jobhistory-daemon.sh stop historyserver --config $HADOOP_CONF_DIR
复制代码



hadoop-restart.sh

  1. #!/bin/bash
  2. ./hadoop-stop.sh
  3. ./hadoop-start.sh
复制代码




最后是我的各项需要配置的 hadoop 配置
core-site.xml


  1. <?xml version="1.0" encoding="UTF-8"?>
  2. <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
  3. <!--
  4.   Licensed under the Apache License, Version 2.0 (the "License");
  5.   you may not use this file except in compliance with the License.
  6.   You may obtain a copy of the License at
  7.     http://www.apache.org/licenses/LICENSE-2.0
  8.   Unless required by applicable law or agreed to in writing, software
  9.   distributed under the License is distributed on an "AS IS" BASIS,
  10.   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  11.   See the License for the specific language governing permissions and
  12.   limitations under the License. See accompanying LICENSE file.
  13. -->
  14. <!-- Put site-specific property overrides in this file. -->
  15. <configuration>
  16.   <property>
  17.       <name>fs.defaultFS</name>
  18.       <value>hdfs://localhost:9000</value>
  19.   </property>
  20.   <property>
  21.       <name>io.file.buffer.size</name>
  22.       <value>131072</value>
  23.   </property>
  24. </configuration>
复制代码




hdfs-site.xml


  1. <?xml version="1.0" encoding="UTF-8"?>
  2. <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
  3. <!--
  4.   Licensed under the Apache License, Version 2.0 (the "License");
  5.   you may not use this file except in compliance with the License.
  6.   You may obtain a copy of the License at
  7.     http://www.apache.org/licenses/LICENSE-2.0
  8.   Unless required by applicable law or agreed to in writing, software
  9.   distributed under the License is distributed on an "AS IS" BASIS,
  10.   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  11.   See the License for the specific language governing permissions and
  12.   limitations under the License. See accompanying LICENSE file.
  13. -->
  14. <!-- Put site-specific property overrides in this file. -->
  15. <configuration>
  16.     <!-- NameNode Configurations -->
  17.     <property>
  18.         <name>dfs.datanode.max.xcievers</name>
  19.         <value>4096</value>
  20.     </property>
  21.     <property>
  22.         <name>dfs.datanode.datadir</name>
  23.         <value>file:///Users/zhenweiliu/Work/Software/hadoop-2.4.1/data</value>
  24.     </property>
  25.     <property>
  26.         <name>dfs.blocksize</name>
  27.         <value>67108864</value>
  28.     </property>
  29.     <property>
  30.         <name>dfs.namenode.handler.count</name>
  31.         <value>100</value>
  32.     </property>
  33.     <!-- Datanode Configurations -->
  34.     <property>
  35.         <name>dfs.namenode.name.dir</name>
  36.         <value>file:///Users/zhenweiliu/Work/Software/hadoop-2.4.1/name</value>
  37.     </property>
  38. </configuration>
复制代码





yarn-site.xml


  1. ?xml version="1.0"?>
  2. <!--
  3.   Licensed under the Apache License, Version 2.0 (the "License");
  4.   you may not use this file except in compliance with the License.
  5.   You may obtain a copy of the License at
  6.     http://www.apache.org/licenses/LICENSE-2.0
  7.   Unless required by applicable law or agreed to in writing, software
  8.   distributed under the License is distributed on an "AS IS" BASIS,
  9.   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  10.   See the License for the specific language governing permissions and
  11.   limitations under the License. See accompanying LICENSE file.
  12. -->
  13. <configuration>
  14.     <!-- ResourceManager and NodeManager Configurations -->
  15.     <property>
  16.         <name>yarn.acl.enable</name>
  17.         <value>false</value>
  18.     </property>
  19.     <property>
  20.         <name>yarn.acl.enable</name>
  21.         <value>false</value>
  22.     </property>
  23.     <!-- ResourceManager Configurations -->
  24.     <property>
  25.         <name>yarn.resourcemanager.address</name>
  26.         <value>localhost:9001</value>
  27.     </property>
  28.     <property>
  29.         <name>yarn.resourcemanager.scheduler.address</name>
  30.         <value>localhost:9002</value>
  31.     </property>
  32.     <property>
  33.         <name>yarn.resourcemanager.resource-tracker.address</name>
  34.         <value>localhost:9003</value>
  35.     </property>
  36.     <property>
  37.         <name>yarn.resourcemanager.admin.address</name>
  38.         <value>localhost:9004</value>
  39.     </property>
  40.     <property>
  41.         <name>yarn.resourcemanager.webapp.address</name>
  42.         <value>localhost:9005</value>
  43.     </property>
  44.     <property>
  45.         <name>yarn.resourcemanager.scheduler.class</name>
  46.         <value>CapacityScheduler</value>
  47.     </property>
  48.     <property>
  49.         <name>yarn.scheduler.minimum-allocation-mb</name>
  50.         <value>1024</value>
  51.     </property>
  52.     <property>
  53.         <name>yarn.scheduler.maximum-allocation-mb</name>
  54.         <value>8192</value>
  55.     </property>
  56.     <!-- NodeManager Configurations -->
  57.     <property>
  58.         <name>yarn.nodemanager.resource.memory-mb</name>
  59.         <value>8192</value>
  60.     </property>
  61.     <property>
  62.         <name>yarn.nodemanager.vmem-pmem-ratio</name>
  63.         <value>2.1</value>
  64.     </property>
  65.     <property>
  66.         <name>yarn.nodemanager.local-dirs</name>
  67.         <value>${hadoop.tmp.dir}/nm-local-dir</value>
  68.     </property>
  69.     <property>
  70.         <name>yarn.nodemanager.log-dirs</name>
  71.         <value>${yarn.log.dir}/userlogs</value>
  72.     </property>
  73.     <property>
  74.         <name>yarn.nodemanager.log.retain-seconds</name>
  75.         <value>10800</value>
  76.     </property>
  77.     <property>
  78.         <name>yarn.nodemanager.remote-app-log-dir</name>
  79.         <value>/logs</value>
  80.     </property>
  81.     <property>
  82.         <name>yarn.nodemanager.remote-app-log-dir-suffix</name>
  83.         <value>logs</value>
  84.     </property>
  85.     <property>
  86.         <name>yarn.nodemanager.aux-services</name>
  87.         <value>mapreduce_shuffle</value>
  88.     </property>
  89.     <!-- History Server Configurations -->
  90.     <property>
  91.         <name>yarn.log-aggregation.retain-seconds</name>
  92.         <value>-1</value>
  93.     </property>
  94.     <property>
  95.         <name>yarn.log-aggregation.retain-check-interval-seconds</name>
  96.         <value>-1</value>
  97.     </property>
  98. </configuration>
复制代码




mapred-site.xml


  1. <?xml version="1.0"?>
  2. <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
  3. <!--
  4.   Licensed under the Apache License, Version 2.0 (the "License");
  5.   you may not use this file except in compliance with the License.
  6.   You may obtain a copy of the License at
  7.     http://www.apache.org/licenses/LICENSE-2.0
  8.   Unless required by applicable law or agreed to in writing, software
  9.   distributed under the License is distributed on an "AS IS" BASIS,
  10.   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  11.   See the License for the specific language governing permissions and
  12.   limitations under the License. See accompanying LICENSE file.
  13. -->
  14. <!-- Put site-specific property overrides in this file. -->
  15. <configuration>
  16.     <!-- Configurations for MapReduce Applications -->
  17.     <property>
  18.         <name>mapreduce.framework.name</name>
  19.         <value>yarn</value>
  20.     </property>
  21.     <property>
  22.         <name>mapreduce.map.memory.mb</name>
  23.         <value>1536</value>
  24.     </property>
  25.     <property>
  26.         <name>mapreduce.map.java.opts</name>
  27.         <value>-Xmx1024M</value>
  28.     </property>
  29.     <property>
  30.         <name>mapreduce.reduce.memory.mb</name>
  31.         <value>3072</value>
  32.     </property>
  33.     <property>
  34.         <name>mapreduce.reduce.java.opts</name>
  35.         <value>-Xmx2560M</value>
  36.     </property>
  37.     <property>
  38.         <name>mapreduce.task.io.sort.mb</name>
  39.         <value>512</value>
  40.     </property>
  41.     <property>
  42.         <name>mapreduce.task.io.sort.factor</name>
  43.         <value>100</value>
  44.     </property>
  45.     <property>
  46.         <name>mapreduce.reduce.shuffle.parallelcopies</name>
  47.         <value>50</value>
  48.     </property>
  49.     <!-- Configurations for MapReduce JobHistory Server -->
  50.     <property>
  51.         <name>mapreduce.jobhistory.address</name>
  52.         <value>localhost:10020</value>
  53.     </property>
  54.     <property>
  55.         <name>mapreduce.jobhistory.webapp.address</name>
  56.         <value>localhost:19888</value>
  57.     </property>
  58.     <property>
  59.         <name>mapreduce.jobhistory.intermediate-done-dir</name>
  60.         <value>file:////Users/zhenweiliu/Work/Software/hadoop-2.4.1/mr-history/tmp</value>
  61.     </property>
  62.     <property>
  63.         <name>mapreduce.jobhistory.done-dir</name>
  64.         <value>file:////Users/zhenweiliu/Work/Software/hadoop-2.4.1/mr-history/done</value>
  65.     </property>
  66. </configuration>
复制代码





二、ZK伪分布式配置

复制 3 个 ZK 实例文件夹, 分别为

zookeeper-3.4.5-1
zookeeper-3.4.5-2
zookeeper-3.4.5-3


每个 ZK 文件下的 zoo.cfg 配置如下

zookeeper-3.4.5-1/zoo.cfg

  1. tickTime=2000
  2. initLimit=10
  3. syncLimit=5
  4. dataDir=/Users/zhenweiliu/Work/Software/zookeeper/zookeeper-3.4.5-1/data
  5. dataLogDir=/Users/zhenweiliu/Work/Software/zookeeper/zookeeper-3.4.5-1/logs
  6. clientPort=2181
  7. server.1=127.0.0.1:2888:3888
  8. server.2=127.0.0.1:2889:3889
  9. server.3=127.0.0.1:2890:3890
复制代码




zookeeper-3.4.5-2/zoo.cfg

  1. tickTime=2000
  2. initLimit=10
  3. syncLimit=5
  4. dataDir=/Users/zhenweiliu/Work/Software/zookeeper/zookeeper-3.4.5-2/data
  5. dataLogDir=/Users/zhenweiliu/Work/Software/zookeeper/zookeeper-3.4.5-2/logs
  6. clientPort=2182
  7. server.1=127.0.0.1:2888:3888
  8. server.2=127.0.0.1:2889:3889
  9. server.3=127.0.0.1:2890:3890
复制代码



zookeeper-3.4.5-3/zoo.cfg

  1. tickTime=2000
  2. initLimit=10
  3. syncLimit=5
  4. dataDir=/Users/zhenweiliu/Work/Software/zookeeper/zookeeper-3.4.5-3/data
  5. dataLogDir=/Users/zhenweiliu/Work/Software/zookeeper/zookeeper-3.4.5-3/logs
  6. clientPort=2183
  7. server.1=127.0.0.1:2888:3888
  8. server.2=127.0.0.1:2889:3889
  9. server.3=127.0.0.1:2890:3890
复制代码



然后在每个实例的 data 文件夹下创建一个文件 myid, 文件内分别写入 1, 2, 3 三个字符, 例如

zookeeper-3.4.5-1/data/myid

  1. 1
复制代码



最后做一个批量启动, 停止脚本

startZkCluster.sh

  1. #!/bin/bash
  2. BASE_DIR="/Users/zhenweiliu/Work/Software/zookeeper/zookeeper-3.4.5"
  3. BIN_EXEC="bin/zkServer.sh start"
  4. for no in $(seq 1 3)
  5. do
  6.     $BASE_DIR"-"$no/$BIN_EXEC
  7. done
复制代码



stopZkCluster.sh

  1. #!/bin/bash
  2. BASE_DIR="/Users/zhenweiliu/Work/Software/zookeeper/zookeeper-3.4.5"
  3. BIN_EXEC="bin/zkServer.sh stop"
  4. for no in $(seq 1 3)
  5. do
  6.     $BASE_DIR"-"$no/$BIN_EXEC
  7. done
复制代码



restartZkCluster.sh
  1. #!/bin/bash
  2. ./stopZkCluster.sh
  3. ./startZkCluster.sh
复制代码


三、HBase
实际上 HBase 内置了 ZK, 如果不显式指定 ZK 的配置, 他会使用内置的 ZK, 这个 ZK 会随着 HBase 启动而启动
hbase-env.sh 中显式启动内置 ZK

  1. export HBASE_MANAGES_ZK=true
复制代码



hbase-site.xml

  1. <?xml version="1.0"?>
  2. <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
  3. <!--
  4. /**
  5. *
  6. * Licensed to the Apache Software Foundation (ASF) under one
  7. * or more contributor license agreements.  See the NOTICE file
  8. * distributed with this work for additional information
  9. * regarding copyright ownership.  The ASF licenses this file
  10. * to you under the Apache License, Version 2.0 (the
  11. * "License"); you may not use this file except in compliance
  12. * with the License.  You may obtain a copy of the License at
  13. *
  14. *     http://www.apache.org/licenses/LICENSE-2.0
  15. *
  16. * Unless required by applicable law or agreed to in writing, software
  17. * distributed under the License is distributed on an "AS IS" BASIS,
  18. * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  19. * See the License for the specific language governing permissions and
  20. * limitations under the License.
  21. */
  22. -->
  23. <configuration>
  24.       <!--
  25.     <property>
  26.         <name>hbase.rootdir</name>
  27.         <value>file:///Users/zhenweiliu/Work/Software/hbase-0.98.3-hadoop2/hbase</value>
  28.     </property>
  29.     -->
  30.   <property>
  31.     <name>hbase.rootdir</name>
  32.     <value>hdfs://localhost:9000/hbase</value>
  33.     <description>The directory shared by RegionServers.</description>
  34.   </property>
  35.   <property>
  36.     <name>dfs.replication</name>
  37.     <value>1</value>
  38.     <description>The replication count for HLog and HFile storage. Should not be greater than HDFS datanode count.</description>
  39.   </property>
  40.   <property>
  41.       <name>hbase.zookeeper.quorum</name>
  42.       <value>localhost</value>
  43.   </property>
  44.   <property>
  45.       <name>hbase.zookeeper.property.dataDir</name>
  46.       <value>/Users/zhenweiliu/Work/Software/hbase-0.98.3-hadoop2/zookeeper</value>
  47.   </property>
  48.     <property>
  49.         <name>hbase.zookeeper.property.clientPort</name>
  50.         <value>2222</value>
  51.         <description>Property from ZooKeeper's config zoo.cfg.
  52.         The port at which the clients will connect.
  53.         </description>
  54.     </property>
  55.   <property>
  56.       <name>hbase.cluster.distributed</name>
  57.       <value>true</value>
  58.   </property>
  59. </configuration>
复制代码




最后启动 hbase


  1. ./start-hbase.sh
复制代码


四、系统参数

另外, hbase 需要大得 processes 数以及 open files 数, 所以需要修改 ulimit, 我的 mac 下增加 /etc/launchd.conf 文件, 文件内容


  1. limit maxfiles 16384 16384
  2. limit maxproc 2048 2048
复制代码


在 /etc/profile 添加

  1. ulimit -n 16384
  2. ulimit -u 2048
复制代码



如果 hbase 出现

  1. 2014-07-14 23:00:48,342 WARN  [main] util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
  2. ERROR: org.apache.hadoop.hbase.ipc.ServerNotRunningYetException: Server is not running yet
  3.     at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:90)
  4.     at org.apache.hadoop.hbase.ipc.FifoRpcScheduler$1.run(FifoRpcScheduler.java:73)
  5.     at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
  6.     at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
  7.     at java.util.concurrent.FutureTask.run(FutureTask.java:138)
  8.     at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
  9.     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
  10.     at java.lang.Thread.run(Thread.java:695)
复制代码



1. 查看 hbase master log, 发现

  1. 2014-07-14 23:31:51,270 INFO  [master:192.168.126.8:60000] util.FSUtils: Waiting for dfs to exit safe mode...
复制代码



退出 hadoop 安全模式

  1. bin/hdfs dfsadmin -safemode leave
复制代码



master log 报错

  1. 2014-07-14 23:32:22,238 WARN  [master:192.168.126.8:60000] hdfs.DFSClient: DFS Read
  2. org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block: BP-1761102757-192.168.126.8-1404787541755:blk_1073741825_1001 file=/hbase/hbase.version
复制代码



检查 hdfs

  1. ./hdfs fsck / -files -blocks
复制代码



  1. 14/07/14 23:36:32 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
  2. Connecting to namenode via http://localhost:50070
  3. FSCK started by zhenweiliu (auth:SIMPLE) from /127.0.0.1 for path / at Mon Jul 14 23:36:33 CST 2014
  4. .
  5. /hbase/WALs/192.168.126.8,60020,1404917152583-splitting/192.168.126.8%2C60020%2C1404917152583.1404917158940: CORRUPT blockpool BP-1761102757-192.168.126.8-1404787541755 block blk_1073741842
  6. /hbase/WALs/192.168.126.8,60020,1404917152583-splitting/192.168.126.8%2C60020%2C1404917152583.1404917158940: MISSING 1 blocks of total size 17 B..
  7. /hbase/WALs/192.168.126.8,60020,1404917152583-splitting/192.168.126.8%2C60020%2C1404917152583.1404917167188.meta: CORRUPT blockpool BP-1761102757-192.168.126.8-1404787541755 block blk_1073741843
  8. /hbase/WALs/192.168.126.8,60020,1404917152583-splitting/192.168.126.8%2C60020%2C1404917152583.1404917167188.meta: MISSING 1 blocks of total size 401 B..
  9. /hbase/data/hbase/meta/.tabledesc/.tableinfo.0000000001: CORRUPT blockpool BP-1761102757-192.168.126.8-1404787541755 block blk_1073741829
  10. /hbase/data/hbase/meta/.tabledesc/.tableinfo.0000000001: MISSING 1 blocks of total size 372 B..
  11. /hbase/data/hbase/meta/1588230740/.regioninfo: CORRUPT blockpool BP-1761102757-192.168.126.8-1404787541755 block blk_1073741827
  12. /hbase/data/hbase/meta/1588230740/.regioninfo: MISSING 1 blocks of total size 30 B..
  13. /hbase/data/hbase/meta/1588230740/info/e63bf8b1e649450895c36f28fb88da98: CORRUPT blockpool BP-1761102757-192.168.126.8-1404787541755 block blk_1073741836
  14. /hbase/data/hbase/meta/1588230740/info/e63bf8b1e649450895c36f28fb88da98: MISSING 1 blocks of total size 1340 B..
  15. /hbase/data/hbase/meta/1588230740/oldWALs/hlog.1404787632739: CORRUPT blockpool BP-1761102757-192.168.126.8-1404787541755 block blk_1073741828
  16. /hbase/data/hbase/meta/1588230740/oldWALs/hlog.1404787632739: MISSING 1 blocks of total size 17 B..
  17. /hbase/data/hbase/namespace/.tabledesc/.tableinfo.0000000001: CORRUPT blockpool BP-1761102757-192.168.126.8-1404787541755 block blk_1073741832
  18. /hbase/data/hbase/namespace/.tabledesc/.tableinfo.0000000001: MISSING 1 blocks of total size 286 B..
  19. /hbase/data/hbase/namespace/a3fbb84530e05cab6319257d03975e6b/.regioninfo: CORRUPT blockpool BP-1761102757-192.168.126.8-1404787541755 block blk_1073741833
  20. /hbase/data/hbase/namespace/a3fbb84530e05cab6319257d03975e6b/.regioninfo: MISSING 1 blocks of total size 40 B..
  21. /hbase/data/hbase/namespace/a3fbb84530e05cab6319257d03975e6b/info/770eb1a6dc76458fb97e9213edb80b72: CORRUPT blockpool BP-1761102757-192.168.126.8-1404787541755 block blk_1073741837
  22. /hbase/data/hbase/namespace/a3fbb84530e05cab6319257d03975e6b/info/770eb1a6dc76458fb97e9213edb80b72: MISSING 1 blocks of total size 1045 B..
  23. /hbase/hbase.id: CORRUPT blockpool BP-1761102757-192.168.126.8-1404787541755 block blk_1073741826
  24. /hbase/hbase.id: MISSING 1 blocks of total size 42 B..
  25. /hbase/hbase.version: CORRUPT blockpool BP-1761102757-192.168.126.8-1404787541755 block blk_1073741825
  26. /hbase/hbase.version: MISSING 1 blocks of total size 7 B.Status: CORRUPT
  27. Total size:    3597 B
  28. Total dirs:    21
  29. Total files:    11
  30. Total symlinks:        0
  31. Total blocks (validated):    11 (avg. block size 327 B)
  32.   ********************************
  33.   CORRUPT FILES:    11
  34.   MISSING BLOCKS:    11
  35.   MISSING SIZE:        3597 B
  36.   CORRUPT BLOCKS:     11
  37.   ********************************
  38. Minimally replicated blocks:    0 (0.0 %)
  39. Over-replicated blocks:    0 (0.0 %)
  40. Under-replicated blocks:    0 (0.0 %)
  41. Mis-replicated blocks:        0 (0.0 %)
  42. Default replication factor:    3
  43. Average block replication:    0.0
  44. Corrupt blocks:        11
  45. Missing replicas:        0
  46. Number of data-nodes:        1
  47. Number of racks:        1
  48. FSCK ended at Mon Jul 14 23:36:33 CST 2014 in 15 milliseconds
  49. The filesystem under path '/' is CORRUPT
复制代码



执行删除


  1. ./hdfs fsck -delete
复制代码



  1. 14/07/14 23:41:45 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
  2. Connecting to namenode via http://localhost:50070
  3. FSCK started by zhenweiliu (auth:SIMPLE) from /127.0.0.1 for path / at Mon Jul 14 23:41:46 CST 2014
  4. Status: HEALTHY
  5. Total size:    0 B
  6. Total dirs:    21
  7. Total files:    0
  8. Total symlinks:        0
  9. Total blocks (validated):    0
  10. Minimally replicated blocks:    0
  11. Over-replicated blocks:    0
  12. Under-replicated blocks:    0
  13. Mis-replicated blocks:        0
  14. Default replication factor:    3
  15. Average block replication:    0.0
  16. Corrupt blocks:        0
  17. Missing replicas:        0
  18. Number of data-nodes:        1
  19. Number of racks:        1
  20. FSCK ended at Mon Jul 14 23:41:46 CST 2014 in 4 milliseconds
  21. The filesystem under path '/' is HEALTHY
复制代码




这时发现 hbase 挂了, 查看 master log


  1. 2014-07-14 23:48:53,788 FATAL [master:192.168.126.8:60000] master.HMaster: Unhandled exception. Starting shutdown.
  2. org.apache.hadoop.hbase.util.FileSystemVersionException: HBase file layout needs to be upgraded.  You have version null and I want version 8.  Is your hbase.rootdir valid?  If so, you may need to run 'hbase hbck -fixVersionFile'.
  3.     at org.apache.hadoop.hbase.util.FSUtils.checkVersion(FSUtils.java:602)
  4.     at org.apache.hadoop.hbase.master.MasterFileSystem.checkRootDir(MasterFileSystem.java:456)
  5.     at org.apache.hadoop.hbase.master.MasterFileSystem.createInitialFileSystemLayout(MasterFileSystem.java:147)
  6.     at org.apache.hadoop.hbase.master.MasterFileSystem.<init>(MasterFileSystem.java:128)
  7.     at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:802)
  8.     at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:615)
  9.     at java.lang.Thread.run(Thread.java:695)
复制代码




重建一下 hdfs/hbase 文件

  1. bin/hadoop fs -rm -r /hbase
复制代码


hbase master 报错


  1. 2014-07-14 23:56:33,999 INFO  [master:192.168.126.8:60000] catalog.CatalogTracker: Failed verification of hbase:meta,,1 at address=192.168.126.8,60020,1405352769509, exception=org.apache.hadoop.hbase.NotServingRegionException: org.apache.hadoop.hbase.NotServingRegionException: Region hbase:meta,,1 is not online on 192.168.126.8,60020,1405353371628
  2.     at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:2683)
  3.     at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:4117)
  4.     at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionInfo(HRegionServer.java:3494)
  5.     at org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$2.callBlockingMethod(AdminProtos.java:20036)
  6.     at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2012)
  7.     at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:98)
  8.     at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:168)
  9.     at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:39)
  10.     at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcScheduler.java:111)
  11.     at java.lang.Thread.run(Thread.java:695)
复制代码




重建 region sever 节点

  1. bin/hbase zkcli
  2. rmr /hbase/meta-region-server
复制代码


再次重启 hbase, 解决。



四、HBase 重要参数

这些参数在 hbase-site.xml 里配置

1. zookeeper.session.timeout

这个默认值是3分钟。这意味着一旦一个server宕掉了,Master至少需要3分钟才能察觉到宕机,开始恢复。你可能希望将这个超时调短,这样Master就能更快的察觉到了。在你调这个值之前,你需要确认你的JVM的GC参数,否则一个长时间的GC操作就可能导致超时。(当一个RegionServer在运行一个长时间的GC的时候,你可能想要重启并恢复它).

要想改变这个配置,可以编辑 hbase-site.xml, 将配置部署到全部集群,然后重启。

我们之所以把这个值调的很高,是因为我们不想一天到晚在论坛里回答新手的问题。“为什么我在执行一个大规模数据导入的时候Region Server死掉啦”,通常这样的问题是因为长时间的GC操作引起的,他们的JVM没有调优。我们是这样想的,如果一个人对HBase不很熟悉,不能期望他知道所有,打击他的自信心。等到他逐渐熟悉了,他就可以自己调这个参数了。

2. hbase.regionserver.handler.count
这个设置决定了处理用户请求的线程数量。默认是10,这个值设的比较小,主要是为了预防用户用一个比较大的写缓冲,然后还有很多客户端并发,这样region servers会垮掉。有经验的做法是,当请求内容很大(上MB,如大puts, 使用缓存的scans)的时候,把这个值放低。请求内容较小的时候(gets, 小puts, ICVs, deletes),把这个值放大。

当客户端的请求内容很小的时候,把这个值设置的和最大客户端数量一样是很安全的。一个典型的例子就是一个给网站服务的集群,put操作一般不会缓冲,绝大多数的操作是get操作。

把这个值放大的危险之处在于,把所有的Put操作缓冲意味着对内存有很大的压力,甚至会导致OutOfMemory.一个运行在内存不足的机器的RegionServer会频繁的触发GC操作,渐渐就能感受到停顿。(因为所有请求内容所占用的内存不管GC执行几遍也是不能回收的)。一段时间后,集群也会受到影响,因为所有的指向这个region的请求都会变慢。这样就会拖累集群,加剧了这个问题。

你可能会对handler太多或太少有感觉 ,在单个RegionServer启动log并查看log末尾 (请求队列消耗内存)。







欢迎加入about云群90371779322273151432264021 ,云计算爱好者群,亦可关注about云腾讯认证空间||关注本站微信

没找到任何评论,期待你打破沉寂

您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

关闭

推荐上一条 /2 下一条