分享

regionserver无法连接hdfs的问题

curran12 发表于 2016-1-14 14:07:05 [显示全部楼层] 只看大图 回帖奖励 阅读模式 关闭右栏 5 11329
我有5个节点,在两台机器上部署了namenode HA和Hmaster HA,hdfs启动后,操作正常,但启动hbase后,一直报错,看regionserver日志发现一直尝试连接hbase,集群是192.168.90.x 网段的,为什么连的是cluster01/60.191.124.236:8020 这个地址,确认hosts文件除了5台机器的ip和hostname,没有别的东西,手动在regionserver上往hdfs上传也没问题,实在不知道哪里出了问题[hadoop@slave01 ~]$ hdfs dfs -put /var/log/boot.log /
[hadoop@slave01 ~]$ hdfs dfs -ls /
Found 2 items
-rw-r--r--   2 hadoop supergroup       2053 2016-01-14 14:05 /boot.log
drwxr-xr-x   - hadoop supergroup          0 2016-01-14 13:39 /hbase

报错如下:

2016-01-14 13:44:13,322 INFO  [regionserver/slave01/192.168.90.44:16020] ipc.Client: Retrying connect to server: cluster01/60.191.124.236:8020. Already tried 13 time(s); maxRetries=45
2016-01-14 13:44:33,342 INFO  [regionserver/slave01/192.168.90.44:16020] ipc.Client: Retrying connect to server: cluster01/60.191.124.236:8020. Already tried 14 time(s); maxRetries=45
2016-01-14 13:44:53,357 INFO  [regionserver/slave01/192.168.90.44:16020] ipc.Client: Retrying connect to server: cluster01/60.191.124.236:8020. Already tried 15 time(s); maxRetries=45
2016-01-14 13:45:13,377 INFO  [regionserver/slave01/192.168.90.44:16020] ipc.Client: Retrying connect to server: cluster01/60.191.124.236:8020. Already tried 16 time(s); maxRetries=45
2016-01-14 13:45:33,400 INFO  [regionserver/slave01/192.168.90.44:16020] ipc.Client: Retrying connect to server: cluster01/60.191.124.236:8020. Already tried 17 time(s); maxRetries=45
2016-01-14 13:45:53,421 INFO  [regionserver/slave01/192.168.90.44:16020] ipc.Client: Retrying connect to server: cluster01/60.191.124.236:8020. Already tried 18 time(s); maxRetries=45
2016-01-14 13:46:13,494 INFO  [regionserver/slave01/192.168.90.44:16020] ipc.Client: Retrying connect to server: cluster01/60.191.124.236:8020. Already tried 19 time(s); maxRetries=45
2016-01-14 13:46:33,515 INFO  [regionserver/slave01/192.168.90.44:16020] ipc.Client: Retrying connect to server: cluster01/60.191.124.236:8020. Already tried 20 time(s); maxRetries=45
2016-01-14 13:47:03,545 WARN  [regionserver/slave01/192.168.90.44:16020] ipc.Client: Address change detected. Old: cluster01/60.191.124.236:8020 New: cluster01:8020
2016-01-14 13:47:03,545 INFO  [regionserver/slave01/192.168.90.44:16020] ipc.Client: Retrying connect to server: cluster01:8020. Already tried 0 time(s); maxRetries=45
2016-01-14 13:47:03,549 INFO  [regionserver/slave01/192.168.90.44:16020] regionserver.HRegionServer: STOPPED: Failed initialization
2016-01-14 13:47:03,551 ERROR [regionserver/slave01/192.168.90.44:16020] regionserver.HRegionServer: Failed init
java.io.IOException: Failed on local exception: java.io.IOException: Couldn't set up IO streams; Host Details : local host is: "slave01/192.168.90.44"; destination host is: "cluster01":8020;
        at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:764)
        at org.apache.hadoop.ipc.Client.call(Client.java:1415)
        at org.apache.hadoop.ipc.Client.call(Client.java:1364)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
        at com.sun.proxy.$Proxy19.getFileInfo(Unknown Source)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
        at com.sun.proxy.$Proxy19.getFileInfo(Unknown Source)
        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:707)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.hadoop.hbase.fs.HFileSystem$1.invoke(HFileSystem.java:279)
        at com.sun.proxy.$Proxy20.getFileInfo(Unknown Source)
        at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1785)
        at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1068)
        at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1064)
        at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
        at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1064)
        at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:397)
        at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1398)
        at org.apache.hadoop.hbase.regionserver.HRegionServer.setupWALAndReplication(HRegionServer.java:1606)
        at org.apache.hadoop.hbase.regionserver.HRegionServer.handleReportForDutyResponse(HRegionServer.java:1362)
        at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:899)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: Couldn't set up IO streams




hbase-site.xml 配置如下:
[mw_shl_code=xml,true]<configuration>      
    <property>
        <name>hbase.rootdir</name>
        <value>hdfs://cluster01/hbase</value>
    </property>
    <property>
        <name>hbase.master</name>
        <value>60000</value>
    </property>
    <property>
        <name>hbase.zookeeper.property.clientPort</name>
        <value>2181</value>
    </property>
    <property>
        <name>hbase.cluster.distributed</name>
        <value>true</value>  
    </property>      
    <property>
        <name>hbase.zookeeper.quorum</name>
        <value>slave01,slave02,slave03</value>
    </property>
    <property>
        <name>hbase.zookeeper.property.dataDir</name>
        <value>/data/zkdata</value>
    </property>
    <property>
        <name>hbase.tmp.dir</name>
        <value>/data/tmp/hbase/</value>
    </property>
</configuration>[/mw_shl_code]

hdfs-site.xml的配置如下:
[mw_shl_code=xml,true]<configuration>
    <property>
        <name>dfs.nameservices</name>
        <value>cluster01</value>
    </property>
    <property>
        <name>dfs.replication</name>
        <value>2</value>
    </property>
    <property>
        <name>dfs.blocksize</name>
        <value>128M</value>
    </property>
        <property>
        <name>dfs.namenode.checkpoint.dir</name>
        <value>/data/checkpoint</value>
    </property>
    <property>
        <name>dfs.ha.namenodes.cluster01</name>
        <value>nn1,nn2</value>
    </property>
    <property>
        <name>dfs.namenode.http-address.cluster01.nn1</name>
        <value>master01:50070</value>
    </property>
    <property>
        <name>dfs.namenode.http-address.cluster01.nn2</name>
        <value>master02:50070</value>
    </property>
        <property>
        <name>dfs.namenode.rpc-address.cluster01.nn1</name>
        <value>master01:8020</value>
    </property>
    <property>
        <name>dfs.namenode.rpc-address.cluster01.nn2</name>
        <value>master02:8020</value>
    </property>
    <property>
        <name>dfs.namenode.name.dir</name>
        <value>file:///data/dfs/name</value>
    </property>
    <property>
        <name>dfs.datanode.data.dir</name>
        <value>file:///data/dfs/data</value>
    </property>
    <property>
        <name>dfs.namenode.shared.edits.dir</name>
        <value>qjournal://slave01:8485;slave02:8485;slave03:8485/jt_journal</value>  
    </property>
    <property>
        <name>dfs.journalnode.edits.dir</name>
        <value>/data/journal</value>
    </property>
    <property>
        <name>dfs.client.failover.proxy.provider.cluster01</name>
        <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
    </property>
    <property>
        <name>dfs.ha.automatic-failover.enabled</name>
        <value>true</value>
    </property>
    <property>
        <name>dfs.webhdfs.enabled</name>
        <value>true</value>
    </property>
    <property>
        <name>dfs.ha.fencing.methods</name>
        <value>sshfence</value>
    </property>
    <property>
        <name>dfs.ha.fencing.ssh.private-key-files</name>
        <value>/home/hadoop/.ssh/id_rsa</value>
    </property>
</configuration>[/mw_shl_code]


已有(5)人评论

跳转到指定楼层
qz2003 发表于 2016-1-14 17:51:20
  <property>
        <name>hbase.rootdir</name>
        <value>hdfs://cluster01/hbase</value>
    </property>

上面中hdfs://cluster01,这里cluster01是什么?active节点是哪个?

回复

使用道具 举报

curran12 发表于 2016-1-14 18:20:53
qz2003 发表于 2016-1-14 17:51
hbase.rootdir
        hdfs://cluster01/hbase
   


cluster01 是dfs.nameservices


回复

使用道具 举报

curran12 发表于 2016-1-14 18:22:19
先启hmaster,再逐个启动RegionServer就可以,
在master节点用start-hbase.sh 启动就报上面的错误,太诡异了
回复

使用道具 举报

Alkaloid0515 发表于 2016-1-14 20:07:46
curran12 发表于 2016-1-14 18:22
先启hmaster,再逐个启动RegionServer就可以,
在master节点用start-hbase.sh 启动就报上面的错误,太诡异 ...

程序可能开过多线程了。

/etc/security/limits.conf文件中增加
username soft nproc 100000
username   hard nproc 100000  


这个在配置的时候应该就注意了
1.png

来源:

hbase 0.96整合到hadoop2.2三个节点全分布式安装高可靠文档
http://www.aboutyun.com/thread-7746-1-1.html



回复

使用道具 举报

when30 发表于 2016-1-14 20:10:10
楼主能否分享下两个HA的配置,到时候定拜读
回复

使用道具 举报

您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

关闭

推荐上一条 /2 下一条