分享

Exception while invoking getBlockLocations of class

bob007 发表于 2016-1-22 15:54:28 [显示全部楼层] 只看大图 回帖奖励 阅读模式 关闭右栏 1 13397
本帖最后由 bob007 于 2016-1-22 15:58 编辑

我impala集群使用sql语句搜索特别慢

我用的是CDH5.5.0,如

select domain, sum(domain_request_count) domain_request_count,sum(domain_response_count) domain_response_count from
dfdsdb.request_response_domain_sc where cast(CONCAT(year,month,day) as int)
between cast("20151214" as int) and cast("20151231" as int) group by domain order by domain_request_count desc limit 10

一般用时30秒左右,有时耗时50多秒,最快的时候在15秒。
dfdsdb.request_response_domain_sc表有 年月日,三级分区。数据量在一亿左右。
按道理来说,这条语句耗时应该在10秒以下。我监控了一下后台implala日志,发现耗时长的查询后台都有异常,如下:

Tuple(id=0 size=40 slots=[Slot(id=0 type=STRING col_path=[4] offset=24 null=(offset=0 mask=4) slot_idx=2 field_idx=-1), Slot(id=1 type=BIGINT col_path=[5] offset=8 null=(offset=0 mask=1) slot_idx=0 field_idx=-1), Slot(id=2 type=BIGINT col_path=[6] offset=16 null=(offset=0 mask=2) slot_idx=1 field_idx=-1), Slot(id=3 type=STRING col_path=[0] offset=-1 null=(offset=0 mask=1) slot_idx=0 field_idx=-1), Slot(id=4 type=STRING col_path=[1] offset=-1 null=(offset=0 mask=1) slot_idx=0 field_idx=-1), Slot(id=5 type=STRING col_path=[2] offset=-1 null=(offset=0 mask=1) slot_idx=0 field_idx=-1)] tuple_path=[])
Tuple(id=1 size=40 slots=[Slot(id=6 type=STRING col_path=[] offset=24 null=(offset=0 mask=4) slot_idx=2 field_idx=-1), Slot(id=7 type=BIGINT col_path=[] offset=8 null=(offset=0 mask=1) slot_idx=0 field_idx=-1), Slot(id=8 type=BIGINT col_path=[] offset=16 null=(offset=0 mask=2) slot_idx=1 field_idx=-1)] tuple_path=[])
Tuple(id=2 size=40 slots=[Slot(id=9 type=STRING col_path=[] offset=24 null=(offset=0 mask=4) slot_idx=2 field_idx=-1), Slot(id=10 type=BIGINT col_path=[] offset=8 null=(offset=0 mask=1) slot_idx=0 field_idx=-1), Slot(id=11 type=BIGINT col_path=[] offset=16 null=(offset=0 mask=2) slot_idx=1 field_idx=-1)] tuple_path=[])
I0106 09:46:59.656497 19278 plan-fragment-executor.cc:303] Open(): instance_id=794f58dadaa44cb8:1f24c33dda8d00a2
I0106 09:47:20.070286  6805 RetryInvocationHandler.java:144] Exception while invoking getBlockLocations of class ClientNamenodeProtocolTranslatorPB over CM-GY-HXa-5d1/117.135.251.170:8020. Trying to fail over immediately.
Java exception follows:
org.apache.hadoop.net.ConnectTimeoutException: Call From CM-GY-HX8-5c6/117.135.251.135 to CM-GY-HXa-5d1:8020 failed on socket timeout exception: org.apache.hadoop.net.ConnectTimeoutException: 20000 millis timeout while waiting for channel to be ready for connect. ch : java.nio.channels.SocketChannel[connection-pending remote=CM-GY-HXa-5d1/117.135.251.170:8020]; For more details see:  http://wiki.apache.org/hadoop/SocketTimeout
        at sun.reflect.GeneratedConstructorAccessor7.newInstance(Unknown Source)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
        at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:791)
        at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:750)
        at org.apache.hadoop.ipc.Client.call(Client.java:1476)
        at org.apache.hadoop.ipc.Client.call(Client.java:1403)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)
        at com.sun.proxy.$Proxy14.getBlockLocations(Unknown Source)
        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getBlockLojavascript:;cations(ClientNamenodeProtocolTranslatorPB.java:254)
        at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:252)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104)
        at com.sun.proxy.$Proxy15.getBlockLocations(Unknown Source)
        at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:1258)
        at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1245)
        at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1233)
        at org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:302)
        at org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:268)
        at org.apache.hadoop.hdfs.DFSInputStream.<init>(DFSInputStream.java:260)
        at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1564)
        at org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(DistributedFileSystem.java:308)
        at org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(DistributedFileSystem.java:304)
        at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
        at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:304)
Caused by: org.apache.hadoop.net.ConnectTimeoutException: 20000 millis timeout while waiting for channel to be ready for connect. ch : java.nio.channels.SocketChannel[connection-pending remote=CM-GY-HXa-5d1/117.135.251.170:8020]
        at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:533)
        at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:494)
        at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:609)
        at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:708)
        at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:370)
        at org.apache.hadoop.ipc.Client.getConnection(Client.java:1525)
        at org.apache.hadoop.ipc.Client.call(Client.java:1442)
        ... 21 more
I0106 09:47:20.077205  6805 RetryInvocationHandler.java:144] Exception while invoking getBlockLocations of class ClientNamenodeProtocolTranslatorPB over CM-GY-HXa-5d2/117.135.251.171:8020 after 1 fail over attempts. Trying to fail over after sleeping for 1300ms.
Java exception follows:

查询快的时候,就没有这些日志,我怀疑是连接超时造成的impala查询速度慢,但是,这个问题怎么解决呢?

http://stackoverflow.com/questio ... i-am-using-cdh5-5-0


已有(1)人评论

跳转到指定楼层
bob007 发表于 2016-1-22 16:23:44
获得行锁超时问题部分源码:


[mw_shl_code=java,true]public RowLock getRowLock(byte[] row, boolean waitForLock) throws IOException {  
   checkRow(row, "row lock");  
   startRegionOperation();  
   try {  
     HashedBytes rowKey = new HashedBytes(row);  
     RowLockContext rowLockContext = new RowLockContext(rowKey);  
  
     // loop until we acquire the row lock (unless !waitForLock)  
     while (true) {  
       //获取锁,往currenthashmap中putifAbsent rowkey  
       RowLockContext existingContext = lockedRows.putIfAbsent(rowKey, rowLockContext);  
       if (existingContext == null) {  
         // Row is not already locked by any thread, use newly created context.  
         break;  
       } else if (existingContext.ownedByCurrentThread()) {  
         // Row is already locked by current thread, reuse existing context instead.  
         rowLockContext = existingContext;  
         break;  
       } else {  
         // Row is already locked by some other thread, give up or wait for it  
         if (!waitForLock) {  
           return null;  
         }  
         try {//等待其他线程 downlatch,释放锁  
           if (!existingContext.latch.await(this.rowLockWaitDuration, TimeUnit.MILLISECONDS)) {  
             throw new IOException("Timed out waiting for lock for row: " + rowKey);  
           }  
         } catch (InterruptedException ie) {  
           LOG.warn("Thread interrupted waiting for lock on row: " + rowKey);  
           InterruptedIOException iie = new InterruptedIOException();  
           iie.initCause(ie);  
           throw iie;  
         }  
       }  
     }  
  
     // allocate new lock for this thread  
     return rowLockContext.newLock();  
   } finally {  
     closeRegionOperation();  
   }  
}  [/mw_shl_code]

在一下这些地方,需要获得行锁
1.png
有关配置
[mw_shl_code=bash,true]<property>
       <name>hbase.rowlock.wait.duration</name>
       <value>90000</value>
       <description>
        每次获取行锁的超时时间,默认为30s
       </description>
</property>
<property>
                        <name>hbase.regionserver.lease.period</name>
                        <value>180000</value>
                        <description>
                        客户端每次获得rs一次socket时间
                        </description>
</property>

<property>
       <name>hbase.rpc.timeout</name>
       <value>180000</value>
                        <description>
                        rpc超时时间
                        </description>
</property>

<property>
       <name>hbase.client.scanner.timeout.period</name>
       <value>180000</value>
                        <description>
                        客户端每次scan|get的超时时间
                        </description>
</property>

<property>
        <name>hbase.client.scanner.caching</name>
        <value>100</value>
                        <description>
                        客户端每次scan的一个next,获得多少行,默认1
                        </description>
</property>[/mw_shl_code]




回复

使用道具 举报

您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

关闭

推荐上一条 /2 下一条