sun 发表于 2016-7-8 17:03:32

关于kylin构建cube中step2时报connection exception异常

2016-07-08 14:02:53,547 INFO hive.metastore: Trying to connect to metastore with URI thrift://10.209.30.19:9083
2016-07-08 14:02:53,547 INFO hive.metastore: Opened a connection to metastore, current connections: 4
2016-07-08 14:02:53,549 INFO hive.metastore: Connected to metastore.
2016-07-08 14:02:53,661 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
2016-07-08 14:02:54,668 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
2016-07-08 14:02:55,669 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
2016-07-08 14:02:56,669 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
2016-07-08 14:02:57,669 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
2016-07-08 14:02:58,670 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
2016-07-08 14:02:59,670 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
2016-07-08 14:03:00,670 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 6 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
2016-07-08 14:03:01,671 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 7 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
2016-07-08 14:03:02,671 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 8 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
......


java.net.ConnectException: Call From mydomain/10.209.30.19 to 0.0.0.0:8032 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:http://wiki.apache.org/hadoop/ConnectionRefused
      at sun.reflect.GeneratedConstructorAccessor45.newInstance(Unknown Source)
      at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
      at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
      at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:783)
      at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:730)
      at org.apache.hadoop.ipc.Client.call(Client.java:1351)
      at org.apache.hadoop.ipc.Client.call(Client.java:1300)
      at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
      at com.sun.proxy.$Proxy43.getNewApplication(Unknown Source)
      at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getNewApplication(ApplicationClientProtocolPBClientImpl.java:167)
      at sun.reflect.GeneratedMethodAccessor42.invoke(Unknown Source)
      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      at java.lang.reflect.Method.invoke(Method.java:606)
      at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
      at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
      at com.sun.proxy.$Proxy44.getNewApplication(Unknown Source)
      at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getNewApplication(YarnClientImpl.java:127)
      at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.createApplication(YarnClientImpl.java:135)
      at org.apache.hadoop.mapred.ResourceMgrDelegate.getNewJobID(ResourceMgrDelegate.java:175)
      at org.apache.hadoop.mapred.YARNRunner.getNewJobID(YARNRunner.java:229)
      at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:355)
      at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1268)
      at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1265)
      at java.security.AccessController.doPrivileged(Native Method)
      at javax.security.auth.Subject.doAs(Subject.java:415)
      at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
      at org.apache.hadoop.mapreduce.Job.submit(Job.java:1265)
      at org.apache.kylin.engine.mr.common.AbstractHadoopJob.waitForCompletion(AbstractHadoopJob.java:147)
      at org.apache.kylin.engine.mr.steps.FactDistinctColumnsJob.run(FactDistinctColumnsJob.java:96)
      at org.apache.kylin.engine.mr.MRUtil.runMRJob(MRUtil.java:91)
      at org.apache.kylin.engine.mr.common.MapReduceExecutable.doWork(MapReduceExecutable.java:121)
      at org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:114)
      at org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:50)
      at org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:114)
      at org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:124)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
      at java.lang.Thread.run(Thread.java:745)


hadoop/hive/hbase/zk的环境是正常的,
看上去像是在提交mr之前,由于未正确获取yarn的配置,从而利用默认0.0.0.0:8032去建立连接,导致连接失败。我看到kylin System界面中与yarn的相关环境变量都显示正常,我又尝试将yarn-site.xml文件放入classes下,重新执行后依然是相同的报错结果。

我不明白为何在FactDistinctColumnsJob的时候初始化YarnConfiguration的时候未正确获取到yarn-site.xml的配置,其实之前也碰到类似问题,我是通过在代码中间yarn-site.xmladdResource进去就解决了。对于kylin,不知道这是什么导致的.

qcbb001 发表于 2016-7-8 19:46:21

这个其实是自己连接不上自己。
楼主的hosts贴出来看下。
0.0.0.0是表示本地网络的意思


sun 发表于 2016-7-8 20:37:48

qcbb001 发表于 2016-7-8 19:46
这个其实是自己连接不上自己。
楼主的hosts贴出来看下。
0.0.0.0是表示本地网络的意思

配置没找到就是用默认的0.0.0.0:8032代替的,也就是说在构建YarnConfiguration的时候并未读取yarn-site.xml配置,或者yarn-site.xml配置有问题。我确信yarn-site.xml配置是没问题的。
@Private
protected InetSocketAddress getRMAddress(YarnConfiguration conf, Class<?> protocol) throws IOException {
    if(protocol == ApplicationClientProtocol.class) {
      return conf.getSocketAddr("yarn.resourcemanager.address", "0.0.0.0:8032", 8032);
    } else if(protocol == ResourceManagerAdministrationProtocol.class) {
      return conf.getSocketAddr("yarn.resourcemanager.admin.address", "0.0.0.0:8033", 8033);
    } else if(protocol == ApplicationMasterProtocol.class) {
      setAMRMTokenService(conf);
      return conf.getSocketAddr("yarn.resourcemanager.scheduler.address", "0.0.0.0:8030", 8030);
    } else {
      String message = "Unsupported protocol found when creating the proxy connection to ResourceManager: " + (protocol != null?protocol.getClass().getName():"null");
      LOG.error(message);
      throw new IllegalStateException(message);
    }
}

qcbb001 发表于 2016-7-9 07:34:17

sun 发表于 2016-7-8 20:37
配置没找到就是用默认的0.0.0.0:8032代替的,也就是说在构建YarnConfiguration的时候并未读取yarn-site. ...

配置文件找不到,大多数是因为权限导致的不能访问。
当然还有其它,比如你的配置文件路径是否正确,配置项是否正确等原因
页: [1]
查看完整版本: 关于kylin构建cube中step2时报connection exception异常