使用On Yarn提交Spark任务时,为啥一直显示Accepted?

使用On Yarn提交Spark任务时,为啥一直显示Accepted?

# 具体遇到的问题
使用On Yarn提交Spark任务时,一直不断地显示Accepted,然后过了很久之后报如下错误,请问这是什么原因?又该怎么解决?
# 报错信息的截图

# 相关课程内容截图

# 尝试过的解决思路和结果

# 粘贴全部相关代码,切记添加代码注释(请勿截图)

​[root@BigData04 spark-3.0.1-bin-hadoop3.2]# bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn --deploy-mode cluster examples/jars/spark-examples_2.12-3.0.1.jar 2

2020-12-23 21:47:51,970 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

2020-12-23 21:47:52,089 INFO client.RMProxy: Connecting to ResourceManager at BigData01/192.168.93.128:8032

2020-12-23 21:47:52,479 INFO yarn.Client: Requesting a new application from cluster with 3 NodeManagers

2020-12-23 21:47:53,146 INFO conf.Configuration: resource-types.xml not found

2020-12-23 21:47:53,147 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.

2020-12-23 21:47:53,204 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster (8192 MB per container)

2020-12-23 21:47:53,205 INFO yarn.Client: Will allocate AM container, with 1408 MB memory including 384 MB overhead

2020-12-23 21:47:53,205 INFO yarn.Client: Setting up container launch context for our AM

2020-12-23 21:47:53,206 INFO yarn.Client: Setting up the launch environment for our AM container

2020-12-23 21:47:53,219 INFO yarn.Client: Preparing resources for our AM container

2020-12-23 21:47:53,367 WARN yarn.Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.

2020-12-23 21:47:55,797 INFO yarn.Client: Uploading resource file:/tmp/spark-073dfd19-bf54-4ccf-9f93-9cfd7871e042/__spark_libs__6704468600548540756.zip -> hdfs://BigData01:9000/user/root/.sparkStaging/application_1608731015658_0001/__spark_libs__6704468600548540756.zip

2020-12-23 21:47:58,923 INFO yarn.Client: Uploading resource file:/data/software/spark-3.0.1-bin-hadoop3.2/examples/jars/spark-examples_2.12-3.0.1.jar -> hdfs://BigData01:9000/user/root/.sparkStaging/application_1608731015658_0001/spark-examples_2.12-3.0.1.jar

2020-12-23 21:47:59,274 INFO yarn.Client: Uploading resource file:/tmp/spark-073dfd19-bf54-4ccf-9f93-9cfd7871e042/__spark_conf__3726838920049808942.zip -> hdfs://BigData01:9000/user/root/.sparkStaging/application_1608731015658_0001/__spark_conf__.zip

2020-12-23 21:47:59,364 INFO spark.SecurityManager: Changing view acls to: root

2020-12-23 21:47:59,365 INFO spark.SecurityManager: Changing modify acls to: root

2020-12-23 21:47:59,365 INFO spark.SecurityManager: Changing view acls groups to: 

2020-12-23 21:47:59,366 INFO spark.SecurityManager: Changing modify acls groups to: 

2020-12-23 21:47:59,366 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(root); groups with view permissions: Set(); users  with modify permissions: Set(root); groups with modify permissions: Set()

2020-12-23 21:47:59,436 INFO yarn.Client: Submitting application application_1608731015658_0001 to ResourceManager

2020-12-23 21:47:59,831 INFO impl.YarnClientImpl: Submitted application application_1608731015658_0001

2020-12-23 21:48:00,838 INFO yarn.Client: Application report for application_1608731015658_0001 (state: ACCEPTED)

2020-12-23 21:48:00,841 INFO yarn.Client: 

client token: N/A

diagnostics: [Wed Dec 23 21:48:00 +0800 2020] Scheduler has assigned a container for AM, waiting for AM container to be launched

ApplicationMaster host: N/A

ApplicationMaster RPC port: -1

queue: default

start time: 1608731279575

final status: UNDEFINED

tracking URL: http://BigData01:8088/proxy/application_1608731015658_0001/

user: root

2020-12-23 21:48:01,845 INFO yarn.Client: Application report for application_1608731015658_0001 (state: ACCEPTED)

.....

2020-12-23 21:48:09,879 INFO yarn.Client: Application report for application_1608731015658_0001 (state: ACCEPTED)


2020-12-23 22:06:42,871 INFO yarn.Client: 

client token: N/A

diagnostics: Application application_1608731015658_0001 failed 2 times due to Error launching appattempt_1608731015658_0001_000002. Got exception: java.net.ConnectException: Call From localhost/127.0.0.1 to localhost:43799 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused

at sun.reflect.GeneratedConstructorAccessor47.newInstance(Unknown Source)

at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)

at java.lang.reflect.Constructor.newInstance(Constructor.java:423)

at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:837)

at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:757)

at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1566)

at org.apache.hadoop.ipc.Client.call(Client.java:1508)

at org.apache.hadoop.ipc.Client.call(Client.java:1405)

at org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:234)

at org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:119)

at com.sun.proxy.$Proxy83.startContainers(Unknown Source)






在这里输入代码,可通过选择【代码语言】突出显示

正在回答 回答被采纳积分+1

登陆购买课程后可参与讨论,去登陆

1回答
徐老师 2020-12-23 22:37:09

1: 到yarn的8088界面中看一下详细的日志信息,截图发一下,这里的错误信息看起来不是很清晰

2: 建议目前使用课程中提供的spark 2.4版本,因为目前企业中用的最多的并且稳定的是2.x,否则可能还会遇到一些api层面的变化,和课程中的写法不一致的情况,因为2.x到3.x属于大版本变更了

3: 我明天测试一下spark3.x中能不能复现你现在遇到的问题,今天在外地出差,明天我测试一下

  • 提问者 源自我心 #1
    2020-12-23 23:22:29,709 ERROR yarn.YarnAllocator: Failed to launch executor 11 on container container_1608735746030_0002_01_000012
    org.apache.spark.SparkException: Exception while starting container container_1608735746030_0002_01_000012 on host localhost
    at org.apache.spark.deploy.yarn.ExecutorRunnable.startContainer(ExecutorRunnable.scala:129)
    at org.apache.spark.deploy.yarn.ExecutorRunnable.run(ExecutorRunnable.scala:68)
    at org.apache.spark.deploy.yarn.YarnAllocator.$anonfun$runAllocatedContainers$4(YarnAllocator.scala:570)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
    Caused by: java.net.ConnectException: Call From #localhost.localdomain/127.0.0.1 to localhost:33737 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
    at sun.reflect.GeneratedConstructorAccessor11.newInstance(Unknown Source)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at org.apache.hadoop.yarn.client.api.impl.NMClientImpl.startContainer(NMClientImpl.java:207)
    at org.apache.spark.deploy.yarn.ExecutorRunnable.startContainer(ExecutorRunnable.scala:125)
    ... 5 more
    Caused by: java.net.ConnectException: Connection refused
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
    at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
    at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)
    at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:690)
    at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:794)
    at org.apache.hadoop.ipc.Client$Connection.access$3700(Client.java:411)
    at org.apache.hadoop.ipc.Client.getConnection(Client.java:1572)
    at org.apache.hadoop.ipc.Client.call(Client.java:1403)
    ... 21 more
    2020-12-23 23:22:29,709 ERROR yarn.YarnAllocator: Failed to launch executor 12 on container container_1608735746030_0002_01_000013
    org.apache.spark.SparkException: Exception while starting container container_1608735746030_0002_01_000013 on host localhost
    at org.apache.spark.deploy.yarn.ExecutorRunnable.startContainer(ExecutorRunnable.scala:129)
    at org.apache.spark.deploy.yarn.ExecutorRunnable.run(ExecutorRunnable.scala:68)
    at org.apache.spark.deploy.yarn.YarnAllocator.$anonfun$runAllocatedContainers$4(YarnAllocator.scala:570)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
    Caused by: java.net.ConnectException: Call From #localhost.localdomain/127.0.0.1 to localhost:45498 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
    at sun.reflect.GeneratedConstructorAccessor11.newInstance(Unknown Source)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:423)

    at org.apache.spark.deploy.yarn.ExecutorRunnable.startContainer(ExecutorRunnable.scala:125)
    ... 5 more
    Caused by: java.net.ConnectException: Connection refused


    2020-12-23 23:30:10
  • 徐老师 回复 提问者 源自我心 #2
    这个报错信息总是提示连接localhost失败,从日志信息中来看你这个应该是一个分布式集群,所以应该和配置有关系,你在慕课大数据群里面吗,给我发一个消息,具体沟通一下
    2020-12-23 23:37:07
  • 徐老师 回复 提问者 源自我心 #3
    同学,方便的时候在群里给我发一个消息哈,便于详细沟通一下这个问题
    2020-12-24 12:11:15
问题已解决,确定采纳
还有疑问,暂不采纳

恭喜解决一个难题,获得1积分~

来为老师/同学的回答评分吧

0 星
请稍等 ...
意见反馈 帮助中心 APP下载
官方微信

在线咨询

领取优惠

免费试听

领取大纲

扫描二维码,添加
你的专属老师