使用On Yarn提交Spark任务时，为啥一直显示Accepted？

# 具体遇到的问题
使用On Yarn提交Spark任务时，一直不断地显示Accepted，然后过了很久之后报如下错误，请问这是什么原因？又该怎么解决？
# 报错信息的截图

# 相关课程内容截图

# 尝试过的解决思路和结果

# 粘贴全部相关代码，切记添加代码注释（请勿截图）

[root@BigData04 spark-3.0.1-bin-hadoop3.2]# bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn --deploy-mode cluster examples/jars/spark-examples_2.12-3.0.1.jar 2

2020-12-23 21:47:51,970 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

2020-12-23 21:47:52,089 INFO client.RMProxy: Connecting to ResourceManager at BigData01/192.168.93.128:8032

2020-12-23 21:47:52,479 INFO yarn.Client: Requesting a new application from cluster with 3 NodeManagers

2020-12-23 21:47:53,146 INFO conf.Configuration: resource-types.xml not found

2020-12-23 21:47:53,147 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.

2020-12-23 21:47:53,204 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster (8192 MB per container)

2020-12-23 21:47:53,205 INFO yarn.Client: Will allocate AM container, with 1408 MB memory including 384 MB overhead

2020-12-23 21:47:53,205 INFO yarn.Client: Setting up container launch context for our AM

2020-12-23 21:47:53,206 INFO yarn.Client: Setting up the launch environment for our AM container

2020-12-23 21:47:53,219 INFO yarn.Client: Preparing resources for our AM container

2020-12-23 21:47:53,367 WARN yarn.Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.

2020-12-23 21:47:55,797 INFO yarn.Client: Uploading resource file:/tmp/spark-073dfd19-bf54-4ccf-9f93-9cfd7871e042/__spark_libs__6704468600548540756.zip -> hdfs://BigData01:9000/user/root/.sparkStaging/application_1608731015658_0001/__spark_libs__6704468600548540756.zip

2020-12-23 21:47:58,923 INFO yarn.Client: Uploading resource file:/data/software/spark-3.0.1-bin-hadoop3.2/examples/jars/spark-examples_2.12-3.0.1.jar -> hdfs://BigData01:9000/user/root/.sparkStaging/application_1608731015658_0001/spark-examples_2.12-3.0.1.jar

2020-12-23 21:47:59,274 INFO yarn.Client: Uploading resource file:/tmp/spark-073dfd19-bf54-4ccf-9f93-9cfd7871e042/__spark_conf__3726838920049808942.zip -> hdfs://BigData01:9000/user/root/.sparkStaging/application_1608731015658_0001/__spark_conf__.zip

2020-12-23 21:47:59,364 INFO spark.SecurityManager: Changing view acls to: root

2020-12-23 21:47:59,365 INFO spark.SecurityManager: Changing modify acls to: root

2020-12-23 21:47:59,365 INFO spark.SecurityManager: Changing view acls groups to:

2020-12-23 21:47:59,366 INFO spark.SecurityManager: Changing modify acls groups to:

2020-12-23 21:47:59,366 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); groups with view permissions: Set(); users with modify permissions: Set(root); groups with modify permissions: Set()

2020-12-23 21:47:59,436 INFO yarn.Client: Submitting application application_1608731015658_0001 to ResourceManager

2020-12-23 21:47:59,831 INFO impl.YarnClientImpl: Submitted application application_1608731015658_0001

2020-12-23 21:48:00,838 INFO yarn.Client: Application report for application_1608731015658_0001 (state: ACCEPTED)

2020-12-23 21:48:00,841 INFO yarn.Client:

client token: N/A

diagnostics: [Wed Dec 23 21:48:00 +0800 2020] Scheduler has assigned a container for AM, waiting for AM container to be launched

ApplicationMaster host: N/A

ApplicationMaster RPC port: -1

queue: default

start time: 1608731279575

final status: UNDEFINED

tracking URL: http://BigData01:8088/proxy/application_1608731015658_0001/

user: root

2020-12-23 21:48:01,845 INFO yarn.Client: Application report for application_1608731015658_0001 (state: ACCEPTED)

.....

2020-12-23 21:48:09,879 INFO yarn.Client: Application report for application_1608731015658_0001 (state: ACCEPTED)

2020-12-23 22:06:42,871 INFO yarn.Client:

client token: N/A

diagnostics: Application application_1608731015658_0001 failed 2 times due to Error launching appattempt_1608731015658_0001_000002. Got exception: java.net.ConnectException: Call From localhost/127.0.0.1 to localhost:43799 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused

at sun.reflect.GeneratedConstructorAccessor47.newInstance(Unknown Source)

at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)

at java.lang.reflect.Constructor.newInstance(Constructor.java:423)

at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:837)

at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:757)

at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1566)

at org.apache.hadoop.ipc.Client.call(Client.java:1508)

at org.apache.hadoop.ipc.Client.call(Client.java:1405)

at org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:234)

at org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:119)

at com.sun.proxy.$Proxy83.startContainers(Unknown Source)

在这里输入代码，可通过选择【代码语言】突出显示

源自我心 2020-12-23

源自：Spark快速上手 1-3 Spark ON YARN集群安装部署

收起

1回答

徐老师 2020-12-23 22:37:09

1: 到yarn的8088界面中看一下详细的日志信息，截图发一下，这里的错误信息看起来不是很清晰

2: 建议目前使用课程中提供的spark 2.4版本，因为目前企业中用的最多的并且稳定的是2.x，否则可能还会遇到一些api层面的变化，和课程中的写法不一致的情况，因为2.x到3.x属于大版本变更了

3: 我明天测试一下spark3.x中能不能复现你现在遇到的问题，今天在外地出差，明天我测试一下

收起回答

提问者源自我心 #1

2020-12-23 23:22:29,709 ERROR yarn.YarnAllocator: Failed to launch executor 11 on container container_1608735746030_0002_01_000012
org.apache.spark.SparkException: Exception while starting container container_1608735746030_0002_01_000012 on host localhost
	at org.apache.spark.deploy.yarn.ExecutorRunnable.startContainer(ExecutorRunnable.scala:129)
	at org.apache.spark.deploy.yarn.ExecutorRunnable.run(ExecutorRunnable.scala:68)
	at org.apache.spark.deploy.yarn.YarnAllocator.$anonfun$runAllocatedContainers$4(YarnAllocator.scala:570)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.ConnectException: Call From #localhost.localdomain/127.0.0.1 to localhost:33737 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
	at sun.reflect.GeneratedConstructorAccessor11.newInstance(Unknown Source)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at org.apache.hadoop.yarn.client.api.impl.NMClientImpl.startContainer(NMClientImpl.java:207)
	at org.apache.spark.deploy.yarn.ExecutorRunnable.startContainer(ExecutorRunnable.scala:125)
	... 5 more
Caused by: java.net.ConnectException: Connection refused
	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
	at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
	at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
	at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)
	at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:690)
	at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:794)
	at org.apache.hadoop.ipc.Client$Connection.access$3700(Client.java:411)
	at org.apache.hadoop.ipc.Client.getConnection(Client.java:1572)
	at org.apache.hadoop.ipc.Client.call(Client.java:1403)
	... 21 more
2020-12-23 23:22:29,709 ERROR yarn.YarnAllocator: Failed to launch executor 12 on container container_1608735746030_0002_01_000013
org.apache.spark.SparkException: Exception while starting container container_1608735746030_0002_01_000013 on host localhost
	at org.apache.spark.deploy.yarn.ExecutorRunnable.startContainer(ExecutorRunnable.scala:129)
	at org.apache.spark.deploy.yarn.ExecutorRunnable.run(ExecutorRunnable.scala:68)
	at org.apache.spark.deploy.yarn.YarnAllocator.$anonfun$runAllocatedContainers$4(YarnAllocator.scala:570)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.ConnectException: Call From #localhost.localdomain/127.0.0.1 to localhost:45498 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
	at sun.reflect.GeneratedConstructorAccessor11.newInstance(Unknown Source)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:423)

	at org.apache.spark.deploy.yarn.ExecutorRunnable.startContainer(ExecutorRunnable.scala:125)
	... 5 more
Caused by: java.net.ConnectException: Connection refused

2020-12-23 23:30:10

徐老师回复提问者源自我心 #2

这个报错信息总是提示连接localhost失败，从日志信息中来看你这个应该是一个分布式集群，所以应该和配置有关系，你在慕课大数据群里面吗，给我发一个消息，具体沟通一下

2020-12-23 23:37:07
徐老师回复提问者源自我心 #3

同学，方便的时候在群里给我发一个消息哈，便于详细沟通一下这个问题

2020-12-24 12:11:15