Failed to submit Azure Synapse Analytics Spark Job

Anjan Pradhan 21 Reputation points
2021-04-14T22:32:21.357+00:00

I've created an Azure Synapse workspace (and related resources like ADLS Gen2 storage, and Apache Spark pool). Uploaded the Spark Pi example JAR to the linked ADLS Gen2 storage and created a Spark Job definition to run the same Spark Pi example. However, I am seeing error in the Spark Job submit.

On the Spark Job Definition page within Azure Synapse Studio, I am seeing these messages:

4:02:37 PM Submit Apache Spark job start.
           Submitting job "Spark job definition 1"...

4:09:00 PM Failed to submit the Spark job
           Spark monitoring URL: ...

On the Spark Application monitor page, these are the logs from Livy:

    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
21/04/14 21:04:07 INFO ShutdownHookManager: Shutdown hook called
21/04/14 21:04:07 INFO ShutdownHookManager: Deleting directory /tmp/spark-9d17628a-ed29-44a1-b7f2-0e895c53c519
21/04/14 21:04:07 INFO MetricsSystemImpl: Stopping azure-file-system metrics system...
21/04/14 21:04:07 INFO MetricsSystemImpl: azure-file-system metrics system stopped.
21/04/14 21:04:07 INFO MetricsSystemImpl: azure-file-system metrics system shutdown complete.

stderr: 

YARN Diagnostics: 
No YARN application is found with tag livy-batch-0-bpyt81ry in 300 seconds. This may be because 1) spark-submit fail to submit application to YARN; or 2) YARN cluster doesn't have enough resources to start the application in time. Please check Livy log and YARN log to know the details.

The Spark Application monitor page also says:
This application failed due to the total number of errors: 1. View error details

The error details page contains this error:

{"StatusCode":500,"Message":"Authorization property is not specified in http request header, and/or incoming traffic is not from private link.","ExceptionDetail":"System.Exception: Authorization property is not specified in http request header, and/or incoming traffic is not from private link.\r\n   at Microsoft.Analytics.Clusters.Common.Web.PubSubAuthorizationMiddleWare.AuthorizeAsync(HttpContext context) in C:\\source\\Shared\\Web\\PubSubAuthorizationMiddleWare.cs:line 175\r\n   at Microsoft.Analytics.Clusters.Common.Web.PubSubAuthorizationMiddleWare.InvokeAsync(HttpContext context) in C:\\source\\Shared\\Web\\PubSubAuthorizationMiddleWare.cs:line 73\r\n   at Microsoft.Analytics.Clusters.Common.Web.ExceptionMiddleware.InvokeAsync(HttpContext httpContext) in C:\\source\\Shared\\Web\\ExceptionMiddleware.cs:line 54","ErrorType":"None","ErrorNumber":0,"ErrorOn":"2021-04-14T21:09:48.9457637+00:00"}

The job definition is very simple and I believe I've setup all the access control right.
Could you please explain me why I am getting this error?

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
2,626 questions
No comments
{count} votes

Accepted answer
  1. Samara Soucy - MSFT 4,931 Reputation points
    2021-04-15T14:51:54.05+00:00

    The most common cause of authorization issues in Synapse is permissions on the ADLS storage account. Can you please check to ensure that you have the Storage Blob Data Contributor role assigned to you on the storage account?

    If that does not clear the error, with a brand new spark pool it may also be worthwhile to delete and recreate the Spark pool in case something was not provisioned properly.

    Please let me know if either of these clears the issue or if we need to do further troubleshooting.


0 additional answers

Sort by: Most helpful