After long running synapse spark notebook failed with HTTP status code: 400 error

Question

After long running synapse spark notebook failed with HTTP status code: 400 error

Heta Desai 357

Hi,

I have created synapse notebook in which using pyspark I am trying to join multiple delta lake tables and writing it to Azure SQL table. The no. of records in delta lake table are 142 million. Executing notebook from synapse pipeline and it is giving below error:

{
"errorCode": "BadRequest",
"message": "Operation on target Write_AzureSQL failed: InvalidHttpRequestToLivy: Submission failed due to error content =[\"requirement failed: Session isn't active.\"] HTTP status code: 400. Trace ID: 9f6d18b6-3af0-432c-a5a2-1339fa4c55c7.",
"failureType": "UserError",
"target": "Phase 3",
"details": ""
}

2 answers

Your answer

Answer 1

Vinodh247 40,296 MVP Volunteer Moderator

Hi

Thanks for reaching out to Microsoft Q&A.

This is a spark error. Please see the below link, it has explained with settings that has to be done for avoiding this error.

"livy.server.session.timeout":"10h" -- addresses errors from long-running Spark tasks in a Jupyter/EMR notebook that die after an hour of execution:
An error was encountered:
Invalid status code '400' from https://xxx.xx.x.xxx:18888/sessions/0/statements/20 with error payload: "requirement failed: Session isn't active." [reference]

https://towardsdatascience.com/how-to-set-up-a-cost-effective-aws-emr-cluster-and-jupyter-notebooks-for-sparksql-552360ffd4bc

Please Upvote and Accept as answer if the reply was helpful, this will be helpful to other community members.

ShaikMaheer-MSFT 38,631 Reputation points Microsoft Employee Moderator

2022-10-03T10:36:06.373+00:00

Hi @Anonymous ,

Just check is above answer helps? If not, kindly share more details on your implementation and code sample so that I can try to repro and help. It above answer helps, then please consider marking it as Accept Answer. Accepted answers help community as well. Thank you.
Venkataswamy M 25 Reputation points

2023-09-04T09:43:53.73+00:00

the above link can be accessed by only members.
John Sanders (WT) 1 Reputation point

2023-12-01T19:33:13.5933333+00:00

Same. Cannot access the link.

Answer 2

From the article:

[{"classification":"spark","properties":{"maximizeResourceAllocation":"true"}},{"classification":"spark-defaults","properties":{"spark.network.timeout":"1500"}},{"classification":"hdfs-site","properties":{"dfs.replication":"2"}},{"classification":"livy-conf","properties":{"livy.server.session.timeout":"10h"}},{"classification":"emrfs-site","properties":{"fs.s3.maxConnections":"100"}}]

explanation:

"maximizeResourceAllocation":"true" -- Configures your executors to utilize the maximum resources possible on each node in a cluster. This EMR-specific option calculates the maximum compute and memory resources available for an executor on an instance in the core instance group. It then sets the corresponding spark-defaults settings based on this information. [reference]"livy.server.session.timeout":"10h" -- addresses errors from long-running Spark tasks in a Jupyter/EMR notebook that die after an hour of execution: 
An error was encountered:
Invalid status code '400' from https://xxx.xx.x.xxx:18888/sessions/0/statements/20 with error payload: "requirement failed: Session isn't active." [reference]"fs.s3.maxConnections":"100" -- addresses the “Timeout waiting for connection from pool” error [reference]

Daniel Wang 5 Reputation points

2024-12-30T09:00:01.99+00:00

The above article is for AWS, not Synapse. A random link with no viable solution offered.

Share via

After long running synapse spark notebook failed with HTTP status code: 400 error

2 answers

Your answer