After long running synapse spark notebook failed with HTTP status code: 400 error

Question

Hi,

I have created synapse notebook in which using pyspark I am trying to join multiple delta lake tables and writing it to Azure SQL table. The no. of records in delta lake table are 142 million. Executing notebook from synapse pipeline and it is giving below error:

{
"errorCode": "BadRequest",
"message": "Operation on target Write_AzureSQL failed: InvalidHttpRequestToLivy: Submission failed due to error content =[\"requirement failed: Session isn't active.\"] HTTP status code: 400. Trace ID: 9f6d18b6-3af0-432c-a5a2-1339fa4c55c7.",
"failureType": "UserError",
"target": "Phase 3",
"details": ""
}

Answer

Hi

Thanks for reaching out to Microsoft Q&A.

This is a spark error. Please see the below link, it has explained with settings that has to be done for avoiding this error.

"livy.server.session.timeout":"10h" -- addresses errors from long-running Spark tasks in a Jupyter/EMR notebook that die after an hour of execution:
An error was encountered:
Invalid status code '400' from https://xxx.xx.x.xxx:18888/sessions/0/statements/20 with error payload: "requirement failed: Session isn't active." [reference]

https://towardsdatascience.com/how-to-set-up-a-cost-effective-aws-emr-cluster-and-jupyter-notebooks-for-sparksql-552360ffd4bc

Please Upvote and Accept as answer if the reply was helpful, this will be helpful to other community members.

After long running synapse spark notebook failed with HTTP status code: 400 error

1 answer