After long running synapse spark notebook failed with HTTP status code: 400 error

heta desai 247 Reputation points
2022-09-30T07:21:37.717+00:00

Hi,

I have created synapse notebook in which using pyspark I am trying to join multiple delta lake tables and writing it to Azure SQL table. The no. of records in delta lake table are 142 million. Executing notebook from synapse pipeline and it is giving below error:

{
"errorCode": "BadRequest",
"message": "Operation on target Write_AzureSQL failed: InvalidHttpRequestToLivy: Submission failed due to error content =[\"requirement failed: Session isn't active.\"] HTTP status code: 400. Trace ID: 9f6d18b6-3af0-432c-a5a2-1339fa4c55c7.",
"failureType": "UserError",
"target": "Phase 3",
"details": ""
}

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
4,432 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. Vinodh247-1375 11,301 Reputation points
    2022-09-30T07:31:32.56+00:00

    Hi

    Thanks for reaching out to Microsoft Q&A.

    This is a spark error. Please see the below link, it has explained with settings that has to be done for avoiding this error.

    "livy.server.session.timeout":"10h" -- addresses errors from long-running Spark tasks in a Jupyter/EMR notebook that die after an hour of execution:
    An error was encountered:
    Invalid status code '400' from https://xxx.xx.x.xxx:18888/sessions/0/statements/20 with error payload: "requirement failed: Session isn't active." [reference]

    https://towardsdatascience.com/how-to-set-up-a-cost-effective-aws-emr-cluster-and-jupyter-notebooks-for-sparksql-552360ffd4bc

    Please Upvote and Accept as answer if the reply was helpful, this will be helpful to other community members.