Synapse Spark Notebook runs manually fine - times out in Pipeline. Why?

Mark Mace 71 Reputation points
2023-08-16T16:47:37.3833333+00:00

The Spark Notebook in question:

If ran manually, takes 2.5 minutes and finishes successfully

If in a pipeline, errors out after 2 hours, with below error message

"errorCode": "6002", "message": "Error: Job aborted.\norg.apache.spark.sql.errors.QueryExecutionErrors$.jobAbortedError

Looking at the pipeline running, it's spinning in circles on the last step, which is an INSERT.

  • I've tried:
  • Running the Notebook manually (as said works fine; finishes in 2.5 minutes)
  • Running the Notebook manually, as managed identity (works the same - fine)
  • Creating a brand new pipeline with only the troublesome Spark Notebook in it - that pipeline also fails after 2 hours (timeout I'm guessing)?
  • Running pipeline on a large Spark pool - at 24 minutes and counting, so pretty sure this is going to fail too.

Thoughts?

Similar Post: https://www.reddit.com/r/AZURE/comments/15sv3es/synapse_spark_noteobook_runs_fine_normally_times/

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
5,309 questions
{count} votes

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.