Hello Martin B,
Synapse creates a Spark session automatically when you start a notebook. This session is managed by the Synapse service and is typically intended to last for the entire duration of the notebook execution. Stopping and restarting the Spark session using spark.stop() in your code may disrupt the integration between the notebook and the underlying Synapse service, leading to incomplete or missing UI elements in the Spark History Server.
I personally saw errors like driver is running into out of memory situations when running spark.stop()
Alternatives:
The cleanest way to separate your Spark jobs would be to use different notebooks or pipelines for each step. This ensures each step runs as a distinct Spark application, with its own Spark session and context. Yes, there's a startup cost, but this approach simplifies monitoring, logging, and troubleshooting. You can orchestrate these notebooks in an Azure Synapse Pipeline if you want to maintain the overall flow and use built-in monitoring features
If using separate notebooks isn't an option due to startup time concerns, another approach could be to manage Spark jobs within a single session by grouping related jobs into stages.
Please see the below article explains how multiple SparkSessions can be created under one SparkContext.
https://www.waitingforcode.com/apache-spark-sql/multiple-sparksession-one-sparkcontext/read
also, the below Microsoft document provides guidance on how Spark is managed within Synapse:
I hope this helps.