Hi @Iwan
Welcome to the Microsoft Q&A and thank you for posting your questions here.
As I understand you're facing a common issue where the notebook activity in your pipeline is delayed due to the time it takes for the cluster to start.
Currently, in Azure Synapse Pipelines, there is no direct way to pre-warm or start the Spark cluster before reaching the notebook activity step.
However, a workaround is possible to start the cluster in advance within your pipeline to reduce the waiting time before running your notebook activity. Here’s a workaround to achieve this:
- Insert a dummy notebook activity at the start of your pipeline that will effectively start the cluster. This notebook can be very simple, such as running a basic command like
print("Pre-warming cluster"). The idea is to make sure the cluster is up and running by the time your main notebook activity is needed. - Add a Wait activity after the dummy notebook to ensure the cluster has enough time to start up. You can set this delay to the approximate time it takes for the cluster to start (e.g., 5 minutes).
- After the Wait activity, proceed with your existing activities (running SQL queries, copying output to a Parquet file, running the main notebook activity).
Hope this helps. Do let us know if you any further queries.
If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.