Notebook clusters in pipelines

Iwan 65 Reputation points
2024-10-22T09:32:33.5133333+00:00

I have a pipeline which runs sql queries and copies the output into a parquet file, then using the output in a notebook activity.

It takes 3 minutes to run the sql queries and then 5 minutes for the cluster to start, then 2 minutes to run the notebook.

Is it possible to start the cluster in the pipeline in advance so I don't have to wait so long before running the notebook?

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
0 comments No comments
{count} votes

Answer accepted by question author
  1. Smaran Thoomu 32,265 Reputation points Microsoft External Staff Moderator
    2024-10-22T13:34:27.7433333+00:00

    Hi @Iwan

    Welcome to the Microsoft Q&A and thank you for posting your questions here.

    As I understand you're facing a common issue where the notebook activity in your pipeline is delayed due to the time it takes for the cluster to start.

    Currently, in Azure Synapse Pipelines, there is no direct way to pre-warm or start the Spark cluster before reaching the notebook activity step.

    However, a workaround is possible to start the cluster in advance within your pipeline to reduce the waiting time before running your notebook activity. Here’s a workaround to achieve this:

    • Insert a dummy notebook activity at the start of your pipeline that will effectively start the cluster. This notebook can be very simple, such as running a basic command like print("Pre-warming cluster"). The idea is to make sure the cluster is up and running by the time your main notebook activity is needed.
    • Add a Wait activity after the dummy notebook to ensure the cluster has enough time to start up. You can set this delay to the approximate time it takes for the cluster to start (e.g., 5 minutes).
    • After the Wait activity, proceed with your existing activities (running SQL queries, copying output to a Parquet file, running the main notebook activity).
      User's image

    Hope this helps. Do let us know if you any further queries.


    If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

    1 person found this answer helpful.

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.