how to forcedly restart cluster when notebook activity retry?

Tian, Xinyong 0 Reputation points
2023-03-20T16:16:05.4733333+00:00

Hi

I have a synapse pipeline with a notebook activity. Occasionally, the notebook fails. So I change retry option to 2. The notebook does retry when it fails and it runs successfully on the second try. But I find the spark application of first notebook session does not stop when the retry starts, and runs along with second spark application. The spark pool is set with autoscale and is large enough to hold two notebooks running at same time. So more nodes could be allocated for the retry notebook. In the end, it cost more than double of normal. Is there a way to forcedly stop the first spark application , or spark cluster when the first notebook fails? I plan to turn autoscaling of spark pool off and set number nodes only enough to run one notebook. But I am not sure if it will work.

Thanks.

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
4,868 questions
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
10,583 questions
0 comments No comments
{count} votes

2 answers

Sort by: Most helpful
  1. Bhargava-MSFT 30,891 Reputation points Microsoft Employee
    2023-03-21T20:02:25.4866667+00:00

    Hello @Tian, Xinyong,

    Welcome to the MS Q&A platform.

    You can use the mssparkutils.session.stop() API to stop the current interactive session asynchronously in the background, it stops the Spark session and releases resources occupied by the session so they are available to other sessions in the same pool.

    You can add the mssparkutils.session.stop() command at the end of your notebook code. if the notebook execution fails for any reason, the spark session will be stopped(spark session will be stopped regardless of whether the notebook execution succeeds or fails).

    Please note: this will stop the entire Spark session associated with the notebook run, which may impact other running jobs or notebooks that are using the same Spark cluster.

    (Or) you can use Spark Session - Cancel Spark Session API to cancel a running spark session.

    DELETE {endpoint}/livyApi/versions/{livyApiVersion}/sparkPools/{sparkPoolName}/sessions/{sessionId}

    Reference documents:
    https://learn.microsoft.com/en-us/azure/synapse-analytics/spark/microsoft-spark-utilities?pivots=programming-language-python#stop-an-interactive-session

    https://learn.microsoft.com/en-us/rest/api/synapse/data-plane/spark-session/cancel-spark-session?tabs=HTTP

    I hope this helps. Please let me know if you have any further questions.

    If this answers your question, please consider accepting the answer by hitting the Accept answer and up-vote as it helps the community look for answers to similar questions

    0 comments No comments

  2. Tian, Xinyong 0 Reputation points
    2023-03-27T15:00:52.5+00:00

    @Bhargava-MSFT ,thank you for your answer, I have not tried yet. But I doubt it will work, because I think when the notebook fails, it probably fails in the middle of notebooks. So putting the mssparkutils.session.stop() to the end of notebooks probably won't have any effect. Instead, I increased the 'Retry interval' in notebook activity to 1800 s (30 min) from previous 10 min. So far it works. I think the spark pool auto pause need more time to start to work (the auto pause is set as 5 min).


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.