Connect pipeline to established Spark Pool

Yaroslaw Khomenko [C] 21 Reputation points
2021-05-23T08:54:45.687+00:00

Is there a way to speed up a pipeline that is executing a notebook on spark pool?
The pipeline itself executes for 30-35 seconds, but cluster start takes around 3 minutes, which is totally inefficient.

For example, in Databricks jobs you can execute a job using cluster which is already running and you dont need to wait for the cluster to start. How to achieve the same using Synapse?

Thanks in advance,
Yaro

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
5,367 questions
0 comments No comments
{count} votes

Accepted answer
  1. Samara Soucy - MSFT 5,141 Reputation points
    2021-05-26T02:10:29.467+00:00

    I'm guessing you are asking about the idle pools feature in Databricks? You can prevent the pool from being deallocated by turning off auto-shutdown, and paying the related costs, but you currently can't avoid creating a fresh cluster for a given job.

    We always take requests like this from the forums back to the product teams, but you can also put a feature request on feedback.azure.com since the vote system there helps with feature prioritization.


0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.