Understanding and Optimizing the Synapse Notebook (Apache Spark Pool)

Abinash Tumulu 56 Reputation points Microsoft Employee
2021-09-29T21:15:56.2+00:00

Hello Team,

We have a Synapse pipeline which has Notebook as an activity.

Tried pipeline with multiple SKUs and here is our observation, however unable to understand which is the best SKU to select for Production?

Pool Size Time Taken
Large (3-200 Nodes) – Auto Scale enabled =>17 mins~
Large (3 Nodes) – Auto Scale disabled =>30 mins~
Medium (3 Nodes) – Auto Scale disabled =>45 mins~

136386-notebook.jpg

Questions/Suggestions:

  1. From above time taken for different Pool size – which one is best suggested for Production work loads? How to choose best suitable?
  2. For one of the Notebook session When I see the execution details, there is a mismatch in “Total Duration” vs “Playback” duration, why there is so much difference? Is this expected?
  3. Is it good practice to customize the number of executors? And what is the best way to do it?
Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
5,047 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. HimanshuSinha-msft 19,476 Reputation points Microsoft Employee
    2021-09-30T23:38:37.337+00:00

    Hello @Anonymous ,
    Thanks for the ask and using Microsoft Q&A platform .

    I will start with what is the workload which we are trying to process and is the data which spark is consuming is paritioned or not . For example if you are processing 100GB of csv file on a small cluster ( without partition) adding executor will not help . I wil also go ahead and put the autoscale ON in production . Also I think you will also have to look into the internal details as to how the data is processed . Have you gone through this link .
    https://learn.microsoft.com/en-us/azure/synapse-analytics/spark/apache-spark-history-server

    Please do let me know how it goes .
    Thanks
    Himanshu

    -------------------------------------------------------------------------------------------------------------------------

    • Please don't forget to click on 130616-image.png or upvote 130671-image.png button whenever the information provided helps you. Original posters help the community find answers faster by identifying the correct answer. Here is how
    • Want a reminder to come back and check responses? Here is how to subscribe to a notification
    • If you are interested in joining the VM program and help shape the future of Q&A: Here is how you can be part of Q&A Volunteer Moderators
    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.