Use existing active session in synapse notebooks

eugenia apostolopoulou 76 Reputation points
2022-07-08T14:42:29.51+00:00

Hello,

I have created a pipeline which consists of 4 synapse notebooks. After its run (with debug mode), I checked the time consumption per each notebook and I realised that each one starts a new session in order to be executed. This is time consuming.
Is there any way to keep the session which the first notebook activates up and running and force the rest of the notebooks to use the aforementioned active session??

Thanks in advance!

Azure Monitor
Azure Monitor
An Azure service that is used to collect, analyze, and act on telemetry data from Azure and on-premises environments.
3,466 questions
Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
5,184 questions
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
11,236 questions
{count} votes

Accepted answer
  1. PRADEEPCHEEKATLA 90,541 Reputation points
    2022-07-11T06:37:25.247+00:00

    Hello @eugenia apostolopoulou ,

    Thanks for the question and using MS Q&A platform.

    Yes, this is an excepted behaviour with Azure Synapse Analytics.

    Reason: Azure Synapse provides purpose-built engines for specific use cases. Apache Spark for Synapse is designed as a job service and not a cluster model. There are two scenarios where people ask for a multi-user cluster model.

    Scenario #1: Many users accessing a cluster for serving data for BI purposes.

    The easiest way of accomplishing this task is to cook the data with Spark and then take advantage of the serving capabilities of Synapse SQL to that they can connect Power BI to those datasets.

    Scenario #2: Having multiple developers on a single cluster to save money.

    To satisfy this scenario, you should give each developer a serverless Spark pool that is set to use a small number of Spark resources. Since serverless Spark pools don’t cost anything, until they are actively used minimizes the cost when there are multiple developers. The pools share metadata (Spark tables) so they can easily work with each other.

    Spark instances are created when you connect to a Spark pool, create a session, and run a job. As multiple users may have access to a single Spark pool, a new Spark instance is created for each user that connects.

    For more details, refer to Azure Synapse Analytics frequently asked questions and Apache Spark in Azure Synapse Analytics Core Concepts - Examples.

    Hope this will help. Please let us know if any further queries.

    ------------------------------

    • Please don't forget to click on 130616-image.png or upvote 130671-image.png button whenever the information provided helps you. Original posters help the community find answers faster by identifying the correct answer. Here is how
    • Want a reminder to come back and check responses? Here is how to subscribe to a notification
    • If you are interested in joining the VM program and help shape the future of Q&A: Here is how you can be part of Q&A Volunteer Moderators
    0 comments No comments

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.