Azure Synapse : Use single session for multiple notebooks in pipeline

Bitchiko Tchelidze 6 Reputation points
2021-12-22T10:52:05.49+00:00

I have multiple notebooks in pipeline, problem is that, each notebook spins up its own Spark session which takes time. I want to define some kind of "Group" of notebooks so that all the notebooks in that group will REUSE the same session. Is that possible ? or every notebook should be run in a separate dedicated session ?

.NET
.NET
Microsoft Technologies based on the .NET software framework.
3,415 questions
Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
4,422 questions
{count} vote

1 answer

Sort by: Most helpful
  1. PRADEEPCHEEKATLA-MSFT 78,331 Reputation points Microsoft Employee
    2021-12-22T11:57:19.697+00:00

    Hello @BitchikoTchelidze-5514,

    Thanks for the question and using MS Q&A platform.

    Unfortunately, you cannot use single Apache Spark session for multiple notebooks/users.

    Reason: Azure Synapse provides purpose-built engines for specific use cases. Apache Spark for Synapse is designed as a job service and not a cluster model. There are two scenarios where people ask for a multi-user cluster model.

    Scenario #1: Many users accessing a cluster for serving data for BI purposes.

    The easiest way of accomplishing this task is to cook the data with Spark and then take advantage of the serving capabilities of Synapse SQL to that they can connect Power BI to those datasets.

    Scenario #2: Having multiple developers on a single cluster to save money.

    To satisfy this scenario, you should give each developer a serverless Spark pool that is set to use a small number of Spark resources. Since serverless Spark pools don’t cost anything, until they are actively used minimizes the cost when there are multiple developers. The pools share metadata (Spark tables) so they can easily work with each other.

    Spark instances are created when you connect to a Spark pool, create a session, and run a job. As multiple users may have access to a single Spark pool, a new Spark instance is created for each user that connects.

    For more details, refer to Azure Synapse Analytics frequently asked questions and Apache Spark in Azure Synapse Analytics Core Concepts - Examples.

    Hope this will help. Please let us know if any further queries.

    ------------------------------

    • Please don't forget to click on 130616-image.png or upvote 130671-image.png button whenever the information provided helps you. Original posters help the community find answers faster by identifying the correct answer. Here is how
    • Want a reminder to come back and check responses? Here is how to subscribe to a notification
    • If you are interested in joining the VM program and help shape the future of Q&A: Here is how you can be part of Q&A Volunteer Moderators