Should I use Dedicated SQL Pool or Serverless SQL Pool or Spark Notebooks for my use-case?

MADHUSUDAN PANWAR 101 Reputation points
2023-04-17T12:11:50.3133333+00:00

Hi, I have an Azure Synapse Analytics architect related question. I have gone through most of the documentations but still couldn't figure out what is best for me. I have a simple use case scenario. I am getting data in ADLS in parquet format and pushing it into Dedicated SQL Pool using Copy Data and Stored Procedures? and bit confused

  • Whether I should use ServerlessSQL Pool or Apache Spark Pool instead of Dedicated SQL Pool? Should I use Dataflows or Stored Procedures in case of Dedicated SQL Pool?
  1. Since me and my team have sole access to ADLS so should I create Lake Database and take advantage of Parquet format and do every analysis using serverless SQL Pool? In that case, I won't have to provision dedicated SQL Pool and would save on money and it would avoid duplication also. If not, then when should I use serverless sql in my scenario?
  2. Since the data is much cleaner and don't need much transformation other than few CASE statements, should I opt for Spark Pool? If No, then when should I use Spark Pool in my case?
  3. In case you suggest that the dedicated SQL Pools are best for my scenario, can you suggest whether I should do transformation using SP or DataFlow? Kind Regards, Madhusudan
Azure SQL Database
Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
5,378 questions
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
11,651 questions
{count} votes

Accepted answer
  1. PRADEEPCHEEKATLA 90,651 Reputation points Moderator
    2023-04-18T06:49:13.4766667+00:00


    MADHUSUDAN PANWAR
    - Thanks for the question and using MS Q&A platform.

    Choosing the right type of pool in Azure Synapse Analytics depends on several factors, including the size and complexity of your data, the type of analysis you need to perform, and your budget. Here are some general guidelines that can help you decide which pool to use for your use case:

    1. Dedicated SQL Pool: If you have large amounts of structured data and need to perform complex queries and transformations, Dedicated SQL Pool is a good choice. It provides high performance and scalability, and allows you to use T-SQL to perform data transformations. You can use Stored Procedures or Dataflows to perform transformations in Dedicated SQL Pool.
    2. Serverless SQL Pool: If you have smaller amounts of data and need to perform ad-hoc queries and exploratory analysis, Serverless SQL Pool is a good choice. It provides on-demand compute and storage, and allows you to use T-SQL to perform data transformations. You can use Serverless SQL Pool to analyze data in ADLS in Parquet format.
    3. Apache Spark Pool: If you have large amounts of unstructured or semi-structured data and need to perform complex transformations and machine learning, Apache Spark Pool is a good choice. It provides distributed computing and allows you to use Spark Notebooks to perform data transformations.

    Based on your use case scenario, it seems like Dedicated SQL Pool is a good choice for you. You can use Stored Procedures or Dataflows to perform transformations in Dedicated SQL Pool. If your data is already in Parquet format, you can take advantage of the performance benefits of Parquet by using Dedicated SQL Pool. If you have smaller amounts of data and need to perform ad-hoc queries and exploratory analysis, you can use Serverless SQL Pool. If you have large amounts of unstructured or semi-structured data and need to perform complex transformations and machine learning, you can use Apache Spark Pool.

    In terms of whether to use Stored Procedures or Dataflows in Dedicated SQL Pool, it depends on the complexity of your transformations. If you have simple transformations that can be expressed in T-SQL, Stored Procedures are a good choice. If you have more complex transformations that require data wrangling and data flow operations, Dataflows are a good choice.

    I hope this helps you decide which pool to use for your use case. If you have any further questions or concerns, please let me know.

    1 person found this answer helpful.
    0 comments No comments

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.