Azure SQL database elastic sharding vs Synapse MPP datawarehouse

Anonymous
2022-03-31T14:27:31.917+00:00

Hello, we are looking to horizontally scale our datawarehouse, currently we have a single Azure SQL database. We are looking at either 1) shard via elastic database tools or 2) Azure Synapse MPP datawarehouse. From a requirements perspective 1) concurrent ETL processes thus concurrent writes to datawarehouse 2) consumers of the data would use a single point/query delegator. 3) straightforward to further horizontally scale by adding more nodes. 4) performance

What are the pros and cons of both architectures? Similarities and differences? I've done searches but the two architectures are not compared. Also I'm assuming moving from Azure SQL database to sharding via elastic tools would be less involved than moving to Azure Synapse?

Thanks so much in advance.

Azure SQL Database
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. Alberto Morillo 34,671 Reputation points MVP Volunteer Moderator
    2022-03-31T20:39:39.487+00:00

    Sharding on Azure SQL is a type of horizontal partitioning that splits large databases into smaller components, which are faster and easier to manage. A shard is an individual partition that exists on separate database server instance to spread load. Auto sharding or data sharding is needed when a dataset is too big to be stored in a single database. As both the database size and number of transactions increase, so does the response time for querying the database. Costs associated with maintaining a huge database can also skyrocket due to the number and quality of computers you need to manage your workload. Data shards, on the other hand, have fewer hardware and software requirements and can be managed on less expensive servers. For this type of sharding you have the option to use Azure SQL Database, Azure SQL Database for MySQL and CosmoDB. Azure SQL offers Elastic Database Tools to help implement sharding.

    On the other hand, with Azure Synapse Dedicated SQL Pools the data is sharded into distributions to optimize the performance of the system. You can choose which sharding pattern to use to distribute the data when you define the table. These sharding patterns are supported: Hash, Round Robin, Replicate. Round Robin is the best distribution for ETL, it spreads the data randomly, it is good for temporary staging tables, and good when there is no obvious joining key with other tables, Use hash for evenly distributed columns, choose columns that won't be updated.

    Azure Synapse Dedicated SQL Pool is a petabyte-scale data warehouse, intended for relational table schemas and optimized for ETL

    In terms of cost and performance Azure Synapse should be better. Azure SQL databases cannot be paused but Azure Synapse can be paused when not in use, as data and compute power are separate on Synapse. If you use Hash distribution with Synapse you will see the best query performance.

    Azure SQL Databases sharding cannot offer you the the Massive Parallel Processing Engine that Synapse offers, where you have a control node, that manage multiple compute nodes all of them processing the data.

    One more thing is that Azure Synapse offers you integration with Power BI, has Azure Data Factory (ETL) integrated into it, integrates with Azure Machine Learning and Azure Purview (Data Governance).

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.