Synapse sql pool service level and data distribution performance

Question

Synapse sql pool service level and data distribution performance

pmscorca 1,052

Hi,

does it exist a relation between the service level of Synapse dedicated SQL pool and the data distribution performance on the 60 storage nodes?

Could scaling up the service level improve such performance? Thanks

0 comments

Answer accepted by question author

0 additional answers

Your answer

Answer 1

Hi pmscorca,

Thanks for reaching out to Microsoft Q&A.

Yes. While scaling up the service level can positively impact performance, it's essential to understand your data distribution, query patterns, and business requirements to make informed decisions. Always monitor and optimize based on actual performance metrics to achieve the best results.

Distributed Tables:

In Synapse dedicated SQL pools, tables are distributed across 60 storage nodes. Each table appears as a single logical table, but its rows are actually stored across these 60 distributions.

Internally two common distribution methods are used:

Hash Distribution:

Rows are assigned to distributions based on a deterministic hash function. Identical values always hash to the same distribution. This approach minimizes data movement during queries, improving query performance for large fact tables.

Round-Robin Distribution:

Rows are evenly distributed across all distributions. Unlike hash-distributed tables, rows with equal values are not guaranteed to be assigned to the same distribution. However, round-robin distribution is useful for improving loading speed.

Hash-distributed tables work well for large fact tables in a star schema, while round-robin distribution is beneficial during data loading.

Service Level and Compute Nodes:

The service level determines the compute resources available for your dedicated SQL pool. The maximum service level is DW30000c, which includes 60 compute nodes, with one distribution per compute node. For example, a 600 TB data warehouse at DW30000c processes approximately 10 TB per compute node. Adding more compute nodes increases compute power, but it also decreases the number of distributions per compute node. Therefore, more compute nodes provide additional processing capacity for your queries.

Scaling Up Service Level:

Scaling up/increasing the service level can improve performance, esp., if your queries involve large fact tables. By allocating more compute resources, you enhance parallel processing capabilities, which can lead to faster query execution. However, keep in mind that the effectiveness of scaling up depends on factors such as data distribution, query complexity, and workload patterns. If your data distribution aligns well with the hash-distributed model, scaling up can yield significant benefits.

And as always, evaluate your specific workload and consider the trade-offs between cost and performance when deciding to scale up.

Please 'Upvote'(Thumbs-up) and 'Accept' as an answer if the reply was helpful. This will benefit other community members who face the same issue.

pmscorca 1,052 Reputation points

2024-03-17T08:55:27.8666667+00:00

Hi, thanks for your reply.

Scaling up the service level means adding more compute nodes and so having more compute power: it seems a good thing.

On the other hand, scaling up the service level means decrementing the number of distributions per compute node: it does't seem a real good thing, isn't it? Why?

Is there the right balance between the number of compute nodes and the number of distributions per compute node?
A little observation: from the DW1000c service level the number of compute nodes increases, passing from 1 to 2 ones.

Thanks
Vinodh247-1375 43,181 Reputation points Volunteer Moderator

2024-03-18T09:33:14.3+00:00

On the other hand, scaling up the service level means decrementing the number of distributions per compute node: it does't seem a real good thing, isn't it? Why?

As you increase the number of compute nodes, the number of distributions per compute node decreases. This means more computing power is available for your queries & when you have fewer distributions per node, each node can handle a larger portion of the data. As a result, queries benefit from higher parallelism and can be completed faster.

A little observation: from the DW1000c service level the number of compute nodes increases, passing from 1 to 2 ones.

Yes. It is why it is recommended for all your loads to be above 1000c if you want to get real meaning of parallel processing as you will only run on a single node until you cross 1000c.

Share via

Synapse sql pool service level and data distribution performance

0 additional answers

Your answer