Azure Data Warehouse:Understanding the DWU, workload management, resource and memory allocation
SQL Azure Data Warehouse is a MPP fully managed cloud based data warehouse. It also has a unique mechanism to control the concurrency level and the resource allocation. This workload management is very critical to understand and deliver the predictable performance.
All the allocation of the resources of your SQL Data Warehouse is measured in a unit called as DWU (Data Warehouse Unit), understanding the DWU and it resource allocation is highly critical to achieve the highest performance level at minimum cost.
The SQL data warehouse workload and resource management is based on the following:
- Concurrent queries
- Concurrency slots
- Resource class
- Memory allocation
Let's try to understand what are there building blocks and how they are important to maximize the performance. We all know that if we go for a higher level for DWU units we will get the highest performance. But we need to also understand how to differentiate a DWU100 with a DWU2000. Do I really need DWU100 or do I need DWU2000 or DWU6000?
If I try to compare the DWU Vs the Max No. of concurrent quires, this will look something like as shown in the graph below:
If you have noticed that DWU100 has max 4 concurrent queries, from DWU1000 till DWU6000 we are getting max 32 concurrent queries.
Now I will say, "Why I should consider DWU beyond 1000, when the max no. of concurrent queries remains the same as 32?"
Here comes the concept of concurrency slot, More no. of DWU more no. for Concurrency slots. A DWU6000 you will get upto 240 concurrent slots.
The next confusion "What are these slots and how they going to affect the performance of a concurrent query"? There comes another concept called resource class. A resource class will help you in defining the and controlling the memory allocation and CPU cycle allocated to a query. This can be controlled and assigned using the database roles. There are four resource classes smallrc, mediumrc, largerc, and xlargerc. So as the name suggest user with smallrc will be allocated smaller amount of memory but can have more concurrent slot.
The concurrency level may decrease when users are assigned to a higher resource class or when SQL Data Warehouse is configured with low DWU. Some queries, like DMV queries, are always allowed to run.
Lastly, we need to understand the concept of Distribution. To achieve the highest throughput and query performance a SQL Data warehouse stores the data into 60 different distribution. Each distribution is referred as a single database unit called as distribution.
Distribution setting are defined at table level in a SQL Data Warehouse selecting any of the two-method available: Round robin or Hash. By default, if you do not define a distribution, the table will be distributed using the round robin distribution method.
So how to find how much memory is allocated to the entire data warehouse?
Let's say I have Data warehouse with DWU1500 and the resource class allocated to it is xlargerc which is 3200 MB memory
Total memory allocated = 3200*60 distributions/1024 = 187.5 GB
Conclusion: This blog is to help in understanding the Data warehouse workload management and how the memory allocation taken care based on the resource class assigned to a user. In order to understand the SQL Data warehouse overview and architecture I strongly recommend to go through this What is Azure SQL Data Warehouse? Gaurav Agarwal