How do I get performance metrics for Azure SQL DB?

David Beavon 206 Reputation points
2020-06-09T15:01:31.873+00:00

Is Microsoft deliberately hiding useful metrics for monitoring the performance of SQL? In the metrics section of the Azure portal, I can only find very vague and unhelpful numbers like "% data IO" and "% log IO". These are way too abstract and are essentially meaningless for our purposes.

I'd like to see the throughput rates in bytes/sec. And the number of IOPS. And the latencies on those IOPS.

Any help would be greatly appreciated. I am using single databases (provisioned) with vCore licensing. I realize that there is resource governance going on that is limiting my performance in terms of memory and I/O. However, without being able to review the performance characteristics of the currently provisioned database, it is impossible to determine how much of an upgrade we need to get to our desired goals.

I have read a number of articles, but none of them are helpful in telling me how to get to the throughput, IOPS, or latencies.
Eg.
https://azure.microsoft.com/en-us/blog/resource-governance-in-azure-sql-database/
https://learn.microsoft.com/en-us/azure/azure-sql/database/monitor-tune-overview

Any help would be appreciated.

Azure SQL Database
No comments
{count} votes

Accepted answer
  1. David Beavon 206 Reputation points
    2020-06-18T21:44:18.643+00:00

    It appears that Microsoft has made it really hard to discover the underlying details related to SQL IOPS in Azure. I think they are assuming that most of their customers just want to know how close they are to reaching the "% of tier" limit. The problem is that if you are under the limit and your application still seems slow then the "% of tier" is essentially meaningless for any troubleshooting purposes. For that matter, even if you are at 100% you probably need to complement that information with the absolute number of IOPS you are achieving, along with the absolute throughput rate (ie. byte rate for reads/writes).

    Given that these hosted SQL databases have service tiers, the Azure limitations we run into are rarely due to hardware limitations but are due to the SQL resource governor. Because the governor is hyper-focused on "allowances" and "percentages-of-tier", that is probably the reason why our performance metrics are also being reported to us as percentages . The percentages are unhelpful, to say the least.

    Say, for example, you are attempting to compare your cloud performance to how things worked on-premises. There is virtually no way to make a comparison because "100%" means something totally different in each context (or rather, it means virtually nothing in either context).

    The good news is that I was able to open a case with Microsoft and they pointed me to some of the metrics I was looking for. The absolute IOPS and throughput rates are part of the resource governor's DMV. Here is the DMV that they pointed me to: https://learn.microsoft.com/en-us/sql/relational-databases/system-dynamic-management-views/sys-dm-resource-governor-resource-pools-history-ex-azure-sql-database?view=azuresqldb-current

    It is a bit obscure and I probably wouldn't have ever found it on my own. Thank you Microsoft support!

    I haven't tested with both of them, but I think this DMV will work properly with SQL database, and SQL elastic pool. We are currently deploying our own solution to SQL elastic pool. Note that SQL managed-instances have DMV's that are far easier to use and provide better information, so a managed-instance customer would probably not use the DMV that I'm referring to here.

    Below is a sample query, but I came up with this after playing for only an hour. You may want to tinker with it to suit your own needs. The fields within it are also a bit obscure. The fields are reported at intervals, by resource pool name. The pool names are a bit obsure as well.

    select   
    	snapshot_time ,   
    	name as resource_governor_resource_pool_name,  
    	  
    	duration_ms,  
    	   
    	cap_vcores_used_percent,  
    	avg_data_io_percent,    
      
      
    	avg_log_write_percent,  
      
    	delta_write_io_completed,  
    	delta_write_io_throttled,  
    	delta_write_bytes,  
    	delta_write_io_stall_ms,  
    	delta_write_io_stall_queued_ms,  
      
      
    	delta_read_io_completed,  
    	delta_read_io_throttled,  
    	delta_read_bytes,   
    	delta_read_io_stall_ms,  
    	delta_read_io_stall_queued_ms,  
      
      
      
    	delta_io_issue_delay_ms  
    	   
        from sys.dm_resource_governor_resource_pools_history_ex   
    	  
    	where name in ('SloSharedPool1', 'internal')  
    	order by snapshot_time desc  
      
    

    Hope this helps anyone else who needs actual IOP metrics rather than "% of tier". I'm excited to say that this Q&A question will probably be the best result in a google search for the DMV, ("dm_resource_governor_resource_pools_history_ex "). Let me know if I'm wrong!


2 additional answers

Sort by: Most helpful
  1. Ronen Ariely 13,781 Reputation points Microsoft MVP
    2020-06-09T15:35:07.297+00:00

    Good day,

    The following document provides all you need regarding monitoring the Azure SQL Database:

    https://learn.microsoft.com/en-us/azure/azure-sql/database/monitor-tune-overview


  2. David Beavon 206 Reputation points
    2020-06-15T14:03:38.123+00:00

    It appears that Azure SQL users who run "manage instance" may have the IO metrics. It looks like they can see the actual IOPS and throughput numbers for their databases (rather than the meaningless "percent of tier").

    See this DMV for example:
    https://learn.microsoft.com/en-us/sql/relational-databases/system-catalog-views/sys-server-resource-stats-azure-sql-database?view=azuresqldb-current

    • io_request
    • io_byte_read
    • io_bytes_written

    Is there some equivalent for the other Azure SQL databases? A percentage is essentially a ratio that is calculated from two other numbers. Surely Microsoft understands how silly it is to give us their "IO percentage" without its two underlying metrics. It almost seems like there is a deliberate effort to hide the details. Why would IOPS and throughput be provided for "managed instance" and not for Azure SQL Database? Is there a way to back into the underlying metrics? I think there is additional detail that is available for individual queries, but I'm more interested in being able to monitor the database workload as a whole.

    No comments