Azure Managed disk performance analysis for SQL Server DB Backup

Question

We are currently trying to optimize cost in our Azure environment and are reviewing dedicated backup disks on servers. We have a few servers running single Premium SSD P50 - 4TB as backup disks, but the actual backup files on these drives are often less than 1TB. I'm unsure of how to do the performance analysis to confirm the actual disk performance during backup. I need to use this supporting case to change the disk types (recommend Standard SSD). This is solely for backup drives, so just writes (or maybe not).

I have these questions:

I'm unsure which disk level metrics (or maybe VM level metrics) are used to measure throughput (likely Disk Write bytes/sec), IOPS, and latency for disks. Also, how can I interpret the metrics and make conclusions?
On the disks metrics window, I only see an average, not a maximum, so I'm not sure of any peaks (can only confirm when I zoom into one metric). I'm not sure if there is another way of doing this, so I thought to ask.

I need to gather correct information on TP, IOPS, and latency of the disks during the backup operation, which will inform our decision on which performance tier is best for the backup disks/drives.

Accepted Answer

Hello Abimbola Adeniran,

Greetings! Welcome to Microsoft Q&A Platfrom.

1.Azure offers metrics in the Azure portal that provide insight on how your virtual machines (VM) and disks perform. The metrics can also be retrieved through an API call. This article is broken into 3 subsections:

Disk IO, throughput, queue depth and latency metrics - These metrics allow you to see the storage performance from the perspective of a disk and a virtual machine.
Disk bursting metrics - These are the metrics provide observability into our bursting feature on our premium disks.
Storage IO utilization metrics - These metrics help diagnose bottlenecks in your storage performance with disks.

Here are the key metrics related to disk performance:

OS Disk Latency (Preview): The average time to complete IOs during monitoring for the OS disk (values in milliseconds).

OS Disk Queue Depth: The number of current outstanding IO requests waiting to be read from or written to the OS disk.

OS Disk Read Bytes/Sec: Bytes read per second from the OS disk (inclusive of cache if enabled).

OS Disk Read Operations/Sec: Input operations read per second from the OS disk (inclusive of cache if enabled).

OS Disk Write Bytes/Sec: Bytes written per second to the OS disk.

OS Disk Write Operations/Sec: Output operations written per second to the OS disk.

Data Disk Latency (Preview): Average time to complete IOs during monitoring for data disks.

Data Disk Queue Depth: Outstanding IO requests waiting to be read from or written to data disks.

Data Disk Read Bytes/Sec: Bytes read per second from data disks (inclusive of cache if enabled).

Data Disk Read Operations/Sec: Input operations read per second from data disks (inclusive of cache if enabled).

Data Disk Write Bytes/Sec: Bytes written per second to data disks.

Data Disk Write Operations/Sec: Output operations written per second to data disks.

Interpretation:

Latency: Lower latency values indicate better performance.

Throughput (Bytes/Sec): Higher throughput values mean faster data transfer.

IOPS: Higher IOPS values indicate better input/output performance.

IOPS (Input/Output Operations Per Second): IOPS measures the number of read or write operations a storage device can perform in a single second. Higher IOPS generally indicate better performance. Calculating IOPS involves dividing the combined read and write throughput by the block size. For example:

IOPS = (ReadThroughput + WriteThroughput) / BlockSize

Throughput: Throughput represents the amount of data transferred per unit of time. It’s typically measured in megabytes per second (MB/s). The relationship between IOPS and throughput depends on the block size. The formula is:

Throughput (MB/s) = Average IO size (in bytes) × IOPS

Latency: Latency is the average time taken to complete an I/O operation. Lower latency is desirable. It’s measured in milliseconds (ms). Key latency metrics include:

Disk sec/Transfer: Time for one read/write operation (disk latency). Ideally, it should be below 10 ms for high-load servers.

Data Disk Latency: Average time for I/Os during monitoring for data disks.

Queue Depth: Queue depth indicates the number of outstanding I/O requests waiting to be read from or written to a disk. High queue depth may impact performance.

Interpreting Metrics:

Average vs. Maximum: While Azure metrics show averages, you can use tools like Azure Monitor or Performance Co-Pilot (PCP) to capture maximum values over time.

Thresholds: Set performance thresholds based on your workload requirements. For example, if latency exceeds a certain value, investigate further.

Bursting Metrics: Premium disks support bursting. Monitor the bursting credit percentage to ensure optimal performance during peak loads. On-demand bursting options only apply to premium disks products at this time, where you can get unlimited burst performance on a pay as you go model.

2.To identify peaks, consider these approaches:

Azure Monitor: Use Azure Monitor to set up alerts based on thresholds (e.g., high latency or low throughput).

Custom Metrics: You can create custom metrics to track specific thresholds (e.g., peak latency) using Azure Monitor.

Azure virtual machines have input/output operations per second (IOPS) and throughput performance limits based on the virtual machine type and size. OS disks and data disks can be attached to virtual machines. The disks have their own IOPS and throughput limits.

Azure offers the ability to boost disk storage IOPS and MB/s performance, this is referred to as bursting for both virtual machines (VM) and disks. You can effectively use VM and disk bursting to achieve better bursting performance on both your VMs and disk.

refer - https://learn.microsoft.com/en-us/azure/virtual-machines/disks-performance, https://learn.microsoft.com/en-us/azure/virtual-machines/disk-bursting for more details.

Azure Disk Backup is a native, cloud-based backup solution that protects your data in managed disks. It's a simple, secure, and cost-effective solution that enables you to configure protection for managed disks. There are some following factors to consider when analyzing the disk performance that is in Backup,

Ensure you’re not hitting data ingress/egress limits.
Storage Throttling: Premium disks can throttle if IOPS/throughput exceed VM and disk SKUs.
VM Scalability Targets: Choose a VM SKU that meets your workload’s performance demands.
Cache Restriction: Optimize VMs using Premium storage by configuring disk caching.

refer this following blog for detailed guidance on performance and cost-effective steps - https://bluexp.netapp.com/blog/azure-disk-performance-how-to-analyze-and-monitor-issues.

If your backup files are consistently less than 1TB, consider switching to Standard SSD disks. They offer cost savings while still providing reasonable performance.

Hope this answer helps! Please let us know if you have any further queries. I’m happy to assist you further.

Please "Accept the answer” and “up-vote” wherever the information provided helps you, this can be beneficial to other community members.

Share via

Azure Managed disk performance analysis for SQL Server DB Backup

0 additional answers

Your answer