Collect baseline: Performance best practices for SQL Server on Azure VM

Applies to: SQL Server on Azure VM

This article provides information to collect a performance baseline as a series of best practices and guidelines to optimize performance for your SQL Server on Azure Virtual Machines (VMs).

There is typically a trade-off between optimizing for costs and optimizing for performance. This performance best practices series is focused on getting the best performance for SQL Server on Azure Virtual Machines. If your workload is less demanding, you might not require every recommended optimization. Consider your performance needs, costs, and workload patterns as you evaluate these recommendations.

Overview

For a prescriptive approach, gather performance counters using PerfMon/LogMan and capture SQL Server wait statistics to better understand general pressures and potential bottlenecks of the source environment.

Start by collecting the CPU, memory, IOPS, throughput, and latency of the source workload at peak times following the application performance checklist.

Gather data during peak hours such as workloads during your typical business day, but also other high load processes such as end-of-day processing, and weekend ETL workloads. Consider scaling up your resources for atypically heavily workloads, such as end-of-quarter processing, and then scale done once the workload completes.

Use the performance analysis to select the VM Size that can scale to your workload's performance requirements.

Storage

SQL Server performance depends heavily on the I/O subsystem and storage performance is measured by IOPS and throughput. Unless your database fits into physical memory, SQL Server constantly brings database pages in and out of the buffer pool. The data files for SQL Server should be treated differently. Access to log files is sequential except when a transaction needs to be rolled back where data files, including tempdb, are randomly accessed. If you have a slow I/O subsystem, your users may experience performance issues such as slow response times and tasks that do not complete due to time-outs.

The Azure Marketplace virtual machines have log files on a physical disk that is separate from the data files by default. The tempdb data files count and size meet best practices and are targeted to the ephemeral D:\ drive.

The following PerfMon counters can help validate the IO throughput required by your SQL Server:

  • \LogicalDisk\Disk Reads/Sec (read IOPS)
  • \LogicalDisk\Disk Writes/Sec (write IOPS)
  • \LogicalDisk\Disk Read Bytes/Sec (read throughput requirements for the data, log, and tempdb files)
  • \LogicalDisk\Disk Write Bytes/Sec (write throughput requirements for the data, log, and tempdb files)

Using IOPS and throughput requirements at peak levels, evaluate VM sizes that match the capacity from your measurements.

If your workload requires 20K read IOPS and 10K write IOPS, you can either choose E16s_v3 (with up to 32K cached and 25600 uncached IOPS) or M16_s (with up to 20K cached and 10K uncached IOPS) with 2 P30 disks striped using Storage Spaces.

Make sure to understand both throughput and IOPS requirements of the workload as VMs have different scale limits for IOPS and throughput.

Memory

Track both external memory used by the OS as well as the memory used internally by SQL Server. Identifying pressure for either component will help size virtual machines and identify opportunities for tuning.

The following PerfMon counters can help validate the memory health of a SQL Server virtual machine:

Compute

Compute in Azure is managed differently than on-premises. On-premises servers are built to last several years without an upgrade due to the management overhead and cost of acquiring new hardware. Virtualization mitigates some of these issues but applications are optimized to take the most advantage of the underlying hardware, meaning any significant change to resource consumption requires rebalancing the entire physical environment.

This is not a challenge in Azure where a new virtual machine on a different series of hardware, and even in a different region, is easy to achieve.

In Azure, you want to take advantage of as much of the virtual machines resources as possible, therefore, Azure virtual machines should be configured to keep the average CPU as high as possible without impacting the workload.

The following PerfMon counters can help validate the compute health of a SQL Server virtual machine:

  • \Processor Information(_Total)% Processor Time
  • \Process(sqlservr)% Processor Time

Note

Ideally, try to aim for using 80% of your compute, with peaks above 90% but not reaching 100% for any sustained period of time. Fundamentally, you only want to provision the compute the application needs and then plan to scale up or down as the business requires.

Next steps

To learn more, see the other articles in this best practices series:

For security best practices, see Security considerations for SQL Server on Azure Virtual Machines.

Review other SQL Server Virtual Machine articles at SQL Server on Azure Virtual Machines Overview. If you have questions about SQL Server virtual machines, see the Frequently Asked Questions.