Checklist: Best practices for SQL Server on Azure VMs
Applies to: SQL Server on Azure VM
This article provides a quick checklist as a series of best practices and guidelines to optimize performance of your SQL Server on Azure Virtual Machines (VMs).
For comprehensive details, see the other articles in this series: VM size, Storage, Security, HADR configuration, Collect baseline.
Enable SQL Assessment for SQL Server on Azure VMs and your SQL Server will be evaluated against known best practices with results on the SQL VM management page of the Azure portal.
For videos about the latest features to optimize SQL Server VM performance and automate management, review the following Data Exposed videos:
- Caching and Storage Capping (Ep. 1)
- Automate Management with the SQL Server IaaS Agent extension (Ep. 2)
- Use Azure Monitor Metrics to Track VM Cache Health (Ep. 3)
- Get the best price-performance for your SQL Server workloads on Azure VM
- Using PerfInsights to Evaluate Resource Health and Troubleshoot (Ep. 5)
- Best Price-Performance with Ebdsv5 Series (Ep.6)
- Optimally Configure SQL Server on Azure Virtual Machines with SQL Assessment (Ep. 7)
- New and Improved SQL Server on Azure VM deployment and management experience (Ep.8)
While running SQL Server on Azure Virtual Machines, continue using the same database performance tuning options that are applicable to SQL Server in on-premises server environments. However, the performance of a relational database in a public cloud depends on many factors, such as the size of a virtual machine, and the configuration of the data disks.
There's typically a trade-off between optimizing for costs and optimizing for performance. This performance best practices series is focused on getting the best performance for SQL Server on Azure Virtual Machines. If your workload is less demanding, you might not require every recommended optimization. Consider your performance needs, costs, and workload patterns as you evaluate these recommendations.
The following is a quick checklist of VM size best practices for running your SQL Server on Azure VM:
- The new Ebdsv5-series provides the highest I/O throughput-to-vCore ratio in Azure along with a memory-to-vCore ratio of 8. This series offers the best price-performance for SQL Server workloads on Azure VMs. Consider this series first for most SQL Server workloads.
- Use VM sizes with 4 or more vCPUs like the E4ds_v5 or higher.
- Use memory optimized virtual machine sizes for the best performance of SQL Server workloads.
- The Edsv5 series, the M-, and the Mv2- series offer the optimal memory-to-vCore ratio required for OLTP workloads.
- The M series VMs offer the highest memory-to-vCore ratio in Azure. Consider these VMs for mission critical and data warehouse workloads.
- Leverage Azure Marketplace images to deploy your SQL Server Virtual Machines as the SQL Server settings and storage options are configured for optimal performance.
- Collect the target workload's performance characteristics and use them to determine the appropriate VM size for your business.
- Use the Data Migration Assistant and SKU recommendation tools to find the right VM size for your existing SQL Server workload.
- Use Azure Data Studio to migrate to Azure.
To learn more, see the comprehensive VM size best practices.
The following is a quick checklist of storage configuration best practices for running your SQL Server on Azure VM:
- Monitor the application and determine storage bandwidth and latency requirements for SQL Server data, log, and
tempdbfiles before choosing the disk type.
- To optimize storage performance, plan for highest uncached IOPS available and use data caching as a performance feature for data reads while avoiding virtual machine and disks capping/throttling.
- Place data, log, and
tempdbfiles on separate drives.
- For the data drive, use premium P30 and P40 or smaller disks to ensure the availability of cache support
- For the log drive plan for capacity and test performance versus cost while evaluating the premium P30 - P80 disks.
- If submillisecond storage latency is required, use Azure ultra disks for the transaction log.
- For M-series virtual machine deployments consider Write Accelerator over using Azure ultra disks.
- Place tempdb on the local ephemeral SSD (default
D:\) drive for most SQL Server workloads that aren't part of Failover Cluster Instance (FCI) after choosing the optimal VM size.
- If the capacity of the local drive isn't enough for
tempdb, consider sizing up the VM. See Data file caching policies for more information.
- For FCI place
tempdbon the shared storage.
- If the FCI workload is heavily dependent on
tempdbdisk performance, then as an advanced configuration place
tempdbon the local ephemeral SSD (default
D:\) drive, which isn't part of FCI storage. This configuration will need custom monitoring and action to ensure the local ephemeral SSD (default
D:\) drive is available all the time as any failures of this drive won't trigger action from FCI.
- Stripe multiple Azure data disks using Storage Spaces to increase I/O bandwidth up to the target virtual machine's IOPS and throughput limits.
- Set host caching to read-only for data file disks.
- Set host caching to none for log file disks.
- Don't enable read/write caching on disks that contain SQL Server data or log files.
- Always stop the SQL Server service before changing the cache settings of your disk.
- For development and test workloads consider using standard storage. It isn't recommended to use Standard HDD/SDD for production workloads.
- Credit-based Disk Bursting (P1-P20) should only be considered for smaller dev/test workloads and departmental systems.
- Provision the storage account in the same region as the SQL Server VM.
- Disable Azure geo-redundant storage (geo-replication) and use LRS (local redundant storage) on the storage account.
- Format your data disk to use 64-KB allocation unit size for all data files placed on a drive other than the temporary
D:\drive (which has a default of 4 KB). SQL Server VMs deployed through Azure Marketplace come with data disks formatted with allocation unit size and interleave for the storage pool set to 64 KB.
- Enable the SQL Best Practices Assessment to identify possible performance issues and evaluate that your SQL Server VM is configured to follow best practices.
- Review and monitor disk and VM limits using Storage IO utilization metrics.
To learn more, see the comprehensive Storage best practices.
SQL Server features
The following is a quick checklist of best practices for SQL Server configuration settings when running your SQL Server instances in an Azure virtual machine in production:
- Enable database page compression where appropriate.
- Enable backup compression.
- Enable instant file initialization for data files.
- Limit autogrowth of the database.
- Disable autoshrink of the database.
- Disable autoclose of the database.
- Move all databases to data disks, including system databases.
- Move SQL Server error log and trace file directories to data disks.
- Configure default backup and database file locations.
- Set max SQL Server memory limit to leave enough memory for the Operating System. (Leverage Memory\Available Bytes to monitor the operating system memory health).
- Enable lock pages in memory.
- Enable optimize for adhoc workloads for OLTP heavy environments.
- Evaluate and apply the latest cumulative updates for the installed versions of SQL Server.
- Enable Query Store on all production SQL Server databases following best practices.
- Enable automatic tuning on mission critical application databases.
- Ensure that all tempdb best practices are followed.
- Use the recommended number of files, using multiple
tempdbdata files starting with one file per core, up to eight files.
- Schedule SQL Server Agent jobs to run DBCC CHECKDB, index reorganize, index rebuild, and update statistics jobs.
- Monitor and manage the health and size of the SQL Server transaction log file.
- Take advantage of any new SQL Server features available for the version being used.
- Be aware of the differences in supported features between the editions you're considering deploying.
- Exclude SQL Server files from antivirus software scanning. This includes data files, log files, and backup files.
The following is a quick checklist of best practices for Azure-specific guidance when running your SQL Server on Azure VM:
- Register with the SQL IaaS Agent Extension to unlock a number of feature benefits.
- Leverage the best backup and restore strategy for your SQL Server workload.
- Ensure Accelerated Networking is enabled on the virtual machine.
- Leverage Microsoft Defender for Cloud to improve the overall security posture of your virtual machine deployment.
- Leverage Microsoft Defender for Cloud, integrated with Microsoft Defender for Cloud, for specific SQL Server VM coverage including vulnerability assessments, and just-in-time access, which reduces the attack service while allowing legitimate users to access virtual machines when necessary. To learn more, see vulnerability assessments, enable vulnerability assessments for SQL Server VMs and just-in-time access.
- Leverage Azure Advisor to address performance, cost, reliability, operational excellence, and security recommendations.
- Leverage Azure Monitor to collect, analyze, and act on telemetry data from your SQL Server environment. This includes identifying infrastructure issues with VM insights and monitoring data with Log Analytics for deeper diagnostics.
- Enable Autoshutdown for development and test environments.
- Implement a high availability and disaster recovery (HADR) solution that meets your business continuity SLAs, see the HADR options options available for SQL Server on Azure VMs.
- Use the Azure portal (support + troubleshooting) to evaluate resource health and history; submit new support requests when needed.
High availability and disaster recovery (HADR) features, such as the Always On availability group and the failover cluster instance rely on underlying Windows Server Failover Cluster technology. Review the best practices for modifying your HADR settings to better support the cloud environment.
For your Windows cluster, consider these best practices:
- Deploy your SQL Server VMs to multiple subnets whenever possible to avoid the dependency on an Azure Load Balancer or a distributed network name (DNN) to route traffic to your HADR solution. Review the various ways to deploy [Availability Groups]/azure/azure-sql/virtual-machines/windows/availability-group-overview#deployment-options) on Azure.
- Change the cluster to less aggressive parameters to avoid unexpected outages from transient network failures or Azure platform maintenance. To learn more, see heartbeat and threshold settings. For Windows Server 2012 and later, use the following recommended values:
- SameSubnetDelay: 1 second
- SameSubnetThreshold: 40 heartbeats
- CrossSubnetDelay: 1 second
- CrossSubnetThreshold: 40 heartbeats
- Place your VMs in an availability set or different availability zones. To learn more, see VM availability settings.
- Use a single NIC per cluster node and a single subnet.
- Configure cluster quorum voting to use 3 or more odd number of votes. Don't assign votes to DR regions.
- Carefully monitor resource limits to avoid unexpected restarts or failovers due to resource constraints.
- Ensure your OS, drivers, and SQL Server are at the latest builds.
- Optimize performance for SQL Server on Azure VMs. Review the other sections in this article to learn more.
- Reduce or spread out workload to avoid resource limits.
- Move to a VM or disk that his higher limits to avoid constraints.
For your SQL Server availability group or failover cluster instance, consider these best practices:
- If you're experiencing frequent unexpected failures, follow the performance best practices outlined in the rest of this article.
- If optimizing SQL Server VM performance doesn't resolve your unexpected failovers, consider relaxing the monitoring for the availability group or failover cluster instance. However, doing so may not address the underlying source of the issue and could mask symptoms by reducing the likelihood of failure. You may still need to investigate and address the underlying root cause. For Windows Server 2012 or higher, use the following recommended values:
- Lease timeout: Use this equation to calculate the maximum lease time-out value:
Lease timeout < (2 * SameSubnetThreshold * SameSubnetDelay).
- Start with 40 seconds. If you're using the relaxed
SameSubnetDelayvalues recommended previously, don't exceed 80 seconds for the lease timeout value.
- Max failures in a specified period: You can set this value to 6.
- Healthcheck timeout: You can set this value to 60000 initially, adjust as necessary.
- Lease timeout: Use this equation to calculate the maximum lease time-out value:
- When using the virtual network name (VNN) and Azure Load Balancer to connect to your HADR solution, specify
MultiSubnetFailover = truein the connection string, even if your cluster only spans one subnet.
- If the client doesn't support
MultiSubnetFailover = Trueyou may need to set
RegisterAllProvidersIP = 0and
HostRecordTTL = 300to cache client credentials for shorter durations. However, doing so may cause additional queries to the DNS server.
- If the client doesn't support
- To connect to your HADR solution using the distributed network name (DNN), consider the following:
- You must use a client driver that supports
MultiSubnetFailover = True, and this parameter must be in the connection string.
- Use a unique DNN port in the connection string when connecting to the DNN listener for an availability group.
- You must use a client driver that supports
- Use a database mirroring connection string for a basic availability group to bypass the need for a load balancer or DNN.
- Validate the sector size of your VHDs before deploying your high availability solution to avoid having misaligned I/Os. See KB3009974 to learn more.
- If the SQL Server database engine, Always On availability group listener, or failover cluster instance health probe are configured to use a port between 49,152 and 65,536 (the default dynamic port range for TCP/IP), add an exclusion for each port. Doing so will prevent other systems from being dynamically assigned the same port. The following example creates an exclusion for port 59999:
netsh int ipv4 add excludedportrange tcp startport=59999 numberofports=1 store=persistent
To learn more, see the comprehensive HADR best practices.
The checklist in this section covers the security best practices for SQL Server on Azure VMs.
SQL Server features and capabilities provide a method of security at the data level and is how you achieve defense-in-depth at the infrastructure level for cloud-based and hybrid solutions. In addition, with Azure security measures, it is possible to encrypt your sensitive data, protect virtual machines from viruses and malware, secure network traffic, identify and detect threats, meet compliance requirements, and provides a single method for administration and reporting for any security need in the hybrid cloud.
- Use Azure Security Center to evaluate and take action to improve the security posture of your data environment. Capabilities such as Azure Advanced Threat Protection (ATP) can be leveraged across your hybrid workloads to improve security evaluation and give the ability to react to risks. Registering your SQL Server VM with the SQL IaaS Agent extension surfaces Azure Security Center assessments within the SQL virtual machine resource of the Azure portal.
- Leverage Microsoft Defender for SQL to discover and mitigate potential database vulnerabilities, as well as detect anomalous activities that could indicate a threat to your SQL Server instance and database layer.
- Vulnerability Assessment is a part of Microsoft Defender for SQL that can discover and help remediate potential risks to your SQL Server environment. It provides visibility into your security state, and includes actionable steps to resolve security issues.
- Use Azure confidential VMs to reinforce protection of your data in-use, and data-at-rest against host operator access. Azure confidential VMs allow you to confidently store your sensitive data in the cloud and meet strict compliance requirements.
- If you're on SQL Server 2022, consider using Azure Active Directory authentication to connect to your instance of SQL Server.
- Azure Advisor analyzes your resource configuration and usage telemetry and then recommends solutions that can help you improve the cost effectiveness, performance, high availability, and security of your Azure resources.. Leverage Azure Advisor at the virtual machine, resource group, or subscription level to help identify and apply best practices to optimize your Azure deployments.
- Use Azure Disk Encryption when your compliance and security needs require you to encrypt the data end-to-end using your encryption keys, including encryption of the ephemeral (locally attached temporary) disk.
- Managed Disks are encrypted at rest by default using Azure Storage Service Encryption, where the encryption keys are Microsoft-managed keys stored in Azure.
- For a comparison of the managed disk encryption options review the managed disk encryption comparison chart
- Management ports should be closed on your virtual machines - Open remote management ports expose your VM to a high level of risk from internet-based attacks. These attacks attempt to brute force credentials to gain admin access to the machine.
- Turn on Just-in-time (JIT) access for Azure virtual machines
- Use Azure Bastion over Remote Desktop Protocol (RDP).
- Lock down ports and only allow the necessary application traffic using Azure Firewall which is a managed Firewall as a Service (FaaS) that grants/ denies server access based on the originating IP address.
- Use Network Security Groups (NSGs) to filter network traffic to, and from, Azure resources on Azure Virtual Networks
- Leverage Application Security Groups to group servers together with similar port filtering requirements, with similar functions, such as web servers and database servers.
- For web and application servers leverage Azure Distributed Denial of Service (DDoS) protection. DDoS attacks are designed to overwhelm and exhaust network resources, making apps slow or unresponsive. It is common for DDos attacks to target user interfaces. Azure DDoS protection sanitizes unwanted network traffic, before it impacts service availability
- Leverage VM extensions to help address anti-malware, desired state, threat detection, prevention, and remediation to address threats at the operating system, machine, and network levels:
- Guest Configuration extension performs audit and configuration operations inside virtual machines.
- Network Watcher Agent virtual machine extension for Windows and Linux monitors network performance, diagnostic, and analytics service that allows monitoring of Azure networks.
- Microsoft Antimalware Extension for Windows to help identify and remove viruses, spyware, and other malicious software, with configurable alerts.
- Evaluate third party extensions such as Symantec Endpoint Protection for Windows VM (/azure/virtual-machines/extensions/symantec)
- Leverage Azure Policy to create business rules that can be applied to your environment. Azure Policies evaluate Azure resources by comparing the properties of those resources against rules defined in JSON format.
- Azure Blueprints enables cloud architects and central information technology groups to define a repeatable set of Azure resources that implements and adheres to an organization's standards, patterns, and requirements. Azure Blueprints are different than Azure Policies.
To learn more, see the other articles in this best practices series:
Consider enabling SQL Assessment for SQL Server on Azure VMs.
Review other SQL Server Virtual Machine articles at SQL Server on Azure Virtual Machines Overview. If you have questions about SQL Server virtual machines, see the Frequently Asked Questions.
Submit and view feedback for