Breyta

Deila með


Optimize performance on Lsv3, Lasv3, and Lsv2-series Windows VMs

Applies to: ✔️ Windows VMs ✔️ Uniform scale sets

Lsv3, Lasv3, and Lsv2-series Azure Virtual Machines (Azure VMs) support various workloads that need high I/O and throughput on local storage across a wide range of applications and industries. The L-series is ideal for Big Data, SQL, NoSQL databases, data warehousing and large transactional databases, including Cassandra, MongoDB, Cloudera, and Redis.

Lsv3, Lasv3, and Lsv2-series VMs are designed to work with the needs of Windows and Linux operating systems for better performance with hardware and the software.

Software and hardware tuning resulted in the optimized version of Windows Server 2019 Datacenter, released to the Azure Marketplace (and later versions), which support maximum performance on the NVMe devices in L-series VMs.

This article provides tips and suggestions to ensure your workloads and applications achieve the maximum performance designed into the VMs.

AMD EPYC™ chipset architecture

Lasv3 and Lsv2-series VMs use AMD EPYC™ server processors based on the Zen micro-architecture. AMD developed Infinity Fabric (IF) for EPYC™ as a scalable interconnect for its NUMA model that can be used for on-die, on-package, and multi-package communications. Compared with QPI (Quick-Path Interconnect) and UPI (Ultra-Path Interconnect), used on Intel modern monolithic-die processors, AMD's many-NUMA small-die architecture can bring both performance benefits and challenges. The actual effects of memory bandwidth and latency constraints can vary depending on the type of workloads.

Tips for maximizing performance

  • To gain max performance, run multiple jobs with deep queue depth per device.

  • Avoid mixing NVMe admin commands (for example, NVMe SMART info query) with NVMe I/O commands during active workloads. Lsv3, Lasv3, and Lsv2 NVMe devices are backed by Hyper-V NVMe Direct technology, which switches into "slow mode" whenever any NVMe admin commands are pending. Lsv3, Lasv3, and Lsv2 users might see a dramatic performance drop in NVMe I/O performance if that scenario happens.

  • It's not recommended for Lsv2 users to rely on device NUMA information (all 0) reported from within the VM for data drives to decide the NUMA affinity for their apps. For better performance, it's recommended to spread workloads across CPUs if possible.

  • The maximum supported queue depth per I/O queue pair for Lsv3, Lasv3, and Lsv2 VM NVMe device is 1024. Lsv3, Lasv3, and Lsv2 users are recommended to limit their (synthetic) benchmarking workloads to queue depth 1024 or lower to avoid triggering queue full conditions, which can reduce performance.

  • The best performance is obtained when I/O is done directly to each of the raw NVMe devices with no partitioning, no file systems, no RAID config, etc.

Utilizing local NVMe storage

Local storage on the 1.92 TB NVMe disk on all Lsv3, Lasv3, and Lsv2 VMs is ephemeral. During a successful standard reboot of the VM, the data on the local NVMe disk persists. The data doesn't persist on the NVMe if the VM is redeployed, deallocated, or deleted. Data doesn't persist if another issue causes the VM, or the hardware on which the VM is running, to become unhealthy. When this scenario happens, any data on the old host is securely erased.

There are also cases when the VM needs to be moved to a different host machine; for example, during a planned maintenance operation. Planned maintenance operations and some hardware failures can be anticipated with Scheduled Events. Use Scheduled Events to stay updated on any predicted maintenance and recovery operations.

In the case that a planned maintenance event requires the VM to be recreated on a new host with empty local disks, the data needs to be resynchronized (again, with any data on the old host being securely erased). This scenario occurs because Lsv3, Lasv3, and Lsv2-series VMs don't currently support live migration on the local NVMe disk.

There are two modes for planned maintenance: standard VM customer-controlled maintenance and automatic maintenance.

For any upcoming service events, use the controlled maintenance process to select a time most convenient to you for the update. Prior to the event, back up your data in premium storage. After the maintenance event completes, return your data to the refreshed Lsv2 VMs local NVMe storage.

Scenarios that maintain data on local NVMe disks include when:

  • The VM is running and healthy.
  • The VM is rebooted in place by you or by Azure.
  • The VM is paused (stopped without deallocation).
  • Most planned maintenance servicing operations.

Scenarios that securely erase data to protect the customer include when:

  • The VM is redeployed, stopped (deallocated), or deleted by you.
  • The VM becomes unhealthy and has to service heal to another node due to a hardware issue.
  • A few the planned maintenance servicing operations that require the VM to be reallocated to another host for servicing.

Standard VM customer-controlled maintenance

In standard VM customer-controlled maintenance, the VM is moved to an updated host during a 30-day window.

Lsv3, Lasv3, and Lsv2 local storage data might be lost, so backing-up data prior to the event is recommended.

Automatic maintenance

Automatic maintenance occurs if the customer doesn't execute customer-controlled maintenance. Automatic maintenance can also occur because of emergency procedures, such as a security zero-day event.

This type of maintenance is intended to preserve customer data, but there's a small risk of a VM freeze or reboot.

Lsv3, Lasv3, and Lsv2 local storage data might be lost, so backing-up data prior to the event is recommended.

Frequently asked questions

The following are frequently asked questions about these series.

How do I start deploying L-series VMs?

Much like any other VM, create a VM using the Azure portal, through the Azure Command-Line Interface (Azure CLI), or through PowerShell.

Does a single NVMe disk failure cause all VMs on the host to fail?

If a disk failure is detected on the hardware node, the hardware is in a failed state. When this problem occurs, all VMs on the node are automatically deallocated and moved to a healthy node. For Lsv3, Lasv3, and Lsv2-series VMs, this scenario means that the customer's data on the failing node is also securely erased. The customer needs to recreate the data on the new node.

Do I need to make polling adjustments in Windows Server 2012 or Windows Server 2016?

NVMe polling is only available on Windows Server 2019 and later versions on Azure.

Can I switch back to a traditional interrupt service routine (ISR) model?

Lasv3, and Lsv2-series VMs are optimized for NVMe polling. Updates are continuously provided to improve polling performance.

Can I adjust the polling settings in Windows Server 2019 or later versions?

The polling settings aren't user adjustable.

Next steps

See specifications for all VMs optimized for storage performance on Azure.