Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Important
Resilient virtual machine create and delete for Virtual Machine Scale Sets is currently in preview. Previews are made available to you on the condition that you agree to the supplemental terms of use. Some aspects of this feature may change prior to general availability (GA).
Resilient create and delete for Virtual Machine Scale Sets helps reduce Virtual Machine (VM) create and delete errors by automatically retrying failed operations. Failed VMs can accumulate and result in unusable capacity, requiring manual effort to detect and clean up. These errors are rare, but the Resilient create and delete mechanism is built for customers who are deploying or deleting large volumes of Virtual Machine Scale Sets or VMs.
Prerequisites
Before utilizing Resilient create and delete, complete the feature registration and ensure your API policy is on at least version 2023-07-01
.
Feature Registration
Register for the ResilientScaleSetVMCreation and ReliableVMDeletion feature flags using the az feature register
command:
az feature register --namespace "Microsoft.Compute" --name "ResilientVMScaleSetVMCreation"
az feature register --namespace "Microsoft.Compute" --name "ReliableVMDeletion"
It takes a few moments for the feature to register. Verify the registration status by using the az feature show
command:
az feature show --namespace "Microsoft.Compute" --name "ResilientVMScaleSetVMCreation"
az feature show --namespace "Microsoft.Compute" --name "ReliableVMDeletion"
Resilient create
Resilient create runs on Virtual Machines Scale Sets during the initial create of the scale set or during a scale-out.
Resilient create initiates retries for OS Provisioning time-out and VM Start time-out errors. Time-outs are hit when a VM isn't provisioned after 20 minutes for Windows or 8 minutes for Linux.
Resilient create attempts the create operation for up to 30 total minutes. If unsuccessful, the VM remains in a failed state.
Resilient delete
Resilient delete initiates forced delete retries for any errors that occur during the delete process. For example, InternalExecutionError, TransientFailure, or InternalOperationError.
Resilient delete attempts the forced delete operation five times per VM with an exponential backoff. If unsuccessful, the VM remains in a failed state. For example, if you delete a scale set of five VMs and each VM enters a failed delete state, the scale set initiates one delete call on itself to delete those five VMs again. If four out of five virtual machines delete on the first retry, then the platform waits a period of 10 minutes before initiating the next delete call for the remaining VM.
To check the status of your VMs throughout the delete process, see Get status for Resilient create or delete.
Enable Resilient create and delete
You can enable Resilient create and delete on a new or existing Virtual Machine Scale Set.
Enable Resilient create and delete on a new scale set:
- In the Azure portal search bar, search for and select Virtual Machine Scale Sets.
- Select Create on the Virtual Machine Scale Sets page.
- Go through the steps of creating your scale set, by making selection in the Basics, Spot, Disks, Networking, and Management tabs.
- On the Health tab, go to the Recovery section.
- Select checkboxes Resilient VM create (Preview) and Resilient VM delete (Preview).
- Finish creating your Virtual Machine Scale Set.
Enable Resilient create and delete on an existing scale set:
- Navigate to your Virtual Machine Scale Set in the Azure portal.
- Under Capabilities select Health and repair.
- Under Recovery, enable Resilient VM create (Preview) and Resilient VM delete (Preview).
Get status
Get the status of Resilient create and delete for your scale set.
- Resilient create: Your VM status is Creating while Resilient create is in progress.
- Resilient Delete: While the delete attempt is in progress, the state of the resource is listed as Deleting. If a delete retry fails on a particular VM, then the VM falls back to the Failed or Running state. However, those states only indicate that a retry of a deletion failed – and Resilient delete might still perform more retries. Therefore, while Resilient delete is going on, you may see the VM alternate states between Deleting and Failed or Running.
REST API
To know the status of your VM during Resilient delete, retrieve the return value of the ResilientVMDeletionStatus
property through REST API. There are two different API endpoints available to get the ResilientVMDeletionStatus
.
The following endpoint supports Virtual Machine Scale Sets with Uniform orchestration and Flexible orchestration.
GET https://management.azure.com/subscriptions/{{subscriptionId}}/resourceGroups/{{ResourceGroupName}}/providers/Microsoft.Compute/virtualMachineScaleSets/{{ResourceName}}/VirtualMachines/{{VMName}}?$expand=resiliencyView&api-version=2024-07-01
The following endpoint supports Virtual Machine Scale Sets with Uniform orchestration only.
GET https://management.azure.com/subscriptions/{{subscriptionId}}/resourceGroups/{{ResourceGroupName}}/providers/Microsoft.Compute/virtualMachineScaleSets/{{VMSSName}}/virtualMachines?api-version=2024-07-01
The following return values of ResilientVMDeletionStatus
indicate the progress of Resilient delete.
ResilientVMDeletionStatus | State of delete |
---|---|
Enabled | The resilientVMDeletion policy is set on your scale set. |
Disabled | Your scale set either has the resilientVMDeletion policy enabled as false, has a resiliency policy but is missing a resilientVMDeletion policy, or doesn't have a resiliency policy. |
In Progress | The resilientVMDeletion policy is enabled and the VM is currently being deleted or is marked for deletion. |
Failed | The resilientVMDeletion policy is enabled and hit the max retry count. |
FAQ
What is the minimum API version to use this policy?
Use API version 2023-07-01
.
What do I do if my virtual machine is in a 'Failed' state for a long time?
Resilient delete performs a maximum of five retries on your VM. Therefore, your virtual machine might show up in a 'Failed' state, even when Resilient delete is operating on that VM. For more information, see Get status for Resilient create or delete.
Does Resilient create work when I attach a new virtual machine to my scale set?
No, Resilient create operates during a scale-out of a scale set or when you create a new scale set.
Is the provisioning of my virtual machine accelerated with Resilient create?
No, Resilient create improves the odds of provisioning the virtual machine, but doesn't improve the speed of the provisioning itself.
Next steps
Once your virtual machine is successfully created, learn how to configure automatic instance repairs on your Virtual Machine Scale Sets.