- There is no API that exposes real‑time capacity
Azure doesn’t expose real‑time, per‑SKU capacity availability via any public API. The behavior described matches documented behavior:
-
SkuNotAvailable is raised when a VM size isn’t available in a region/zone at allocation time, even if the SKU is listed as available for that region and subscription.
- The Resource SKUs list only tells whether a SKU can be used in a region/subscription, not whether there is current capacity.
This is explicitly called out in the guidance for SkuNotAvailable and allocation failures: the recommendation is to choose another size or region, or retry later when capacity frees up.
- Recommended patterns and best practices
Because real‑time capacity isn’t exposed, the pattern has to be resilient to allocation failure rather than trying to avoid it completely.
2.1. Use On‑demand Capacity Reservations for critical SKUs
If the workload is important enough and uses supported VM series, use On‑demand Capacity Reservations:
- Reserve capacity in advance for a specific VM size/region (and optionally zone/fault domains).
- Once reserved, deployments against that reservation won’t hit transient capacity shortages (within the reserved quantity and constraints).
- Capacity reservations require quota and are limited to specific VM series and sizes.
This is the only way to get a strong guarantee that capacity will be available when the VM is created.
Relevant points from the documentation:
- Capacity reservations require quota just like VMs.
- Only certain VM series/sizes are supported; supported SKUs are advertised via the compute Resource SKUs list.
- There are limitations (no Spot, no Availability Sets, some constraints like PPG/UltraSSD not supported, max 3 fault domains, etc.).
For user‑facing flows where failure is very costly to UX, consider:
- Pre‑creating capacity reservations per region and VM family that the system offers.
- Associating the VM/VMSS explicitly with the reservation when deploying.
2.2. Implement robust fallback and retry logic
For scenarios where capacity reservations are not used or not supported, design the backend to expect allocation failures and react quickly:
- Fast retry with same SKU/region
- If
SkuNotAvailable or allocation failure occurs, perform a limited number of quick retries (e.g., 1–2) with short backoff. Sometimes capacity frees up quickly.
- Fallback SKUs in the same region
- Maintain a mapping of “equivalent” or “acceptable alternative” SKUs per region (e.g., B1s → B1ms → D2s_v3, etc., depending on your sizing logic).
- On
SkuNotAvailable, automatically attempt the next SKU in the list in the same region.
- Surface to the user that an alternative size was used.
- Fallback regions or zones
- For user scenarios where region flexibility is acceptable, maintain a prioritized list of regions (e.g., primary + backup region(s)).
- On repeated failures in the primary region, automatically attempt deployment in a backup region.
- For zonal deployments, consider:
- Removing the zone constraint (regional VM) if acceptable.
- Or trying another zone in the same region.
- Handle overconstrained requests
- Allocation failures can also be caused by combinations of constraints (size + zone + PPG + Ultra disk + accelerated networking, etc.).
- Implement logic to relax non‑essential constraints when
OverconstrainedAllocationRequest/OverconstrainedZonalAllocationRequest‑type failures occur:
- Try without PPG.
- Try without UltraSSD/PremiumSSDv2.
- Try without accelerated networking.
- Try as regional instead of zonal.
2.3. Make VM creation the first heavy operation
To avoid creating many supporting resources before discovering capacity issues:
- In ARM/Bicep or template‑driven flows, define all resources in a single deployment, with the VM as part of the template. If the VM fails allocation, the deployment fails and dependent resources are rolled back.
- If using imperative API calls:
- Create the VM first with minimal dependencies (e.g., a basic VNet/subnet that can be reused or is pre‑created).
- Only after VM creation succeeds, create additional, more expensive or specific resources.
- Or use a “staging” resource group for the VM; if VM creation fails, delete the whole group to clean up.
This reduces the amount of orphaned resources and cleanup work when capacity issues occur.
2.4. User‑experience‑oriented design
To improve UX in the described journey:
- Pre‑validation and messaging
- Explain that capacity is dynamic and that, in rare cases, the system may automatically choose a nearby region or alternative size.
- Non‑blocking deployment
- Instead of making the user wait synchronously for 2–3 minutes, accept the request, return a “deployment in progress” state, and update status asynchronously.
- If the first attempt fails, the backend can transparently retry with fallbacks before informing the user.
- Configurable policies
- Allow tenants to configure:
- Whether automatic region fallback is allowed.
- Which alternative SKUs are acceptable.
- Whether to require exact size/region or allow substitutions.
- Specific answers to the questions
- Is there any Azure API that provides real‑time VM capacity availability before deployment?
No public API exposes real‑time capacity per SKU/region/zone. Availability is only definitively checked at allocation time, which is when SkuNotAvailable and allocation failures are raised.
- Best practice to handle this in an automated system?
- Use On‑demand Capacity Reservations for critical, predictable workloads on supported SKUs.
- Otherwise, design for failure: implement retry, fallback SKUs, fallback regions/zones, and constraint‑relaxation logic, and minimize pre‑VM resource creation.
- Should the system implement fallback sizes/regions/retries/lightweight validation/Compute Fleet?
- Yes to:
- Fallback VM sizes.
- Retry logic with different regions or zones where acceptable.
- Template‑based or “VM‑first” deployments to avoid heavy pre‑work.
- There is no mention in the provided material of a “Compute Fleet” service for capacity handling; capacity reservations and standard allocation‑failure workarounds are the recommended mechanisms.
- Any new or preview APIs that address this limitation?
The provided material does not describe any new or preview API that exposes real‑time capacity. The documented guidance is to:
- Use On‑demand Capacity Reservations to guarantee capacity.
- Handle allocation failures with retries, alternative sizes, and/or regions.
- Additional concrete techniques from the documentation
- When
SkuNotAvailable occurs, the official guidance is to “try another size or deploy to a different location or zones.”
- For allocation failures in general, recommended workarounds include:
- Retry the allocation.
- Resize the VM to a different size.
- Change the region or zone.
- Adjust constraints (remove PPG, UltraSSD/PremiumSSDv2, accelerated networking, or zonal constraints) to reduce over‑constraint.
- To ensure capacity is always available for critical workloads, use On‑demand Capacity Reservations.
These patterns can be encoded directly into the automation logic to improve success rates and user experience.
References: