An Azure machine learning service for building and deploying models.
Hello Issac Chan,
Welcome to Microsoft Q&A .Thank you for reaching out and sharing the details.
The behavior being observed is understood, and there are indications of a broader service‑side condition affecting this specific configuration in the selected region. This is currently under review, and appropriate teams are actively working on it
As asked if there a known issue or capacity limitation for GPU instances (A100) in Canada Central - High‑end GPU configurations such as A100 are subject to regional capacity availability. In some cases, limited capacity or quota constraints within a region can impact provisioning behavior for online endpoints, including deployments remaining in transitional states for an extended period. Regional capacity constraints are a known consideration for GPU‑backed workloads and can vary by VM size and region.
While the review is ongoing, the following mitigations may help reduce impact and have progress:
- Please consider deploying the workload using an alternate GPU SKU or a smaller VM size, if flexibility allows.
- Then ,test the same deployment in a different nearby region to determine whether the behavior is region‑specific.
- Try reattempting the deployment after some time, as regional capacity conditions can change.
- Please confirm that sufficient GPU quota is available for the selected VM family and region.
Endpoints remaining in a “Deleting” state typically indicate that backend cleanup is still in progress. Manual force deletion is not recommended, as it can leave residual resources in an inconsistent state. The supported approach is as follows:
- Allow sufficient time for backend cleanup to complete naturally.
- If the deletion state persists beyond a reasonable duration, assistance from support teams is recommended to safely complete the cleanup.
Please check the following references for additional information:
- Manage resources and quotas - Azure Machine Learning | Microsoft Learn
- Service limits - Azure Machine Learning | Microsoft Learn
- Feature availability across cloud regions - Azure Machine Learning | Microsoft Learn
- Virtual machine sizes overview - Azure Virtual Machines | Microsoft Learn
- Troubleshoot online endpoint deployment - Azure Machine Learning | Microsoft Learn
- az ml online-endpoint | Microsoft Learn
Thank you
Please 'Upvote'(Thumbs-up) and 'Accept' as answer if the response was helpful. This will be benefitting other community members who face the same issue.