Hello Pat,
Thank you for posting your question in the Microsoft Q&A forum.
When an Azure service becomes unresponsive and fails to stop via the portal, the root cause often lies in a combination of software, hardware, or platform-level issues. In your case, the service had been functioning normally for months before suddenly freezing, suggesting an underlying instability rather than a misconfiguration. The high CPU usage prior to the failure points to a potential process deadlock or resource exhaustion, which may have prevented the operating system from gracefully handling shutdown commands. Additionally, if the Azure host node encountered hardware issues or hypervisor-level delays, stop requests could have been queued or ignored until the platform resolved the underlying problem.
Is CLI More Reliable Than the Portal? - Yes, sometimes. The portal uses the same APIs as CLI/PowerShell, However, if the issue was platform-side (e.g., Azure backend delays), CLI wouldn’t necessarily help. but CLI offers:
- Retry logic (e.g., az vm stop --no-wait --resource-group MyRG --name MyVM).
- Better logging (errors appear directly in the terminal).
- Avoids browser-related timeouts.
To diagnose such issues in the future, administrators should consult Azure Activity Logs to review failed stop operations, check Resource Health for platform-related incidents, and access the Serial Console (for VMs) to inspect OS-level freezes. Enabling diagnostic logging for CPU, memory, and disk metrics can also help identify pre-failure patterns. Proactive monitoring, such as alerts for sustained high CPU usage, could prevent similar incidents by allowing earlier intervention.
If the above answer helped, please do not forget to "Accept Answer" as this may help other community members to refer the info if facing a similar issue. Your contribution to the Microsoft Q&A community is highly appreciated.