Azure Machine Learning Compute Instance Stuck in Starting State

Niket Kumar Singh 195 Reputation points
2024-05-14T06:15:21.99+00:00

I'm encountering an issue with an Azure Machine Learning compute instance that's stuck in the "Starting" state. After some time, it transitions to a "Stopped" state without successfully starting.

Error Message: An internal error has occurred while setting up the node.

Details:

  • Virtual Machine Size: Standard_D16s_v3 (16 cores, 64 GB RAM, 128 GB disk)
  • Region: Central India
    User's image

I would appreciate any insights or suggestions from the Microsoft forum community on how to resolve this issue and successfully start my Azure Machine Learning compute instance.

Azure Machine Learning
Azure Machine Learning
An Azure machine learning service for building and deploying models.
2,618 questions
Azure AI services
Azure AI services
A group of Azure services, SDKs, and APIs designed to make apps more intelligent, engaging, and discoverable.
2,452 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Gowtham CP 1,970 Reputation points
    2024-05-14T08:41:27.1333333+00:00

    Hello @Niket Kumar Singh

    Thanks for reaching out in the Microsoft Q&A!

    First, check for any known issues on the Azure Machine Learning known issues and review resource logs in the Azure portal to identify any error messages. Ensure that the chosen virtual machine size is available in your region and investigate resource usage for any signs of insufficient disk space. If the issue persists, retry starting the instance and consider redeploying it if necessary. Additionally, enable SSH access for deeper troubleshooting, and if all else fails, reach out to Azure Support for further assistance. By addressing these potential causes, you should be able to resolve the issue and successfully start your Azure Machine Learning compute instance.

    If you found this solution helpful, don't forget to upvote and accept as answer. Thank you!