Manage an Azure Machine Learning compute instance

APPLIES TO: Azure CLI ml extension v2 (current) Python SDK azure-ai-ml v2 (current)

Learn how to manage a compute instance in your Azure Machine Learning workspace.

Use a compute instance as your fully configured and managed development environment in the cloud. For development and testing, you can also use the instance as a training compute target. A compute instance can run multiple jobs in parallel and has a job queue. As a development environment, a compute instance can't be shared with other users in your workspace.

In this article, you learn how to start, stop, restart, delete a compute instance. To learn how to create a compute instance, see Create an Azure Machine Learning compute instance.

Note

This article shows CLI v2 in the sections below. If you are still using CLI v1, see Create an Azure Machine Learning compute cluster CLI v1.

Prerequisites

Select the appropriate tab for the rest of the prerequisites based on your preferred method of managing your compute instance.

  • If you're not running your code on a compute instance, install the Azure Machine Learning Python SDK. This SDK is already installed for you on a compute instance.

  • Attach to the workspace in your Python script:

    Run this code to connect to your Azure Machine Learning workspace.

    Replace your Subscription ID, Resource Group name, and Workspace name in the following code. To find these values:

    1. Sign in to Azure Machine Learning studio.
    2. Open the workspace you wish to use.
    3. Select your workspace name in the upper right Azure Machine Learning studio toolbar.
    4. Copy the value for workspace, resource group, and subscription ID into the code.

    APPLIES TO: Python SDK azure-ai-ml v2 (current)

    # Enter details of your AML workspace
    subscription_id = "<SUBSCRIPTION_ID>"
    resource_group = "<RESOURCE_GROUP>"
    workspace = "<AML_WORKSPACE_NAME>"
    # get a handle to the workspace
    from azure.ai.ml import MLClient
    from azure.identity import DefaultAzureCredential
    
    ml_client = MLClient(
        DefaultAzureCredential(), subscription_id, resource_group, workspace
    )

    ml_client is a handler to the workspace that you use to manage other resources and jobs.

Manage

Start, stop, restart, and delete a compute instance. A compute instance doesn't always automatically scale down, so make sure to stop the resource to prevent ongoing charges. Stopping a compute instance deallocates it. Then start it again when you need it. While stopping the compute instance stops the billing for compute hours, you'll still be billed for disk, public IP, and standard load balancer.

You can enable automatic shutdown to automatically stop the compute instance after a specified time.

You can also create a schedule for the compute instance to automatically start and stop based on a time and day of week.

Tip

The compute instance has 120GB OS disk. If you run out of disk space, use the terminal to clear at least 1-2 GB before you stop or restart the compute instance. Please do not stop the compute instance by issuing sudo shutdown from the terminal. The temp disk size on compute instance depends on the VM size chosen and is mounted on /mnt.

APPLIES TO: Python SDK azure-ai-ml v2 (current)

In these examples, the name of the compute instance is stored in the variable ci_basic_name.

  • Get status

    from azure.ai.ml.entities import ComputeInstance, AmlCompute
    
    # Get compute
    ci_basic_state = ml_client.compute.get(ci_basic_name)
  • Stop

    from azure.ai.ml.entities import ComputeInstance, AmlCompute
    
    # Stop compute
    ml_client.compute.begin_stop(ci_basic_name).wait()
  • Start

    from azure.ai.ml.entities import ComputeInstance, AmlCompute
    
    # Start compute
    ml_client.compute.begin_start(ci_basic_name).wait()
  • Restart

    from azure.ai.ml.entities import ComputeInstance, AmlCompute
    
    # Restart compute
    ml_client.compute.begin_restart(ci_basic_name).wait()
  • Delete

    from azure.ai.ml.entities import ComputeInstance, AmlCompute
    
    ml_client.compute.begin_delete(ci_basic_name).wait()

Azure RBAC allows you to control which users in the workspace can create, delete, start, stop, restart a compute instance. All users in the workspace contributor and owner role can create, delete, start, stop, and restart compute instances across the workspace. However, only the creator of a specific compute instance, or the user assigned if it was created on their behalf, is allowed to access Jupyter, JupyterLab, and RStudio on that compute instance. A compute instance is dedicated to a single user who has root access. That user has access to Jupyter/JupyterLab/RStudio running on the instance. Compute instance has single-user sign-in and all actions use that user's identity for Azure RBAC and attribution of experiment jobs. SSH access is controlled through public/private key mechanism.

These actions can be controlled by Azure RBAC:

  • Microsoft.MachineLearningServices/workspaces/computes/read
  • Microsoft.MachineLearningServices/workspaces/computes/write
  • Microsoft.MachineLearningServices/workspaces/computes/delete
  • Microsoft.MachineLearningServices/workspaces/computes/start/action
  • Microsoft.MachineLearningServices/workspaces/computes/stop/action
  • Microsoft.MachineLearningServices/workspaces/computes/restart/action
  • Microsoft.MachineLearningServices/workspaces/computes/updateSchedules/action

To create a compute instance, you need permissions for the following actions:

  • Microsoft.MachineLearningServices/workspaces/computes/write
  • Microsoft.MachineLearningServices/workspaces/checkComputeNameAvailability/action

Audit and observe compute instance version

Once a compute instance is deployed, it doesn't get automatically updated. Microsoft releases new VM images on a monthly basis. To understand options for keeping recent with the latest version, see vulnerability management.

To keep track of whether an instance's operating system version is current, you could query its version using the CLI, SDK, or Studio UI.

APPLIES TO: Python SDK azure-ai-ml v2 (current)

from azure.ai.ml.entities import ComputeInstance, AmlCompute

# Display operating system version
instance = ml_client.compute.get("myci")
print instance.os_image_metadata

For more information on the classes, methods, and parameters used in this example, see the following reference documents:

IT administrators can use Azure Policy to monitor the inventory of instances across workspaces in Azure Policy compliance portal. Assign the built-in policy Audit Azure Machine Learning Compute Instances with an outdated operating system on an Azure subscription or Azure management group scope.

Next steps