Create and manage an Azure Machine Learning compute instance

APPLIES TO: Azure CLI ml extension v2 (current) Python SDK azure-ai-ml v2 (preview)

Learn how to create and manage a compute instance in your Azure Machine Learning workspace.

Use a compute instance as your fully configured and managed development environment in the cloud. For development and testing, you can also use the instance as a training compute target or for an inference target. A compute instance can run multiple jobs in parallel and has a job queue. As a development environment, a compute instance can't be shared with other users in your workspace.

In this article, you learn how to:

  • Create a compute instance
  • Manage (start, stop, restart, delete) a compute instance
  • Create a schedule to automatically start and stop the compute instance (preview)

You can also use a setup script (preview) to create the compute instance with your own custom environment.

Compute instances can run jobs securely in a virtual network environment, without requiring enterprises to open up SSH ports. The job executes in a containerized environment and packages your model dependencies in a Docker container.

Note

This article shows CLI v2 in the sections below. If you are still using CLI v1, see Create an Azure Machine Learning compute cluster CLI v1).

Prerequisites

Create

Important

Items marked (preview) below are currently in public preview. The preview version is provided without a service level agreement, and it's not recommended for production workloads. Certain features might not be supported or might have constrained capabilities. For more information, see Supplemental Terms of Use for Microsoft Azure Previews.

Time estimate: Approximately 5 minutes.

Creating a compute instance is a one time process for your workspace. You can reuse the compute as a development workstation or as a compute target for training. You can have multiple compute instances attached to your workspace.

The dedicated cores per region per VM family quota and total regional quota, which applies to compute instance creation, is unified and shared with Azure Machine Learning training compute cluster quota. Stopping the compute instance doesn't release quota to ensure you'll be able to restart the compute instance. It isn't possible to change the virtual machine size of compute instance once it's created.

The following example demonstrates how to create a compute instance:

APPLIES TO: Python SDK azure-ai-ml v2 (preview)

# Compute Instances need to have a unique name across the region.
# Here we create a unique name with current datetime
from azure.ai.ml.entities import ComputeInstance, AmlCompute
import datetime

ci_basic_name = "basic-ci" + datetime.datetime.now().strftime("%Y%m%d%H%M")
ci_basic = ComputeInstance(name=ci_basic_name, size="STANDARD_DS3_v2")
ml_client.begin_create_or_update(ci_basic)

For more information on the classes, methods, and parameters used in this example, see the following reference documents:

Configure auto-stop (preview)

To avoid getting charged for a compute instance that is switched on but inactive, you can configure auto-stop.

A compute instance is considered inactive if the below conditions are met:

  • No active Jupyter Kernel sessions (which translates to no Notebooks usage via Jupyter, JupyterLab or Interactive notebooks)
  • No active Jupyter terminal sessions
  • No active AzureML runs or experiments
  • No SSH connections
  • No VS code connections; you must close your VS Code connection for your compute instance to be considered inactive. Sessions are auto-terminated if VS code detects no activity for 3 hours.

Activity on custom applications installed on the compute instance isn't considered. There are also some basic bounds around inactivity time periods; CI must be inactive for a minimum of 15 mins and a maximum of three days.

This setting can be configured during CI creation or for existing CIs via the following interfaces:

  • AzureML Studio

    Screenshot of the Advanced Settings page for creating a compute instance Screenshot of the compute instance details page showing how to update an existing compute instance with idle shutdown

  • REST API

    Endpoint:

    POST https://management.azure.com/subscriptions/{SUB_ID}/resourceGroups/{RG_NAME}/providers/Microsoft.MachineLearningServices/workspaces/{WS_NAME}/computes/{CI_NAME}/updateIdleShutdownSetting?api-version=2021-07-01
    

    Body:

    {
        "idleTimeBeforeShutdown": "PT30M" // this must be a string in ISO 8601 format
    }
    
  • CLIv2 (YAML): only configurable during new CI creation

    # Note that this is just a snippet for the idle shutdown property. Refer to the "Create" Azure CLI section for more information.
    idle_time_before_shutdown_minutes: 30
    
  • Python SDKv2: only configurable during new CI creation

    ComputeInstance(name=ci_basic_name, size="STANDARD_DS3_v2", idle_time_before_shutdown_minutes="30")
    
  • ARM Templates: only configurable during new CI creation

    // Note that this is just a snippet for the idle shutdown property in an ARM template
    {
        "idleTimeBeforeShutdown":"PT30M" // this must be a string in ISO 8601 format
    }
    

Azure policy support

Administrators can use a built-in Azure Policy definition to enforce auto-stop on all compute instances in a given subscription/resource-group.

  1. Navigate to Azure Policy in the Azure portal.

  2. Under "Definitions", look for the idle shutdown policy.

    Screenshot for the idle shutdown policy in Azure portal.

  3. Assign policy to the necessary scope.

You can also create your own custom Azure policy. For example, if the below policy is assigned, all new compute instances will have auto-stop configured with a 60-minute inactivity period.

{
  "mode": "All",
  "policyRule": {
    "if": {
      "allOf": [
        {
          "field": "type",
          "equals": "Microsoft.MachineLearningServices/workspaces/computes"
        },
        {
          "field": "Microsoft.MachineLearningServices/workspaces/computes/computeType",
          "equals": "ComputeInstance"
        },
        {
          "anyOf": [
            {
              "field": "Microsoft.MachineLearningServices/workspaces/computes/idleTimeBeforeShutdown",
              "exists": false
            },
            {
              "value": "[empty(field('Microsoft.MachineLearningServices/workspaces/computes/idleTimeBeforeShutdown'))]",
              "equals": true
            }
          ]
        }
      ]
    },
    "then": {
      "effect": "append",
      "details": [
        {
          "field": "Microsoft.MachineLearningServices/workspaces/computes/idleTimeBeforeShutdown",
          "value": "PT60M"
        }
      ]
    }
  },
  "parameters": {}
}

Create on behalf of (preview)

As an administrator, you can create a compute instance on behalf of a data scientist and assign the instance to them with:

The data scientist you create the compute instance for needs the following be Azure role-based access control (Azure RBAC) permissions:

  • Microsoft.MachineLearningServices/workspaces/computes/start/action
  • Microsoft.MachineLearningServices/workspaces/computes/stop/action
  • Microsoft.MachineLearningServices/workspaces/computes/restart/action
  • Microsoft.MachineLearningServices/workspaces/computes/applicationaccess/action
  • Microsoft.MachineLearningServices/workspaces/computes/updateSchedules/action

The data scientist can start, stop, and restart the compute instance. They can use the compute instance for:

  • Jupyter
  • JupyterLab
  • RStudio
  • Integrated notebooks

Schedule automatic start and stop (preview)

Define multiple schedules for auto-shutdown and auto-start. For instance, create a schedule to start at 9 AM and stop at 6 PM from Monday-Thursday, and a second schedule to start at 9 AM and stop at 4 PM for Friday. You can create a total of four schedules per compute instance.

Schedules can also be defined for create on behalf of compute instances. You can create a schedule that creates the compute instance in a stopped state. Stopped compute instances are particularly useful when you create a compute instance on behalf of another user.

Create a schedule in studio

  1. Fill out the form.

  2. On the second page of the form, open Show advanced settings.

  3. Select Add schedule to add a new schedule.

    Screenshot: Add schedule in advanced settings.

  4. Select Start compute instance or Stop compute instance.

  5. Select the Time zone.

  6. Select the Startup time or Shutdown time.

  7. Select the days when this schedule is active.

    Screenshot: schedule a compute instance to shut down.

  8. Select Add schedule again if you want to create another schedule.

Once the compute instance is created, you can view, edit, or add new schedules from the compute instance details section.

Note

Timezone labels don't account for day light savings. For instance, (UTC+01:00) Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna is actually UTC+02:00 during day light savings.

Create a schedule with CLI

APPLIES TO: Azure CLI ml extension v2 (current)

az ml compute create -f create-instance.yml

Where the file create-instance.yml is:

$schema: https://azuremlschemas.azureedge.net/latest/computeInstance.schema.json 
name: schedule-example-i
type: computeinstance
size: STANDARD_DS3_v2
schedules:
   compute_start_stop:
      - action: stop
        trigger:
         type: cron
         start_time: "2021-03-10T21:21:07"
         time_zone: Pacific Standard Time
         expression: 0 18 * * *

Create a schedule with a Resource Manager template

You can schedule the automatic start and stop of a compute instance by using a Resource Manager template.

In the Resource Manager template, add:

"schedules": "[parameters('schedules')]"

Then use either cron or LogicApps expressions to define the schedule that starts or stops the instance in your parameter file:

        "schedules": {
        "value": {
        "computeStartStop": [
          {
            "triggerType": "Cron",
            "cron": {              
              "timeZone": "UTC",
              "expression": "0 18 * * *"
            },
            "action": "Stop",
            "status": "Enabled"
          },
          {
            "triggerType": "Cron",
            "cron": {              
              "timeZone": "UTC",
              "expression": "0 8 * * *"
            },
            "action": "Start",
            "status": "Enabled"
          },
          { 
            "triggerType": "Recurrence", 
            "recurrence": { 
              "frequency": "Day", 
              "interval": 1, 
              "timeZone": "UTC", 
              "schedule": { 
                "hours": [17], 
                "minutes": [0]
              } 
            }, 
            "action": "Stop", 
            "status": "Enabled" 
          } 
        ]
      }
    }
  • Action can have value of “Start” or “Stop”.

  • For trigger type of Recurrence use the same syntax as logic app, with this recurrence schema.

  • For trigger type of cron, use standard cron syntax:

    // Crontab expression format: 
    // 
    // * * * * * 
    // - - - - - 
    // | | | | | 
    // | | | | +----- day of week (0 - 6) (Sunday=0) 
    // | | | +------- month (1 - 12) 
    // | | +--------- day of month (1 - 31) 
    // | +----------- hour (0 - 23) 
    // +------------- min (0 - 59) 
    // 
    // Star (*) in the value field above means all legal values as in 
    // braces for that column. The value column can have a * or a list 
    // of elements separated by commas. An element is either a number in 
    // the ranges shown above or two numbers in the range separated by a 
    // hyphen (meaning an inclusive range). 
    

Azure Policy support to default a schedule

Use Azure Policy to enforce a shutdown schedule exists for every compute instance in a subscription or default to a schedule if nothing exists. Following is a sample policy to default a shutdown schedule at 10 PM PST.

{
    "mode": "All",
    "policyRule": {
     "if": {
      "allOf": [
       {
        "field": "Microsoft.MachineLearningServices/workspaces/computes/computeType",
        "equals": "ComputeInstance"
       },
       {
        "field": "Microsoft.MachineLearningServices/workspaces/computes/schedules",
        "exists": "false"
       }
      ]
     },
     "then": {
      "effect": "append",
      "details": [
       {
        "field": "Microsoft.MachineLearningServices/workspaces/computes/schedules",
        "value": {
         "computeStartStop": [
          {
           "triggerType": "Cron",
           "cron": {
            "startTime": "2021-03-10T21:21:07",
            "timeZone": "Pacific Standard Time",
            "expression": "0 22 * * *"
           },
           "action": "Stop",
           "status": "Enabled"
          }
         ]
        }
       }
      ]
     }
    }
}    

Add custom applications such as RStudio (preview)

You can set up other applications, such as RStudio, when creating a compute instance. Follow these steps in studio to set up a custom application on your compute instance

  1. Fill out the form to create a new compute instance
  2. Select Next: Advanced Settings
  3. Select Add application under the Custom application setup (RStudio Workbench, etc.) section

Screenshot showing Custom Service Setup.

Setup RStudio Workbench

RStudio is one of the most popular IDEs among R developers for ML and data science projects. You can easily set up RStudio Workbench to run on your compute instance, using your own RStudio license, and access the rich feature set that RStudio Workbench offers.

  1. Follow the steps listed above to Add application when creating your compute instance.
  2. Select RStudio Workbench (bring your own license) in the Application dropdown and enter your RStudio Workbench license key in the License key field. You can get your RStudio Workbench license or trial license from RStudio.
  3. Select Create to add RStudio Workbench application to your compute instance.

Screenshot shows RStudio settings.

Important

If using a private link workspace, ensure that the docker image and ghcr.io are accessible. Also, use a published port in the range 8704-8993.

Note

  • Support for accessing your workspace file store from RStudio is not yet available.
  • When accessing multiple instances of RStudio, if you see a "400 Bad Request. Request Header Or Cookie Too Large" error, use a new browser or access from a browser in incognito mode.
  • Shiny applications are not currently supported on RStudio Workbench.

Setup RStudio open source

To use RStudio open source, set up a custom application as follows:

  1. Follow the steps listed above to Add application when creating your compute instance.

  2. Select Custom Application on the Application dropdown

  3. Configure the Application name you would like to use.

  4. Set up the application to run on Target port 8787 - the docker image for RStudio open source listed below needs to run on this Target port.

  5. Set up the application to be accessed on Published port 8787 - you can configure the application to be accessed on a different Published port if you wish.

  6. Point the Docker image to ghcr.io/azure/rocker-rstudio-ml-verse:latest.

  7. Use Bind mounts to add access to the files in your default storage account:

    • Specify /home/azureuser/cloudfiles for Host path.
    • Specify /home/azureuser/cloudfiles for the Container path.
    • Select Add to add this mounting. Because the files are mounted, changes you make to them will be available in other compute instances and applications.
  8. Select Create to set up RStudio as a custom application on your compute instance.

Screenshot shows form to set up RStudio as a custom application

Important

If using a private link workspace, ensure that the docker image and ghcr.io are accessible. Also, use a published port in the range 8704-8993.

Setup other custom applications

Set up other custom applications on your compute instance by providing the application on a Docker image.

  1. Follow the steps listed above to Add application when creating your compute instance.
  2. Select Custom Application on the Application dropdown.
  3. Configure the Application name, the Target port you wish to run the application on, the Published port you wish to access the application on and the Docker image that contains your application.
  4. Optionally, add Environment variables you wish to use for your application.
  5. Use Bind mounts to add access to the files in your default storage account:
    • Specify /home/azureuser/cloudfiles for Host path.
    • Specify /home/azureuser/cloudfiles for the Container path.
    • Select Add to add this mounting. Because the files are mounted, changes you make to them will be available in other compute instances and applications.
  6. Select Create to set up the custom application on your compute instance.

Screenshot show custom application settings.

Important

If using a private link workspace, ensure that the docker image and ghcr.io are accessible. Also, use a published port in the range 8704-8993.

Accessing custom applications in studio

Access the custom applications that you set up in studio:

  1. On the left, select Compute.
  2. On the Compute instance tab, see your applications under the Applications column.

Screenshot shows studio access for your custom applications.

Note

It might take a few minutes after setting up a custom application until you can access it via the links above. The amount of time taken will depend on the size of the image used for your custom application. If you see a 502 error message when trying to access the application, wait for some time for the application to be set up and try again.

Manage

Start, stop, restart, and delete a compute instance. A compute instance doesn't automatically scale down, so make sure to stop the resource to prevent ongoing charges. Stopping a compute instance deallocates it. Then start it again when you need it. While stopping the compute instance stops the billing for compute hours, you'll still be billed for disk, public IP, and standard load balancer.

You can create a schedule for the compute instance to automatically start and stop based on a time and day of week.

Tip

The compute instance has 120GB OS disk. If you run out of disk space, use the terminal to clear at least 1-2 GB before you stop or restart the compute instance. Please do not stop the compute instance by issuing sudo shutdown from the terminal. The temp disk size on compute instance depends on the VM size chosen and is mounted on /mnt.

APPLIES TO: Python SDK azure-ai-ml v2 (preview)

In the examples below, the name of the compute instance is stored in the variable ci_basic_name.

  • Get status

    from azure.ai.ml.entities import ComputeInstance, AmlCompute
    
    # Get compute
    ci_basic_state = ml_client.compute.get(ci_basic_name)
  • Stop

    from azure.ai.ml.entities import ComputeInstance, AmlCompute
    
    # Stop compute
    ml_client.compute.begin_stop(ci_basic_name)
  • Start

    from azure.ai.ml.entities import ComputeInstance, AmlCompute
    
    # Start compute
    ml_client.compute.begin_start(ci_basic_name)
  • Restart

    from azure.ai.ml.entities import ComputeInstance, AmlCompute
    
    # Restart compute
    ml_client.compute.begin_restart(ci_basic_name)
  • Delete

    from azure.ai.ml.entities import ComputeInstance, AmlCompute
    
    ml_client.compute.begin_delete(ci_basic_name)

Azure RBAC allows you to control which users in the workspace can create, delete, start, stop, restart a compute instance. All users in the workspace contributor and owner role can create, delete, start, stop, and restart compute instances across the workspace. However, only the creator of a specific compute instance, or the user assigned if it was created on their behalf, is allowed to access Jupyter, JupyterLab, and RStudio on that compute instance. A compute instance is dedicated to a single user who has root access. That user has access to Jupyter/JupyterLab/RStudio running on the instance. Compute instance will have single-user sign-in and all actions will use that user’s identity for Azure RBAC and attribution of experiment jobs. SSH access is controlled through public/private key mechanism.

These actions can be controlled by Azure RBAC:

  • Microsoft.MachineLearningServices/workspaces/computes/read
  • Microsoft.MachineLearningServices/workspaces/computes/write
  • Microsoft.MachineLearningServices/workspaces/computes/delete
  • Microsoft.MachineLearningServices/workspaces/computes/start/action
  • Microsoft.MachineLearningServices/workspaces/computes/stop/action
  • Microsoft.MachineLearningServices/workspaces/computes/restart/action
  • Microsoft.MachineLearningServices/workspaces/computes/updateSchedules/action

To create a compute instance, you'll need permissions for the following actions:

  • Microsoft.MachineLearningServices/workspaces/computes/write
  • Microsoft.MachineLearningServices/workspaces/checkComputeNameAvailability/action

Next steps