Create and manage an Azure Machine Learning compute instance
APPLIES TO:
Azure CLI ml extension v2 (current)
Python SDK azure-ai-ml v2 (current)
Learn how to create and manage a compute instance in your Azure Machine Learning workspace.
Use a compute instance as your fully configured and managed development environment in the cloud. For development and testing, you can also use the instance as a training compute target. A compute instance can run multiple jobs in parallel and has a job queue. As a development environment, a compute instance can't be shared with other users in your workspace.
In this article, you learn how to:
- Create a compute instance
- Manage (start, stop, restart, delete) a compute instance
- Create a schedule to automatically start and stop the compute instance
- Enable idle shutdown
You can also use a setup script to create the compute instance with your own custom environment.
Compute instances can run jobs securely in a virtual network environment, without requiring enterprises to open up SSH ports. The job executes in a containerized environment and packages your model dependencies in a Docker container.
Note
This article shows CLI v2 in the sections below. If you are still using CLI v1, see Create an Azure Machine Learning compute cluster CLI v1).
Prerequisites
An Azure Machine Learning workspace. For more information, see Create an Azure Machine Learning workspace. In the storage account, the "Allow storage account key access" option must be enabled for compute instance creation to be successful.
The Azure CLI extension for Machine Learning service (v2), Azure Machine Learning Python SDK (v2), or the Azure Machine Learning Visual Studio Code extension.
If using the Python SDK, set up your development environment with a workspace. Once your environment is set up, attach to the workspace in your Python script:
APPLIES TO:
Python SDK azure-ai-ml v2 (current)
Run this code to connect to your Azure ML workspace.
Replace your Subscription ID, Resource Group name and Workspace name in the code below. To find these values:
- Sign in to Azure Machine Learning studio.
- Open the workspace you wish to use.
- In the upper right Azure Machine Learning studio toolbar, select your workspace name.
- Copy the value for workspace, resource group and subscription ID into the code.
- If you're using a notebook inside studio, you'll need to copy one value, close the area and paste, then come back for the next one.
# Enter details of your AML workspace subscription_id = "<SUBSCRIPTION_ID>" resource_group = "<RESOURCE_GROUP>" workspace = "<AML_WORKSPACE_NAME>"
# get a handle to the workspace from azure.ai.ml import MLClient from azure.identity import DefaultAzureCredential ml_client = MLClient( DefaultAzureCredential(), subscription_id, resource_group, workspace )
ml_client
is a handler to the workspace that you'll use to manage other resources and jobs.
Create
Time estimate: Approximately 5 minutes.
Creating a compute instance is a one time process for your workspace. You can reuse the compute as a development workstation or as a compute target for training. You can have multiple compute instances attached to your workspace.
The dedicated cores per region per VM family quota and total regional quota, which applies to compute instance creation, is unified and shared with Azure Machine Learning training compute cluster quota. Stopping the compute instance doesn't release quota to ensure you'll be able to restart the compute instance. It isn't possible to change the virtual machine size of compute instance once it's created.
The fastest way to create a compute instance is to follow the Create resources you need to get started.
Or use the following examples to create a compute instance with more options:
APPLIES TO:
Python SDK azure-ai-ml v2 (current)
# Compute Instances need to have a unique name across the region.
# Here we create a unique name with current datetime
from azure.ai.ml.entities import ComputeInstance, AmlCompute
import datetime
ci_basic_name = "basic-ci" + datetime.datetime.now().strftime("%Y%m%d%H%M")
ci_basic = ComputeInstance(name=ci_basic_name, size="STANDARD_DS3_v2")
ml_client.begin_create_or_update(ci_basic).result()
For more information on the classes, methods, and parameters used in this example, see the following reference documents:
Create on behalf of
As an administrator, you can create a compute instance on behalf of a data scientist and assign the instance to them with:
Studio, using the Advanced settings
Azure Resource Manager template. For details on how to find the TenantID and ObjectID needed in this template, see Find identity object IDs for authentication configuration. You can also find these values in the Azure Active Directory portal.
REST API
The data scientist you create the compute instance for needs the following be Azure role-based access control (Azure RBAC) permissions:
- Microsoft.MachineLearningServices/workspaces/computes/start/action
- Microsoft.MachineLearningServices/workspaces/computes/stop/action
- Microsoft.MachineLearningServices/workspaces/computes/restart/action
- Microsoft.MachineLearningServices/workspaces/computes/applicationaccess/action
- Microsoft.MachineLearningServices/workspaces/computes/updateSchedules/action
The data scientist can start, stop, and restart the compute instance. They can use the compute instance for:
- Jupyter
- JupyterLab
- RStudio
- Posit Workbench (formerly RStudio Workbench)
- Integrated notebooks
Enable idle shutdown
To avoid getting charged for a compute instance that is switched on but inactive, you can configure when to shut down your compute instance due to inactivity.
A compute instance is considered inactive if the below conditions are met:
- No active Jupyter Kernel sessions (which translates to no Notebooks usage via Jupyter, JupyterLab or Interactive notebooks)
- No active Jupyter terminal sessions
- No active Azure Machine Learning runs or experiments
- No SSH connections
- No VS code connections; you must close your VS Code connection for your compute instance to be considered inactive. Sessions are auto-terminated if VS code detects no activity for 3 hours.
- No custom applications are running on the compute
A compute instance will not be considered idle if any custom application is running. There are also some basic bounds around inactivity time periods; compute instance must be inactive for a minimum of 15 mins and a maximum of three days.
Also, if a compute instance has already been idle for a certain amount of time, if idle shutdown settings are updated to an amount of time shorter than the current idle duration, the idle time clock will be reset to 0. For example, if the compute instance has already been idle for 20 minutes, and the shutdown settings are updated to 15 minutes, the idle time clock will be reset to 0.
The setting can be configured during compute instance creation or for existing compute instances via the following interfaces:
APPLIES TO:
Python SDK azure-ai-ml v2 (current)
When creating a new compute instance, add the idle_time_before_shutdown_minutes
parameter.
# Note that idle_time_before_shutdown has been deprecated.
ComputeInstance(name=ci_basic_name, size="STANDARD_DS3_v2", idle_time_before_shutdown_minutes="30")
You cannot change the idle time of an existing compute instance with the Python SDK.
You can also change the idle time using:
REST API
Endpoint:
POST https://management.azure.com/subscriptions/{SUB_ID}/resourceGroups/{RG_NAME}/providers/Microsoft.MachineLearningServices/workspaces/{WS_NAME}/computes/{CI_NAME}/updateIdleShutdownSetting?api-version=2021-07-01
Body:
{ "idleTimeBeforeShutdown": "PT30M" // this must be a string in ISO 8601 format }
ARM Templates: only configurable during new compute instance creation
// Note that this is just a snippet for the idle shutdown property in an ARM template { "idleTimeBeforeShutdown":"PT30M" // this must be a string in ISO 8601 format }
Azure policy support
Administrators can use a built-in Azure Policy definition to enforce auto-stop on all compute instances in a given subscription/resource-group.
Navigate to Azure Policy in the Azure portal.
Under "Definitions", look for the idle shutdown policy.
Assign policy to the necessary scope.
You can also create your own custom Azure policy. For example, if the below policy is assigned, all new compute instances will have auto-stop configured with a 60-minute inactivity period.
{
"mode": "All",
"policyRule": {
"if": {
"allOf": [
{
"field": "type",
"equals": "Microsoft.MachineLearningServices/workspaces/computes"
},
{
"field": "Microsoft.MachineLearningServices/workspaces/computes/computeType",
"equals": "ComputeInstance"
},
{
"anyOf": [
{
"field": "Microsoft.MachineLearningServices/workspaces/computes/idleTimeBeforeShutdown",
"exists": false
},
{
"value": "[empty(field('Microsoft.MachineLearningServices/workspaces/computes/idleTimeBeforeShutdown'))]",
"equals": true
}
]
}
]
},
"then": {
"effect": "append",
"details": [
{
"field": "Microsoft.MachineLearningServices/workspaces/computes/idleTimeBeforeShutdown",
"value": "PT60M"
}
]
}
},
"parameters": {}
}
Schedule automatic start and stop
Define multiple schedules for auto-shutdown and auto-start. For instance, create a schedule to start at 9 AM and stop at 6 PM from Monday-Thursday, and a second schedule to start at 9 AM and stop at 4 PM for Friday. You can create a total of four schedules per compute instance.
Schedules can also be defined for create on behalf of compute instances. You can create a schedule that creates the compute instance in a stopped state. Stopped compute instances are useful when you create a compute instance on behalf of another user.
Prior to a scheduled shutdown, users will see a notification alerting them that the Compute Instance is about to shut down. At that point, the user can choose to dismiss the upcoming shutdown event, if for example they are in the middle of using their Compute Instance.
Create a schedule in studio
On the second page of the form, open Show advanced settings.
Select Add schedule to add a new schedule.
Select Start compute instance or Stop compute instance.
Select the Time zone.
Select the Startup time or Shutdown time.
Select the days when this schedule is active.
Select Add schedule again if you want to create another schedule.
Once the compute instance is created, you can view, edit, or add new schedules from the compute instance details section.
Note
Timezone labels don't account for day light savings. For instance, (UTC+01:00) Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna is actually UTC+02:00 during day light savings.
Create a schedule with CLI
APPLIES TO:
Azure CLI ml extension v2 (current)
az ml compute create -f create-instance.yml
Where the file create-instance.yml is:
$schema: https://azuremlschemas.azureedge.net/latest/computeInstance.schema.json
name: schedule-example-i
type: computeinstance
size: STANDARD_DS3_v2
schedules:
compute_start_stop:
- action: stop
trigger:
type: cron
start_time: "2021-03-10T21:21:07"
time_zone: Pacific Standard Time
expression: 0 18 * * *
Create a schedule with SDK
APPLIES TO:
Python SDK azure-ai-ml v2 (current)
from azure.ai.ml.entities import ComputeInstance, ComputeSchedules, ComputeStartStopSchedule, RecurrenceTrigger, RecurrencePattern
from azure.ai.ml import MLClient
from azure.ai.ml.constants import TimeZone
from azure.identity import DefaultAzureCredential
subscription_id = "sub-id"
resource_group = "rg-name"
workspace = "ws-name"
# get a handle to the workspace
ml_client = MLClient(
DefaultAzureCredential(), subscription_id, resource_group, workspace
)
ci_minimal_name = "ci-name"
rec_trigger = RecurrenceTrigger(start_time="yyyy-mm-ddThh:mm:ss", time_zone=TimeZone.INDIA_STANDARD_TIME, frequency="week", interval=1, schedule=RecurrencePattern(week_days=["Friday"], hours=15, minutes=[30]))
myschedule = ComputeStartStopSchedule(trigger=rec_trigger, action="start")
com_sch = ComputeSchedules(compute_start_stop=[myschedule])
my_compute = ComputeInstance(name=ci_minimal_name, schedules=com_sch)
ml_client.compute.begin_create_or_update(my_compute)
Create a schedule with a Resource Manager template
You can schedule the automatic start and stop of a compute instance by using a Resource Manager template.
In the Resource Manager template, add:
"schedules": "[parameters('schedules')]"
Then use either cron or LogicApps expressions to define the schedule that starts or stops the instance in your parameter file:
"schedules": {
"value": {
"computeStartStop": [
{
"triggerType": "Cron",
"cron": {
"timeZone": "UTC",
"expression": "0 18 * * *"
},
"action": "Stop",
"status": "Enabled"
},
{
"triggerType": "Cron",
"cron": {
"timeZone": "UTC",
"expression": "0 8 * * *"
},
"action": "Start",
"status": "Enabled"
},
{
"triggerType": "Recurrence",
"recurrence": {
"frequency": "Day",
"interval": 1,
"timeZone": "UTC",
"schedule": {
"hours": [17],
"minutes": [0]
}
},
"action": "Stop",
"status": "Enabled"
}
]
}
}
Action can have value of "Start" or "Stop".
For trigger type of
Recurrence
use the same syntax as logic app, with this recurrence schema.For trigger type of
cron
, use standard cron syntax:// Crontab expression format: // // * * * * * // - - - - - // | | | | | // | | | | +----- day of week (0 - 6) (Sunday=0) // | | | +------- month (1 - 12) // | | +--------- day of month (1 - 31) // | +----------- hour (0 - 23) // +------------- min (0 - 59) // // Star (*) in the value field above means all legal values as in // braces for that column. The value column can have a * or a list // of elements separated by commas. An element is either a number in // the ranges shown above or two numbers in the range separated by a // hyphen (meaning an inclusive range).
Azure Policy support to default a schedule
Use Azure Policy to enforce a shutdown schedule exists for every compute instance in a subscription or default to a schedule if nothing exists. Following is a sample policy to default a shutdown schedule at 10 PM PST.
{
"mode": "All",
"policyRule": {
"if": {
"allOf": [
{
"field": "Microsoft.MachineLearningServices/workspaces/computes/computeType",
"equals": "ComputeInstance"
},
{
"field": "Microsoft.MachineLearningServices/workspaces/computes/schedules",
"exists": "false"
}
]
},
"then": {
"effect": "append",
"details": [
{
"field": "Microsoft.MachineLearningServices/workspaces/computes/schedules",
"value": {
"computeStartStop": [
{
"triggerType": "Cron",
"cron": {
"startTime": "2021-03-10T21:21:07",
"timeZone": "Pacific Standard Time",
"expression": "0 22 * * *"
},
"action": "Stop",
"status": "Enabled"
}
]
}
}
]
}
}
}
Assign managed identity
You can assign a system- or user-assigned managed identity to a compute instance, to authenticate against other Azure resources such as storage. Using managed identities for authentication helps improve workspace security and management. For example, you can allow users to access training data only when logged in to a compute instance. Or use a common user-assigned managed identity to permit access to a specific storage account.
You can create compute instance with managed identity from Azure Machine Learning Studio:
- Fill out the form to create a new compute instance.
- Select Next: Advanced Settings.
- Enable Assign a managed identity.
- Select System-assigned or User-assigned under Identity type.
- If you selected User-assigned, select subscription and name of the identity.
You can use SDK V2 to create a compute instance with assign system-assigned managed identity:
from azure.ai.ml import MLClient
from azure.identity import ManagedIdentityCredential
client_id = os.environ.get("DEFAULT_IDENTITY_CLIENT_ID", None)
credential = ManagedIdentityCredential(client_id=client_id)
ml_client = MLClient(credential, sub_id, rg_name, ws_name)
data = ml_client.data.get(name=data_name, version="1")
You can also use SDK V1:
from azureml.core.authentication import MsiAuthentication
from azureml.core import Workspace
client_id = os.environ.get("DEFAULT_IDENTITY_CLIENT_ID", None)
auth = MsiAuthentication(identity_config={"client_id": client_id})
workspace = Workspace.get("chrjia-eastus", auth=auth, subscription_id="381b38e9-9840-4719-a5a0-61d9585e1e91", resource_group="chrjia-rg", location="East US")
You can use V2 CLI to create a compute instance with assign system-assigned managed identity:
az ml compute create --name myinstance --identity-type SystemAssigned --type ComputeInstance --resource-group my-resource-group --workspace-name my-workspace
You can also use V2 CLI with yaml file, for example to create a compute instance with user-assigned managed identity:
Azure Machine Learning compute create --file compute.yaml --resource-group my-resource-group --workspace-name my-workspace
The identity definition is contained in compute.yaml file:
https://azuremlschemas.azureedge.net/latest/computeInstance.schema.json
name: myinstance
type: computeinstance
identity:
type: user_assigned
user_assigned_identities:
- resource_id: identity_resource_id
Once the managed identity is created, grant the managed identity at least Storage Blob Data Reader role on the storage account of the datastore, see Accessing storage services. Then, when you work on the compute instance, the managed identity is used automatically to authenticate against datastores.
Note
The name of the created system managed identity will be in the format /workspace-name/computes/compute-instance-name in your Azure Active Directory.
You can also use the managed identity manually to authenticate against other Azure resources. The following example shows how to use it to get an Azure Resource Manager access token:
import requests
def get_access_token_msi(resource):
client_id = os.environ.get("DEFAULT_IDENTITY_CLIENT_ID", None)
resp = requests.get(f"{os.environ['MSI_ENDPOINT']}?resource={resource}&clientid={client_id}&api-version=2017-09-01", headers={'Secret': os.environ["MSI_SECRET"]})
resp.raise_for_status()
return resp.json()["access_token"]
arm_access_token = get_access_token_msi("https://management.azure.com")
To use Azure CLI with the managed identity for authentication, specify the identity client ID as the username when logging in:
az login --identity --username $DEFAULT_IDENTITY_CLIENT_ID
Note
You cannot use azcopy
when trying to use managed identity. azcopy login --identity
will not work.
Add custom applications such as RStudio or Posit Workbench
You can set up other applications, such as RStudio, or Posit Workbench (formerly RStudio Workbench), when creating a compute instance. Follow these steps in studio to set up a custom application on your compute instance
- Fill out the form to create a new compute instance
- Select Next: Advanced Settings
- Select Add application under the Custom application setup (RStudio Workbench, etc.) section
Setup Posit Workbench (formerly RStudio Workbench)
RStudio is one of the most popular IDEs among R developers for ML and data science projects. You can easily set up Posit Workbench, which provides access to RStudio along with other development tools, to run on your compute instance, using your own Posit license, and access the rich feature set that Posit Workbench offers
- Follow the steps listed above to Add application when creating your compute instance.
- Select Posit Workbench (bring your own license) in the Application dropdown and enter your Posit Workbench license key in the License key field. You can get your Posit Workbench license or trial license from posit.
- Select Create to add Posit Workbench application to your compute instance.
Important
If using a private link workspace, ensure that the docker image, pkg-containers.githubusercontent.com and ghcr.io are accessible. Also, use a published port in the range 8704-8993. For Posit Workbench (formerly RStudio Workbench), ensure that the license is accessible by providing network access to https://www.wyday.com.
Note
- Support for accessing your workspace file store from Posit Workbench is not yet available.
- When accessing multiple instances of Posit Workbench, if you see a "400 Bad Request. Request Header Or Cookie Too Large" error, use a new browser or access from a browser in incognito mode.
Setup RStudio (open source)
To use RStudio, set up a custom application as follows:
Follow the steps listed above to Add application when creating your compute instance.
Select Custom Application on the Application dropdown
Configure the Application name you would like to use.
Set up the application to run on Target port
8787
- the docker image for RStudio open source listed below needs to run on this Target port.Set up the application to be accessed on Published port
8787
- you can configure the application to be accessed on a different Published port if you wish.Point the Docker image to
ghcr.io/azure/rocker-rstudio-ml-verse:latest
.Select Create to set up RStudio as a custom application on your compute instance.
Important
If using a private link workspace, ensure that the docker image, pkg-containers.githubusercontent.com and ghcr.io are accessible. Also, use a published port in the range 8704-8993. For Posit Workbench (formerly RStudio Workbench), ensure that the license is accessible by providing network access to https://www.wyday.com.
Setup other custom applications
Set up other custom applications on your compute instance by providing the application on a Docker image.
- Follow the steps listed above to Add application when creating your compute instance.
- Select Custom Application on the Application dropdown.
- Configure the Application name, the Target port you wish to run the application on, the Published port you wish to access the application on and the Docker image that contains your application.
- Optionally, add Environment variables you wish to use for your application.
- Use Bind mounts to add access to the files in your default storage account:
- Specify /home/azureuser/cloudfiles for Host path.
- Specify /home/azureuser/cloudfiles for the Container path.
- Select Add to add this mounting. Because the files are mounted, changes you make to them will be available in other compute instances and applications.
- Select Create to set up the custom application on your compute instance.
Important
If using a private link workspace, ensure that the docker image, pkg-containers.githubusercontent.com and ghcr.io are accessible. Also, use a published port in the range 8704-8993. For Posit Workbench (formerly RStudio Workbench), ensure that the license is accessible by providing network access to https://www.wyday.com.
Accessing custom applications in studio
Access the custom applications that you set up in studio:
- On the left, select Compute.
- On the Compute instance tab, see your applications under the Applications column.
Note
It might take a few minutes after setting up a custom application until you can access it via the links above. The amount of time taken will depend on the size of the image used for your custom application. If you see a 502 error message when trying to access the application, wait for some time for the application to be set up and try again.
Manage
Start, stop, restart, and delete a compute instance. A compute instance doesn't automatically scale down, so make sure to stop the resource to prevent ongoing charges. Stopping a compute instance deallocates it. Then start it again when you need it. While stopping the compute instance stops the billing for compute hours, you'll still be billed for disk, public IP, and standard load balancer.
You can create a schedule for the compute instance to automatically start and stop based on a time and day of week.
Tip
The compute instance has 120GB OS disk. If you run out of disk space, use the terminal to clear at least 1-2 GB before you stop or restart the compute instance. Please do not stop the compute instance by issuing sudo shutdown from the terminal. The temp disk size on compute instance depends on the VM size chosen and is mounted on /mnt.
APPLIES TO:
Python SDK azure-ai-ml v2 (current)
In the examples below, the name of the compute instance is stored in the variable ci_basic_name
.
Get status
from azure.ai.ml.entities import ComputeInstance, AmlCompute # Get compute ci_basic_state = ml_client.compute.get(ci_basic_name)
Stop
from azure.ai.ml.entities import ComputeInstance, AmlCompute # Stop compute ml_client.compute.begin_stop(ci_basic_name).wait()
Start
from azure.ai.ml.entities import ComputeInstance, AmlCompute # Start compute ml_client.compute.begin_start(ci_basic_name).wait()
Restart
from azure.ai.ml.entities import ComputeInstance, AmlCompute # Restart compute ml_client.compute.begin_restart(ci_basic_name).wait()
Delete
from azure.ai.ml.entities import ComputeInstance, AmlCompute ml_client.compute.begin_delete(ci_basic_name).wait()
Azure RBAC allows you to control which users in the workspace can create, delete, start, stop, restart a compute instance. All users in the workspace contributor and owner role can create, delete, start, stop, and restart compute instances across the workspace. However, only the creator of a specific compute instance, or the user assigned if it was created on their behalf, is allowed to access Jupyter, JupyterLab, and RStudio on that compute instance. A compute instance is dedicated to a single user who has root access. That user has access to Jupyter/JupyterLab/RStudio running on the instance. Compute instance will have single-user sign-in and all actions will use that user's identity for Azure RBAC and attribution of experiment jobs. SSH access is controlled through public/private key mechanism.
These actions can be controlled by Azure RBAC:
- Microsoft.MachineLearningServices/workspaces/computes/read
- Microsoft.MachineLearningServices/workspaces/computes/write
- Microsoft.MachineLearningServices/workspaces/computes/delete
- Microsoft.MachineLearningServices/workspaces/computes/start/action
- Microsoft.MachineLearningServices/workspaces/computes/stop/action
- Microsoft.MachineLearningServices/workspaces/computes/restart/action
- Microsoft.MachineLearningServices/workspaces/computes/updateSchedules/action
To create a compute instance, you'll need permissions for the following actions:
- Microsoft.MachineLearningServices/workspaces/computes/write
- Microsoft.MachineLearningServices/workspaces/checkComputeNameAvailability/action
Audit and observe compute instance version
Once a compute instance is deployed, it does not get automatically updated. Microsoft releases new VM images on a monthly basis. To understand options for keeping recent with the latest version, see vulnerability management.
To keep track of whether an instance's operating system version is current, you could query its version using the CLI, SDK or Studio UI.
APPLIES TO:
Python SDK azure-ai-ml v2 (current)
from azure.ai.ml.entities import ComputeInstance, AmlCompute
# Display operating system version
instance = ml_client.compute.get("myci")
print instance.os_image_metadata
For more information on the classes, methods, and parameters used in this example, see the following reference documents:
IT administrators can use Azure Policy to monitor the inventory of instances across workspaces in Azure Policy compliance portal. Assign the built-in policy Audit Azure Machine Learning Compute Instances with an outdated operating system on an Azure subscription or Azure management group scope.
Next steps
Feedback
Submit and view feedback for