What is an Azure Machine Learning workspace?

Raksts
08/28/2024

Workspaces are places to collaborate with colleagues to create machine learning artifacts and group related work. For example, experiments, jobs, datasets, models, components, and inference endpoints. This article describes workspaces, how to manage access to them, and how to use them to organize your work.

Ready to get started? Create a workspace.

Tasks performed within a workspace

For machine learning teams, the workspace is a place to organize their work. Here are some of the tasks you can start from a workspace:

Create jobs - Jobs are training runs you use to build your models. You can group jobs into experiments to compare metrics.
Author pipelines - Pipelines are reusable workflows for training and retraining your model.
Register data assets - Data assets aid in management of the data you use for model training and pipeline creation.
Register models - Once you have a model you want to deploy, you create a registered model.

Create online endpoints - Use a registered model and a scoring script to create an online endpoint.

Deploy a model - Use the registered model and a scoring script to deploy a model.

Besides grouping your machine learning results, workspaces also host resource configurations:

Compute targets are used to run your experiments.
Datastores define how you and others can connect to data sources when using data assets.
Security settings - Networking, identity and access control, and encryption settings.

Organizing workspaces

For machine learning team leads and administrators, workspaces serve as containers for access management, cost management, and data isolation. Here are some tips for organizing workspaces:

Use user roles for permission management in the workspace between users. For example a data scientist, a machine learning engineer or an admin.
Assign access to user groups: By using Microsoft Entra user groups, you don't have to add individual users to each workspace, and to other resources the same group of users requires access to.
Create a workspace per project: While a workspace can be used for multiple projects, limiting it to one project per workspace allows for cost reporting accrued to a project level. It also allows you to manage configurations like datastores in the scope of each project.
Share Azure resources: Workspaces require you to create several associated resources. Share these resources between workspaces to save repetitive setup steps.
Enable self-serve: Precreate and secure associated resources as an IT admin, and use user roles to let data scientists create workspaces on their own.
Share assets: You can share assets between workspaces using Azure Machine Learning registries.

How is my content stored in a workspace?

Your workspace keeps a history of all training runs, with logs, metrics, output, lineage metadata, and a snapshot of your scripts. As you perform tasks in Azure Machine Learning, artifacts are generated. Their metadata and data are stored in the workspace and on its associated resources.

Associated resources

When you create a new workspace, you're required to bring other Azure resources to store your data. If not provided by you, these resources are automatically be created by Azure Machine Learning.

Azure Storage account. Stores machine learning artifacts such as job logs. By default, this storage account is used when you upload data to the workspace. Jupyter notebooks that are used with your Azure Machine Learning compute instances are stored here as well.
Important

You can't use an existing Azure Storage account if it is:
- An account of type BlobStorage
- A premium account (Premium_LRS and Premium_GRS)
- An account with hierarchical namespace (used with Azure Data Lake Storage Gen2).
You can use premium storage or hierarchical namespace as additional storage by creating a datastore.

Do not enable hierarchical namespace on the storage account after upgrading to general-purpose v2.

If you bring an existing general-purpose v1 storage account, you may upgrade this to general-purpose v2 after the workspace has been created.
Azure Container Registry (ACR). Stores created docker containers, when you build custom environments via Azure Machine Learning. Deploying AutoML models and data profile will also trigger creation of custom environments.

Workspaces can be created without ACR as a dependency if you do not have a need to build custom docker containers. Azure Machine Learning can read from external container registries.

ACR will automatically be provisioned when you build custom docker images. Use Azure role-based access control (Azure RBAC) to prevent customer docker containers from being built.

Important

If your subscription setting requires adding tags to resources under it, ACR created by Azure Machine Learning will fail, since we cannot set tags to ACR.
Azure Application Insights. Helps you monitor and collect diagnostic information from your inference endpoints.

For more information, see Monitor online endpoints.
Azure Key Vault. Stores secrets that are used by compute targets and other sensitive information that the workspace needs.

Create a workspace

There are multiple ways to create a workspace. To get started, use one of the following options:

The Azure Machine Learning studio lets you quickly create a workspace with default settings.
Use Azure portal for a point-and-click interface with more security options.
Use the VS Code extension if you work in Visual Studio Code.

To automate workspace creation using your preferred security settings:

Azure Resource Manager / Bicep templates provide a declarative syntax to deploy Azure resources. An alternative option is to use Terraform. Also see the Bicep template or Terraform template.

Use the Azure Machine Learning CLI or Azure Machine Learning SDK for Python for prototyping and as part of your MLOps workflows.

Use the Azure Machine Learning CLI or Azure Machine Learning SDK for Python for prototyping and as part of your MLOps workflows.

Use REST APIs directly in scripting environment, for platform integration or in MLOps workflows.

Tools for workspace interaction and management

Once your workspace is set up, you can interact with it in the following ways:

On the web:
- Azure Machine Learning studio
- Azure Machine Learning designer

In any Python environment with the Azure Machine Learning SDK.
On the command line, using the Azure Machine Learning CLI extension v2

In any Python environment with the Azure Machine Learning SDK
On the command line, using the Azure Machine Learning CLI extension v1

Azure Machine Learning VS Code Extension

The following workspace management tasks are available in each interface.

Workspace management task	Portal	Studio	Python SDK	Azure CLI	VS Code
Create a workspace	✓	✓	✓	✓	✓
Manage workspace access	✓			✓
Create and manage compute resources	✓	✓	✓	✓	✓
Create a compute instance		✓	✓	✓	✓

Warning

Moving your Azure Machine Learning workspace to a different subscription, or moving the owning subscription to a new tenant, is not supported. Doing so may cause errors.

Sub resources

When you create compute clusters and compute instances in Azure Machine Learning, sub resources are created.

VMs: provide computing power for compute instances and compute clusters, which you use to run jobs.
Load Balancer: a network load balancer is created for each compute instance and compute cluster to manage traffic even while the compute instance/cluster is stopped.
Virtual Network: these help Azure resources communicate with one another, the internet, and other on-premises networks.
Bandwidth: encapsulates all outbound data transfers across regions.

Next steps

To learn more about planning a workspace for your organization's requirements, see Organize and set up Azure Machine Learning.

To get started with Azure Machine Learning, see:

Kopīgot, izmantojot