What is an Azure Machine Learning workspace?
Workspaces are places to collaborate with colleagues to create machine learning artifacts and group related work. For example, experiments, jobs, datasets, models, components, and inference endpoints. This article describes workspaces, how to manage access to them, and how to use them to organize your work.
Ready to get started? Create a workspace.
Tasks performed within a workspace
For machine learning teams, the workspace is a place to organize their work. Below are some of the tasks you can start from a workspace:
- Create jobs - Jobs are training runs you use to build your models. You can group jobs into experiments to compare metrics.
- Author pipelines - Pipelines are reusable workflows for training and retraining your model.
- Register data assets - Data assets aid in management of the data you use for model training and pipeline creation.
- Register models - Once you have a model you want to deploy, you create a registered model.
- Create online endpoints - Use a registered model and a scoring script to create an online endpoint.
- Deploy a model - Use the registered model and a scoring script to deploy a model.
Besides grouping your machine learning results, workspaces also host resource configurations:
- Compute targets are used to run your experiments.
- Datastores define how you and others can connect to data sources when using data assets.
- Security settings - Networking, identity and access control, and encryption settings.
For machine learning team leads and administrators, workspaces serve as containers for access management, cost management and data isolation. Below are some tips for organizing workspaces:
- Use user roles for permission management in the workspace between users. For example a data scientist, a machine learning engineer or an admin.
- Assign access to user groups: By using Microsoft Entra user groups, you don't have to add individual users to each workspace, and to other resources the same group of users requires access to.
- Create a workspace per project: While a workspace can be used for multiple projects, limiting it to one project per workspace allows for cost reporting accrued to a project level. It also allows you to manage configurations like datastores in the scope of each project.
- Share Azure resources: Workspaces require you to create several associated resources. Share these resources between workspaces to save repetitive setup steps.
- Enable self-serve: Pre-create and secure associated resources as an IT admin, and use user roles to let data scientists create workspaces on their own.
- Share assets: You can share assets between workspaces using Azure Machine Learning registries.
How is my content stored in a workspace?
Your workspace keeps a history of all training runs, with logs, metrics, output, lineage metadata, and a snapshot of your scripts. As you perform tasks in Azure Machine Learning, artifacts are generated. Their metadata and data are stored in the workspace and on its associated resources.
When you create a new workspace, you're required to bring other Azure resources to store your data. If not provided by you, these resources will automatically be created by Azure Machine Learning.
Azure Storage account. Stores machine learning artifacts such as job logs. By default, this storage account is used when you upload data to the workspace. Jupyter notebooks that are used with your Azure Machine Learning compute instances are stored here as well.
To use an existing Azure Storage account, it can't be of type BlobStorage, a premium account (Premium_LRS and Premium_GRS) and cannot have a hierarchical namespace (used with Azure Data Lake Storage Gen2). You can use premium storage or hierarchical namespace as additional storage by creating a datastore. Do not enable hierarchical namespace on the storage account after upgrading to general-purpose v2. If you bring an existing general-purpose v1 storage account, you may upgrade this to general-purpose v2 after the workspace has been created.
Azure Container Registry. Stores created docker containers, when you build custom environments via Azure Machine Learning. Scenarios that trigger creation of custom environments include AutoML when deploying models and data profiling.
Workspaces can be created without Azure Container Registry as a dependency if you do not have a need to build custom docker containers. To read container images, Azure Machine Learning also works with external container registries. Azure Container Registry is automatically provisioned when you build custom docker images. Use Azure RBAC to prevent customer docker containers from being built.
If your subscription setting requires adding tags to resources under it, Azure Container Registry (ACR) created by Azure Machine Learning will fail, since we cannot set tags to ACR.
Azure Application Insights. Helps you monitor and collect diagnostic information from your inference endpoints.
For more information, see Monitor online endpoints.
Azure Key Vault. Stores secrets that are used by compute targets and other sensitive information that's needed by the workspace.
Create a workspace
There are multiple ways to create a workspace. To get started use one of the following options:
- The Azure Machine Learning studio lets you quickly create a workspace with default settings.
- Use Azure portal for a point-and-click interface with more security options.
- Use the VS Code extension if you work in Visual Studio Code.
To automate workspace creation using your preferred security settings:
- Azure Resource Manager / Bicep templates provide a declarative syntax to deploy Azure resources. An alternative option is to use Terraform. Also see How to create a secure workspace by using a template.
- Use REST APIs directly in scripting environment, for platform integration or in MLOps workflows.
Tools for workspace interaction and management
Once your workspace is set up, you can interact with it in the following ways:
The following workspace management tasks are available in each interface.
|Workspace management task
|Create a workspace
|Manage workspace access
|Create and manage compute resources
|Create a compute instance
Moving your Azure Machine Learning workspace to a different subscription, or moving the owning subscription to a new tenant, is not supported. Doing so may cause errors.
When you create compute clusters and compute instances in Azure Machine Learning, sub resources are created.
- VMs: provide computing power for compute instances and compute clusters, which you use to run jobs.
- Load Balancer: a network load balancer is created for each compute instance and compute cluster to manage traffic even while the compute instance/cluster is stopped.
- Virtual Network: these help Azure resources communicate with one another, the internet, and other on-premises networks.
- Bandwidth: encapsulates all outbound data transfers across regions.
To learn more about planning a workspace for your organization's requirements, see Organize and set up Azure Machine Learning.
To get started with Azure Machine Learning, see: