Manage and increase quotas for resources with Azure Machine Learning
Azure uses limits and quotas to prevent budget overruns due to fraud, and to honor Azure capacity constraints. Consider these limits as you scale for production workloads. In this article, you learn about:
- Default limits on Azure resources related to Azure Machine Learning.
- Creating workspace-level quotas.
- Viewing your quotas and limits.
- Requesting quota increases.
A quota is a credit limit, not a capacity guarantee. If you have large-scale capacity needs, contact Azure support to increase your quota.
A quota is shared across all the services in your subscriptions, including Azure Machine Learning. Calculate usage across all services when you're evaluating capacity.
Azure Machine Learning compute is an exception. It has a separate quota from the core compute quota.
Default limits vary by offer category type, such as free trial, pay-as-you-go, and virtual machine (VM) series (such as Dv2, F, and G).
Default resource quotas
In this section, you learn about the default and maximum quota limits for the following resources:
- Azure Machine Learning assets
- Azure Machine Learning compute
- Azure Machine Learning managed online endpoints
- Azure Machine Learning pipelines
- Virtual machines
- Azure Container Instances
- Azure Storage
Limits are subject to change. For the latest information, see Service limits in Azure Machine Learning.
Azure Machine Learning assets
The following limits on assets apply on a per-workspace basis.
In addition, the maximum run time is 30 days and the maximum number of metrics logged per run is 1 million.
Azure Machine Learning Compute
Azure Machine Learning Compute has a default quota limit on both the number of cores (split by each VM Family and cumulative total cores) and the number of unique compute resources allowed per region in a subscription. This quota is separate from the VM core quota listed in the previous section as it applies only to the managed compute resources of Azure Machine Learning.
Request a quota increase to raise the limits for various VM family core quotas, total subscription core quotas, cluster quota and resources in this section.
Dedicated cores per region have a default limit of 24 to 300, depending on your subscription offer type. You can increase the number of dedicated cores per subscription for each VM family. Specialized VM families like NCv2, NCv3, or ND series start with a default of zero cores. GPUs also default to zero cores.
Low-priority cores per region have a default limit of 100 to 3,000, depending on your subscription offer type. The number of low-priority cores per subscription can be increased and is a single value across VM families.
Clusters per region have a default limit of 200. This limit is shared between training clusters, compute instances and MIR endpoint deployments. (A compute instance is considered a single-node cluster for quota purposes.) Cluster quota can be increased up to a value of 500 per region within a given subscription.
To learn more about which VM family to request a quota increase for, check out virtual machine sizes in Azure. For instance GPU VM families start with an "N" in their family name (eg. NCv3 series)
The following table shows more limits in the platform. Reach out to the AzureML product team through a technical support ticket to request an exception.
|Resource or Action||Maximum limit|
|Workspaces per resource group||800|
|Nodes in a single Azure Machine Learning Compute (AmlCompute) cluster set up as a non communication-enabled pool (that is, can't run MPI jobs)||100 nodes but configurable up to 65,000 nodes|
|Nodes in a single Parallel Run Step run on an Azure Machine Learning Compute (AmlCompute) cluster||100 nodes but configurable up to 65,000 nodes if your cluster is set up to scale per above|
|Nodes in a single Azure Machine Learning Compute (AmlCompute) cluster set up as a communication-enabled pool||300 nodes but configurable up to 4000 nodes|
|Nodes in a single Azure Machine Learning Compute (AmlCompute) cluster set up as a communication-enabled pool on an RDMA enabled VM Family||100 nodes|
|Nodes in a single MPI run on an Azure Machine Learning Compute (AmlCompute) cluster||100 nodes but can be increased to 300 nodes|
|Job lifetime||21 days1|
|Job lifetime on a low-priority node||7 days2|
|Parameter servers per node||1|
1 Maximum lifetime is the duration between when a job starts and when it finishes. Completed jobs persist indefinitely. Data for jobs not completed within the maximum lifetime isn't accessible.
2 Jobs on a low-priority node can be preempted whenever there's a capacity constraint. We recommend that you implement checkpoints in your job.
Azure Machine Learning managed online endpoints
Azure Machine Learning managed online endpoints have limits described in the following table.
|Endpoint name||Endpoint names must
|Deployment name||Deployment names must
|Number of endpoints per subscription||50|
|Number of deployments per subscription||200|
|Number of deployments per endpoint||20|
|Number of instances per deployment||20 2|
|Max request time-out at endpoint level||90 seconds|
|Total requests per second at endpoint level for all deployments||500 3|
|Total connections per second at endpoint level for all deployments||500 3|
|Total connections active at endpoint level for all deployments||500 3|
|Total bandwidth at endpoint level for all deployments||5 MBPS 3|
1 Single dashes like,
my-endpoint-name, are accepted in endpoint and deployment names.
2 We reserve 20% extra compute resources for performing upgrades. For example, if you request 10 instances in a deployment, you must have a quota for 12. Otherwise, you'll receive an error.
3 If you request a limit increase, be sure to calculate related limit increases you might need. For example, if you request a limit increase for requests per second, you might also want to compute the required connections and bandwidth limits and include these limit increases in the same request.
To determine the current usage for an endpoint, view the metrics.
To request an exception from the Azure Machine Learning product team, use the steps in the Request quota increases.
Azure Machine Learning pipelines
Azure Machine Learning pipelines have the following limits.
|Steps in a pipeline||30,000|
|Workspaces per resource group||800|
Azure Machine Learning integration with Synapse
Synapse spark clusters have a default limit of 12-2000, depending on your subscription offer type. This limit can be increased by submitting a support ticket and requesting for quota increase under the "Machine Learning Service: Spark vCore Quota" category.
Each Azure subscription has a limit on the number of virtual machines across all services. Virtual machine cores have a regional total limit and a regional limit per size series. Both limits are separately enforced.
For example, consider a subscription with a US East total VM core limit of 30, an A series core limit of 30, and a D series core limit of 30. This subscription would be allowed to deploy 30 A1 VMs, or 30 D1 VMs, or a combination of the two that doesn't exceed a total of 30 cores.
You can't raise limits for virtual machines above the values shown in the following table.
|Subscriptions associated with an Azure Active Directory tenant||Unlimited|
|Coadministrators per subscription||Unlimited|
|Resource groups per subscription||980|
|Azure Resource Manager API request size||4,194,304 bytes|
|Tags per subscription1||50|
|Unique tag calculations per subscription2||80,000|
|Subscription-level deployments per location||8003|
|Locations of Subscription-level deployments||10|
1You can apply up to 50 tags directly to a subscription. However, the subscription can contain an unlimited number of tags that are applied to resource groups and resources within the subscription. The number of tags per resource or resource group is limited to 50.
2Resource Manager returns a list of tag name and values in the subscription only when the number of unique tags is 80,000 or less. A unique tag is defined by the combination of resource ID, tag name, and tag value. For example, two resources with the same tag name and value would be calculated as two unique tags. You still can find a resource by tag when the number exceeds 80,000.
3Deployments are automatically deleted from the history as you near the limit. For more information, see Automatic deletions from deployment history.
For more information, see Container Instances limits.
Azure Storage has a limit of 250 storage accounts per region, per subscription. This limit includes both Standard and Premium storage accounts.
To increase the limit, make a request through Azure Support. The Azure Storage team will review your case and can approve up to 250 storage accounts for a region.
Use workspace-level quotas to manage Azure Machine Learning compute target allocation between multiple workspaces in the same subscription.
By default, all workspaces share the same quota as the subscription-level quota for VM families. However, you can set a maximum quota for individual VM families on workspaces in a subscription. This lets you share capacity and avoid resource contention issues.
- Go to any workspace in your subscription.
- In the left pane, select Usages + quotas.
- Select the Configure quotas tab to view the quotas.
- Expand a VM family.
- Set a quota limit on any workspace listed under that VM family.
You can't set a negative value or a value higher than the subscription-level quota.
You need subscription-level permissions to set a quota at the workspace level.
View quotas in the studio
When you create a new compute resource, by default you'll see only VM sizes that you already have quota to use. Switch the view to Select from all options.
Scroll down until you see the list of VM sizes you don't have quota for.
Use the link to go directly to the online customer support request for more quota.
View your usage and quotas in the Azure portal
To view your quota for various Azure resources like virtual machines, storage, or network, use the Azure portal:
On the left pane, select All services and then select Subscriptions under the General category.
From the list of subscriptions, select the subscription whose quota you're looking for.
Select Usage + quotas to view your current quota limits and usage. Use the filters to select the provider and locations.
You manage the Azure Machine Learning compute quota on your subscription separately from other Azure quotas:
Go to your Azure Machine Learning workspace in the Azure portal.
On the left pane, in the Support + troubleshooting section, select Usage + quotas to view your current quota limits and usage.
Select a subscription to view the quota limits. Filter to the region you're interested in.
You can switch between a subscription-level view and a workspace-level view.
Request quota increases
To raise the limit or quota above the default limit, open an online customer support request at no charge.
You can't raise limits above the maximum values shown in the preceding tables. If there's no maximum limit, you can't adjust the limit for the resource.
When you're requesting a quota increase, select the service that you have in mind. For example, select Azure Machine Learning, Container Instances, or Storage. For Azure Machine Learning compute, you can select the Request Quota button while viewing the quota in the preceding steps.
Free trial subscriptions are not eligible for limit or quota increases. If you have a free trial subscription, you can upgrade to a pay-as-you-go subscription. For more information, see Upgrade Azure free trial to pay-as-you-go and Azure free account FAQ.
Endpoint quota increases
When requesting the quota increase, provide the following information:
When opening the support request, select Machine Learning Service: Endpoint Limits as the Quota type.
On the Additional details tab, select Enter details and then provide the quota you'd like to increase and the new value, the reason for the quota increase request, and location(s) where you need the quota increase. Finally, select Save and continue to continue.