Model Serving limits and regions
This article summarizes the limitations and region availability for Databricks Model Serving and supported endpoint types.
Limitations
Databricks Model Serving imposes default limits to ensure reliable performance. If you have feedback on these limits, please reach out to your Databricks account team.
The following table summarizes resource and payload limitations for model serving endpoints.
Feature | Granularity | Limit |
---|---|---|
Payload size | Per request | 16 MB |
Queries per second (QPS) | Per workspace | 200, but can be increased to 3000 or more by reaching out to your Databricks account |
Model execution duration | Per request | 120 seconds |
CPU endpoint model memory usage | Per endpoint | 4GB |
GPU endpoint model memory usage | Per endpoint | Greater than or equal to assigned GPU memory, depends on the GPU workload size |
Provisioned concurrency | Per workspace | 200 concurrency. Can be increased by reaching out to your Databricks account. |
Overhead latency | Per request | Less than 50 milliseconds |
Foundation Model APIs (pay-per-token) rate limits | Per workspace | Reach out to your Databricks account team to increase the following limits. * The DBRX Instruct model has a limit of 1 query per second. * Other chat and completion models have a default rate limit of 2 queries per second. * Embedding models have a default 300 embedding inputs per second. |
Foundation Model APIs (provisioned throughput) rate limits | Per workspace | Same as Model Serving QPS limit listed above. |
Model Serving endpoints are protected by access control and respect networking-related ingress rules configured on the workspace, like IP allowlists and Private Link.
Additional limitations exist as well:
- It is possible for a workspace to be deployed in a supported region, but be served by a control plane in a different region. These workspaces do not support Model Serving and result in an error message saying that your workspace is not supported. Reach out to your Azure Databricks account team for more information.
- Model Serving does not support init scripts.
- By default, Model Serving does not support Private Link to external endpoints (like, Azure OpenAI). Support for this functionality is evaluated and implemented on a per region basis. Reach out to your Azure Databricks account team for more information.
Foundation Model APIs limits
Note
As part of providing the Foundation Model APIs, Databricks may process your data outside of the region where your data originated, but not outside of the relevant geographical location.
The following are limits relevant to Foundation Model APIs workloads:
- Provisioned throughput supports the HIPAA compliance profile and should be used for workloads requiring compliance certifications. Pay-per-token workloads are not HIPAA or compliance security profile compliant.
- For Foundation Model APIs endpoints, only workspace admins can change the governance settings, like the rate limits. To change rate limits use the following steps:
- Open the Serving UI in your workspace to see your serving endpoints.
- From the kebab menu on the Foundation Model APIs endpoint you want to edit, select View details.
- From the kebab menu on the upper-right side of the endpoints details page, select Change rate limit.
- To use the DBRX model architecture for a provisioned throughput workload, your serving endpoint must be in one of the following regions:
eastus
eastus2
westus
centralus
westeurope
northeurope
australiaeast
canadacentral
brazilsouth
Region availability
Note
If you require an endpoint in an unsupported region, reach out to your Azure Databricks account team.
For provisioned throughput workloads that use DBRX models, see Foundation Model APIs limits for region availability.
Region | Location | Core Model Serving capability * | Foundation Model APIs (provisioned throughout) ** | Foundation Model APIs (pay-per-token) | External models |
---|---|---|---|---|---|
australiacentral |
Australia Central | ||||
australiacentral2 |
Australia Central 2 | ||||
australiaeast |
Australia East | X | X | X | |
australiasoutheast |
Australia Southeast | ||||
brazilsouth |
Brazil South | X | X | X | |
canadacentral |
Canada Central | X | X | X | |
canadaeast |
Canada East | ||||
centralindia |
Central India | X | X | X | |
centralus |
Central US | X | X | X | X |
chinaeast2 |
China East 2 | ||||
chinaeast3 |
China East 3 | ||||
chinanorth2 |
China North 2 | ||||
chinanorth3 |
China North 3 | ||||
eastasia |
East Asia | ||||
eastus |
East US | X | X | X | X |
eastus2 |
East US 2 | X | X | X | X |
eastus2euap |
East US 2 EUAP | ||||
francecentral |
France Central | ||||
germanywestcentral |
Germany West Central | ||||
japaneast |
Japan East | ||||
japanwest |
Japan West | ||||
koreacentral |
Korea Central | ||||
northcentralus |
North Central US | X | X | X | |
northeurope |
North Europe | X | X | X | |
norwayeast |
Norway East | ||||
qatarcentral |
Qatar Central | ||||
southafricanorth |
South Africa North | ||||
southcentralus |
South Central US | ||||
southeastasia |
Southeast Asia | X | X | ||
southindia |
South India | ||||
swedencentral |
Sweden Central | ||||
switzerlandnorth |
Switzerland North | ||||
switzerlandwest |
Switzerland West | ||||
uaenorth |
UAE North | ||||
uksouth |
UK South | ||||
ukwest |
UK West | ||||
westcentralus |
West Central US | ||||
westeurope |
West Europe | X | X | X | |
westindia |
West India | ||||
westus |
West US | X | X | X | X |
westus2 |
West US 2 | X | |||
westus3 |
West US 3 |
- only cpu compute
** includes gpu support
Feedback
https://aka.ms/ContentUserFeedback.
Coming soon: Throughout 2024 we will be phasing out GitHub Issues as the feedback mechanism for content and replacing it with a new feedback system. For more information see:Submit and view feedback for