An Apache Spark-based analytics platform optimized for Azure.
The error indicates that the provisioned throughput feature is not enabled in the workspace, but the endpoint being created is configured to use provisioned throughput (a mode used mainly for Foundation Model APIs and fine‑tuned foundation models).
To resolve this, use one of these approaches depending on what is being deployed:
- If deploying a custom model (own MLflow model):
- Create a custom model serving endpoint, not a provisioned throughput endpoint.
- In the Serving UI:
- Go to Serving in the sidebar.
- Click Create serving endpoint.
- In Served entities → Entity, select My models – Unity Catalog or My models – Model Registry and choose the registered model and version.
- Configure Compute Type (CPU or GPU) and Compute Scale-out (Small/Medium/ Large) instead of any provisioned throughput options.
- Do not set
min_provisioned_throughput/max_provisioned_throughputin REST/SDK calls; instead, use the standard concurrency settings (min_provisioned_concurrency,max_provisioned_concurrency) if needed.
- If deploying a fine‑tuned foundation model with provisioned throughput:
- Provisioned throughput endpoints are a specific feature of Foundation Model APIs.
- In the Serving UI, provisioned throughput is only available for eligible models from Unity Catalog; when such a model is selected, the UI shows a Provisioned Throughput screen where the Up to dropdown is used to set tokens per second.
- If the workspace does not have provisioned throughput enabled, this mode cannot be used. In that case, either:
- Use the default pay‑per‑token Foundation Model API endpoints that appear at the top of the Serving list (no custom provisioned throughput endpoint required), or
- Contact the Databricks account team to enable provisioned throughput for the workspace.
- If private connectivity (Azure Private Link) is required:
- Private connectivity is only supported for provisioned throughput endpoints or custom model endpoints. Ensure the endpoint type matches this requirement.
In summary, the error is not about GPU quota or Premium tier, but about trying to use provisioned throughput in a workspace where that feature is not enabled. Use a standard custom model serving endpoint (no provisioned throughput settings) or work with the Databricks account team to enable provisioned throughput if specifically required.
References: