Manage model serving endpoints

This article describes how to manage model serving endpoints using the Serving UI and REST API. See Serving endpoints in the REST API reference.

To create model serving endpoints use one of the following:

Get the status of the model endpoint

In the Serving UI, you can check the status of an endpoint from the Serving endpoint state indicator at the top of your endpoint’s details page.

You can use check the status and details of an endpoint programmatically using the REST API or the MLflow Deployments SDK


GET /api/2.0/serving-endpoints/{name}

The following example gets the details of an endpoint that serves the first version of the ads1 model that is registered in the model registry. To specify a model from Unity Catalog, provide the full model name including parent catalog and schema such as, catalog.schema.example-model.

In the following example response, the state.ready field is “READY”, which means the endpoint is ready to receive traffic. The state.update_state field is NOT_UPDATING and pending_config is no longer returned because the update was finished successfully.

  "name": "workspace-model-endpoint",
  "creator": "",
  "creation_timestamp": 1666829055000,
  "last_updated_timestamp": 1666829055000,
  "state": {
    "ready": "READY",
    "update_state": "NOT_UPDATING"
  "config": {
    "served_entities": [
        "name": "ads1-1",
        "entity_name": "ads1",
        "entity_version": "1",
        "workload_size": "Small",
        "scale_to_zero_enabled": false,
        "state": {
          "deployment": "DEPLOYMENT_READY",
          "deployment_state_message": ""
        "creator": "",
        "creation_timestamp": 1666829055000
    "traffic_config": {
      "routes": [
          "served_model_name": "ads1-1",
          "traffic_percentage": 100
    "config_version": 1
  "id": "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx",
  "permission_level": "CAN_MANAGE"

MLflow Deployments SDK

from mlflow.deployments import get_deploy_client

client = get_deploy_client("databricks")
endpoint = client.get_endpoint(endpoint="chat")
assert endpoint == {
    "name": "chat",
    "creator": "",
    "creation_timestamp": 0,
    "last_updated_timestamp": 0,
    "state": {...},
    "config": {...},
    "tags": [...],
    "id": "88fd3f75a0d24b0380ddc40484d7a31b",

Delete a model serving endpoint

To disable serving for a model, you can delete the endpoint it’s served on.

You can delete an endpoint from the endpoint’s details page in the Serving UI.

  1. Click Serving on the sidebar.
  2. Click the endpoint you want to delete.
  3. Click the kebab menu at the top and select Delete.

Alternatively, you can delete a serving endpoint programmatically using the REST API or the MLflow Deployments SDK


DELETE /api/2.0/serving-endpoints/{name}

MLflow Deployments SDK

from mlflow.deployments import get_deploy_client

client = get_deploy_client("databricks")

Debug your model serving endpoint

To debug any issues with the endpoint, you can fetch:

  • Model server container build logs
  • Model server logs

These logs are also accessible from the Endpoints UI in the Logs tab.

For the build logs for a served model you can use the following request:

GET /api/2.0/serving-endpoints/{name}/served-models/{served-model-name}/build-logs
  “config_version”: 1  // optional

For the model server logs for a serve model, you can use the following request:

GET /api/2.0/serving-endpoints/{name}/served-models/{served-model-name}/logs

  “config_version”: 1  // optional

Manage permissions on your model serving endpoint

You must have at least the CAN MANAGE permission on a serving endpoint to modify permissions. For more information on the permission levels, see Serving endpoint ACLs.

Get the list of permissions on the serving endpoint.

databricks permissions get servingendpoints <endpoint-id>

Grant user the CAN QUERY permission on the serving endpoint.

databricks permissions update servingendpoints <endpoint-id> --json '{
  "access_control_list": [
      "user_name": "",
      "permission_level": "CAN_QUERY"

You can also modify serving endpoint permissions using the Permissions API.

Get a model serving endpoint schema


Support for serving endpoint query schemas is in Public Preview. This functionality is available in Model Serving regions.

A serving endpoint query schema is a formal description of the serving endpoint using the standard OpenAPI specification in JSON format. It contains information about the endpoint including the endpoint path, details for querying the endpoint like the request and response body format, and data type for each field. This information can be helpful for reproducibility scenarios or when you need information about the endpoint, but are not the original endpoint creator or owner.

To get the model serving endpoint schema, the served model must have a model signature logged and the endpoint must be in a READY state.

The following examples demonstrate how to programmatically get the model serving endpoint schema using the REST API. For feature serving endpoint schemas, see What is Databricks Feature Serving?.

The schema returned by the API is in the format of a JSON object that follows the OpenAPI specification.

ENDPOINT_NAME="<endpoint name>"

curl "$ENDPOINT_NAME/openapi" -H "Authorization: Bearer $ACCESS_TOKEN" -H "Content-Type: application/json"

Schema response details

The response is an OpenAPI specification in JSON format, typically including fields like openapi, info, servers and paths. Since the schema response is a JSON object, you can parse it using common programming languages, and generate client code from the specification using third-party tools. You can also visualize the OpenAPI specification using third-party tools like Swagger Editor.

The main fields of the response include:

  • The info.title field shows the name of the serving endpoint.
  • The servers field always contains one object, typically the url field which is the base url of the endpoint.
  • The paths object in the response contains all supported paths for an endpoint. The keys in the object are the path URL. Each path can support multiple formats of inputs. These inputs are listed in the oneOf field.

The following is an example endpoint schema response:

  "openapi": "3.1.0",
  "info": {
    "title": "example-endpoint",
    "version": "2"
  "servers": [{ "url": ""}],
  "paths": {
    "/served-models/vanilla_simple_model-2/invocations": {
      "post": {
        "requestBody": {
          "content": {
            "application/json": {
              "schema": {
                "oneOf": [
                    "type": "object",
                    "properties": {
                      "dataframe_split": {
                        "type": "object",
                        "properties": {
                          "columns": {
                            "description": "required fields: int_col",
                            "type": "array",
                            "items": {
                              "type": "string",
                              "enum": [
                          "data": {
                            "type": "array",
                            "items": {
                              "type": "array",
                              "prefixItems": [
                                  "type": "integer",
                                  "format": "int64"
                                  "type": "number",
                                  "format": "double"
                                  "type": "string"
                      "params": {
                        "type": "object",
                        "properties": {
                          "sentiment": {
                            "type": "number",
                            "format": "double",
                            "default": "0.5"
                    "examples": [
                        "columns": [
                        "data": [
                    "type": "object",
                    "properties": {
                      "dataframe_records": {
                        "type": "array",
                        "items": {
                          "required": [
                          "type": "object",
                          "properties": {
                            "int_col": {
                              "type": "integer",
                              "format": "int64"
                            "float_col": {
                              "type": "number",
                              "format": "double"
                            "string_col": {
                              "type": "string"
                            "becx_col": {
                              "type": "object",
                              "format": "unknown"
                      "params": {
                        "type": "object",
                        "properties": {
                          "sentiment": {
                            "type": "number",
                            "format": "double",
                            "default": "0.5"
        "responses": {
          "200": {
            "description": "Successful operation",
            "content": {
              "application/json": {
                "schema": {
                  "type": "object",
                  "properties": {
                    "predictions": {
                      "type": "array",
                      "items": {
                        "type": "number",
                        "format": "double"