Share via


Query a chat model

In this article, you learn how to write query requests for foundation models that are optimized for chat and general purpose tasks and send them to your model serving endpoint.

The examples in this article apply to querying foundation models that are made available using either:

Requirements

Query examples

The examples in this section show how to query the Anthropic Claude Sonnet 4.5 model made available by the Foundation Model APIs pay-per-token endpoint, databricks-claude-sonnet-4-5*, using the different client options.

OpenAI client

To use the OpenAI client, specify the model serving endpoint name as the model input. The following example assumes you have a Databricks API token and openai installed on your compute. You also need your Databricks workspace instance to connect the OpenAI client to Databricks.


import os
import openai
from openai import OpenAI

client = OpenAI(
    api_key="dapi-your-databricks-token",
    base_url="https://example.staging.cloud.databricks.com/serving-endpoints"
)

response = client.chat.completions.create(
    model="databricks-claude-sonnet-4-5",
    messages=[
      {
        "role": "system",
        "content": "You are a helpful assistant."
      },
      {
        "role": "user",
        "content": "What is a mixture of experts model?",
      }
    ],
    max_tokens=256
)

REST API

Important

The following example uses REST API parameters for querying serving endpoints that serve external models. These parameters are in Public Preview and the definition might change. See POST /serving-endpoints/{name}/invocations.

curl \
-u token:$DATABRICKS_TOKEN \
-X POST \
-H "Content-Type: application/json" \
-d '{
  "messages": [
    {
      "role": "system",
      "content": "You are a helpful assistant."
    },
    {
      "role": "user",
      "content": " What is a mixture of experts model?"
    }
  ]
}' \
https://<workspace_host>.databricks.com/serving-endpoints/<your-external-model-endpoint>/invocations \

MLflow Deployments SDK

Important

The following example uses the predict() API from the MLflow Deployments SDK.


import mlflow.deployments

# Only required when running this example outside of a Databricks Notebook
export DATABRICKS_HOST="https://<workspace_host>.databricks.com"
export DATABRICKS_TOKEN="dapi-your-databricks-token"

client = mlflow.deployments.get_deploy_client("databricks")

chat_response = client.predict(
    endpoint="databricks-claude-sonnet-4-5",
    inputs={
        "messages": [
            {
              "role": "user",
              "content": "Hello!"
            },
            {
              "role": "assistant",
              "content": "Hello! How can I assist you today?"
            },
            {
              "role": "user",
              "content": "What is a mixture of experts model??"
            }
        ],
        "temperature": 0.1,
        "max_tokens": 20
    }
)

Databricks Python SDK

This code must be run in a notebook in your workspace. See Use the Databricks SDK for Python from an Azure Databricks notebook.

from databricks.sdk import WorkspaceClient
from databricks.sdk.service.serving import ChatMessage, ChatMessageRole

w = WorkspaceClient()
response = w.serving_endpoints.query(
    name="databricks-claude-sonnet-4-5",
    messages=[
        ChatMessage(
            role=ChatMessageRole.SYSTEM, content="You are a helpful assistant."
        ),
        ChatMessage(
            role=ChatMessageRole.USER, content="What is a mixture of experts model?"
        ),
    ],
    max_tokens=128,
)
print(f"RESPONSE:\n{response.choices[0].message.content}")

As an example, the following is the expected request format for a chat model when using the REST API. For external models, you can include additional parameters that are valid for a given provider and endpoint configuration. See Additional query parameters.

{
  "messages": [
    {
      "role": "user",
      "content": "What is a mixture of experts model?"
    }
  ],
  "max_tokens": 100,
  "temperature": 0.1
}

The following is an expected response format for a request made using the REST API:

{
  "model": "databricks-claude-sonnet-4-5",
  "choices": [
    {
      "message": {},
      "index": 0,
      "finish_reason": null
    }
  ],
  "usage": {
    "prompt_tokens": 7,
    "completion_tokens": 74,
    "total_tokens": 81
  },
  "object": "chat.completion",
  "id": null,
  "created": 1698824353
}

Supported models

See Foundation model types for supported chat models.

Additional resources