Query model services

Important

This feature is in Beta. Account admins can control access to this feature from the account console Previews page. See Manage Azure Databricks previews.

This page describes how to query model services in Unity Catalog.

Requirements

  • Unity AI Gateway preview enabled for your account. See Manage Azure Databricks previews.
  • A Azure Databricks workspace in a Unity AI Gateway supported region.
  • EXECUTE on the model service, and USE CATALOG and USE SCHEMA on its catalog and schema. System-provided model services in system.ai grant EXECUTE to all account users by default.

Identify a model service

Identify a model service by its fully qualified name as the model slug, for example system.ai.databricks-claude-opus-4-6. You can query a model service from any workspace attached to the same metastore, including across workspace boundaries.

Each request must identify a workspace, which Azure Databricks uses for pay-per-token billing. Provide the workspace in one of the following ways:

  • Workspace URL: Send the request to your workspace URL, which identifies the workspace. For example, https://<workspace-url>/ai-gateway/mlflow/v1.
  • Workspace header: If you send the request to a single account-level URL, add the x-databricks-workspace-id header to identify the workspace.

Query with the OpenAI client

The following example queries a model service using the OpenAI client and the MLflow Chat Completions API:

Python

from openai import OpenAI
import os

# To get a Databricks token, see https://docs.databricks.com/dev-tools/auth/pat
DATABRICKS_TOKEN = os.environ.get('DATABRICKS_TOKEN')

client = OpenAI(
  api_key=DATABRICKS_TOKEN,
  base_url="https://<workspace-url>/ai-gateway/mlflow/v1"
)

chat_completion = client.chat.completions.create(
  messages=[
    {"role": "user", "content": "Hello!"},
    {"role": "assistant", "content": "Hello! How can I assist you today?"},
    {"role": "user", "content": "What is Databricks?"},
  ],
  model="system.ai.databricks-claude-opus-4-6",
  max_tokens=256
)

print(chat_completion.choices[0].message.content)

REST API

curl \
  -u token:$DATABRICKS_TOKEN \
  -X POST \
  -H "Content-Type: application/json" \
  -d '{
    "model": "system.ai.databricks-claude-opus-4-6",
    "max_tokens": 256,
    "messages": [
      {"role": "user", "content": "Hello!"},
      {"role": "assistant", "content": "Hello! How can I assist you today?"},
      {"role": "user", "content": "What is Databricks?"}
    ]
  }' \
  https://<workspace-url>/ai-gateway/mlflow/v1/chat/completions

Replace <workspace-url> with your Azure Databricks workspace URL.

Model services support the same unified and native APIs as Unity AI Gateway endpoints, such as the MLflow Chat Completions API and the Anthropic Messages API. For the full list of supported APIs and more examples, see Query Unity AI Gateway endpoints (legacy).

Query with ai_query

Use the ai_query function to query Databricks-provided model services in system.ai from SQL or Python for batch inference:

SELECT ai_query(
  'system.ai.databricks-claude-opus-4-6',
  'Summarize the following text: ' || text_column
) AS summary
FROM my_table
LIMIT 10

For full ai_query syntax, see ai_query function.

Backward compatibility with workspace endpoint names

For backward compatibility, Azure Databricks interprets requests that use a Databricks-hosted model name without a fully qualified name as a system-provided model service in system.ai. For example, Azure Databricks interprets databricks-claude-opus-4-6 as system.ai.databricks-claude-opus-4-6. This behavior lets existing workloads continue to run without code changes.

Next steps