Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Important
This feature is in Beta. Account admins can control access to this feature from the account console Previews page. See Manage Azure Databricks previews.
This page describes how to query model services in Unity Catalog using supported APIs.
Requirements
- Unity AI Gateway preview enabled for your account. See Manage Azure Databricks previews.
- A Azure Databricks workspace in a Unity AI Gateway supported region.
- Unity Catalog enabled for your workspace. See Enable a workspace for Unity Catalog.
Supported APIs and integrations
Unity AI Gateway supports the following APIs and integrations:
- Unified APIs: OpenAI-compatible interfaces to query models on Azure Databricks. Seamlessly switch between models from different providers without changing how you query each model.
- Native APIs: Provider-specific interfaces to access the latest model and provider-specific features.
- Coding agents: Integrate your coding agents with Unity AI Gateway to add centralized governance and monitoring to your AI-assisted development workflows. See coding agent integration.
- Agents on Databricks Apps: Author and deploy AI agents on Databricks Apps that route LLM traffic through Unity AI Gateway. See Step 4. Govern LLM usage from your agents on Databricks Apps with Unity AI Gateway.
ai_query: Useai_queryto query Azure Databricks-provided model services from SQL or Python for batch inference. See Query model services withai_query.
Query model services with ai_query
You can use the ai_query function to query Azure Databricks-provided model services directly from SQL or Python. This allows you to capture usage tracking information for your batch inference workloads.
Note
ai_querysupport for Unity AI Gateway is only available for Azure Databricks-provided model services (for example,databricks-gpt-5-4ordatabricks-claude-sonnet-4). Model services that you create in Unity AI Gateway are not yet supported.- Only usage tracking applies to
ai_querybatch inference workloads. Other Unity AI Gateway features such as rate limits, guardrails, inference tables, and fallbacks do not apply.
To get started:
- Enable the Unity AI Gateway preview for your account. See Manage Azure Databricks previews.
- Query a Azure Databricks-provided model service using
ai_query:
SELECT ai_query(
'databricks-gpt-5-4',
'Summarize the following text: ' || text_column
) AS summary
FROM my_table
LIMIT 10
Requests made through ai_query to Azure Databricks-provided model services are captured in the usage tracking system table (system.ai_gateway.usage). These requests also appear in the built-in usage dashboard.
For full ai_query syntax and parameter reference, see ai_query function. For best practices and supported models, see Use ai_query.
Query model services with unified APIs
Unified APIs offer an OpenAI-compatible interface to query models on Azure Databricks. Use unified APIs to seamlessly switch between models from different providers without changing your code.
MLflow Chat Completions API
MLflow Chat Completions API
Python
from openai import OpenAI
import os
DATABRICKS_TOKEN = os.environ.get('DATABRICKS_TOKEN')
client = OpenAI(
api_key=DATABRICKS_TOKEN,
base_url="https://<workspace-url>/ai-gateway/mlflow/v1"
)
chat_completion = client.chat.completions.create(
messages=[
{"role": "user", "content": "Hello!"},
{"role": "assistant", "content": "Hello! How can I assist you today?"},
{"role": "user", "content": "What is Databricks?"},
],
model="<model-service>",
max_tokens=256
)
print(chat_completion.choices[0].message.content)
REST API
curl \
-u token:$DATABRICKS_TOKEN \
-X POST \
-H "Content-Type: application/json" \
-d '{
"model": "<model-service>",
"max_tokens": 256,
"messages": [
{"role": "user", "content": "Hello!"},
{"role": "assistant", "content": "Hello! How can I assist you today?"},
{"role": "user", "content": "What is Databricks?"}
]
}' \
https://<workspace-url>/ai-gateway/mlflow/v1/chat/completions
Replace <workspace-url> with your Azure Databricks workspace URL and <model-service> with the fully qualified name of your model service.
MLflow Embeddings API
MLflow Embeddings API
Python
from openai import OpenAI
import os
DATABRICKS_TOKEN = os.environ.get('DATABRICKS_TOKEN')
client = OpenAI(
api_key=DATABRICKS_TOKEN,
base_url="https://<workspace-url>/ai-gateway/mlflow/v1"
)
embeddings = client.embeddings.create(
input="What is Databricks?",
model="<model-service>"
)
print(embeddings.data[0].embedding)
REST API
curl \
-u token:$DATABRICKS_TOKEN \
-X POST \
-H "Content-Type: application/json" \
-d '{
"model": "<model-service>",
"input": "What is Databricks?"
}' \
https://<workspace-url>/ai-gateway/mlflow/v1/embeddings
Replace <workspace-url> with your Azure Databricks workspace URL and <model-service> with the fully qualified name of your model service.
Supervisor API
Supervisor API
The Supervisor API (/mlflow/v1/responses) is an OpenResponses-compatible, provider-agnostic API for building agents in Beta. Account admins can enable access from the Previews page. See Manage Azure Databricks previews. Pick the best model for your agent use case across providers, without changing your code.
Python
from openai import OpenAI
import os
DATABRICKS_TOKEN = os.environ.get('DATABRICKS_TOKEN')
client = OpenAI(
api_key=DATABRICKS_TOKEN,
base_url="https://<workspace-url>/ai-gateway/mlflow/v1"
)
response = client.responses.create(
model="<model-service>",
input=[{"role": "user", "content": "What is Databricks?"}]
)
print(response.output_text)
REST API
curl \
-u token:$DATABRICKS_TOKEN \
-X POST \
-H "Content-Type: application/json" \
-d '{
"model": "<model-service>",
"input": [
{"role": "user", "content": "What is Databricks?"}
]
}' \
https://<workspace-url>/ai-gateway/mlflow/v1/responses
Replace <workspace-url> with your Azure Databricks workspace URL and <model-service> with the fully qualified name of your model service.
Query model services with native APIs
Native APIs offer provider-specific interfaces to query models on Azure Databricks. Use native APIs to access the latest provider-specific features.
OpenAI Responses API
OpenAI Responses API
Python
from openai import OpenAI
import os
DATABRICKS_TOKEN = os.environ.get('DATABRICKS_TOKEN')
client = OpenAI(
api_key=DATABRICKS_TOKEN,
base_url="https://<workspace-url>/ai-gateway/openai/v1"
)
response = client.responses.create(
model="<model-service>",
max_output_tokens=256,
input=[
{
"role": "user",
"content": [{"type": "input_text", "text": "Hello!"}]
},
{
"role": "assistant",
"content": [{"type": "output_text", "text": "Hello! How can I assist you today?"}]
},
{
"role": "user",
"content": [{"type": "input_text", "text": "What is Databricks?"}]
}
]
)
print(response.output)
REST API
curl \
-u token:$DATABRICKS_TOKEN \
-X POST \
-H "Content-Type: application/json" \
-d '{
"model": "<model-service>",
"max_output_tokens": 256,
"input": [
{
"role": "user",
"content": [{"type": "input_text", "text": "Hello!"}]
},
{
"role": "assistant",
"content": [{"type": "output_text", "text": "Hello! How can I assist you today?"}]
},
{
"role": "user",
"content": [{"type": "input_text", "text": "What is Databricks?"}]
}
]
}' \
https://<workspace-url>/ai-gateway/openai/v1/responses
Replace <workspace-url> with your Azure Databricks workspace URL and <model-service> with the fully qualified name of your model service.
Anthropic Messages API
Anthropic Messages API
Python
import anthropic
import os
DATABRICKS_TOKEN = os.environ.get('DATABRICKS_TOKEN')
client = anthropic.Anthropic(
api_key="unused",
base_url="https://<workspace-url>/ai-gateway/anthropic",
default_headers={
"Authorization": f"Bearer {DATABRICKS_TOKEN}",
},
)
message = client.messages.create(
model="<model-service>",
max_tokens=256,
messages=[
{"role": "user", "content": "Hello!"},
{"role": "assistant", "content": "Hello! How can I assist you today?"},
{"role": "user", "content": "What is Databricks?"},
],
)
print(message.content[0].text)
REST API
curl \
-u token:$DATABRICKS_TOKEN \
-X POST \
-H "Content-Type: application/json" \
-d '{
"model": "<model-service>",
"max_tokens": 256,
"messages": [
{"role": "user", "content": "Hello!"},
{"role": "assistant", "content": "Hello! How can I assist you today?"},
{"role": "user", "content": "What is Databricks?"}
]
}' \
https://<workspace-url>/ai-gateway/anthropic/v1/messages
Replace <workspace-url> with your Azure Databricks workspace URL and <model-service> with the fully qualified name of your model service.
Google Gemini API
Google Gemini API
Python
from google import genai
from google.genai import types
import os
DATABRICKS_TOKEN = os.environ.get('DATABRICKS_TOKEN')
client = genai.Client(
api_key="databricks",
http_options=types.HttpOptions(
base_url="https://<workspace-url>/ai-gateway/gemini",
headers={
"Authorization": f"Bearer {DATABRICKS_TOKEN}",
},
),
)
response = client.models.generate_content(
model="<model-service>",
contents=[
types.Content(
role="user",
parts=[types.Part(text="Hello!")],
),
types.Content(
role="model",
parts=[types.Part(text="Hello! How can I assist you today?")],
),
types.Content(
role="user",
parts=[types.Part(text="What is Databricks?")],
),
],
config=types.GenerateContentConfig(
max_output_tokens=256,
),
)
print(response.text)
REST API
curl \
-u token:$DATABRICKS_TOKEN \
-X POST \
-H "Content-Type: application/json" \
-d '{
"contents": [
{
"role": "user",
"parts": [{"text": "Hello!"}]
},
{
"role": "model",
"parts": [{"text": "Hello! How can I assist you today?"}]
},
{
"role": "user",
"parts": [{"text": "What is Databricks?"}]
}
],
"generationConfig": {
"maxOutputTokens": 256
}
}' \
https://<workspace-url>/ai-gateway/gemini/v1beta/models/<model-service>:generateContent
Replace <workspace-url> with your Azure Databricks workspace URL and <model-service> with the fully qualified name of your model service.
Tag requests for usage tracking
You can attach custom key-value tags to individual requests using the Databricks-Ai-Gateway-Request-Tags HTTP header. Request tags are logged to the request_tags column in both the usage tracking system table and inference tables, enabling you to track costs, attribute usage, and filter analytics by project, team, environment, or any other dimension.
The header value must be a JSON object mapping string keys to string values. For example:
{ "project": "chatbot", "team": "ml-platform", "environment": "production" }
Use the extra_headers parameter (Python) or pass the header directly (REST API) to attach tags to a request:
Python (OpenAI SDK)
from openai import OpenAI
import json
import os
DATABRICKS_TOKEN = os.environ.get('DATABRICKS_TOKEN')
client = OpenAI(
api_key=DATABRICKS_TOKEN,
base_url="https://<workspace-url>/ai-gateway/mlflow/v1"
)
request_tags = {"project": "chatbot", "team": "ml-platform"}
chat_completion = client.chat.completions.create(
messages=[
{"role": "user", "content": "What is Databricks?"},
],
model="<model-service>",
max_tokens=256,
extra_headers={
"Databricks-Ai-Gateway-Request-Tags": json.dumps(request_tags)
}
)
Python (Anthropic SDK)
import anthropic
import json
import os
DATABRICKS_TOKEN = os.environ.get('DATABRICKS_TOKEN')
request_tags = {"project": "chatbot", "team": "ml-platform"}
client = anthropic.Anthropic(
api_key="unused",
base_url="https://<workspace-url>/ai-gateway/anthropic",
default_headers={
"Authorization": f"Bearer {DATABRICKS_TOKEN}",
"Databricks-Ai-Gateway-Request-Tags": json.dumps(request_tags),
},
)
message = client.messages.create(
model="<model-service>",
max_tokens=256,
messages=[
{"role": "user", "content": "What is Databricks?"},
],
)
REST API
curl \
-u token:$DATABRICKS_TOKEN \
-X POST \
-H "Content-Type: application/json" \
-H 'Databricks-Ai-Gateway-Request-Tags: {"project": "chatbot", "team": "ml-platform"}' \
-d '{
"model": "<model-service>",
"max_tokens": 256,
"messages": [
{"role": "user", "content": "What is Databricks?"}
]
}' \
https://<workspace-url>/ai-gateway/mlflow/v1/chat/completions
Replace <workspace-url> with your Azure Databricks workspace URL and <model-service> with the fully qualified name of your model service.