Query Unity AI Gateway endpoints

Important

This feature is in Beta. Account admins can control access to this feature from the account console Previews page. See Manage Azure Databricks previews.

This page describes how to query Unity AI Gateway endpoints using supported APIs.

Requirements

Supported APIs and integrations

Unity AI Gateway supports the following APIs and integrations:

Query endpoints with ai_query

You can use the ai_query function to query Azure Databricks-provided Unity AI Gateway endpoints directly from SQL or Python. This allows you to capture usage tracking information for your batch inference workloads.

Note

  • ai_query support for Unity AI Gateway is only available for Azure Databricks-provided endpoints (for example, databricks-gpt-5-4 or databricks-claude-sonnet-4). Endpoints that you create in Unity AI Gateway are not yet supported.
  • Only usage tracking applies to ai_query batch inference workloads. Other Unity AI Gateway features such as rate limits, guardrails, inference tables, and fallbacks do not apply.

To get started:

  1. Enable the Unity AI Gateway preview for your account. See Manage Azure Databricks previews.
  2. Query a Azure Databricks-provided endpoint using ai_query:
SELECT ai_query(
  'databricks-gpt-5-4',
  'Summarize the following text: ' || text_column
) AS summary
FROM my_table
LIMIT 10

Requests made through ai_query to Azure Databricks-provided endpoints are captured in the usage tracking system table (system.ai_gateway.usage). These requests also appear in the built-in usage dashboard.

For full ai_query syntax and parameter reference, see ai_query function. For best practices and supported models, see Use ai_query.

Query endpoints with unified APIs

Unified APIs offer an OpenAI-compatible interface to query models on Azure Databricks. Use unified APIs to seamlessly switch between models from different providers without changing your code.

MLflow Chat Completions API

MLflow Chat Completions API

Python

from openai import OpenAI
import os

DATABRICKS_TOKEN = os.environ.get('DATABRICKS_TOKEN')

client = OpenAI(
  api_key=DATABRICKS_TOKEN,
  base_url="https://<workspace-url>/ai-gateway/mlflow/v1"
)

chat_completion = client.chat.completions.create(
  messages=[
    {"role": "user", "content": "Hello!"},
    {"role": "assistant", "content": "Hello! How can I assist you today?"},
    {"role": "user", "content": "What is Databricks?"},
  ],
  model="<ai-gateway-endpoint>",
  max_tokens=256
)

print(chat_completion.choices[0].message.content)

REST API

curl \
  -u token:$DATABRICKS_TOKEN \
  -X POST \
  -H "Content-Type: application/json" \
  -d '{
    "model": "<ai-gateway-endpoint>",
    "max_tokens": 256,
    "messages": [
      {"role": "user", "content": "Hello!"},
      {"role": "assistant", "content": "Hello! How can I assist you today?"},
      {"role": "user", "content": "What is Databricks?"}
    ]
  }' \
  https://<workspace-url>/ai-gateway/mlflow/v1/chat/completions

Replace <workspace-url> with your Azure Databricks workspace URL and <ai-gateway-endpoint> with your Unity AI Gateway endpoint name.

MLflow Embeddings API

MLflow Embeddings API

Python

from openai import OpenAI
import os

DATABRICKS_TOKEN = os.environ.get('DATABRICKS_TOKEN')

client = OpenAI(
  api_key=DATABRICKS_TOKEN,
  base_url="https://<workspace-url>/ai-gateway/mlflow/v1"
)

embeddings = client.embeddings.create(
  input="What is Databricks?",
  model="<ai-gateway-endpoint>"
)

print(embeddings.data[0].embedding)

REST API

curl \
  -u token:$DATABRICKS_TOKEN \
  -X POST \
  -H "Content-Type: application/json" \
  -d '{
    "model": "<ai-gateway-endpoint>",
    "input": "What is Databricks?"
  }' \
  https://<workspace-url>/ai-gateway/mlflow/v1/embeddings

Replace <workspace-url> with your Azure Databricks workspace URL and <ai-gateway-endpoint> with your Unity AI Gateway endpoint name.

Supervisor API

Supervisor API

The Supervisor API (/mlflow/v1/responses) is an OpenResponses-compatible, provider-agnostic API for building agents in Beta. Account admins can enable access from the Previews page. See Manage Azure Databricks previews. Pick the best model for your agent use case across providers, without changing your code.

Python

from openai import OpenAI
import os

DATABRICKS_TOKEN = os.environ.get('DATABRICKS_TOKEN')

client = OpenAI(
  api_key=DATABRICKS_TOKEN,
  base_url="https://<workspace-url>/ai-gateway/mlflow/v1"
)

response = client.responses.create(
  model="<ai-gateway-endpoint>",
  input=[{"role": "user", "content": "What is Databricks?"}]
)

print(response.output_text)

REST API

curl \
  -u token:$DATABRICKS_TOKEN \
  -X POST \
  -H "Content-Type: application/json" \
  -d '{
    "model": "<ai-gateway-endpoint>",
    "input": [
      {"role": "user", "content": "What is Databricks?"}
    ]
  }' \
  https://<workspace-url>/ai-gateway/mlflow/v1/responses

Replace <workspace-url> with your Azure Databricks workspace URL and <ai-gateway-endpoint> with your Unity AI Gateway endpoint name.

Query endpoints with native APIs

Native APIs offer provider-specific interfaces to query models on Azure Databricks. Use native APIs to access the latest provider-specific features.

OpenAI Responses API

OpenAI Responses API

Python

from openai import OpenAI
import os

DATABRICKS_TOKEN = os.environ.get('DATABRICKS_TOKEN')

client = OpenAI(
  api_key=DATABRICKS_TOKEN,
  base_url="https://<workspace-url>/ai-gateway/openai/v1"
)

response = client.responses.create(
  model="<ai-gateway-endpoint>",
  max_output_tokens=256,
  input=[
    {
      "role": "user",
      "content": [{"type": "input_text", "text": "Hello!"}]
    },
    {
      "role": "assistant",
      "content": [{"type": "output_text", "text": "Hello! How can I assist you today?"}]
    },
    {
      "role": "user",
      "content": [{"type": "input_text", "text": "What is Databricks?"}]
    }
  ]
)

print(response.output)

REST API

curl \
  -u token:$DATABRICKS_TOKEN \
  -X POST \
  -H "Content-Type: application/json" \
  -d '{
    "model": "<ai-gateway-endpoint>",
    "max_output_tokens": 256,
    "input": [
      {
        "role": "user",
        "content": [{"type": "input_text", "text": "Hello!"}]
      },
      {
        "role": "assistant",
        "content": [{"type": "output_text", "text": "Hello! How can I assist you today?"}]
      },
      {
        "role": "user",
        "content": [{"type": "input_text", "text": "What is Databricks?"}]
      }
    ]
  }' \
  https://<workspace-url>/ai-gateway/openai/v1/responses

Replace <workspace-url> with your Azure Databricks workspace URL and <ai-gateway-endpoint> with your Unity AI Gateway endpoint name.

Anthropic Messages API

Anthropic Messages API

Python

import anthropic
import os

DATABRICKS_TOKEN = os.environ.get('DATABRICKS_TOKEN')

client = anthropic.Anthropic(
  api_key="unused",
  base_url="https://<workspace-url>/ai-gateway/anthropic",
  default_headers={
    "Authorization": f"Bearer {DATABRICKS_TOKEN}",
  },
)

message = client.messages.create(
  model="<ai-gateway-endpoint>",
  max_tokens=256,
  messages=[
    {"role": "user", "content": "Hello!"},
    {"role": "assistant", "content": "Hello! How can I assist you today?"},
    {"role": "user", "content": "What is Databricks?"},
  ],
)

print(message.content[0].text)

REST API

curl \
  -u token:$DATABRICKS_TOKEN \
  -X POST \
  -H "Content-Type: application/json" \
  -d '{
    "model": "<ai-gateway-endpoint>",
    "max_tokens": 256,
    "messages": [
      {"role": "user", "content": "Hello!"},
      {"role": "assistant", "content": "Hello! How can I assist you today?"},
      {"role": "user", "content": "What is Databricks?"}
    ]
  }' \
  https://<workspace-url>/ai-gateway/anthropic/v1/messages

Replace <workspace-url> with your Azure Databricks workspace URL and <ai-gateway-endpoint> with your Unity AI Gateway endpoint name.

Google Gemini API

Google Gemini API

Python

from google import genai
from google.genai import types
import os

DATABRICKS_TOKEN = os.environ.get('DATABRICKS_TOKEN')

client = genai.Client(
  api_key="databricks",
  http_options=types.HttpOptions(
    base_url="https://<workspace-url>/ai-gateway/gemini",
    headers={
      "Authorization": f"Bearer {DATABRICKS_TOKEN}",
    },
  ),
)

response = client.models.generate_content(
  model="<ai-gateway-endpoint>",
  contents=[
    types.Content(
      role="user",
      parts=[types.Part(text="Hello!")],
    ),
    types.Content(
      role="model",
      parts=[types.Part(text="Hello! How can I assist you today?")],
    ),
    types.Content(
      role="user",
      parts=[types.Part(text="What is Databricks?")],
    ),
  ],
  config=types.GenerateContentConfig(
    max_output_tokens=256,
  ),
)

print(response.text)

REST API

curl \
  -u token:$DATABRICKS_TOKEN \
  -X POST \
  -H "Content-Type: application/json" \
  -d '{
    "contents": [
      {
        "role": "user",
        "parts": [{"text": "Hello!"}]
      },
      {
        "role": "model",
        "parts": [{"text": "Hello! How can I assist you today?"}]
      },
      {
        "role": "user",
        "parts": [{"text": "What is Databricks?"}]
      }
    ],
    "generationConfig": {
      "maxOutputTokens": 256
    }
  }' \
  https://<workspace-url>/ai-gateway/gemini/v1beta/models/<ai-gateway-endpoint>:generateContent

Replace <workspace-url> with your Azure Databricks workspace URL and <ai-gateway-endpoint> with your Unity AI Gateway endpoint name.

Tag requests for usage tracking

You can attach custom key-value tags to individual requests using the Databricks-Ai-Gateway-Request-Tags HTTP header. Request tags are logged to the request_tags column in both the usage tracking system table and inference tables, enabling you to track costs, attribute usage, and filter analytics by project, team, environment, or any other dimension.

The header value must be a JSON object mapping string keys to string values. For example:

{ "project": "chatbot", "team": "ml-platform", "environment": "production" }

Use the extra_headers parameter (Python) or pass the header directly (REST API) to attach tags to a request:

Python (OpenAI SDK)

from openai import OpenAI
import json
import os

DATABRICKS_TOKEN = os.environ.get('DATABRICKS_TOKEN')

client = OpenAI(
  api_key=DATABRICKS_TOKEN,
  base_url="https://<workspace-url>/ai-gateway/mlflow/v1"
)

request_tags = {"project": "chatbot", "team": "ml-platform"}

chat_completion = client.chat.completions.create(
  messages=[
    {"role": "user", "content": "What is Databricks?"},
  ],
  model="<ai-gateway-endpoint>",
  max_tokens=256,
  extra_headers={
    "Databricks-Ai-Gateway-Request-Tags": json.dumps(request_tags)
  }
)

Python (Anthropic SDK)

import anthropic
import json
import os

DATABRICKS_TOKEN = os.environ.get('DATABRICKS_TOKEN')

request_tags = {"project": "chatbot", "team": "ml-platform"}

client = anthropic.Anthropic(
  api_key="unused",
  base_url="https://<workspace-url>/ai-gateway/anthropic",
  default_headers={
    "Authorization": f"Bearer {DATABRICKS_TOKEN}",
    "Databricks-Ai-Gateway-Request-Tags": json.dumps(request_tags),
  },
)

message = client.messages.create(
  model="<ai-gateway-endpoint>",
  max_tokens=256,
  messages=[
    {"role": "user", "content": "What is Databricks?"},
  ],
)

REST API

curl \
  -u token:$DATABRICKS_TOKEN \
  -X POST \
  -H "Content-Type: application/json" \
  -H 'Databricks-Ai-Gateway-Request-Tags: {"project": "chatbot", "team": "ml-platform"}' \
  -d '{
    "model": "<ai-gateway-endpoint>",
    "max_tokens": 256,
    "messages": [
      {"role": "user", "content": "What is Databricks?"}
    ]
  }' \
  https://<workspace-url>/ai-gateway/mlflow/v1/chat/completions

Replace <workspace-url> with your Azure Databricks workspace URL and <ai-gateway-endpoint> with your Unity AI Gateway endpoint name.

Next steps