Eseguire query sugli endpoint del Gateway di Intelligenza Artificiale Unity

Importante

Questa funzionalità è in versione beta. Gli amministratori dell'account possono controllare l'accesso a questa funzionalità dalla pagina Anteprime della console dell'account. Consultare Gestisci anteprime Azure Databricks.

Questa pagina descrive come eseguire query sugli endpoint del gateway di intelligenza artificiale Unity usando le API supportate.

Requisiti

Anteprima del Gateway di Intelligenza Artificiale Unity abilitata per l'account. Consultare Gestisci anteprime Azure Databricks.
Un'area di lavoro Azure Databricks in un'area Unity AI Gateway supportata.
Catalogo Unity abilitato per l'area di lavoro. Vedere Abilitare un'area di lavoro per il Catalogo Unity.

API e integrazioni supportate

Unity AI Gateway supporta le API e le integrazioni seguenti:

APIunificate: interfacce compatibili con OpenAI per eseguire query sui modelli in Azure Databricks. Passare facilmente da modelli da provider diversi senza modificare la modalità di esecuzione di query su ogni modello.
API native: interfacce specifiche del provider per accedere alle funzionalità più recenti specifiche del modello e del provider.
Agenti di coding: Integra i tuoi agenti di coding con Unity AI Gateway per aggiungere governance e monitoraggio centralizzati ai flussi di lavoro di sviluppo assistiti dall'AI. Vedere Integrazione con gli agenti di codifica.
Agenti nelle app di Databricks: creare e distribuire agenti di intelligenza artificiale in App Databricks che instradano il traffico LLM attraverso Unity AI Gateway. Vedere Passaggio 4. Governare l'uso degli LLM da parte degli agenti su Databricks Apps con Unity AI Gateway.

Interrogare gli endpoint con API unificate

Le API unificate offrono un'interfaccia compatibile con OpenAI per eseguire query sui modelli in Azure Databricks. Usare API unificate per passare facilmente da modelli a provider diversi senza modificare il codice.

API Completamenti Chat MLflow

API di completamento della chat di MLflow

Python

from openai import OpenAI
import os

DATABRICKS_TOKEN = os.environ.get('DATABRICKS_TOKEN')

client = OpenAI(
  api_key=DATABRICKS_TOKEN,
  base_url="https://<workspace-url>/ai-gateway/mlflow/v1"
)

chat_completion = client.chat.completions.create(
  messages=[
    {"role": "user", "content": "Hello!"},
    {"role": "assistant", "content": "Hello! How can I assist you today?"},
    {"role": "user", "content": "What is Databricks?"},
  ],
  model="<ai-gateway-endpoint>",
  max_tokens=256
)

print(chat_completion.choices[0].message.content)

REST API

curl \
  -u token:$DATABRICKS_TOKEN \
  -X POST \
  -H "Content-Type: application/json" \
  -d '{
    "model": "<ai-gateway-endpoint>",
    "max_tokens": 256,
    "messages": [
      {"role": "user", "content": "Hello!"},
      {"role": "assistant", "content": "Hello! How can I assist you today?"},
      {"role": "user", "content": "What is Databricks?"}
    ]
  }' \
  https://<workspace-url>/ai-gateway/mlflow/v1/chat/completions

Sostituire <workspace-url> con l'URL dell'area di lavoro Azure Databricks e <ai-gateway-endpoint> con il nome dell'endpoint del Gateway di Unity AI.

API Embeddings MLflow

API di incorporazioni MLflow

Python

from openai import OpenAI
import os

DATABRICKS_TOKEN = os.environ.get('DATABRICKS_TOKEN')

client = OpenAI(
  api_key=DATABRICKS_TOKEN,
  base_url="https://<workspace-url>/ai-gateway/mlflow/v1"
)

embeddings = client.embeddings.create(
  input="What is Databricks?",
  model="<ai-gateway-endpoint>"
)

print(embeddings.data[0].embedding)

REST API

curl \
  -u token:$DATABRICKS_TOKEN \
  -X POST \
  -H "Content-Type: application/json" \
  -d '{
    "model": "<ai-gateway-endpoint>",
    "input": "What is Databricks?"
  }' \
  https://<workspace-url>/ai-gateway/mlflow/v1/embeddings

Sostituire <workspace-url> con l'URL della tua area di lavoro Azure Databricks e <ai-gateway-endpoint> con il nome dell'endpoint del Gateway di Intelligenza Artificiale Unity.

Supervisor API

Supervisor API

L'API Supervisor (/mlflow/v1/responses) è un'API indipendente dal provider OpenResponses per la creazione di agenti in fase Beta. Gli amministratori dell'account possono abilitare l'accesso dalla pagina Anteprime . Consultare Gestisci anteprime Azure Databricks. Scegliere il modello migliore per il caso d'uso dell'agente tra provider, senza modificare il codice.

Python

from openai import OpenAI
import os

DATABRICKS_TOKEN = os.environ.get('DATABRICKS_TOKEN')

client = OpenAI(
  api_key=DATABRICKS_TOKEN,
  base_url="https://<workspace-url>/ai-gateway/mlflow/v1"
)

response = client.responses.create(
  model="<ai-gateway-endpoint>",
  input=[{"role": "user", "content": "What is Databricks?"}]
)

print(response.output_text)

REST API

curl \
  -u token:$DATABRICKS_TOKEN \
  -X POST \
  -H "Content-Type: application/json" \
  -d '{
    "model": "<ai-gateway-endpoint>",
    "input": [
      {"role": "user", "content": "What is Databricks?"}
    ]
  }' \
  https://<workspace-url>/ai-gateway/mlflow/v1/responses

Sostituire <workspace-url> con l'URL dell'area di lavoro Azure Databricks e <ai-gateway-endpoint> con il nome dell'endpoint del Gateway di Intelligenza Artificiale Unity.

Eseguire query sugli endpoint con API native

Le API native offrono interfacce specifiche del provider per eseguire query sui modelli in Azure Databricks. Usare le API native per accedere alle funzionalità più recenti specifiche del provider.

API Risposte OpenAI

API Risposte OpenAI

Python

from openai import OpenAI
import os

DATABRICKS_TOKEN = os.environ.get('DATABRICKS_TOKEN')

client = OpenAI(
  api_key=DATABRICKS_TOKEN,
  base_url="https://<workspace-url>/ai-gateway/openai/v1"
)

response = client.responses.create(
  model="<ai-gateway-endpoint>",
  max_output_tokens=256,
  input=[
    {
      "role": "user",
      "content": [{"type": "input_text", "text": "Hello!"}]
    },
    {
      "role": "assistant",
      "content": [{"type": "output_text", "text": "Hello! How can I assist you today?"}]
    },
    {
      "role": "user",
      "content": [{"type": "input_text", "text": "What is Databricks?"}]
    }
  ]
)

print(response.output)

REST API

curl \
  -u token:$DATABRICKS_TOKEN \
  -X POST \
  -H "Content-Type: application/json" \
  -d '{
    "model": "<ai-gateway-endpoint>",
    "max_output_tokens": 256,
    "input": [
      {
        "role": "user",
        "content": [{"type": "input_text", "text": "Hello!"}]
      },
      {
        "role": "assistant",
        "content": [{"type": "output_text", "text": "Hello! How can I assist you today?"}]
      },
      {
        "role": "user",
        "content": [{"type": "input_text", "text": "What is Databricks?"}]
      }
    ]
  }' \
  https://<workspace-url>/ai-gateway/openai/v1/responses

Sostituire <workspace-url> con l'URL dell'area di lavoro Azure Databricks e <ai-gateway-endpoint> con il nome dell'endpoint del gateway di intelligenza artificiale Unity.

Anthropic API Messages

API Messaggi Anthropic

Python

import anthropic
import os

DATABRICKS_TOKEN = os.environ.get('DATABRICKS_TOKEN')

client = anthropic.Anthropic(
  api_key="unused",
  base_url="https://<workspace-url>/ai-gateway/anthropic",
  default_headers={
    "Authorization": f"Bearer {DATABRICKS_TOKEN}",
  },
)

message = client.messages.create(
  model="<ai-gateway-endpoint>",
  max_tokens=256,
  messages=[
    {"role": "user", "content": "Hello!"},
    {"role": "assistant", "content": "Hello! How can I assist you today?"},
    {"role": "user", "content": "What is Databricks?"},
  ],
)

print(message.content[0].text)

REST API

curl \
  -u token:$DATABRICKS_TOKEN \
  -X POST \
  -H "Content-Type: application/json" \
  -d '{
    "model": "<ai-gateway-endpoint>",
    "max_tokens": 256,
    "messages": [
      {"role": "user", "content": "Hello!"},
      {"role": "assistant", "content": "Hello! How can I assist you today?"},
      {"role": "user", "content": "What is Databricks?"}
    ]
  }' \
  https://<workspace-url>/ai-gateway/anthropic/v1/messages

Sostituire <workspace-url> con l'URL dell'area di lavoro Azure Databricks e <ai-gateway-endpoint> con il nome dell'endpoint del gateway di Intelligenza Artificiale Unity.

Google Gemini API

Google Gemini API

Python

from google import genai
from google.genai import types
import os

DATABRICKS_TOKEN = os.environ.get('DATABRICKS_TOKEN')

client = genai.Client(
  api_key="databricks",
  http_options=types.HttpOptions(
    base_url="https://<workspace-url>/ai-gateway/gemini",
    headers={
      "Authorization": f"Bearer {DATABRICKS_TOKEN}",
    },
  ),
)

response = client.models.generate_content(
  model="<ai-gateway-endpoint>",
  contents=[
    types.Content(
      role="user",
      parts=[types.Part(text="Hello!")],
    ),
    types.Content(
      role="model",
      parts=[types.Part(text="Hello! How can I assist you today?")],
    ),
    types.Content(
      role="user",
      parts=[types.Part(text="What is Databricks?")],
    ),
  ],
  config=types.GenerateContentConfig(
    max_output_tokens=256,
  ),
)

print(response.text)

REST API

curl \
  -u token:$DATABRICKS_TOKEN \
  -X POST \
  -H "Content-Type: application/json" \
  -d '{
    "contents": [
      {
        "role": "user",
        "parts": [{"text": "Hello!"}]
      },
      {
        "role": "model",
        "parts": [{"text": "Hello! How can I assist you today?"}]
      },
      {
        "role": "user",
        "parts": [{"text": "What is Databricks?"}]
      }
    ],
    "generationConfig": {
      "maxOutputTokens": 256
    }
  }' \
  https://<workspace-url>/ai-gateway/gemini/v1beta/models/<ai-gateway-endpoint>:generateContent

Sostituire <workspace-url> con l'URL dell'area di lavoro Azure Databricks e <ai-gateway-endpoint> con il nome dell'endpoint del gateway Unity AI.

Passaggi successivi

API Supervisor (Beta): eseguire flussi di lavoro dell'agente a più turni con strumenti ospitati tramite /mlflow/v1/responses

Passaggio 4: Gestisci l'utilizzo di LLM dai tuoi agenti su Databricks Apps con Unity AI Gateway — instradare le chiamate LLM dagli agenti delle app Databricks tramite Unity AI Gateway
Monitorare l'utilizzo per gli endpoint del Gateway AI di Unity
Monitorare i modelli usando le tabelle di inferenza
Configurare i limiti di frequenza per gli endpoint del Gateway AI di Unity

Commenti e suggerimenti

Questa pagina è stata utile?

Last updated on 2026-05-03

Eseguire query sugli endpoint del Gateway di Intelligenza Artificiale Unity

Requisiti

API e integrazioni supportate

Interrogare gli endpoint con API unificate

API di completamento della chat di MLflow

Python

REST API

API di incorporazioni MLflow

Python

REST API

Supervisor API

Python

REST API

Eseguire query sugli endpoint con API native

API Risposte OpenAI

Python

REST API

API Messaggi Anthropic

Python

REST API

Google Gemini API

Python

REST API

Passaggi successivi

Commenti e suggerimenti

Risorse aggiuntive