Interrogare modelli audio e video

Questa pagina descrive come inviare input audio e video ai modelli di base Gemini su Azure Databricks. È possibile fornire media come un URL o come dati inline codificati in Base64 usando l'API Completamento chat o l'API Google Gemini.

Requisiti

Metodi di input

È possibile fornire input audio e video usando due metodi:

  • URL: passare un URL accessibile pubblicamente al file multimediale. Per i video, sono supportati anche gli URL di YouTube.
  • Dati inline Base64: codificare il file come stringa base64 e passarlo come URI dati (ad esempio, data:video/mp4;base64,<encoded_data>).

API di completamento per chat

L'API di completamento della chat consente di passare l'input video e audio. Usare i tipi di contenuto video_url e audio_url nell'array messages per passare input multimediali. Ogni elemento di contenuto include un url campo che accetta un URL Web o un URI dati base64.

Input video

Python

import os
import base64
from openai import OpenAI

DATABRICKS_TOKEN = os.environ.get('DATABRICKS_TOKEN')
DATABRICKS_BASE_URL = os.environ.get('DATABRICKS_BASE_URL')

client = OpenAI(
    api_key=DATABRICKS_TOKEN,
    base_url=DATABRICKS_BASE_URL
)

# Encode a local video file as base64
with open("video.mp4", "rb") as f:
    video_b64 = base64.standard_b64encode(f.read()).decode("utf-8")

response = client.chat.completions.create(
    model="databricks-gemini-3-1-pro",
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "Summarize what happens in these videos."},
            {
                "type": "video_url",
                "video_url": {"url": "https://example.com/sample-video.mp4"}
            },
            {
                "type": "video_url",
                "video_url": {"url": f"data:video/mp4;base64,{video_b64}"}
            },
        ]
    }],
    max_tokens=1024
)

print(response.choices[0].message.content)

REST API

curl \
-u token:$DATABRICKS_TOKEN \
-X POST \
-H "Content-Type: application/json" \
-d '{
  "messages": [{
    "role": "user",
    "content": [
      {"type": "text", "text": "Summarize what happens in these videos."},
      {
        "type": "video_url",
        "video_url": {"url": "https://example.com/sample-video.mp4"}
      },
      {
        "type": "video_url",
        "video_url": {"url": "data:video/mp4;base64,<base64_encoded_data>"}
      }
    ]
  }],
  "max_tokens": 1024
}' \
https://<workspace_host>.databricks.com/serving-endpoints/databricks-gemini-3-1-pro/invocations

Input audio

Python

import os
import base64
from openai import OpenAI

DATABRICKS_TOKEN = os.environ.get('DATABRICKS_TOKEN')
DATABRICKS_BASE_URL = os.environ.get('DATABRICKS_BASE_URL')

client = OpenAI(
    api_key=DATABRICKS_TOKEN,
    base_url=DATABRICKS_BASE_URL
)

# Encode a local audio file as base64
with open("audio.mp3", "rb") as f:
    audio_b64 = base64.standard_b64encode(f.read()).decode("utf-8")

response = client.chat.completions.create(
    model="databricks-gemini-3-1-pro",
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "Transcribe this audio and summarize the key points."},
            {
                "type": "audio_url",
                "audio_url": {"url": "https://example.com/sample-audio.mp3"}
            },
            {
                "type": "audio_url",
                "audio_url": {"url": f"data:audio/mp3;base64,{audio_b64}"}
            },
        ]
    }],
    max_tokens=1024
)

print(response.choices[0].message.content)

REST API

curl \
-u token:$DATABRICKS_TOKEN \
-X POST \
-H "Content-Type: application/json" \
-d '{
  "messages": [{
    "role": "user",
    "content": [
      {"type": "text", "text": "Transcribe this audio and summarize the key points."},
      {
        "type": "audio_url",
        "audio_url": {"url": "https://example.com/sample-audio.mp3"}
      },
      {
        "type": "audio_url",
        "audio_url": {"url": "data:audio/mp3;base64,<base64_encoded_data>"}
      }
    ]
  }],
  "max_tokens": 1024
}' \
https://<workspace_host>.databricks.com/serving-endpoints/databricks-gemini-3-1-pro/invocations

Google Gemini API

Usare l'API Google Gemini per passare i contenuti multimediali come inlineData (codificati in base64) o fileData (riferimento URL) all'interno della matrice parts.

Input video

Python

from google import genai
from google.genai import types
import base64
import os

DATABRICKS_TOKEN = os.environ.get('DATABRICKS_TOKEN')

client = genai.Client(
    api_key="databricks",
    http_options=types.HttpOptions(
        base_url="https://example.staging.cloud.databricks.com/serving-endpoints/gemini",
        headers={
            "Authorization": f"Bearer {DATABRICKS_TOKEN}",
        },
    ),
)

# Encode a local video file as base64
with open("video.mp4", "rb") as f:
    video_b64 = base64.standard_b64encode(f.read()).decode("utf-8")

response = client.models.generate_content(
    model="databricks-gemini-3-1-pro",
    contents=[
        types.Content(
            role="user",
            parts=[
                types.Part(text="Summarize what happens in these videos."),
                types.Part(
                    file_data=types.FileData(
                        mime_type="video/mp4",
                        file_uri="https://example.com/sample-video.mp4",
                    )
                ),
                types.Part(
                    inline_data=types.Blob(
                        mime_type="video/mp4",
                        data=video_b64,
                    )
                ),
            ],
        ),
    ],
    config=types.GenerateContentConfig(
        max_output_tokens=1024,
    ),
)

print(response.text)

REST API

curl \
-u token:$DATABRICKS_TOKEN \
-X POST \
-H "Content-Type: application/json" \
-d '{
  "contents": [{
    "role": "user",
    "parts": [
      {"text": "Summarize what happens in these videos."},
      {
        "fileData": {
          "mimeType": "video/mp4",
          "fileUri": "https://example.com/sample-video.mp4"
        }
      },
      {
        "inlineData": {
          "mimeType": "video/mp4",
          "data": "<base64_encoded_data>"
        }
      }
    ]
  }]
}' \
https://<workspace_host>.databricks.com/serving-endpoints/gemini/v1beta/models/databricks-gemini-3-1-pro:generateContent

Input audio

Python

from google import genai
from google.genai import types
import base64
import os

DATABRICKS_TOKEN = os.environ.get('DATABRICKS_TOKEN')

client = genai.Client(
    api_key="databricks",
    http_options=types.HttpOptions(
        base_url="https://example.staging.cloud.databricks.com/serving-endpoints/gemini",
        headers={
            "Authorization": f"Bearer {DATABRICKS_TOKEN}",
        },
    ),
)

# Encode a local audio file as base64
with open("audio.mp3", "rb") as f:
    audio_b64 = base64.standard_b64encode(f.read()).decode("utf-8")

response = client.models.generate_content(
    model="databricks-gemini-3-1-pro",
    contents=[
        types.Content(
            role="user",
            parts=[
                types.Part(text="Transcribe this audio and summarize the key points."),
                types.Part(
                    file_data=types.FileData(
                        mime_type="audio/mp3",
                        file_uri="https://example.com/sample-audio.mp3",
                    )
                ),
                types.Part(
                    inline_data=types.Blob(
                        mime_type="audio/mp3",
                        data=audio_b64,
                    )
                ),
            ],
        ),
    ],
    config=types.GenerateContentConfig(
        max_output_tokens=1024,
    ),
)

print(response.text)

REST API

curl \
-u token:$DATABRICKS_TOKEN \
-X POST \
-H "Content-Type: application/json" \
-d '{
  "contents": [{
    "role": "user",
    "parts": [
      {"text": "Transcribe this audio and summarize the key points."},
      {
        "fileData": {
          "mimeType": "audio/mp3",
          "fileUri": "https://example.com/sample-audio.mp3"
        }
      },
      {
        "inlineData": {
          "mimeType": "audio/mp3",
          "data": "<base64_encoded_data>"
        }
      }
    ]
  }]
}' \
https://<workspace_host>.databricks.com/serving-endpoints/gemini/v1beta/models/databricks-gemini-3-1-pro:generateContent

Modelli supportati

Gli input audio e video sono supportati nei seguenti modelli di base Gemini con tariffazione per token. Vedere Modelli di base ospitati in Databricks disponibili nelle API del modello di base per la disponibilità dell'area.

  • databricks-gemini-3-1-pro
  • databricks-gemini-3-pro
  • databricks-gemini-2-5-pro
  • databricks-gemini-3-1-flash-lite
  • databricks-gemini-3-flash
  • databricks-gemini-2-5-flash

Limitazioni

  • Gli input audio e video sono disponibili solo nei modelli di base gemini con pagamento per token. Gli endpoint di velocità effettiva con provisioning non sono supportati.
  • È possibile includere più input audio o video in una singola richiesta, ma file di grandi dimensioni aumentano la latenza e l'utilizzo dei token.

Risorse aggiuntive