Azure Databricks 上的結構化輸出

本文說明 Azure Databricks 上的結構化輸出，以及如何將其作為產生 AI 應用程式工作流程的一部分使用。結構化輸出可與支援結構化模型的 OpenAI 模型搭配使用。

什麼是結構化輸出？

結構化輸出可讓您從輸入數據產生 JSON 物件形式的結構化數據。您可以選擇產生文字、非結構化 JSON 物件，以及遵守特定 JSON 架構的 JSON 物件。使用 Foundation Model API、按字元付費及預設吞吐量端點提供的聊天模型支持結構化輸出。

Databricks 建議針對下列案例使用結構化輸出：

從大量檔擷取數據。例如，將產品檢閱意見反應識別並分類為負面、正面或中性。
需要以指定格式輸出的批次推論任務。
數據處理，例如將非結構化數據轉換成結構化數據。

使用結構化輸出

請在聊天要求中使用 response_format 來指定您的結構化輸出。請參閱基礎模型 REST API 參考。

以下是將研究論文數據擷取至特定 JSON 架構的範例。

import os
import json
from openai import OpenAI

DATABRICKS_TOKEN = os.environ.get('YOUR_DATABRICKS_TOKEN')
DATABRICKS_BASE_URL = os.environ.get('YOUR_DATABRICKS_BASE_URL')

client = OpenAI(
  api_key=DATABRICKS_TOKEN,
  base_url=DATABRICKS_BASE_URL
  )

response_format = {
      "type": "json_schema",
      "json_schema": {
        "name": "research_paper_extraction",
        "schema": {
          "type": "object",
          "properties": {
            "title": { "type": "string" },
            "authors": {
              "type": "array",
              "items": { "type": "string" }
            },
            "abstract": { "type": "string" },
            "keywords": {
              "type": "array",
              "items": { "type": "string" }
            }
          },
        },
        "strict": True
      }
    }

messages = [{
        "role": "system",
        "content": "You are an expert at structured data extraction. You will be given unstructured text from a research paper and should convert it into the given structure."
      },
      {
        "role": "user",
        "content": "..."
      }]

response = client.chat.completions.create(
    model="databricks-gpt-oss-20b",
    messages=messages,
    response_format=response_format
)

print(json.dumps(response.choices[0].message.model_dump()['content'], indent=2))

以下是 JSON 擷取的範例，但事先不知道 JSON 架構。

import os
import json
from openai import OpenAI

DATABRICKS_TOKEN = os.environ.get('YOUR_DATABRICKS_TOKEN')
DATABRICKS_BASE_URL = os.environ.get('YOUR_DATABRICKS_BASE_URL')

client = OpenAI(
  api_key=DATABRICKS_TOKEN,
  base_url=DATABRICKS_BASE_URL
  )

response_format = {
      "type": "json_object",
    }

messages = [
      {
        "role": "user",
        "content": "Extract the name, size, price, and color from this product description as a JSON object:\n<description>\nThe SmartHome Mini is a compact smart home assistant available in black or white for only $49.99. It's 5 inches wide.\n</description>"
      }]

response = client.chat.completions.create(
    model="databricks-gpt-oss-20b",
    messages=messages,
    response_format=response_format
)

print(json.dumps(response.choices[0].message.model_dump()['content'], indent=2))

JSON 架構

基礎模型 API 廣泛支援 OpenAI 接受的結構化輸出。不過，針對 JSON 架構定義使用更簡單的 JSON 架構會導致產生高品質的 JSON。為了促進更高品質的生成，基礎模型 API 僅支援 JSON 架構規格的子集。

不支援下列函式呼叫定義項：

使用正規表示式 pattern。
使用 anyOf、oneOf、allOf、prefixItems或 $ref進行複雜巢狀或架構組合及驗證。
除了 [type, “null”] 的特殊情況之外，這是類型清單，其中一個類型是有效的 JSON 類型，另一個是 "null"。

代幣使用

提示注入和其他技術用於提高結構化輸出的品質。這樣做會影響模型取用的輸入和輸出權杖的數目，進而產生計費影響。

限制

JSON 架構中指定的索引鍵數目上限為 64。
基礎模型 API 不會強制執行物件和陣列的長度或大小條件約束。
- 其中包括 maxProperties、minProperties 和 maxLength 等關鍵字。
大量巢狀 JSON 架構會導致品質降低。可能的話，請嘗試扁平化 JSON 架構以取得更好的結果。
Anthropic Claude 模型只能接受 json_schema 結構化輸出。不支援 json_object。

意見反應

此頁面對您有幫助嗎？

Last updated on 2025-10-28