使用已啟用視覺功能的聊天模型

已啟用視覺功能的聊天模型是由 OpenAI 開發的大型多模態模型 (LMM)，可分析影像並提供文字回應，以回答有關影像的問題。它們同時納入自然語言處理和視覺理解。目前的視覺賦能模型有 o系列推理模型、GPT-5系列、GPT-4.1系列、GPT-4.5、GPT-4o系列。

已啟用視覺功能的模型可以回答有關您所上傳影像中存在的一般問題。

小提示

若要使用已啟用視覺功能的模型，您可以在已部署的支援模型上呼叫聊天完成 API。如果您不熟悉聊天完成 API，請參閱已啟用視覺功能的聊天操作指南。

呼叫聊天完成 API

下列命令顯示使用已啟用視覺功能的聊天模型搭配程式碼的最基本方式。如果這是您第一次以程式設計方式使用這些模型，建議您從使用影像聊天快速入門開始。

休息
Python（編程語言）

請傳送 POST 要求至 https://{RESOURCE_NAME}.openai.azure.com/openai/v1/chat/completions，其中

RESOURCE_NAME 是您 Azure OpenAI 資源的名稱

必要標頭：

Content-Type: application/json
api-key： {API_KEY}

本文: 下列是範例要求本文。格式與 GPT-4o 的聊天完成 API 相同，不同之處在於訊息內容可以是包含文字和影像的陣列（影像的有效可公開存取的 HTTP 或 HTTPS URL，或 base-64 編碼的影像）。

這很重要

記得設定一個 "max_tokens"、或 max_completion_tokens 值，否則傳回輸出將被切斷。

這很重要

上傳影像時，每個聊天要求以 10 個影像為限。

{
    "model": "MODEL-DEPLOYMENT-NAME",
    "messages": [ 
        {
            "role": "system", 
            "content": "You are a helpful assistant." 
        },
        {
            "role": "user", 
            "content": [
	            {
	                "type": "text",
	                "text": "Describe this picture:"
	            },
	            {
	                "type": "image_url",
	                "image_url": {
                        "url": "<image URL>"
                    }
                } 
           ] 
        }
    ],
    "max_tokens": 100, 
    "stream": false 
}

定義您的 Azure OpenAI base_url 和 api-key。

使用那些值建立用戶端物件。

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.getenv("AZURE_OPENAI_API_KEY"),
    base_url="https://YOUR-RESOURCE-NAME.openai.azure.com/openai/v1/",
)

然後呼叫用戶端的建立方法。下列程式碼顯示範例要求本文。格式與 GPT-4o 的聊天完成 API 相同，不同之處在於訊息內容可以是包含文字和影像的陣列 (不論是影像的有效 HTTP 或 HTTPS URL，或是 base-64 編碼的影像皆可)。

這很重要

記得設定一個 "max_tokens"、或 max_completion_tokens 值，否則傳回輸出將被切斷。

response = client.chat.completions.create(
    model="MODEL-DEPLOYMENT-NAME",
    messages=[
        { "role": "system", "content": "You are a helpful assistant." },
        { "role": "user", "content": [  
            { 
                "type": "text", 
                "text": "Describe this picture:" 
            },
            { 
                "type": "image_url",
                "image_url": {
                    "url": "<image URL>"
                }
            }
        ] } 
    ],
    max_tokens=2000 
)
print(response)

小提示

使用本機影像

如果您想要使用本機影像，您可以使用下列 Python 程式碼將它轉換成 base64，以便將其傳遞至 API。替代檔案轉換工具可在線上取得。

import base64
from mimetypes import guess_type

# Function to encode a local image into data URL 
def local_image_to_data_url(image_path):
    # Guess the MIME type of the image based on the file extension
    mime_type, _ = guess_type(image_path)
    if mime_type is None:
        mime_type = 'application/octet-stream'  # Default MIME type if none is found

    # Read and encode the image file
    with open(image_path, "rb") as image_file:
        base64_encoded_data = base64.b64encode(image_file.read()).decode('utf-8')

    # Construct the data URL
    return f"data:{mime_type};base64,{base64_encoded_data}"

# Example usage
image_path = '<path_to_image>'
data_url = local_image_to_data_url(image_path)
print("Data URL:", data_url)

當您的 base64 影像資料準備就緒時，您可以將它傳遞給要求本文中的 API，如下所示:

...
"type": "image_url",
"image_url": {
   "url": "data:image/jpeg;base64,<your_image_data>"
}
...

詳細參數設定

您可以選擇性地在 "detail" 欄位中定義"image_url"參數。選擇三個值 low之一、 high或 auto，以調整模型解譯和處理影像的方式。

auto 設定：預設設定。模型會根據影像輸入的大小來決定低或高。
low 設定: 模型不會啟用「high res」模式，而是處理解析度較低的 512x512 版本，進而產生更快速的回應，並在詳細資料不重要的案例中降低權杖的使用量。
high 設定: 模型會啟動「high res」模式。在此，模型一開始會檢視低解析度影像，然後從輸入影像產生詳細的 512x512 區段。每個區段都會使用兩倍的代幣預算，以便更詳細地解釋圖像。

您可以使用此範例所示的格式來設定值：

{ 
    "type": "image_url",
    "image_url": {
        "url": "<image URL>",
        "detail": "high"
    }
}

如需影像參數如何影響使用的權杖和價格的詳細資料，請參閱 - 什麼是 Azure OpenAI？影像語彙基元

輸出

API 回應應該看起來如下所示。

{
    "id": "chatcmpl-8VAVx58veW9RCm5K1ttmxU6Cm4XDX",
    "object": "chat.completion",
    "created": 1702439277,
    "model": "gpt-4o",
    "prompt_filter_results": [
        {
            "prompt_index": 0,
            "content_filter_results": {
                "hate": {
                    "filtered": false,
                    "severity": "safe"
                },
                "self_harm": {
                    "filtered": false,
                    "severity": "safe"
                },
                "sexual": {
                    "filtered": false,
                    "severity": "safe"
                },
                "violence": {
                    "filtered": false,
                    "severity": "safe"
                }
            }
        }
    ],
    "choices": [
        {
            "finish_reason":"stop",
            "index": 0,
            "message": {
                "role": "assistant",
                "content": "The picture shows an individual dressed in formal attire, which includes a black tuxedo with a black bow tie. There is an American flag on the left lapel of the individual's jacket. The background is predominantly blue with white text that reads \"THE KENNEDY PROFILE IN COURAGE AWARD\" and there are also visible elements of the flag of the United States placed behind the individual."
            },
            "content_filter_results": {
                "hate": {
                    "filtered": false,
                    "severity": "safe"
                },
                "self_harm": {
                    "filtered": false,
                    "severity": "safe"
                },
                "sexual": {
                    "filtered": false,
                    "severity": "safe"
                },
                "violence": {
                    "filtered": false,
                    "severity": "safe"
                }
            }
        }
    ],
    "usage": {
        "prompt_tokens": 1156,
        "completion_tokens": 80,
        "total_tokens": 1236
    }
}

每個回應都包含 "finish_reason" 欄位。可能有以下的值:

stop：API 傳回的完整模型輸出。
length：因為 max_tokens 輸入參數或模型的權杖限制，所以模型輸出不完整。
content_filter：由於內容篩選中的旗標而省略了內容。

輸出

您從模型收到的聊天回應現在應該包含影像的增強資訊，例如物件標籤和週框方塊，以及 OCR 結果。 API 回應應該看起來如下所示。

{
    "id": "chatcmpl-8UyuhLfzwTj34zpevT3tWlVIgCpPg",
    "object": "chat.completion",
    "created": 1702394683,
    "model": "gpt-4o",
    "choices":
    [
        {
            "finish_reason": {
                "type": "stop",
                "stop": "<|fim_suffix|>"
            },
            "index": 0,
            "message":
            {
                "role": "assistant",
                "content": "The image shows a close-up of an individual with dark hair and what appears to be a short haircut. The person has visible ears and a bit of their neckline. The background is a neutral light color, providing a contrast to the dark hair."
            }
        }
    ],
    "usage":
    {
        "prompt_tokens": 816,
        "completion_tokens": 49,
        "total_tokens": 865
    }
}

每個回應都包含 "finish_reason" 欄位。可能有以下的值:

stop：API 傳回的完整模型輸出。
length：因為 max_tokens 輸入參數或模型的權杖限制，所以模型輸出不完整。
content_filter：由於內容篩選中的旗標而省略了內容。

意見反應

此頁面對您有幫助嗎？

Last updated on 2025-11-08

共用方式為

使用已啟用視覺功能的聊天模型

呼叫聊天完成 API

使用本機影像

詳細參數設定

輸出

輸出

相關內容

意見反應

其他資源