快速入門：開始對 Azure AI Studio 中的影像及影片使用含有視覺功能的 GPT-4 Turbo

文章
09/03/2024

重要

本文中標示為 (預覽) 的項目目前處於公開預覽狀態。此預覽版本沒有服務等級協定，不建議將其用於生產工作負載。可能不支援特定功能，或可能已經限制功能。如需詳細資訊，請參閱 Microsoft Azure 預覽版增補使用條款。

您可以閱讀本文，以開始使用 Azure AI Studio 來部署及測試含有視覺功能的 GPT-4 Turbo 模型。

含有視覺功能的 GPT-4 Turbo 和 Azure AI 視覺可提供進階功能，包括：

光學字元辨識 (OCR)：從影像擷取文字並與使用者的提示和影像結合，以展開內容。
物件基礎：運用物件基礎彌補含有視覺功能的 GPT-4 Turbo 文字回應，並概述輸入影像中的主要物件。
影片提示：含有視覺功能的 GPT-4 Turbo 藉由擷取與使用者提示最相關的影片畫面來回答問題。

使用含有視覺功能的 GPT-4 Turbo 與 Azure AI 視覺功能時，可能需要支付額外的使用量費用。

必要條件

Azure 訂用帳戶 - 建立免費帳戶。
擁有 Azure 訂用帳戶之後，請建立 Azure OpenAI 資源。
已將 Azure OpenAI 資源新增為連線的 AI Studio 中樞。

準備媒體

您需要影像來完成影像快速入門。您可以使用此範例影像，或任何其他可用的影像。

針對影片提示，您需要準備長度在三分鐘以內的影片。

使用視覺模型部署 GPT-4 Turbo

登入 Azure AI Studio，然後選取您想要在其中工作的中樞。
在左側導覽功能表上，選取 [AI 服務]。選取 [試試看 GPT-4 Turbo] 面板。
在 gpt-4 頁面上，選取 [部署]。在出現的視窗中，選取您的 Azure OpenAI 資源。選取 vision-preview 作為模型版本。
選取部署。
接下來，移至新模型的頁面，然後選取 [在遊樂場中開啟]。在聊天遊樂場中，[部署] 下拉式清單中應該已選取您建立的 GPT-4 部署。

在此聊天工作階段中，您會指示助理協助了解輸入的影像。

在 [系統訊息] 索引標籤上的 [系統訊息] 文字框中，提供此提示以引導助理："You're an AI assistant that helps people find information." 您可以針對影像或案例量身打造提示。
選取 [套用變更] 以儲存變更。
在聊天工作階段窗格中，選取附件按鈕，然後選取 [上傳影像]。選擇您的影像。
在聊天欄位中新增下列問題："Describe this image"，然後選取向右箭號圖示以傳送。
向右箭號圖示會取代為 [停止] 按鈕。如果您選取該按鈕，助理便會停止處理您的要求。在這個快速入門中，請讓助理完成回覆。
助理會回覆提供影像的描述。
詢問與影像分析相關的後續問題。您可以輸入："What should I highlight about this image to my insurance company?"。

您應該會收到類似如下所示的相關回應：

When reporting the incident to your insurance company, you should highlight the following key points from the image:  

1. **Location of Damage**: Clearly state that the front end of the car, particularly the driver's side, is damaged. Mention the crumpled hood, broken front bumper, and the damaged left headlight.  

2. **Point of Impact**: Indicate that the car has collided with a guardrail, which may suggest that no other vehicles were involved in the accident.  

3. **Condition of the Car**: Note that the damage seems to be concentrated on the front end, and there is no visible damage to the windshield or rear of the car from this perspective.  

4. **License Plate Visibility**: Mention that the license plate is intact and can be used for identification purposes.  

5. **Environment**: Report that the accident occurred near a roadside with a guardrail, possibly in a rural or semi-rural area, which might help in establishing the accident location and context.  

6. **Other Observations**: If there were any other circumstances or details not visible in the image that may have contributed to the accident, such as weather conditions, road conditions, or any other relevant information, be sure to include those as well.  

Remember to be factual and descriptive, avoiding speculation about the cause of the accident, as the insurance company will conduct its own investigation.

在此聊天工作階段中，您會指示助理協助了解輸入的影像。試用擴增視覺模型的功能。

在聊天視窗左側的 [功能增強] 窗格中，開啟 [視覺] 的選項。在隨即顯示的視窗中，選取您的 Azure 電腦視覺資源。
在 [系統訊息] 索引標籤上的 [系統訊息] 文字框中，提供此提示以引導助理："You're an AI assistant that helps people find information." 您可以針對影像或案例量身打造提示。選取 [套用變更] 以儲存變更。
在聊天工作階段窗格中，選取附件按鈕，然後選取 [上傳影像]。選擇您的影像。
在聊天欄位中新增下列問題："Describe this image"，然後選取向右箭號圖示以傳送。
向右箭號圖示會取代為 [停止] 按鈕。如果您選取該按鈕，助理便會停止處理您的要求。在這個快速入門中，請讓助理完成回覆。
助理會回覆提供影像的描述。該功能還會使用 Azure AI 視覺服務，從您上傳的影像擷取更多詳細資料。
詢問與影像分析相關的後續問題。輸入："What should I highlight about this image to my insurance company?" ，然後選取向右箭號圖示以傳送。

您應該會收到類似如下所示的相關回應：

When reporting the incident to your insurance company, you should highlight the following key points from the image:  

1. **Location of Damage**: Clearly state that the front end of the car, particularly the driver's side, is damaged. Mention the crumpled hood, broken front bumper, and the damaged left headlight.  

2. **Point of Impact**: Indicate that the car has collided with a guardrail, which may suggest that no other vehicles were involved in the accident.  

3. **Condition of the Car**: Note that the damage seems to be concentrated on the front end, and there is no visible damage to the windshield or rear of the car from this perspective.  

4. **License Plate Visibility**: Mention that the license plate is intact and can be used for identification purposes.  

5. **Environment**: Report that the accident occurred near a roadside with a guardrail, possibly in a rural or semi-rural area, which might help in establishing the accident location and context.  

6. **Other Observations**: If there were any other circumstances or details not visible in the image that may have contributed to the accident, such as weather conditions, road conditions, or any other relevant information, be sure to include those as well.  

Remember to be factual and descriptive, avoiding speculation about the cause of the accident, as the insurance company will conduct its own investigation.

在此聊天工作階段中，您要指示助理協助了解輸入的影片。助理會從影片擷取數個畫面，用來回答您的問題。

在聊天視窗左側的 [功能增強] 窗格中，開啟 [視覺] 的選項。在隨即顯示的視窗中，選取您的 Azure 電腦視覺資源。
在 [系統訊息] 索引標籤上的 [系統訊息] 文字框中，提供此提示以引導助理："You're an AI assistant that helps people find information." 您可以針對影像或案例量身打造提示。
選取 [套用變更] 以儲存變更。
在聊天工作階段窗格中，選取附件按鈕，然後選取 [上傳影片]。選擇您的影片。
輸入文字提示，例如："Provide details about this video"，然後選取向右箭號圖示以傳送。
向右箭號圖示會取代為 [停止] 按鈕。如果您選取該按鈕，助理便會停止處理您的要求。在這個快速入門中，請讓助理完成回覆。
助理應該會回覆影片描述。
您可以隨意詢問關於影片分析的任何後續問題。

限制

以下是影片提示增強功能的已知限制。

低解析度：畫面分析是使用含有視覺功能的 GPT-4 Turbo 的「低解析度」設定，這可能會影響影片中小型物件和文字辨識的精確度。
影片檔案限制：支援 MP4 和 MOV 檔案類型。在 Azure AI Studio 遊樂場中，影片長度必須少於 3 分鐘。使用 API 時，沒有這類限制。
提示限制：影片提示只包含一段影片，沒有任何圖片。在遊樂場中，您可以清除工作階段，以嘗試處理另一個影片或影像。
有限的畫面選取：目前系統會從整部視片中選取 20 個畫面，因而可能無法擷取所有重要時刻或詳細資訊。畫面選取範圍可以平均分散到整部影片，或依特定影片擷取查詢集中在特定畫面，視提示而定。
語言支援：目前，系統主要支援英文作為文字記錄基礎。文字記錄無法提供有關歌曲歌詞的準確資訊。

檢視和匯出程式碼

在聊天工作階段的任意時間點，您都可以啟用聊天視窗頂端的 [顯示原始 JSON] 切換開關，以查看格式化為 JSON 的交談。以下是快速入門聊天工作階段開始時的樣子：

[
	{
		"role": "system",
		"content": [
			"You are an AI assistant that helps people find information."
		]
	},
]

清除資源

為了避免產生不必要的 Azure 費用，如果您不再需要在本快速入門中建立的資源，則應該加以刪除。若要管理資源，您可以使用 Azure 入口網站。

下一步

建立專案
深入了解 Azure AI 視覺。
深入了解 Azure OpenAI 模型。

分享方式：