你当前正在访问 Microsoft Azure Global Edition 技术文档网站。如果需要访问由世纪互联运营的 Microsoft Azure 中国技术文档网站，请访问 https://docs.azure.cn。

快速入门：开始在 Azure AI Studio 中对图像和视频使用 GPT-4 Turbo with Vision

项目
09/03/2024

重要

本文中标记了“（预览版）”的项目目前为公共预览版。此预览版未提供服务级别协议，不建议将其用于生产工作负载。某些功能可能不受支持或者受限。有关详细信息，请参阅 Microsoft Azure 预览版补充使用条款。

按照本文的要求，开始通过 Azure AI Studio 来部署和测试 GPT-4 Turbo with Vision 模型。

GPT-4 Turbo with Vision 和 Azure AI 视觉提供的高级功能包括：

光学字符识别 (OCR)：从图像中提取文本，将其与用户的提示和图像相结合以扩展上下文。
对象接地：通过对象接地补充 GPT-4 Turbo with Vision 文本响应，并勾勒出输入图像中的突出对象。
视频提示：GPT-4 Turbo with Vision 可以通过检索与用户提示最相关的视频帧来回答问题。

使用 GPT-4 Turbo with Vision 和 Azure AI 视觉功能时，可能需要支付额外的使用费。

先决条件

Azure 订阅 - 免费创建订阅。
拥有 Azure 订阅后，请创建 Azure OpenAI 资源。
AI Studio 中心，其中添加了 Azure OpenAI 资源作为连接。

准备媒体

需要一个图像来完成图像快速入门。可以使用以下图像或任何其他可用的图像。

对于视频提示，需要一个长度短于 3 分钟的视频。

部署 GPT-4 Turbo with Vision 模型

登录到 Azure AI Studio，然后选择要使用的中心。
在左侧导航菜单中，选择“ AI 服务”。选择“试用 GPT-4 Turbo”面板。
在 gpt-4 页上，选择“部署”。在出现的窗口中，选择 Azure OpenAI 资源。选择 vision-preview 作为模型版本。
选择“部署”。
接下来，前往新模型的页面，然后选择“在操场中打开”。在聊天操场中，应在“部署”下拉列表中选择创建的 GPT-4 部署。

在此聊天会话中，你指示助手来帮助理解你输入的图像。

在“系统消息”选项卡上的“系统消息”文本框中，提供此提示来指导助手："You're an AI assistant that helps people find information."。可以根据图像或方案定制提示。
选择“应用更改”以保存更改。
在聊天会话窗格中，选择附件按钮，然后“上传图像”。选择图像。
在聊天字段中添加以下问题："Describe this image"，然后选择右箭头图标进行发送。
右箭头图标将替换为停止按钮。如果选择该按钮，助手会停止处理你的请求。对于本快速入门，请让助手完成其回复。
助手会使用图像的描述进行回复。
提出与图像分析相关的后续问题。你可以输入 "What should I highlight about this image to my insurance company?"。

你应会收到类似于此处所示内容的相关响应：

When reporting the incident to your insurance company, you should highlight the following key points from the image:  

1. **Location of Damage**: Clearly state that the front end of the car, particularly the driver's side, is damaged. Mention the crumpled hood, broken front bumper, and the damaged left headlight.  

2. **Point of Impact**: Indicate that the car has collided with a guardrail, which may suggest that no other vehicles were involved in the accident.  

3. **Condition of the Car**: Note that the damage seems to be concentrated on the front end, and there is no visible damage to the windshield or rear of the car from this perspective.  

4. **License Plate Visibility**: Mention that the license plate is intact and can be used for identification purposes.  

5. **Environment**: Report that the accident occurred near a roadside with a guardrail, possibly in a rural or semi-rural area, which might help in establishing the accident location and context.  

6. **Other Observations**: If there were any other circumstances or details not visible in the image that may have contributed to the accident, such as weather conditions, road conditions, or any other relevant information, be sure to include those as well.  

Remember to be factual and descriptive, avoiding speculation about the cause of the accident, as the insurance company will conduct its own investigation.

在此聊天会话中，你指示助手来帮助理解你输入的图像。试用增强型视觉模型的功能。

在聊天窗口左侧的“增强功能”窗格中，打开“视觉”选项。在出现的窗口中，选择 Azure 计算机视觉资源。
在“系统消息”选项卡上的“系统消息”文本框中，提供此提示来指导助手："You're an AI assistant that helps people find information."。可以根据图像或方案定制提示。选择“应用更改”以保存更改。
在聊天会话窗格中，选择附件按钮，然后“上传图像”。选择图像。
在聊天字段中添加以下问题："Describe this image"，然后选择右箭头图标进行发送。
右箭头图标将替换为停止按钮。如果选择该按钮，助手会停止处理你的请求。对于本快速入门，请让助手完成其回复。
助手会使用图像的描述进行回复。它将使用 Azure AI 视觉服务从上传的图像中提取更多详细信息。
提出与图像分析相关的后续问题。输入 "What should I highlight about this image to my insurance company?" ，然后选择向右箭头图标进行发送。

你应会收到类似于此处所示内容的相关响应：

When reporting the incident to your insurance company, you should highlight the following key points from the image:  

1. **Location of Damage**: Clearly state that the front end of the car, particularly the driver's side, is damaged. Mention the crumpled hood, broken front bumper, and the damaged left headlight.  

2. **Point of Impact**: Indicate that the car has collided with a guardrail, which may suggest that no other vehicles were involved in the accident.  

3. **Condition of the Car**: Note that the damage seems to be concentrated on the front end, and there is no visible damage to the windshield or rear of the car from this perspective.  

4. **License Plate Visibility**: Mention that the license plate is intact and can be used for identification purposes.  

5. **Environment**: Report that the accident occurred near a roadside with a guardrail, possibly in a rural or semi-rural area, which might help in establishing the accident location and context.  

6. **Other Observations**: If there were any other circumstances or details not visible in the image that may have contributed to the accident, such as weather conditions, road conditions, or any other relevant information, be sure to include those as well.  

Remember to be factual and descriptive, avoiding speculation about the cause of the accident, as the insurance company will conduct its own investigation.

在此聊天会话中，你将指示助手帮助理解你输入的视频。助手从视频中提取几帧并用它们来回答问题。

在聊天窗口左侧的“增强功能”窗格中，打开“视觉”选项。在出现的窗口中，选择 Azure 计算机视觉资源。
在“系统消息”选项卡上的“系统消息”文本框中，提供此提示来指导助手："You're an AI assistant that helps people find information."。可以根据图像或方案定制提示。
选择“应用更改”以保存更改。
在聊天会话窗格中，选择附件按钮，然后选择“上传视频”。选择视频。
输入文本提示，例如 "Provide details about this video"，然后选择向右箭头图标进行发送。
右箭头图标将替换为停止按钮。如果选择该按钮，助手会停止处理你的请求。对于本快速入门，请让助手完成其回复。
助手应使用视频的描述进行回复。
可随时提出与视频分析相关的后续问题。

限制

以下是视频提示增强功能的已知限制。

低分辨率：使用 GPT-4 Turbo with Vision 的“低分辨率”设置来分析帧，这可能会影响视频中小对象和文本识别的准确性。
“视频文件限制”：支持 MP4 和 MOV 文件类型。在 Azure AI Studio Playground 中，视频长度必须少于 3 分钟。使用 API 时没有此类限制。
提示限制：视频提示仅包含一个视频，不包含图像。在操场中，可以清除会话以尝试其他视频或图像。
有限的帧选择：目前系统从整个视频中选择 20 帧，这可能无法捕获所有关键时刻或细节。帧选择可以均匀地分布在视频中，也可以通过特定的视频检索查询进行集中，具体取决于提示。
语言支持：目前，系统主要支持英语，作为脚本的基础。脚本不提供有关歌词的准确信息。

查看和导出代码

在聊天会话中的任何时间点，你都可以启用聊天窗口顶部的“显示原始 JSON”开关来查看 JSON 格式的对话。快速入门聊天会话开始时如下所示：

[
	{
		"role": "system",
		"content": [
			"You are an AI assistant that helps people find information."
		]
	},
]

清理资源

为了避免产生不必要的 Azure 成本，如果不再需要在本快速入门中创建的资源，应该将其删除。若要管理资源，可以使用 Azure 门户。

后续步骤

创建项目
详细了解 Azure AI 视觉。
详细了解 Azure OpenAI 模型。

通过