你当前正在访问 Microsoft Azure Global Edition 技术文档网站。如果需要访问由世纪互联运营的 Microsoft Azure 中国技术文档网站，请访问 https://docs.azure.cn。

图像描述文字（版本 4.0）

项目
01/29/2024

Image Analysis 4.0 的图像描述文字可通过“描述文字”和“密集描述文字”功能实现。

“文字描述”会为所有图像内容生成一个句子说明。除了描述整个图像之外，“密集文字描述”还可以生成最多 10 个图像区域的单句说明，从而提供更多详细信息。密集描述文字还将返回所描述图像区域的边界框坐标。这两项功能都使用最新的基于佛罗伦萨的 AI 模型。

目前，图像文字描述仅提供英语版。

重要

图像分析 4.0 中的图像文字描述功能仅在以下 Azure 数据中心地区可用：美国东部、法国中部、韩国中部、北欧、东南亚、西欧、美国西部、东亚。必须使用位于其中一个区域的视觉资源，才能获取“描述文字”和“密集描述文字”功能的结果。

如果必须在这些区域之外使用视觉资源来生成图像描述文字，请使用在所有 Azure AI 视觉区域中都可用的图像分析 3.2。

使用 Vision Studio 快速轻松地在浏览器中试用图像字幕功能。

试用 Vision Studio

性别中立的描述文字

默认情况下，描述文字包含性别词（“男人”、“女人”、“男孩”、“女孩”）。可以选择在结果中将这些字词替换为“人”，并接收性别中立的描述文字。为此，可以在请求 URL 中将可选 API 请求参数 gender-neutral-caption 设置为 true。

以下 JSON 响应展示了分析 4.0 API 在基于视觉特征对示例图像进行描述时返回的内容。

Photo of a man pointing at a screen

"captions": [
    {
        "text": "a man pointing at a screen",
        "confidence": 0.4891590476036072
    }
]

以下 JSON 响应展示了分析 4.0 API 在为示例图像生成密集描述文字时所返回的内容。

Photo of a tractor on a farm

{
  "denseCaptionsResult": {
    "values": [
      {
        "text": "a man driving a tractor in a farm",
        "confidence": 0.535620927810669,
        "boundingBox": {
          "x": 0,
          "y": 0,
          "w": 850,
          "h": 567
        }
      },
      {
        "text": "a man driving a tractor in a field",
        "confidence": 0.5428450107574463,
        "boundingBox": {
          "x": 132,
          "y": 266,
          "w": 209,
          "h": 219
        }
      },
      {
        "text": "a blurry image of a tree",
        "confidence": 0.5139822363853455,
        "boundingBox": {
          "x": 147,
          "y": 126,
          "w": 76,
          "h": 131
        }
      },
      {
        "text": "a man riding a tractor",
        "confidence": 0.4799223840236664,
        "boundingBox": {
          "x": 206,
          "y": 264,
          "w": 64,
          "h": 97
        }
      },
      {
        "text": "a blue sky above a hill",
        "confidence": 0.35495415329933167,
        "boundingBox": {
          "x": 0,
          "y": 0,
          "w": 837,
          "h": 166
        }
      },
      {
        "text": "a tractor in a field",
        "confidence": 0.47338250279426575,
        "boundingBox": {
          "x": 0,
          "y": 243,
          "w": 838,
          "h": 311
        }
      }
    ]
  },
  "modelVersion": "2024-02-01",
  "metadata": {
    "width": 850,
    "height": 567
  }
}

使用 API

图像描述文字
密集文字描述

图像文字描述功能属于分析图像 API。将 Caption 包括在 features 查询参数中。然后，在获取完整 JSON 响应时，请分析 "captionResult" 部分内容的字符串。

图像描述文字（版本 4.0）

性别中立的描述文字

“文字描述”和“密集文字描述”示例

使用 API

后续步骤

其他资源