影像標題 (4.0 版)

文章
09/27/2024

影像分析 4.0 中的影像標題可透過 [標題] 與 [密集標題] 功能取得。

標題功能會為所有影像內容產生單句描述。除了描述整張影像之外，「密集字幕」還可以產生最多 10 個不同影像區域的單句描述來提供更多詳細資料。密集輔助字幕也會傳回所描述影像區域的週框方塊座標。這兩個功能都使用最新的 Florence 型 AI 模型。

影像標題建立功能僅適用於英文。

重要

影像分析 4.0 中的影像標題僅適用於特定 Azure 資料中心區域：請參閱區域可用性。您必須使用位於下列其中一個區域的 [Azure AI 視覺] 資源，從 [標題] 與 [密集標題] 功能取得結果。

如果您需要使用這些區域之外的 [視覺] 資源來產生影像標題，請使用所有 Azure AI 視覺區域中都可用的影像分析 3.2。

使用 Vision Studio 快速且輕鬆地在瀏覽器中試用影像字幕功能。

試用 Vision Studio

性別中性字幕

根據預設，所有標題都包含性別詞彙 (「男人」、「女人」、「男孩」與「女孩」)。您可以選擇在結果中以「人」取代這些詞彙，並接收性別中性字幕。若要這樣做，您可以在要求 URL 中將選擇性 API 要求參數 gender-neutral-caption 設定為 true。

下列 JSON 回應說明在根據視覺功能描述範例影像時，「影像分析 4.0 API」傳回的內容。

一個男人指向螢幕的相片

"captions": [
    {
        "text": "a man pointing at a screen",
        "confidence": 0.4891590476036072
    }
]

下列 JSON 回應說明「影像分析 4.0 API」為範例影像產生密集字幕時所傳回的內容。

農田中拖拉機的相片

{
  "denseCaptionsResult": {
    "values": [
      {
        "text": "a man driving a tractor in a farm",
        "confidence": 0.535620927810669,
        "boundingBox": {
          "x": 0,
          "y": 0,
          "w": 850,
          "h": 567
        }
      },
      {
        "text": "a man driving a tractor in a field",
        "confidence": 0.5428450107574463,
        "boundingBox": {
          "x": 132,
          "y": 266,
          "w": 209,
          "h": 219
        }
      },
      {
        "text": "a blurry image of a tree",
        "confidence": 0.5139822363853455,
        "boundingBox": {
          "x": 147,
          "y": 126,
          "w": 76,
          "h": 131
        }
      },
      {
        "text": "a man riding a tractor",
        "confidence": 0.4799223840236664,
        "boundingBox": {
          "x": 206,
          "y": 264,
          "w": 64,
          "h": 97
        }
      },
      {
        "text": "a blue sky above a hill",
        "confidence": 0.35495415329933167,
        "boundingBox": {
          "x": 0,
          "y": 0,
          "w": 837,
          "h": 166
        }
      },
      {
        "text": "a tractor in a field",
        "confidence": 0.47338250279426575,
        "boundingBox": {
          "x": 0,
          "y": 243,
          "w": 838,
          "h": 311
        }
      }
    ]
  },
  "modelVersion": "2024-02-01",
  "metadata": {
    "width": 850,
    "height": 567
  }
}

使用 API

影像字幕
密集字幕

影像字幕建立功能是分析影像 API 的一部分。在 features 查詢參數中包含 Caption。然後，當您取得完整的 JSON 回應時，請剖析 "captionResult" 區段內容的字串。

分享方式：

影像標題 (4.0 版)

性別中性字幕

字幕與密集字幕範例

使用 API

下一步

意見反映

更多資源