Azure AI Video Indexer搭配LLM提示

2025-06-10

Azure AI Video Indexer 與大型語言模型 (LLMs) 進行整合。大型語言模型（LLM）是自然語言人工智慧模型，您可以用來詢問有關影片內容等方面的問題。提取 Azure AI Video Indexer 的見解，並將其轉換為適合大型語言模型（LLM）使用的提示格式。無需重新索引影片即可製作出準備好的影片格式。

您可以在雲端中使用 LLM 提示與 Azure AI 視訊索引器，或在您的資料中心使用由 Arc 啟用的 Azure AI 視訊索引器。

使用案例

生成影片摘要：您可以請求大型語言模型（LLM）生成整個影片或影片片段的摘要。您可以結合這些區段來建立多種類型的摘要，例如資訊摘要、預告或其他摘要，視您的需求而定。

可搜尋性：將影片內容轉換成基於文字的、可快速查詢的格式，您可以在影片內容中進行詳細的自然語言搜尋。它可以根據特定查詢，大幅改善大型影片庫內的可探索性。

內容創作：您可以從影片庫中查詢與特定情緒或事件相關的視頻片段。例如，您可以從影片系列中擷取有趣或悲傷的時刻，並用它們來製作精華片段或亮點。同樣地，您可以檢索與特定有興趣事件相關的時刻，例如「過去十年的地震」。

教育目的：從講課視頻中創建摘要，以便學生更容易回顧和理解材料。學生也可以提出與講座內容相關的具體問題。你可以參考影片中有討論到文章的確切部分，讓學習過程更加有效率。

互動體驗：您可以創建互動體驗，例如基於視頻的聊天機器人或虛擬助理，這些都能根據視頻內容回應用戶查詢。

運作方式

為了讓輸出準備好提示，視頻被劃分成符合視頻精髓和提示大小的連貫部分。根據 Azure AI 影片索引器的場景分段和其他洞察，將這些部分進行劃分。提示內容的結果會分段獨立整合並生成。例如：

見解

下表包含用於生成提示的洞察。

VI 深入解析	標籤與格式
影片標題	[影片標題] <影片標題>
物件偵測	[檢測到的對象] <對象 1>, <對象 2>, ...
標籤	[可視標籤] <標籤 1>, <標籤 2>, ...
光學字符識別 (OCR)	[OCR] <ocr cluster1><ocr cluster2> ...
文字記錄和講者	[Transcript] <說話者名稱>: <逐字稿內容>\n<說話者名稱>: <逐字稿內容>\n ...
臉孔	[已知人物] <人臉 1>， <人臉 2>， ...
音訊效果（AED）	[音效] < 效果 1>、<效果 2>、...
視頻中片段的位置	[標籤] [開場, 中間, 結束, 片尾字幕]

為影片創建提示內容

使用提示內容 API 對您的索引影片進行處理，以獲得每個片段的提示就緒格式。

注意事項

提示內容洞察取決於用來索引影片的特定預覽設置。

若要產生提示內容 API，請使用 POST 建立提示內容 API 要求。
若要檢視提示內容，請使用取得 PromptContent API 要求。

範例請求

使用你的 AVI 帳戶 ID 和影片 ID。

POST https://api.videoindexer.ai/trial/Accounts/{accountId}/Videos/{videoId}/PromptContent

範例回應

index
{
  "algoVersion": "2.0.0",
  "schemaVersion": "0.0.1",
  "partition": null,
  "name": "10_best_dressed_grammy",
  "sections": [
    {
      "id": 0,
      "start": "0:00:00",
      "end": "0:00:40.915875",
      "content": "[Video title] 10_best_dressed_grammy\n[Detected objects] necktie\n[Visual labels] human face, clothing, person, woman, suit, wedding dress, dress, indoor, wall, carpet, rug, fashion, lady, long hair, fashion accessory, fashion design\n[OCR] TROPHy, LIFE, SPECIAL, EDITION, news FEED, BY

 CLEVVER, CLEVVER, @NazPerez, BEST DRESSED CELEBS AT 2018 GRAMMYS\n[Transcript] Check out the 10 best dressed celebs from the 2018 Grammy Awards and don't forget to subscribe to our channel to get all the latest celebrity updates.\nFrom white roses to white hot looks, this year's Grammy Awards was a feast of fashion thanks to so many celebs bringing their A game to the show.\nSo let's kick off this list of the best dress from the red carpet, starting with Lady Gaga.\nGaga looked like a gothic Princess in her dramatic all black ball gown.\nThe Armani Preve dress featured A Lacy bodysuit and billowing black skirt with a huge train.\nAga's black heeled boots were also some of the highest we've ever seen, like ever, but we wouldn't expect anything less from Mama Monster.\nAnother look we love from the carpet was Anna Kendrick's sexy suit by Belmont."
    },
    {
      "id": 1,
      "start": "0:00:40.915875",
      "end": "0:01:17.202125",
      "content": "[Video title] 10_best_dressed_grammy\n[Detected objects] remote\n[Visual labels] human face, clothing, person, dress, carpet, rug, fashion, lady, furniture, female person, fashion model, model, haute couture, smile\n[OCR] TROPHy, LIFE, news FEED, BEST DRESSED CELEBS AT 2018 GRAMMYS, D CELEBS AT 2018 GRAMMYS, BEST DRESSED\n[Transcript] Anna gave the structured look a sexy feminine touch by wearing a Lacy strapless top underneath and some pale pink stilettos.\nHer suit may have said business, but her relaxed WAVY hairstyle said I came to get down.\nNext on our list is the literally red hot Camila Cabello.\nCamila was all glitzing glam in her strapless Vivian Westwood gown.\nThat humped her curves perfectly.\nCamila opted to wear her hair up and accessorized with some serious bling, but it's that plunging neckline that has this unable to look away.\nAnother look we loved came courtesy of Miley Cyrus, who absolutely slayed in this black velvet bodysuit.\nMiley looked beyond chic, from her classic Hollywood hairstyle to her glitter heels."
    },
}

查看工作狀態

完成這個提示工作需要幾分鐘。如果您想要檢查作業狀態，您可以使用取得作業狀態 API 要求。

使用關鍵影格以視覺方式提示 LLM

提示內容請求支持可以在提示中使用視覺輸入的語言模型。選擇 GPT-4V 模型時，您可以將關鍵影格作為提供給模型的提示的一部分。提示內容回應中傳回的影格代表影片的關鍵影格。對於影片中具有有限或沒有文字記錄的影片，或想要為語言模型提供更多內容以改善結果時，建議使用此功能。

建立並傳送提示內容要求

如先前所述，提示的文字內容位於 JSON 回應中。 JSON 回應中「框架」部分的每個字串都是主要畫面格的標識碼。使用取得視訊縮圖的 ThumbnailId 是來自提示內容的 FrameId。一旦你擁有文本內容和關鍵幀工件，你就可以將它們結合起來，作為你選擇的人工智慧模型的提示。

局限性

提示功能已針對包含盡可能多見解的影片進行優化。

Azure AI 影片索引器文件