如何使用批次合成文字轉換語音虛擬人偶

文章
09/11/2024

文字轉換語音虛擬人偶的批次合成 API 允許以影片檔形式將文字異步合成為交談虛擬人偶。發行者和影片內容平台可以利用此 API 在批次中建立虛擬人偶影片內容。這種方法適用於各種使用案例，例如訓練教材、簡報或廣告。

系統收到文字輸入之後，合成虛擬人偶影片將會以異步方式產生。產生的影片輸出可以在批次模式合成中下載。您可以提交合成文字、輪詢合成狀態，並在狀態指出成功時下載影片輸出。文字輸入格式必須是純文字或語音合成標記語言 (SSML) 文字。

此圖表提供工作流程的高階概觀。

若要執行批次合成，您可以使用下列 REST API 作業。

作業	方法	REST API 呼叫
建立批次合成	PUT	avatar/batchsyntheses/{SynthesisId}?api-version=2024-08-01
取得批次合成	GET	avatar/batchsyntheses/{SynthesisId}?api-version=2024-08-01
列出批次合成	GET	avatar/batchsyntheses/?api-version=2024-08-01
刪除批次合成	DELETE	avatar/batchsyntheses/{SynthesisId}?api-version=2024-08-01

您可以參考 GitHub 上的程式碼範例。

建立批次合成要求

當您建立新的批次合成作業時，需要 JSON 格式的某些屬性。其他屬性是選擇性的。批次合成回應包含其他屬性，以提供合成狀態和結果的相關資訊。例如，outputs.result 屬性包含的位置，您可以在其中下載包含虛擬人偶視訊的影片檔案。您可以從 outputs.summary 存取摘要和偵錯詳細資料。

若要提交批次合成要求，請遵循下列指示建構 HTTP POST 要求本文：

設定必要的 inputKind 屬性。
如果 inputKind 屬性設定為 PlainText，您也必須在 synthesisConfig 中設定 voice 屬性。在下列範例中， inputKind 會設定為 SSML，因此 speechSynthesis 不會設定。
設定必要的 SynthesisId 屬性。為相同的語音資源選擇唯一的 SynthesisId。 SynthesisId 可以是 3 到 64 個字元的字串 (括字母、數字、 '-' 或 '_')，條件是它必須以字母或數字開頭和結尾。
設定必要的 talkingAvatarCharacter 和 talkingAvatarStyle 屬性。您可以在這裡找到支援的虛擬人偶人物和風格。
您可以選擇性地設定 videoFormat、backgroundColor 及其他屬性。如需詳細資訊，請參閱批次合成屬性。

注意

可接受的 JSON 承載大小上限為 500 KB。

每個語音資源最多可以有 200 個批次合成作業同時執行。

輸出視訊的最大長度目前為 20 分鐘，未來可能會增加。

若要提出 HTTP PUT 要求，請使用下列範例所示的 URI 格式。使用語音資源索引鍵取代 YourSpeechKey，您的語音資源區域取代 YourSpeechRegion，並設定上述的要求本文屬性。

curl -v -X PUT -H "Ocp-Apim-Subscription-Key: YourSpeechKey" -H "Content-Type: application/json" -d '{
    "inputKind": "SSML",
    "inputs": [
        {
         "content": "<speak version='\''1.0'\'' xml:lang='\''en-US'\''><voice name='\''en-US-AvaMultilingualNeural'\''>The rainbow has seven colors.</voice></speak>"
        }
    ],
    "avatarConfig": {
        "talkingAvatarCharacter": "lisa",
        "talkingAvatarStyle": "graceful-sitting"
    }
}'  "https://YourSpeechRegion.api.cognitive.microsoft.com/avatar/batchsyntheses/my-job-01?api-version=2024-08-01"

您應該會收到下列格式的回應本文：

{
    "id": "my-job-01",
    "internalId": "5a25b929-1358-4e81-a036-33000e788c46",
    "status": "NotStarted",
    "createdDateTime": "2024-03-06T07:34:08.9487009Z",
    "lastActionDateTime": "2024-03-06T07:34:08.9487012Z",
    "inputKind": "SSML",
    "customVoices": {},
    "properties": {
        "timeToLiveInHours": 744,
    },
    "avatarConfig": {
        "talkingAvatarCharacter": "lisa",
        "talkingAvatarStyle": "graceful-sitting",
        "videoFormat": "Mp4",
        "videoCodec": "hevc",
        "subtitleType": "soft_embedded",
        "bitrateKbps": 2000,
        "customized": false
    }
}

status 屬性應會從 NotStarted 狀態進展到 Running，最後進展到 Succeeded 或 Failed。您可以定期呼叫 GET 批次合成 API，直到傳回的狀態變為 Succeeded 或 Failed 為止。

取得批次合成

若要擷取批次合成作業的狀態，請使用 URI 提出 HTTP GET 要求，如下列範例所示。

將 YourSynthesisId 取代為您的批次合成識別碼、將 YourSpeechKey 取代為您的語音資源索引鍵，並將 YourSpeechRegion 取代為您的語音資源區域取。

curl -v -X GET "https://YourSpeechRegion.api.cognitive.microsoft.com/avatar/batchsyntheses/YourSynthesisId?api-version=2024-08-01" -H "Ocp-Apim-Subscription-Key: YourSpeechKey"

您應該會收到下列格式的回應本文：

{
    "id": "my-job-01",
    "internalId": "5a25b929-1358-4e81-a036-33000e788c46",
    "status": "Succeeded",
    "createdDateTime": "2024-03-06T07:34:08.9487009Z",
    "lastActionDateTime": "2024-03-06T07:34:12.5698769",
    "inputKind": "SSML",
    "customVoices": {},
    "properties": {
        "timeToLiveInHours": 744,
        "sizeInBytes": 344460,
        "durationInMilliseconds": 2520,
        "succeededCount": 1,
        "failedCount": 0,
        "billingDetails": {
            "neuralCharacters": 29,
            "talkingAvatarDurationSeconds": 2
        }
    },
    "avatarConfig": {
        "talkingAvatarCharacter": "lisa",
        "talkingAvatarStyle": "graceful-sitting",
        "videoFormat": "Mp4",
        "videoCodec": "hevc",
        "subtitleType": "soft_embedded",
        "bitrateKbps": 2000,
        "customized": false
    },
    "outputs": {
        "result": "https://stttssvcprodusw2.blob.core.windows.net/batchsynthesis-output/xxxxx/xxxxx/0001.mp4?SAS_Token",
        "summary": "https://stttssvcprodusw2.blob.core.windows.net/batchsynthesis-output/xxxxx/xxxxx/summary.json?SAS_Token"
    }
}

從 outputs.result 欄位中，您可以下載包含虛擬人偶視訊的影片檔案。 outputs.summary 欄位可讓您下載摘要和偵錯詳細資訊。如需批次合成結果的詳細資訊，請參閱批次合成結果。

列出批次合成

若要列出語音資源的所有批次合成作業，請使用 URI 提出 HTTP GET 要求，如下列範例所示。

將 YourSpeechKey 取代為您的語音資源索引鍵，並將 YourSpeechRegion 取代為您的語音資源區域。您可以選擇性地在 URL 中設定 skip 和 top (頁面大小) 查詢參數。 skip 的預設值為 0，而 maxpagesize 的預設值為 100。

curl -v -X GET "https://YourSpeechRegion.api.cognitive.microsoft.com/avatar/batchsyntheses?skip=0&maxpagesize=2&api-version=2024-08-01" -H "Ocp-Apim-Subscription-Key: YourSpeechKey"

您會收到下列格式的回應本文：

{
    "value": [
        {
            "id": "my-job-02",
            "internalId": "14c25fcf-3cb6-4f46-8810-ecad06d956df",
            "status": "Succeeded",
            "createdDateTime": "2024-03-06T07:52:23.9054709Z",
            "lastActionDateTime": "2024-03-06T07:52:29.3416944",
            "inputKind": "SSML",
            "customVoices": {},
            "properties": {
                "timeToLiveInHours": 744,
                "sizeInBytes": 502676,
                "durationInMilliseconds": 2950,
                "succeededCount": 1,
                "failedCount": 0,
                "billingDetails": {
                    "neuralCharacters": 32,
                    "talkingAvatarDurationSeconds": 2
                }
            },
            "avatarConfig": {
                "talkingAvatarCharacter": "lisa",
                "talkingAvatarStyle": "casual-sitting",
                "videoFormat": "Mp4",
                "videoCodec": "h264",
                "subtitleType": "soft_embedded",
                "bitrateKbps": 2000,
                "customized": false
            },
            "outputs": {
                "result": "https://stttssvcprodusw2.blob.core.windows.net/batchsynthesis-output/xxxxx/xxxxx/0001.mp4?SAS_Token",
                "summary": "https://stttssvcprodusw2.blob.core.windows.net/batchsynthesis-output/xxxxx/xxxxx/summary.json?SAS_Token"
            }
        },
        {
            "id": "my-job-01",
            "internalId": "5a25b929-1358-4e81-a036-33000e788c46",
            "status": "Succeeded",
            "createdDateTime": "2024-03-06T07:34:08.9487009Z",
            "lastActionDateTime": "2024-03-06T07:34:12.5698769",
            "inputKind": "SSML",
            "customVoices": {},
            "properties": {
                "timeToLiveInHours": 744,
                "sizeInBytes": 344460,
                "durationInMilliseconds": 2520,
                "succeededCount": 1,
                "failedCount": 0,
                "billingDetails": {
                    "neuralCharacters": 29,
                    "talkingAvatarDurationSeconds": 2
                }
            },
            "avatarConfig": {
                "talkingAvatarCharacter": "lisa",
                "talkingAvatarStyle": "graceful-sitting",
                "videoFormat": "Mp4",
                "videoCodec": "hevc",
                "subtitleType": "soft_embedded",
                "bitrateKbps": 2000,
                "customized": false
            },
            "outputs": {
                "result": "https://stttssvcprodusw2.blob.core.windows.net/batchsynthesis-output/xxxxx/xxxxx/0001.mp4?SAS_Token",
                "summary": "https://stttssvcprodusw2.blob.core.windows.net/batchsynthesis-output/xxxxx/xxxxx/summary.json?SAS_Token"
            }
        }
    ],
    "nextLink": "https://YourSpeechRegion.api.cognitive.microsoft.com/avatar/batchsyntheses/?api-version=2024-08-01&skip=2&maxpagesize=2"
}

從 outputs.result，您可以下載包含虛擬人偶視訊的影片檔案。您可以從 outputs.summary 存取摘要和偵錯詳細資料。如需詳細資訊，請參閱批次合成結果。

JSON 回應中的 value 屬性會列出您的合成要求。此清單為分頁，頁面大小上限為 100。 nextLink 屬性會視需要提供，以取得編頁清單的下一頁。

取得批次合成結果檔案

取得具有「成功」status 的批次合成作業後，您可以下載影片輸出結果。使用來自取得批次合成回應的 outputs.result 屬性的 URL。

若要取得批次合成結果檔案，請使用 URI 來提出 HTTP GET 要求，如下列範例所示。以來自取得批次合成回應的 outputs.result 屬性的 URL 取代 YourOutputsResultUrl。以您的語音資源金鑰取代 YourSpeechKey。

curl -v -X GET "YourOutputsResultUrl" -H "Ocp-Apim-Subscription-Key: YourSpeechKey" > output.mp4

若要取得批次合成摘要檔案，請使用 URI 提出 HTTP GET 要求，如下列範例所示。以來自取得批次合成回應的 outputs.summary 屬性的 URL 取代 YourOutputsResultUrl。以您的語音資源金鑰取代 YourSpeechKey。

curl -v -X GET "YourOutputsSummaryUrl" -H "Ocp-Apim-Subscription-Key: YourSpeechKey" > summary.json

摘要檔案包含每個文字輸入的合成結果。以下是檔案 summary.json 範例：

{
  "jobID": "5a25b929-1358-4e81-a036-33000e788c46",
  "status": "Succeeded",
  "results": [
    {
      "texts": [
        "<speak version='1.0' xml:lang='en-US'><voice name='en-US-AvaMultilingualNeural'>The rainbow has seven colors.</voice></speak>"
      ],
      "status": "Succeeded",
      "videoFileName": "244a87c294b94ddeb3dbaccee8ffa7eb/5a25b929-1358-4e81-a036-33000e788c46/0001.mp4",
      "TalkingAvatarCharacter": "lisa",
      "TalkingAvatarStyle": "graceful-sitting"
    }
  ]
}

刪除批次合成

擷取音訊輸出結果且不再需要批次合成作業歷程記錄之後，您可以將其刪除。語音服務會將每個合成歷程記錄保留最多 31 天，或要求 timeToLiveInHours 屬性的持續時間，以較早者為。針對狀態為「Succeeded」或「Failed」的合成作業，自動刪除的日期和時間會計算為 lastActionDateTime 和 timeToLive 屬性的總和。

若要刪除批次合成作業，請使用下列 URI 格式提出 HTTP DELETE 要求。將 YourSynthesisId 取代為您的批次合成識別碼、將 YourSpeechKey 取代為您的語音資源索引鍵，並將 YourSpeechRegion 取代為您的語音資源區域取。

curl -v -X DELETE "https://YourSpeechRegion.api.cognitive.microsoft.com/avatar/batchsyntheses/YourSynthesisId?api-version=2024-08-01" -H "Ocp-Apim-Subscription-Key: YourSpeechKey"

如果刪除要求成功，回應標頭會包含 HTTP/1.1 204 No Content。

分享方式：