快速入門：即時對話謄寫多重通道自動分段標記 (預覽)

發行項
10/16/2024

注意

此功能目前處於公開預覽。此預覽版是在沒有服務等級協定的情況下提供，不建議用於生產工作負載。可能不支援特定功能，或可能已經限制功能。如需詳細資訊，請參閱 Microsoft Azure 預覽版增補使用條款。

您可以利用對話謄寫多重通道自動分段，將音訊串流處理至語音服務，以新增、移除及識別多位參與者的能力來謄寫會議。您會先使用 REST API 建立每個參與者的語音簽章，然後使用語音簽章搭配語音 SDK 來謄寫會議。如需詳細資訊，請參閱對話謄寫概觀。

重要

對話謄寫多重通道自動分段標記 (預覽) 將於 2025 年 3 月 28 日淘汰。如需移轉至其他語音轉換文字功能的詳細資訊，請參閱移出對話謄寫多重通道自動分段標記。

限制

僅適用於下列訂用帳戶區域：centralus、eastasia、eastus、westeurope
需要 7 個麥克風的圓形多麥克風陣列。麥克風陣列應符合我們的規格。

注意

針對對話謄寫多重通道自動分段功能，請使用 MeetingTranscriber 而非 ConversationTranscriber，並使用 CreateMeetingAsync 而非 CreateConversationAsync。

必要條件

Azure 訂用帳戶。您可以免費建立一個訂用帳戶。
在 Azure 入口網站上建立語音資源。
取得語音資源金鑰和區域。部署語音資源之後，選取 [移至資源] 以檢視和管理索引鍵。

設定環境

您必須先安裝適用於 JavaScript 的語音 SDK，才能執行動作。如果您只想要安裝套件名稱，請執行 npm install microsoft-cognitiveservices-speech-sdk。如需引導式安裝指示，請參閱 SDK 安裝指南。

建立語音簽章

如果您想要註冊使用者設定檔，第一個步驟是建立會議參與者的語音簽章，以便將其識別為唯一的說話者。如果您不想要使用預先註冊的使用者設定檔來識別特定的參與者，此為非必要的步驟。

用來建立語音簽章的輸入 .wav 音訊檔案，在單一通道 (單聲道) 格式中必須是 16 位元、16-kHz 的取樣率。每個音訊樣本的建議長度為 30 秒到兩分鐘。太短的音訊樣本將會在辨識說話者時產生較低的正確性。 .wav 檔案應該是某個人員聲音的樣本，以便建立唯一的語音設定檔。

下列範例顯示如何使用 JavaScript 中的 REST API 來建立語音簽章。您必須將 subscriptionKey、region 和路徑插入至樣本 .wav 檔案。

const fs = require('fs');
const axios = require('axios');
const formData = require('form-data');
 
const subscriptionKey = 'your-subscription-key';
const region = 'your-region';
 
async function createProfile() {
    let form = new formData();
    form.append('file', fs.createReadStream('path-to-voice-sample.wav'));
    let headers = form.getHeaders();
    headers['Ocp-Apim-Subscription-Key'] = subscriptionKey;
 
    let url = `https://signature.${region}.cts.speech.microsoft.com/api/v1/Signature/GenerateVoiceSignatureFromFormData`;
    let response = await axios.post(url, form, { headers: headers });
    
    // get signature from response, serialize to json string
    return JSON.stringify(response.data.Signature);
}
 
async function main() {
    // use this voiceSignature string with meeting transcription calls below
    let voiceSignatureString = await createProfile();
    console.log(voiceSignatureString);
}
main();

執行此指令碼會傳回變數 voiceSignatureString 中的語音簽章字串。執行函式兩次，讓您有兩個字串可作為下面變數 voiceSignatureStringUser1 和 voiceSignatureStringUser2 的輸入使用。

注意

您只能使用 REST API 來建立語音簽章。

針對會議進行謄寫

下列範例程式碼示範如何為兩位說話者進行即時會議謄寫。其會假設您建立了每個說話者的語音簽章字串，如上所示。用實際資訊取代 subscriptionKey、region 和您想要謄寫之音訊的路徑 filepath。

如果您未使用預先註冊的使用者設定檔，則需要幾秒鐘的時間才能完成未知使用者的第一次辨識 speaker1、speaker2 等。

注意

請確定在您的應用程式中使用相同的 subscriptionKey 來建立簽章，否則您將會遇到錯誤。

範例程式碼會執行下列各項：

建立用於謄寫的推送串流，並將樣本 .wav 檔案寫入其中。
使用 createMeetingAsync() 建立 Meeting。
使用建構函式來建立 MeetingTranscriber。
將參與者新增至會議。字串 voiceSignatureStringUser1 和 voiceSignatureStringUser2 應該會以上述步驟的輸出形式來執行。
註冊至事件並開始謄寫。
如果您想要區分說話者而不提供語音樣本，請在會議謄寫概觀中啟用 DifferentiateGuestSpeakers 功能。

如果已啟用說話者辨識或區分，即使您已經收到 transcribed 結果，服務仍會透過累積的音訊資訊來評估它們。如果服務發現任何先前的結果獲指派不正確的 speakerId，則會再次傳送幾乎完全相同的 Transcribed 結果，其中只有 speakerId 和 UtteranceId 不同。 UtteranceId由於格式為 {index}_{speakerId}_{Offset} ，因此當您收到 transcribed 結果時，您可以使用 UtteranceId 來判斷目前的 transcribed 結果是否會更正先前的結果。您的用戶端或 UI 邏輯可以決定行為，例如覆寫先前的輸出，或忽略最新的結果。

(function() {
    "use strict";
    var sdk = require("microsoft-cognitiveservices-speech-sdk");
    var fs = require("fs");
    
    var subscriptionKey = "your-subscription-key";
    var region = "your-region";
    var filepath = "audio-file-to-transcribe.wav"; // 8-channel audio
    
    var speechTranslationConfig = sdk.SpeechTranslationConfig.fromSubscription(subscriptionKey, region);
    var audioConfig = sdk.AudioConfig.fromWavFileInput(fs.readFileSync(filepath));
    speechTranslationConfig.setProperty("ConversationTranscriptionInRoomAndOnline", "true");

    // en-us by default. Adding this code to specify other languages, like zh-cn.
    speechTranslationConfig.speechRecognitionLanguage = "en-US";
    
    // create meeting and transcriber
    var meeting = sdk.Meeting.createMeetingAsync(speechTranslationConfig, "myMeeting");
    var transcriber = new sdk.MeetingTranscriber(audioConfig);
    
    // attach the transcriber to the meeting
    transcriber.joinMeetingAsync(meeting,
    function () {
        // add first participant using voiceSignature created in enrollment step
        var user1 = sdk.Participant.From("user1@example.com", "en-us", voiceSignatureStringUser1);
        meeting.addParticipantAsync(user1,
        function () {
            // add second participant using voiceSignature created in enrollment step
            var user2 = sdk.Participant.From("user2@example.com", "en-us", voiceSignatureStringUser2);
            meeting.addParticipantAsync(user2,
            function () {
                transcriber.sessionStarted = function(s, e) {
                console.log("(sessionStarted)");
                };
                transcriber.sessionStopped = function(s, e) {
                console.log("(sessionStopped)");
                };
                transcriber.canceled = function(s, e) {
                console.log("(canceled)");
                };
                transcriber.transcribed = function(s, e) {
                console.log("(transcribed) text: " + e.result.text);
                console.log("(transcribed) speakerId: " + e.result.speakerId);
                };
    
                // begin meeting transcription
                transcriber.startTranscribingAsync(
                function () { },
                function (err) {
                    console.trace("err - starting transcription: " + err);
                });
        },
        function (err) {
            console.trace("err - adding user1: " + err);
        });
    },
    function (err) {
        console.trace("err - adding user2: " + err);
    });
    },
    function (err) {
    console.trace("err - " + err);
    });
}());

必要條件

Azure 訂用帳戶。您可以免費建立一個訂用帳戶。
在 Azure 入口網站上建立語音資源。
取得語音資源金鑰和區域。部署語音資源之後，選取 [移至資源] 以檢視和管理索引鍵。

設定環境

語音 SDK 可以 NuGet 套件的形式取得，並且實作 .NET Standard 2.0。您稍後會在此指南中安裝語音 SDK，但是請先檢查平台特定安裝指南以了解更多需求。

建立語音簽章

下列範例顯示如何使用 C# 中的 REST API 來建立語音簽章。您必須將 subscriptionKey、region 和路徑插入至樣本 .wav 檔案。

using System;
using System.IO;
using System.Net.Http;
using System.Runtime.Serialization;
using System.Threading.Tasks;
using Newtonsoft.Json;

[DataContract]
internal class VoiceSignature
{
    [DataMember]
    public string Status { get; private set; }

    [DataMember]
    public VoiceSignatureData Signature { get; private set; }

    [DataMember]
    public string Transcription { get; private set; }
}

[DataContract]
internal class VoiceSignatureData
{
    internal VoiceSignatureData()
    { }

    internal VoiceSignatureData(int version, string tag, string data)
    {
        this.Version = version;
        this.Tag = tag;
        this.Data = data;
    }

    [DataMember]
    public int Version { get; private set; }

    [DataMember]
    public string Tag { get; private set; }

    [DataMember]
    public string Data { get; private set; }
}

private static async Task<string> GetVoiceSignatureString()
{
    var subscriptionKey = "your-subscription-key";
    var region = "your-region";

    byte[] fileBytes = File.ReadAllBytes("path-to-voice-sample.wav");
    var content = new ByteArrayContent(fileBytes);
    var client = new HttpClient();
    client.DefaultRequestHeaders.Add("Ocp-Apim-Subscription-Key", subscriptionKey);
    var response = await client.PostAsync($"https://signature.{region}.cts.speech.microsoft.com/api/v1/Signature/GenerateVoiceSignatureFromByteArray", content);
    
    var jsonData = await response.Content.ReadAsStringAsync();
    var result = JsonConvert.DeserializeObject<VoiceSignature>(jsonData);
    return JsonConvert.SerializeObject(result.Signature);
}

執行函式 GetVoiceSignatureString() 會以正確的格式傳回語音簽章字串。執行函式兩次，讓您有兩個字串可作為下面變數 voiceSignatureStringUser1 和 voiceSignatureStringUser2 的輸入使用。

注意

您只能使用 REST API 來建立語音簽章。

針對會議進行謄寫

如果您未使用預先註冊的使用者設定檔，則需要幾秒鐘的時間才能完成未知使用者的第一次辨識 speaker1、speaker2 等。

注意

請確定在您的應用程式中使用相同的 subscriptionKey 來建立簽章，否則您將會遇到錯誤。

範例程式碼會執行下列各項：

從要謄寫的樣本 .wav 檔案建立 AudioConfig。
使用 CreateMeetingAsync() 建立 Meeting。
使用建構函式建立 MeetingTranscriber，並訂閱所需的事件。
將參與者新增至會議。字串 voiceSignatureStringUser1 和 voiceSignatureStringUser2 應該會以從函式 GetVoiceSignatureString() 的上述步驟的輸出形式來執行。
加入會議並開始謄寫。
如果您想要區分說話者而不提供語音樣本，請在會議謄寫概觀中啟用 DifferentiateGuestSpeakers 功能。

注意

AudioStreamReader 是可在 GitHub 上取得的協助程式類別。

如果已啟用說話者辨識或區分，即使您已經收到 Transcribed 結果，服務仍會透過累積的音訊資訊來評估它們。如果服務發現任何先前的結果獲指派不正確的 UserId，則會再次傳送幾乎完全相同的 Transcribed 結果，其中只有 UserId 和 UtteranceId 不同。 UtteranceId由於格式為 {index}_{UserId}_{Offset} ，因此當您收到 Transcribed 結果時，您可以使用 UtteranceId 來判斷目前的 Transcribed 結果是否會更正先前的結果。您的用戶端或 UI 邏輯可以決定行為，例如覆寫先前的輸出，或忽略最新的結果。

呼叫函式 TranscribeMeetingsAsync() 以開始會議謄寫。

using System;
using System.IO;
using System.Threading.Tasks;
using Microsoft.CognitiveServices.Speech;
using Microsoft.CognitiveServices.Speech.Audio;
using Microsoft.CognitiveServices.Speech.Transcription;

class TranscribeMeeting
{
    // all your other code

    public static async Task TranscribeMeetingsAsync(string voiceSignatureStringUser1, string voiceSignatureStringUser2)
    {
        var subscriptionKey = "your-subscription-key";
        var region = "your-region";
        var filepath = "audio-file-to-transcribe.wav";

        var config = SpeechConfig.FromSubscription(subscriptionKey, region);
        config.SetProperty("ConversationTranscriptionInRoomAndOnline", "true");

        // en-us by default. Adding this code to specify other languages, like zh-cn.
        // config.SpeechRecognitionLanguage = "zh-cn";
        var stopRecognition = new TaskCompletionSource<int>();

        using (var audioInput = AudioConfig.FromWavFileInput(filepath))
        {
            var meetingID = Guid.NewGuid().ToString();
            using (var meeting = await Meeting.CreateMeetingAsync(config, meetingID))
            {
                // create a meeting transcriber using audio stream input
                using (var meetingTranscriber = new MeetingTranscriber(audioInput))
                {
                    meetingTranscriber.Transcribing += (s, e) =>
                    {
                        Console.WriteLine($"TRANSCRIBING: Text={e.Result.Text} SpeakerId={e.Result.UserId}");
                    };

                    meetingTranscriber.Transcribed += (s, e) =>
                    {
                        if (e.Result.Reason == ResultReason.RecognizedSpeech)
                        {
                            Console.WriteLine($"TRANSCRIBED: Text={e.Result.Text} SpeakerId={e.Result.UserId}");
                        }
                        else if (e.Result.Reason == ResultReason.NoMatch)
                        {
                            Console.WriteLine($"NOMATCH: Speech could not be recognized.");
                        }
                    };

                    meetingTranscriber.Canceled += (s, e) =>
                    {
                        Console.WriteLine($"CANCELED: Reason={e.Reason}");

                        if (e.Reason == CancellationReason.Error)
                        {
                            Console.WriteLine($"CANCELED: ErrorCode={e.ErrorCode}");
                            Console.WriteLine($"CANCELED: ErrorDetails={e.ErrorDetails}");
                            Console.WriteLine($"CANCELED: Did you set the speech resource key and region values?");
                            stopRecognition.TrySetResult(0);
                        }
                    };

                    meetingTranscriber.SessionStarted += (s, e) =>
                    {
                        Console.WriteLine($"\nSession started event. SessionId={e.SessionId}");
                    };

                    meetingTranscriber.SessionStopped += (s, e) =>
                    {
                        Console.WriteLine($"\nSession stopped event. SessionId={e.SessionId}");
                        Console.WriteLine("\nStop recognition.");
                        stopRecognition.TrySetResult(0);
                    };

                    // Add participants to the meeting.
                    var speaker1 = Participant.From("User1", "en-US", voiceSignatureStringUser1);
                    var speaker2 = Participant.From("User2", "en-US", voiceSignatureStringUser2);
                    await meeting.AddParticipantAsync(speaker1);
                    await meeting.AddParticipantAsync(speaker2);

                    // Join to the meeting and start transcribing
                    await meetingTranscriber.JoinMeetingAsync(meeting);
                    await meetingTranscriber.StartTranscribingAsync().ConfigureAwait(false);

                    // waits for completion, then stop transcription
                    Task.WaitAny(new[] { stopRecognition.Task });
                    await meetingTranscriber.StopTranscribingAsync().ConfigureAwait(false);
                }
            }
        }
    }
}

必要條件

Azure 訂用帳戶。您可以免費建立一個訂用帳戶。
在 Azure 入口網站上建立語音資源。
取得語音資源金鑰和區域。部署語音資源之後，選取 [移至資源] 以檢視和管理索引鍵。

設定環境

您必須先安裝適用於 Python 的語音 SDK，才能執行動作。您可以執行 pip install azure-cognitiveservices-speech 從 PyPI 安裝語音 SDK。

建立語音簽章

下列範例顯示如何使用 Python 中的 REST API 來建立語音簽章。您必須將 subscriptionKey、region 和路徑插入至樣本 .wav 檔案。

import requests
from scipy.io.wavfile import read
import json

speech_key, service_region = "your-subscription-key", "your-region"
endpoint = f"https://signature.{service_region}.cts.speech.microsoft.com/api/v1/Signature/GenerateVoiceSignatureFromByteArray"

#Enrollment audio for each speaker. In this example, two speaker enrollment audio files are added.
enrollment_audio_speaker1 = "enrollment-audio-speaker1.wav"
enrollment_audio_speaker2 = "enrollment-audio-speaker2.wav"

def voice_data_converter(enrollment_audio):
  with open(enrollment_audio, "rb") as wav_file:
    input_wav = wav_file.read()
  return input_wav
  
def voice_signature_creator(endpoint, speech_key, enrollment_audio):
  data = voice_data_converter(enrollment_audio)
  headers = {"Ocp-Apim-Subscription-Key":speech_key}
  r = requests.post(url = endpoint,headers = headers, data = data)
  voice_signature_string = json.dumps(r.json()['Signature'])
  return voice_signature_string

voice_signature_user1 = voice_signature_creator(endpoint, speech_key, enrollment_audio_speaker1)
voice_signature_user2 = voice_signature_creator(endpoint, speech_key, enrollment_audio_speaker2)

您可以使用這兩個 voice_signature_string 作為稍後範例程式碼中變數 voice_signature_user1 和 voice_signature_user2 的輸入。

注意

您只能使用 REST API 來建立語音簽章。

針對會議進行謄寫

下列範例程式碼示範如何為兩位說話者進行即時會議謄寫。其會假設您建立了每個說話者的語音簽章字串，如先前所示。用實際資訊取代 subscriptionKey、region 和您想要謄寫之音訊的路徑 filepath。

如果您未使用預先註冊的使用者設定檔，則需要幾秒鐘的時間才能完成未知使用者的第一次辨識 speaker1、speaker2 等。

注意

請確定在您的應用程式中使用相同的 subscriptionKey 來建立簽章，否則您將會遇到錯誤。

以下是樣本的作用：

使用訂用帳戶資訊建立語音設定。
使用推送資料流建立音訊設定。
建立 MeetingTranscriber 並訂閱由會議謄寫所觸發的事件。
用於建立會議的會議識別碼。
將參與者新增至會議。字串 voiceSignatureStringUser1 和 voiceSignatureStringUser2 應該會以上述步驟的輸出形式來執行。
一次讀取整個波浪檔案，並將其串流至 SDK 並開始進行謄寫。
如果您想要區分說話者而不提供語音樣本，請您在會議謄寫概觀中啟用 DifferentiateGuestSpeakers 功能。

如果已啟用說話者辨識或區分，即使您已收到 transcribed 結果，服務仍會透過累積的音訊資訊來評估它們。如果服務發現任何先前的結果獲指派不正確的 speakerId，則會再次傳送幾乎完全相同的 Transcribed 結果，其中只有 speakerId 和 UtteranceId 不同。 UtteranceId由於格式為 {index}_{speakerId}_{Offset} ，因此當您收到 transcribed 結果時，您可以使用 UtteranceId 來判斷目前的 transcribed 結果是否會更正先前的結果。您的用戶端或 UI 邏輯可以決定行為，例如覆寫先前的輸出，或忽略最新的結果。

import azure.cognitiveservices.speech as speechsdk
import time
import uuid
from scipy.io import wavfile

speech_key, service_region="your-subscription-key","your-region"
meetingfilename= "audio-file-to-transcribe.wav" # 8 channel, 16 bits, 16kHz audio

def meeting_transcription():
    
    speech_config = speechsdk.SpeechConfig(subscription=speech_key, region=service_region)
    speech_config.set_property_by_name("ConversationTranscriptionInRoomAndOnline", "true")
    # If you want to differentiate speakers without providing voice samples, uncomment the following line.
    # speech_config.set_property_by_name("DifferentiateGuestSpeakers", "true")

    channels = 8
    bits_per_sample = 16
    samples_per_second = 16000
    
    wave_format = speechsdk.audio.AudioStreamFormat(samples_per_second, bits_per_sample, channels)
    stream = speechsdk.audio.PushAudioInputStream(stream_format=wave_format)
    audio_config = speechsdk.audio.AudioConfig(stream=stream)

    transcriber = speechsdk.transcription.MeetingTranscriber(audio_config)

    meeting_id = str(uuid.uuid4())
    meeting = speechsdk.transcription.Meeting(speech_config, meeting_id)
    done = False

    def stop_cb(evt: speechsdk.SessionEventArgs):
        """callback that signals to stop continuous transcription upon receiving an event `evt`"""
        print('CLOSING {}'.format(evt))
        nonlocal done
        done = True
        
    transcriber.transcribed.connect(lambda evt: print('TRANSCRIBED: {}'.format(evt)))
    transcriber.session_started.connect(lambda evt: print('SESSION STARTED: {}'.format(evt)))
    transcriber.session_stopped.connect(lambda evt: print('SESSION STOPPED {}'.format(evt)))
    transcriber.canceled.connect(lambda evt: print('CANCELED {}'.format(evt)))
    # stop continuous transcription on either session stopped or canceled events
    transcriber.session_stopped.connect(stop_cb)
    transcriber.canceled.connect(stop_cb)

    # Note user voice signatures are not required for speaker differentiation.
    # Use voice signatures when adding participants when more enhanced speaker identification is required.
    user1 = speechsdk.transcription.Participant("user1@example.com", "en-us", voice_signature_user1)
    user2 = speechsdk.transcription.Participant("user2@example.com", "en-us", voice_signature_user2)

    meeting.add_participant_async(user1).get()
    meeting.add_participant_async(user2).get()
    transcriber.join_meeting_async(meeting).get()
    transcriber.start_transcribing_async()
    
    sample_rate, wav_data = wavfile.read(meetingfilename)
    stream.write(wav_data.tobytes())
    stream.close()
    while not done:
        time.sleep(.5)

    transcriber.stop_transcribing_async()

共用方式為

快速入門：即時對話謄寫多重通道自動分段標記 (預覽)

限制

必要條件

設定環境

建立語音簽章

針對會議進行謄寫

必要條件

設定環境

建立語音簽章

針對會議進行謄寫

必要條件

設定環境

建立語音簽章

針對會議進行謄寫

意見反應

其他資源

共用方式為

快速入門：即時對話謄寫多重通道自動分段標記 (預覽)

限制

必要條件

設定環境

建立語音簽章

針對會議進行謄寫

必要條件

設定環境

建立語音簽章

針對會議進行謄寫

必要條件

設定環境

建立語音簽章

針對會議進行謄寫

相關內容

意見反應

其他資源