快速入門：即時會議謄寫

文章
08/05/2024

您可以透過將音訊串流處理至語音服務，以新增、移除及識別多位參與者的能力，來謄寫會議。您會先使用 REST API 建立每個參與者的語音簽章，然後使用語音簽章搭配語音 SDK 來謄寫會議。如需詳細資訊，請參閱會議謄寫概觀。

限制

僅適用於下列訂用帳戶區域：centralus、eastasia、eastus、westeurope
需要 7 個麥克風的圓形多麥克風陣列。麥克風陣列應符合我們的規格。

注意

適用於 C++、Java、Objective-C 和 Swift 的語音 SDK 支援會議謄寫，但目前尚未提供相對應的指南。

必要條件

Azure 訂用帳戶。您可以免費建立一個訂用帳戶。
在 Azure 入口網站上建立語音資源。
取得語音資源金鑰和區域。部署語音資源之後，選取 [移至資源] 以檢視和管理索引鍵。

設定環境

您必須先安裝適用於 JavaScript 的語音 SDK，才能執行動作。如果您只想要安裝套件名稱，請執行 npm install microsoft-cognitiveservices-speech-sdk。如需引導式安裝指示，請參閱 SDK 安裝指南。

建立語音簽章

如果您想要註冊使用者設定檔，第一個步驟是建立會議參與者的語音簽章，以便將其識別為唯一的說話者。如果您不想要使用預先註冊的使用者設定檔來識別特定的參與者，此為非必要的步驟。

用來建立語音簽章的輸入 .wav 音訊檔案，在單一通道 (單聲道) 格式中必須是 16 位元、16-kHz 的取樣率。每個音訊樣本的建議長度為 30 秒到兩分鐘。太短的音訊樣本將會在辨識說話者時產生較低的正確性。 .wav 檔案應該是某個人員聲音的樣本，以便建立唯一的語音設定檔。

下列範例顯示如何使用 JavaScript 中的 REST API 來建立語音簽章。您必須將 subscriptionKey、region 和路徑插入至樣本 .wav 檔案。

const fs = require('fs');
const axios = require('axios');
const formData = require('form-data');
 
const subscriptionKey = 'your-subscription-key';
const region = 'your-region';
 
async function createProfile() {
    let form = new formData();
    form.append('file', fs.createReadStream('path-to-voice-sample.wav'));
    let headers = form.getHeaders();
    headers['Ocp-Apim-Subscription-Key'] = subscriptionKey;
 
    let url = `https://signature.${region}.cts.speech.microsoft.com/api/v1/Signature/GenerateVoiceSignatureFromFormData`;
    let response = await axios.post(url, form, { headers: headers });
    
    // get signature from response, serialize to json string
    return JSON.stringify(response.data.Signature);
}
 
async function main() {
    // use this voiceSignature string with meeting transcription calls below
    let voiceSignatureString = await createProfile();
    console.log(voiceSignatureString);
}
main();

執行此指令碼會傳回變數 voiceSignatureString 中的語音簽章字串。執行函式兩次，讓您有兩個字串可作為下面變數 voiceSignatureStringUser1 和 voiceSignatureStringUser2 的輸入使用。

注意

您只能使用 REST API 來建立語音簽章。

針對會議進行謄寫

下列範例程式碼示範如何為兩位說話者進行即時會議謄寫。其會假設您已經建立每個說話者的語音簽章字串，如上所示。用實際資訊取代 subscriptionKey、region 和您想要謄寫之音訊的路徑 filepath。

如果您未使用預先註冊的使用者設定檔，則需要幾秒鐘的時間才能完成未知使用者的第一次辨識 speaker1、speaker2 等。

注意

請確定在您的應用程式中使用相同的 subscriptionKey 來建立簽章，否則您將會遇到錯誤。

範例程式碼會執行下列各項：

建立用於謄寫的推送串流，並將樣本 .wav 檔案寫入其中。
使用 createMeetingAsync() 建立 Meeting。
使用建構函式來建立 MeetingTranscriber。
將參與者新增至會議。字串 voiceSignatureStringUser1 和 voiceSignatureStringUser2 應該會以上述步驟的輸出形式來執行。
註冊至事件並開始謄寫。
如果您想要區分說話者而不提供語音樣本，請在會議謄寫概觀中啟用 DifferentiateGuestSpeakers 功能。

如果已啟用說話者辨識或區分，即使您已經收到 transcribed 結果，服務仍會透過累積的音訊資訊來評估它們。如果服務發現任何先前的結果獲指派不正確的 speakerId，則會再次傳送幾乎完全相同的 Transcribed 結果，其中只有 speakerId 和 UtteranceId 不同。 UtteranceId由於格式為 {index}_{speakerId}_{Offset} ，因此當您收到 transcribed 結果時，您可以使用 UtteranceId 來判斷目前的 transcribed 結果是否會更正先前的結果。您的用戶端或 UI 邏輯可以決定行為，例如覆寫先前的輸出，或忽略最新的結果。

(function() {
    "use strict";
    var sdk = require("microsoft-cognitiveservices-speech-sdk");
    var fs = require("fs");
    
    var subscriptionKey = "your-subscription-key";
    var region = "your-region";
    var filepath = "audio-file-to-transcribe.wav"; // 8-channel audio
    
    var speechTranslationConfig = sdk.SpeechTranslationConfig.fromSubscription(subscriptionKey, region);
    var audioConfig = sdk.AudioConfig.fromWavFileInput(fs.readFileSync(filepath));
    speechTranslationConfig.setProperty("ConversationTranscriptionInRoomAndOnline", "true");

    // en-us by default. Adding this code to specify other languages, like zh-cn.
    speechTranslationConfig.speechRecognitionLanguage = "en-US";
    
    // create meeting and transcriber
    var meeting = sdk.Meeting.createMeetingAsync(speechTranslationConfig, "myMeeting");
    var transcriber = new sdk.MeetingTranscriber(audioConfig);
    
    // attach the transcriber to the meeting
    transcriber.joinMeetingAsync(meeting,
    function () {
        // add first participant using voiceSignature created in enrollment step
        var user1 = sdk.Participant.From("user1@example.com", "en-us", voiceSignatureStringUser1);
        meeting.addParticipantAsync(user1,
        function () {
            // add second participant using voiceSignature created in enrollment step
            var user2 = sdk.Participant.From("user2@example.com", "en-us", voiceSignatureStringUser2);
            meeting.addParticipantAsync(user2,
            function () {
                transcriber.sessionStarted = function(s, e) {
                console.log("(sessionStarted)");
                };
                transcriber.sessionStopped = function(s, e) {
                console.log("(sessionStopped)");
                };
                transcriber.canceled = function(s, e) {
                console.log("(canceled)");
                };
                transcriber.transcribed = function(s, e) {
                console.log("(transcribed) text: " + e.result.text);
                console.log("(transcribed) speakerId: " + e.result.speakerId);
                };
    
                // begin meeting transcription
                transcriber.startTranscribingAsync(
                function () { },
                function (err) {
                    console.trace("err - starting transcription: " + err);
                });
        },
        function (err) {
            console.trace("err - adding user1: " + err);
        });
    },
    function (err) {
        console.trace("err - adding user2: " + err);
    });
    },
    function (err) {
    console.trace("err - " + err);
    });
}());

必要條件

Azure 訂用帳戶。您可以免費建立一個訂用帳戶。
在 Azure 入口網站上建立語音資源。
取得語音資源金鑰和區域。部署語音資源之後，選取 [移至資源] 以檢視和管理索引鍵。

設定環境

語音 SDK 可以 NuGet 套件的形式取得，並且實作 .NET Standard 2.0。您稍後會在此指南中安裝語音 SDK，但是請先檢查平台特定安裝指南以了解更多需求。

建立語音簽章

下列範例顯示如何使用 C# 中的 REST API 來建立語音簽章。您必須將 subscriptionKey、region 和路徑插入至樣本 .wav 檔案。

using System;
using System.IO;
using System.Net.Http;
using System.Runtime.Serialization;
using System.Threading.Tasks;
using Newtonsoft.Json;

[DataContract]
internal class VoiceSignature
{
    [DataMember]
    public string Status { get; private set; }

    [DataMember]
    public VoiceSignatureData Signature { get; private set; }

    [DataMember]
    public string Transcription { get; private set; }
}

[DataContract]
internal class VoiceSignatureData
{
    internal VoiceSignatureData()
    { }

    internal VoiceSignatureData(int version, string tag, string data)
    {
        this.Version = version;
        this.Tag = tag;
        this.Data = data;
    }

    [DataMember]
    public int Version { get; private set; }

    [DataMember]
    public string Tag { get; private set; }

    [DataMember]
    public string Data { get; private set; }
}

private static async Task<string> GetVoiceSignatureString()
{
    var subscriptionKey = "your-subscription-key";
    var region = "your-region";

    byte[] fileBytes = File.ReadAllBytes("path-to-voice-sample.wav");
    var content = new ByteArrayContent(fileBytes);
    var client = new HttpClient();
    client.DefaultRequestHeaders.Add("Ocp-Apim-Subscription-Key", subscriptionKey);
    var response = await client.PostAsync($"https://signature.{region}.cts.speech.microsoft.com/api/v1/Signature/GenerateVoiceSignatureFromByteArray", content);
    
    var jsonData = await response.Content.ReadAsStringAsync();
    var result = JsonConvert.DeserializeObject<VoiceSignature>(jsonData);
    return JsonConvert.SerializeObject(result.Signature);
}

執行函式 GetVoiceSignatureString() 會以正確的格式傳回語音簽章字串。執行函式兩次，讓您有兩個字串可作為下面變數 voiceSignatureStringUser1 和 voiceSignatureStringUser2 的輸入使用。

注意

您只能使用 REST API 來建立語音簽章。

針對會議進行謄寫

如果您未使用預先註冊的使用者設定檔，則需要幾秒鐘的時間才能完成未知使用者的第一次辨識 speaker1、speaker2 等。

注意

請確定在您的應用程式中使用相同的 subscriptionKey 來建立簽章，否則您將會遇到錯誤。

範例程式碼會執行下列各項：

從要謄寫的樣本 .wav 檔案建立 AudioConfig。
使用 CreateMeetingAsync() 建立 Meeting。
使用建構函式建立 MeetingTranscriber，並訂閱所需的事件。
將參與者新增至會議。字串 voiceSignatureStringUser1 和 voiceSignatureStringUser2 應該會以從函式 GetVoiceSignatureString() 的上述步驟的輸出形式來執行。
加入會議並開始謄寫。
如果您想要區分說話者而不提供語音樣本，請在會議謄寫概觀中啟用 DifferentiateGuestSpeakers 功能。

注意

AudioStreamReader 是可在 GitHub 上取得的協助程式類別。

如果已啟用說話者辨識或區分，即使您已經收到 Transcribed 結果，服務仍會透過累積的音訊資訊來評估它們。如果服務發現任何先前的結果獲指派不正確的 UserId，則會再次傳送幾乎完全相同的 Transcribed 結果，其中只有 UserId 和 UtteranceId 不同。 UtteranceId由於格式為 {index}_{UserId}_{Offset} ，因此當您收到 Transcribed 結果時，您可以使用 UtteranceId 來判斷目前的 Transcribed 結果是否會更正先前的結果。您的用戶端或 UI 邏輯可以決定行為，例如覆寫先前的輸出，或忽略最新的結果。

呼叫函式 TranscribeMeetingsAsync() 以開始會議謄寫。

using System;
using System.IO;
using System.Threading.Tasks;
using Microsoft.CognitiveServices.Speech;
using Microsoft.CognitiveServices.Speech.Audio;
using Microsoft.CognitiveServices.Speech.Transcription;

class TranscribeMeeting
{
    // all your other code

    public static async Task TranscribeMeetingsAsync(string voiceSignatureStringUser1, string voiceSignatureStringUser2)
    {
        var subscriptionKey = "your-subscription-key";
        var region = "your-region";
        var filepath = "audio-file-to-transcribe.wav";

        var config = SpeechConfig.FromSubscription(subscriptionKey, region);
        config.SetProperty("ConversationTranscriptionInRoomAndOnline", "true");

        // en-us by default. Adding this code to specify other languages, like zh-cn.
        // config.SpeechRecognitionLanguage = "zh-cn";
        var stopRecognition = new TaskCompletionSource<int>();

        using (var audioInput = AudioConfig.FromWavFileInput(filepath))
        {
            var meetingID = Guid.NewGuid().ToString();
            using (var meeting = await Meeting.CreateMeetingAsync(config, meetingID))
            {
                // create a meeting transcriber using audio stream input
                using (var meetingTranscriber = new MeetingTranscriber(audioInput))
                {
                    meetingTranscriber.Transcribing += (s, e) =>
                    {
                        Console.WriteLine($"TRANSCRIBING: Text={e.Result.Text} SpeakerId={e.Result.UserId}");
                    };

                    meetingTranscriber.Transcribed += (s, e) =>
                    {
                        if (e.Result.Reason == ResultReason.RecognizedSpeech)
                        {
                            Console.WriteLine($"TRANSCRIBED: Text={e.Result.Text} SpeakerId={e.Result.UserId}");
                        }
                        else if (e.Result.Reason == ResultReason.NoMatch)
                        {
                            Console.WriteLine($"NOMATCH: Speech could not be recognized.");
                        }
                    };

                    meetingTranscriber.Canceled += (s, e) =>
                    {
                        Console.WriteLine($"CANCELED: Reason={e.Reason}");

                        if (e.Reason == CancellationReason.Error)
                        {
                            Console.WriteLine($"CANCELED: ErrorCode={e.ErrorCode}");
                            Console.WriteLine($"CANCELED: ErrorDetails={e.ErrorDetails}");
                            Console.WriteLine($"CANCELED: Did you set the speech resource key and region values?");
                            stopRecognition.TrySetResult(0);
                        }
                    };

                    meetingTranscriber.SessionStarted += (s, e) =>
                    {
                        Console.WriteLine($"\nSession started event. SessionId={e.SessionId}");
                    };

                    meetingTranscriber.SessionStopped += (s, e) =>
                    {
                        Console.WriteLine($"\nSession stopped event. SessionId={e.SessionId}");
                        Console.WriteLine("\nStop recognition.");
                        stopRecognition.TrySetResult(0);
                    };

                    // Add participants to the meeting.
                    var speaker1 = Participant.From("User1", "en-US", voiceSignatureStringUser1);
                    var speaker2 = Participant.From("User2", "en-US", voiceSignatureStringUser2);
                    await meeting.AddParticipantAsync(speaker1);
                    await meeting.AddParticipantAsync(speaker2);

                    // Join to the meeting and start transcribing
                    await meetingTranscriber.JoinMeetingAsync(meeting);
                    await meetingTranscriber.StartTranscribingAsync().ConfigureAwait(false);

                    // waits for completion, then stop transcription
                    Task.WaitAny(new[] { stopRecognition.Task });
                    await meetingTranscriber.StopTranscribingAsync().ConfigureAwait(false);
                }
            }
        }
    }
}

必要條件

Azure 訂用帳戶。您可以免費建立一個訂用帳戶。
在 Azure 入口網站上建立語音資源。
取得語音資源金鑰和區域。部署語音資源之後，選取 [移至資源] 以檢視和管理索引鍵。

設定環境

您必須先安裝適用於 Python 的語音 SDK，才能執行動作。您可以執行 pip install azure-cognitiveservices-speech 從 PyPI 安裝語音 SDK。

建立語音簽章

下列範例顯示如何使用 Python 中的 REST API 來建立語音簽章。您必須將 subscriptionKey、region 和路徑插入至樣本 .wav 檔案。

import requests
from scipy.io.wavfile import read
import json

speech_key, service_region = "your-subscription-key", "your-region"
endpoint = f"https://signature.{service_region}.cts.speech.microsoft.com/api/v1/Signature/GenerateVoiceSignatureFromByteArray"

#Enrollment audio for each speaker. In this example, two speaker enrollment audio files are added.
enrollment_audio_speaker1 = "enrollment-audio-speaker1.wav"
enrollment_audio_speaker2 = "enrollment-audio-speaker2.wav"

def voice_data_converter(enrollment_audio):
  with open(enrollment_audio, "rb") as wav_file:
    input_wav = wav_file.read()
  return input_wav
  
def voice_signature_creator(endpoint, speech_key, enrollment_audio):
  data = voice_data_converter(enrollment_audio)
  headers = {"Ocp-Apim-Subscription-Key":speech_key}
  r = requests.post(url = endpoint,headers = headers, data = data)
  voice_signature_string = json.dumps(r.json()['Signature'])
  return voice_signature_string

voice_signature_user1 = voice_signature_creator(endpoint, speech_key, enrollment_audio_speaker1)
voice_signature_user2 = voice_signature_creator(endpoint, speech_key, enrollment_audio_speaker2)

您可以使用這兩個 voice_signature_string 作為稍後範例程式碼中變數 voice_signature_user1 和 voice_signature_user2 的輸入。

注意

您只能使用 REST API 來建立語音簽章。

針對會議進行謄寫

如果您未使用預先註冊的使用者設定檔，則需要幾秒鐘的時間才能完成未知使用者的第一次辨識 speaker1、speaker2 等。

注意

請確定在您的應用程式中使用相同的 subscriptionKey 來建立簽章，否則您將會遇到錯誤。

以下是樣本的作用：

使用訂用帳戶資訊建立語音設定。
使用推送資料流建立音訊設定。
建立 MeetingTranscriber 並訂閱由會議謄寫所觸發的事件。
用於建立會議的會議識別碼。
將參與者新增至會議。字串 voiceSignatureStringUser1 和 voiceSignatureStringUser2 應該會以上述步驟的輸出形式來執行。
一次讀取整個波浪檔案，並將其串流至 SDK 並開始進行謄寫。
如果您想要區分說話者而不提供語音樣本，請您在會議謄寫概觀中啟用 DifferentiateGuestSpeakers 功能。

import azure.cognitiveservices.speech as speechsdk
import time
import uuid
from scipy.io import wavfile

speech_key, service_region="your-subscription-key","your-region"
meetingfilename= "audio-file-to-transcribe.wav" # 8 channel, 16 bits, 16kHz audio

def meeting_transcription():
    
    speech_config = speechsdk.SpeechConfig(subscription=speech_key, region=service_region)
    speech_config.set_property_by_name("ConversationTranscriptionInRoomAndOnline", "true")
    # If you want to differentiate speakers without providing voice samples, uncomment the following line.
    # speech_config.set_property_by_name("DifferentiateGuestSpeakers", "true")

    channels = 8
    bits_per_sample = 16
    samples_per_second = 16000
    
    wave_format = speechsdk.audio.AudioStreamFormat(samples_per_second, bits_per_sample, channels)
    stream = speechsdk.audio.PushAudioInputStream(stream_format=wave_format)
    audio_config = speechsdk.audio.AudioConfig(stream=stream)

    transcriber = speechsdk.transcription.MeetingTranscriber(audio_config)

    meeting_id = str(uuid.uuid4())
    meeting = speechsdk.transcription.Meeting(speech_config, meeting_id)
    done = False

    def stop_cb(evt: speechsdk.SessionEventArgs):
        """callback that signals to stop continuous transcription upon receiving an event `evt`"""
        print('CLOSING {}'.format(evt))
        nonlocal done
        done = True
        
    transcriber.transcribed.connect(lambda evt: print('TRANSCRIBED: {}'.format(evt)))
    transcriber.session_started.connect(lambda evt: print('SESSION STARTED: {}'.format(evt)))
    transcriber.session_stopped.connect(lambda evt: print('SESSION STOPPED {}'.format(evt)))
    transcriber.canceled.connect(lambda evt: print('CANCELED {}'.format(evt)))
    # stop continuous transcription on either session stopped or canceled events
    transcriber.session_stopped.connect(stop_cb)
    transcriber.canceled.connect(stop_cb)

    # Note user voice signatures are not required for speaker differentiation.
    # Use voice signatures when adding participants when more enhanced speaker identification is required.
    user1 = speechsdk.transcription.Participant("user1@example.com", "en-us", voice_signature_user1)
    user2 = speechsdk.transcription.Participant("user2@example.com", "en-us", voice_signature_user2)

    meeting.add_participant_async(user1).get()
    meeting.add_participant_async(user2).get()
    transcriber.join_meeting_async(meeting).get()
    transcriber.start_transcribing_async()
    
    sample_rate, wav_data = wavfile.read(meetingfilename)
    stream.write(wav_data.tobytes())
    stream.close()
    while not done:
        time.sleep(.5)

    transcriber.stop_transcribing_async()

下一步

非同步會議謄寫

分享方式：

快速入門：即時會議謄寫

限制

必要條件

設定環境

建立語音簽章

針對會議進行謄寫

必要條件

設定環境

建立語音簽章

針對會議進行謄寫

必要條件

設定環境

建立語音簽章

針對會議進行謄寫

下一步

意見反映

意見反映

更多資源