分享方式:


快速入門:建立即時自動分段標記

參考文件 | 套件 (NuGet) | GitHub 上的其他範例

在本快速入門中,您會使用即時自動分段標記來執行語音轉換文字謄寫的應用程式。 自動分段標記針對參與交談的不同說話者進行區分。 語音服務提供有關哪位說話者正在講出所謄寫語音之特定部分的資訊。

說話者資訊包含在說話者識別碼欄位內的結果中。 說話者識別碼是服務在所提供的音訊內容中識別不同說話者時指派給每位交談參與者的通用識別碼。

提示

您可以在 Speech Studio 中嘗試即時語音轉換文字,而不需要註冊或撰寫任何程式碼。 不過,Speech Studio 尚不支援自動分段標記。

必要條件

設定環境

語音 SDK 可以 NuGet 套件的形式取得,並且實作 .NET Standard 2.0。 您會在本指南的後續部分中安裝語音 SDK,但請先參閱 SDK 安裝指南,了解其他需求。

設定環境變數

您必須驗證應用程式以存取 Azure AI 服務。 本文說明如何使用環境變數來儲存您的認證。 然後,您可以從程式碼存取環境變數,以驗證您的應用程式。 針對實際執行環境,使用更安全的方法來儲存和存取您的認證。

重要

我們建議使用適用於 Azure 資源的受控識別搭配 Microsoft Entra ID 驗證,以避免使用在雲端執行的應用程式儲存認證。

如果您使用 API 金鑰,請將其安全地儲存在別處,例如 Azure Key Vault。 請勿在程式碼中直接包含 API 金鑰,且切勿公開張貼金鑰。

如需 AI 服務安全性的詳細資訊,請參閱驗證對 Azure AI 服務的要求

若要設定語音資源索引鍵和區域的環境變數,請開啟主控台視窗,並遵循作業系統和開發環境的指示進行。

  • 若要設定 SPEECH_KEY 環境變數,請以您其中一個資源金鑰取代 your-key
  • 若要設定 SPEECH_REGION 環境變數,請以您的其中一個資源區域取代 your-region
setx SPEECH_KEY your-key
setx SPEECH_REGION your-region

注意

如果您只需要存取目前主控台的環境變數,您可以使用 set (而不是 setx) 來設定環境變數。

新增環境變數之後,您可能需要重新啟動任何需要讀取環境變數的程式,包括主控台視窗。 例如,如果正在使用 Visual Studio 作為編輯器,請您在執行範例前重新啟動 Visual Studio。

使用交談謄寫從檔案實作自動分段標記

遵循下列步驟來建立主控台應用程式並安裝語音 SDK。

  1. 在資料夾開啟要新增專案的命令提示字元視窗。 執行此命令以使用 .NET CLI 建立主控台應用程式。

    dotnet new console
    

    此命令會在您的專案目錄中建立 Program.cs 檔案。

  2. 使用 .NET CLI 在新專案中安裝語音 SDK。

    dotnet add package Microsoft.CognitiveServices.Speech
    
  3. 以下列程式碼取代 Program.cs 的內容。

    using Microsoft.CognitiveServices.Speech;
    using Microsoft.CognitiveServices.Speech.Audio;
    using Microsoft.CognitiveServices.Speech.Transcription;
    
    class Program 
    {
        // This example requires environment variables named "SPEECH_KEY" and "SPEECH_REGION"
        static string speechKey = Environment.GetEnvironmentVariable("SPEECH_KEY");
        static string speechRegion = Environment.GetEnvironmentVariable("SPEECH_REGION");
    
        async static Task Main(string[] args)
        {
            var filepath = "katiesteve.wav";
            var speechConfig = SpeechConfig.FromSubscription(speechKey, speechRegion);        
            speechConfig.SpeechRecognitionLanguage = "en-US";
            speechConfig.SetProperty(PropertyId.SpeechServiceResponse_DiarizeIntermediateResults, "true"); 
    
            var stopRecognition = new TaskCompletionSource<int>(TaskCreationOptions.RunContinuationsAsynchronously);
    
            // Create an audio stream from a wav file or from the default microphone
            using (var audioConfig = AudioConfig.FromWavFileInput(filepath))
            {
                // Create a conversation transcriber using audio stream input
                using (var conversationTranscriber = new ConversationTranscriber(speechConfig, audioConfig))
                {
                    conversationTranscriber.Transcribing += (s, e) =>
                    {
                        Console.WriteLine($"TRANSCRIBING: Text={e.Result.Text} Speaker ID={e.Result.SpeakerId}");
                    };
    
                    conversationTranscriber.Transcribed += (s, e) =>
                    {
                        if (e.Result.Reason == ResultReason.RecognizedSpeech)
                        {
                            Console.WriteLine();
                            Console.WriteLine($"TRANSCRIBED: Text={e.Result.Text} Speaker ID={e.Result.SpeakerId}");
                            Console.WriteLine();
                        }
                        else if (e.Result.Reason == ResultReason.NoMatch)
                        {
                            Console.WriteLine($"NOMATCH: Speech could not be transcribed.");
                        }
                    };
    
                    conversationTranscriber.Canceled += (s, e) =>
                    {
                        Console.WriteLine($"CANCELED: Reason={e.Reason}");
    
                        if (e.Reason == CancellationReason.Error)
                        {
                            Console.WriteLine($"CANCELED: ErrorCode={e.ErrorCode}");
                            Console.WriteLine($"CANCELED: ErrorDetails={e.ErrorDetails}");
                            Console.WriteLine($"CANCELED: Did you set the speech resource key and region values?");
                            stopRecognition.TrySetResult(0);
                        }
    
                        stopRecognition.TrySetResult(0);
                    };
    
                    conversationTranscriber.SessionStopped += (s, e) =>
                    {
                        Console.WriteLine("\n    Session stopped event.");
                        stopRecognition.TrySetResult(0);
                    };
    
                    await conversationTranscriber.StartTranscribingAsync();
    
                    // Waits for completion. Use Task.WaitAny to keep the task rooted.
                    Task.WaitAny(new[] { stopRecognition.Task });
    
                    await conversationTranscriber.StopTranscribingAsync();
                }
            }
        }
    }
    
  4. 取得範例音訊檔案 (英文) 或使用您自己的 .wav 檔案。 將 katiesteve.wav 取代為您 .wav 檔案的路徑和名稱。

    應用程式會辨識交談中多個參與者的語音。 您的音訊檔案應該包含多個說話者。

  5. 若要變更語音辨識語言,請以另一種支援的語言取代 en-US。 例如,es-ES 代表西班牙文 (西班牙)。 如果您未指定語言,則預設語言為 en-US。 如需詳細了解如何識別所可能說出的多種語言之一,請參閱語言識別

  6. 執行主控台應用程式以開始交談謄寫:

    dotnet run
    

重要

確保您已設定 SPEECH_KEYSPEECH_REGION 環境變數。 如果您未設定這些變數,則範例會失敗,並顯示錯誤訊息。

謄寫的交談應輸出為文字:

TRANSCRIBING: Text=good morning steve Speaker ID=Unknown
TRANSCRIBING: Text=good morning steve how are Speaker ID=Guest-1
TRANSCRIBING: Text=good morning steve how are you doing today Speaker ID=Guest-1

TRANSCRIBED: Text=Good morning, Steve. How are you doing today? Speaker ID=Guest-1

TRANSCRIBING: Text=good morning katie Speaker ID=Unknown
TRANSCRIBING: Text=good morning katie i hope Speaker ID=Guest-2
TRANSCRIBING: Text=good morning katie i hope you're having a great Speaker ID=Guest-2
TRANSCRIBING: Text=good morning katie i hope you're having a great start to Speaker ID=Guest-2
TRANSCRIBING: Text=good morning katie i hope you're having a great start to your day Speaker ID=Guest-2

TRANSCRIBED: Text=Good morning, Katie. I hope you're having a great start to your day. Speaker ID=Guest-2

TRANSCRIBING: Text=have you tried Speaker ID=Unknown
TRANSCRIBING: Text=have you tried the latest Speaker ID=Unknown
TRANSCRIBING: Text=have you tried the latest real time Speaker ID=Guest-1
TRANSCRIBING: Text=have you tried the latest real time diarization Speaker ID=Guest-1
TRANSCRIBING: Text=have you tried the latest real time diarization in Speaker ID=Guest-1
TRANSCRIBING: Text=have you tried the latest real time diarization in microsoft Speaker ID=Guest-1
TRANSCRIBING: Text=have you tried the latest real time diarization in microsoft speech Speaker ID=Guest-1
TRANSCRIBING: Text=have you tried the latest real time diarization in microsoft speech service Speaker ID=Guest-1
TRANSCRIBING: Text=have you tried the latest real time diarization in microsoft speech service which Speaker ID=Guest-1
TRANSCRIBING: Text=have you tried the latest real time diarization in microsoft speech service which can Speaker ID=Guest-1
TRANSCRIBING: Text=have you tried the latest real time diarization in microsoft speech service which can tell you Speaker ID=Guest-1
TRANSCRIBING: Text=have you tried the latest real time diarization in microsoft speech service which can tell you who said Speaker ID=Guest-1
TRANSCRIBING: Text=have you tried the latest real time diarization in microsoft speech service which can tell you who said what Speaker ID=Guest-1
TRANSCRIBING: Text=have you tried the latest real time diarization in microsoft speech service which can tell you who said what in Speaker ID=Guest-1
TRANSCRIBING: Text=have you tried the latest real time diarization in microsoft speech service which can tell you who said what in real time Speaker ID=Guest-1

TRANSCRIBED: Text=Have you tried the latest real time diarization in Microsoft Speech Service which can tell you who said what in real time? Speaker ID=Guest-1

TRANSCRIBING: Text=not yet Speaker ID=Unknown
TRANSCRIBING: Text=not yet i Speaker ID=Guest-2
TRANSCRIBING: Text=not yet i've been using Speaker ID=Guest-2
TRANSCRIBING: Text=not yet i've been using the Speaker ID=Guest-2
TRANSCRIBING: Text=not yet i've been using the batch Speaker ID=Guest-2
TRANSCRIBING: Text=not yet i've been using the batch trans Speaker ID=Guest-2
TRANSCRIBING: Text=not yet i've been using the batch transcription with Speaker ID=Guest-2
TRANSCRIBING: Text=not yet i've been using the batch transcription with diarization Speaker ID=Guest-2  
TRANSCRIBING: Text=not yet i've been using the batch transcription with diarization function Speaker ID=Guest-2
TRANSCRIBING: Text=not yet i've been using the batch transcription with diarization functionality Speaker ID=Guest-2
TRANSCRIBING: Text=not yet i've been using the batch transcription with diarization functionality but Speaker ID=Guest-2
TRANSCRIBING: Text=not yet i've been using the batch transcription with diarization functionality but it Speaker ID=Guest-2
TRANSCRIBING: Text=not yet i've been using the batch transcription with diarization functionality but it produc Speaker ID=Guest-2
TRANSCRIBING: Text=not yet i've been using the batch transcription with diarization functionality but it produces Speaker ID=Guest-2
TRANSCRIBING: Text=not yet i've been using the batch transcription with diarization functionality but it produces di Speaker ID=Guest-2
TRANSCRIBING: Text=not yet i've been using the batch transcription with diarization functionality but it produces diarization Speaker ID=Guest-2
TRANSCRIBING: Text=not yet i've been using the batch transcription with diarization functionality but it produces diarization results Speaker ID=Guest-2
TRANSCRIBING: Text=not yet i've been using the batch transcription with diarization functionality but it produces diarization results after Speaker ID=Guest-2
TRANSCRIBING: Text=not yet i've been using the batch transcription with diarization functionality but it produces diarization results after the Speaker ID=Guest-2
TRANSCRIBING: Text=not yet i've been using the batch transcription with diarization functionality but it produces diarization results after the whole Speaker ID=Guest-2
TRANSCRIBING: Text=not yet i've been using the batch transcription with diarization functionality but it produces diarization results after the whole audio Speaker ID=Guest-2
TRANSCRIBING: Text=not yet i've been using the batch transcription with diarization functionality but it produces diarization results after the whole audio is Speaker ID=Guest-2
TRANSCRIBING: Text=not yet i've been using the batch transcription with diarization functionality but it produces diarization results after the whole audio is processed Speaker ID=Guest-2
TRANSCRIBING: Text=not yet i've been using the batch transcription with diarization functionality but it produces diarization results after the whole audio is processed is the Speaker ID=Guest-2
TRANSCRIBING: Text=not yet i've been using the batch transcription with diarization functionality but it produces diarization results after the whole audio is processed is the new Speaker ID=Guest-2
TRANSCRIBING: Text=not yet i've been using the batch transcription with diarization functionality but it produces diarization results after the whole audio is processed is the new feature Speaker ID=Guest-2  
TRANSCRIBING: Text=not yet i've been using the batch transcription with diarization functionality but it produces diarization results after the whole audio is processed is the new feature able to Speaker ID=Guest-2
TRANSCRIBING: Text=not yet i've been using the batch transcription with diarization functionality but it produces diarization results after the whole audio is processed is the new feature able to di Speaker ID=Guest-2
TRANSCRIBING: Text=not yet i've been using the batch transcription with diarization functionality but it produces diarization results after the whole audio is processed is the new feature able to diarize Speaker ID=Guest-2
TRANSCRIBING: Text=not yet i've been using the batch transcription with diarization functionality but it produces diarization results after the whole audio is processed is the new feature able to diarize in real Speaker ID=Guest-2
TRANSCRIBING: Text=not yet i've been using the batch transcription with diarization functionality but it produces diarization results after the whole audio is processed is the new feature able to diarize in real time Speaker ID=Guest-2

TRANSCRIBED: Text=Not yet. I've been using the batch transcription with diarization functionality, but it produces diarization results after the whole audio is processed. Is the new feature able to diarize in real time? Speaker ID=Guest-2

TRANSCRIBING: Text=absolutely Speaker ID=Unknown
TRANSCRIBING: Text=absolutely i Speaker ID=Unknown
TRANSCRIBING: Text=absolutely i recom Speaker ID=Guest-1
TRANSCRIBING: Text=absolutely i recommend Speaker ID=Guest-1
TRANSCRIBING: Text=absolutely i recommend you give it a try Speaker ID=Guest-1

TRANSCRIBED: Text=Absolutely, I recommend you give it a try. Speaker ID=Guest-1

TRANSCRIBING: Text=that's exc Speaker ID=Unknown
TRANSCRIBING: Text=that's exciting Speaker ID=Unknown
TRANSCRIBING: Text=that's exciting let me try Speaker ID=Guest-2
TRANSCRIBING: Text=that's exciting let me try it right now Speaker ID=Guest-2

TRANSCRIBED: Text=That's exciting. Let me try it right now. Speaker ID=Guest-2

根據交談中的說話者數目,說話者會被識別為 Guest-1、Guest-2 等等。

注意

當尚未識別說話者時,您可能會在一些早期中繼結果中看到 Speaker ID=Unknown。 若沒有中繼的自動分段標記結果 (如果您未將 PropertyId.SpeechServiceResponse_DiarizeIntermediateResults 屬性設定為 “true”),則說話者識別碼一律為 “Unknown”。

清除資源

您可以使用 Azure 入口網站Azure 命令列介面 (CLI) 來移除您所建立的語音資源。

參考文件 | 套件 (NuGet) | GitHub 上的其他範例

在本快速入門中,您會使用即時自動分段標記來執行語音轉換文字謄寫的應用程式。 自動分段標記針對參與交談的不同說話者進行區分。 語音服務提供有關哪位說話者正在講出所謄寫語音之特定部分的資訊。

說話者資訊包含在說話者識別碼欄位內的結果中。 說話者識別碼是服務在所提供的音訊內容中識別不同說話者時指派給每位交談參與者的通用識別碼。

提示

您可以在 Speech Studio 中嘗試即時語音轉換文字,而不需要註冊或撰寫任何程式碼。 不過,Speech Studio 尚不支援自動分段標記。

必要條件

設定環境

語音 SDK 可以 NuGet 套件的形式取得,並且實作 .NET Standard 2.0。 您會在本指南的後續部分中安裝語音 SDK,但請先參閱 SDK 安裝指南,了解其他需求。

設定環境變數

您必須驗證應用程式以存取 Azure AI 服務。 本文說明如何使用環境變數來儲存您的認證。 然後,您可以從程式碼存取環境變數,以驗證您的應用程式。 針對實際執行環境,使用更安全的方法來儲存和存取您的認證。

重要

我們建議使用適用於 Azure 資源的受控識別搭配 Microsoft Entra ID 驗證,以避免使用在雲端執行的應用程式儲存認證。

如果您使用 API 金鑰,請將其安全地儲存在別處,例如 Azure Key Vault。 請勿在程式碼中直接包含 API 金鑰,且切勿公開張貼金鑰。

如需 AI 服務安全性的詳細資訊,請參閱驗證對 Azure AI 服務的要求

若要設定語音資源索引鍵和區域的環境變數,請開啟主控台視窗,並遵循作業系統和開發環境的指示進行。

  • 若要設定 SPEECH_KEY 環境變數,請以您其中一個資源金鑰取代 your-key
  • 若要設定 SPEECH_REGION 環境變數,請以您的其中一個資源區域取代 your-region
setx SPEECH_KEY your-key
setx SPEECH_REGION your-region

注意

如果您只需要存取目前主控台的環境變數,您可以使用 set (而不是 setx) 來設定環境變數。

新增環境變數之後,您可能需要重新啟動任何需要讀取環境變數的程式,包括主控台視窗。 例如,如果正在使用 Visual Studio 作為編輯器,請您在執行範例前重新啟動 Visual Studio。

使用交談謄寫從檔案實作自動分段標記

遵循下列步驟來建立主控台應用程式並安裝語音 SDK。

  1. Visual Studio Community 2022 中建立名為 ConversationTranscription 的新 C++ 主控台專案。

  2. 選取 [工具]>[Nuget 套件管理員]>[套件管理員主控台]。 在套件管理員主控台中,執行此命令。

    Install-Package Microsoft.CognitiveServices.Speech
    
  3. 以下列程式碼取代 ConversationTranscription.cpp 的內容。

    #include <iostream> 
    #include <stdlib.h>
    #include <speechapi_cxx.h>
    #include <future>
    
    using namespace Microsoft::CognitiveServices::Speech;
    using namespace Microsoft::CognitiveServices::Speech::Audio;
    using namespace Microsoft::CognitiveServices::Speech::Transcription;
    
    std::string GetEnvironmentVariable(const char* name);
    
    int main()
    {
        // This example requires environment variables named "SPEECH_KEY" and "SPEECH_REGION"
        auto speechKey = GetEnvironmentVariable("SPEECH_KEY");
        auto speechRegion = GetEnvironmentVariable("SPEECH_REGION");
    
        if ((size(speechKey) == 0) || (size(speechRegion) == 0)) {
            std::cout << "Please set both SPEECH_KEY and SPEECH_REGION environment variables." << std::endl;
            return -1;
        }
    
        auto speechConfig = SpeechConfig::FromSubscription(speechKey, speechRegion);
        speechConfig->SetProperty(PropertyId::SpeechServiceResponse_DiarizeIntermediateResults, "true"); 
    
        speechConfig->SetSpeechRecognitionLanguage("en-US");
    
        auto audioConfig = AudioConfig::FromWavFileInput("katiesteve.wav");
        auto conversationTranscriber = ConversationTranscriber::FromConfig(speechConfig, audioConfig);
    
        // promise for synchronization of recognition end.
        std::promise<void> recognitionEnd;
    
        // Subscribes to events.
        conversationTranscriber->Transcribing.Connect([](const ConversationTranscriptionEventArgs& e)
            {
                std::cout << "TRANSCRIBING:" << e.Result->Text << std::endl;
                std::cout << "Speaker ID=" << e.Result->SpeakerId << std::endl;
            });
    
        conversationTranscriber->Transcribed.Connect([](const ConversationTranscriptionEventArgs& e)
            {
                if (e.Result->Reason == ResultReason::RecognizedSpeech)
                {
                    std::cout << "\n" << "TRANSCRIBED: Text=" << e.Result->Text << std::endl;
                    std::cout << "Speaker ID=" << e.Result->SpeakerId << "\n" << std::endl;
                }
                else if (e.Result->Reason == ResultReason::NoMatch)
                {
                    std::cout << "NOMATCH: Speech could not be transcribed." << std::endl;
                }
            });
    
        conversationTranscriber->Canceled.Connect([&recognitionEnd](const ConversationTranscriptionCanceledEventArgs& e)
            {
                auto cancellation = CancellationDetails::FromResult(e.Result);
                std::cout << "CANCELED: Reason=" << (int)cancellation->Reason << std::endl;
    
                if (cancellation->Reason == CancellationReason::Error)
                {
                    std::cout << "CANCELED: ErrorCode=" << (int)cancellation->ErrorCode << std::endl;
                    std::cout << "CANCELED: ErrorDetails=" << cancellation->ErrorDetails << std::endl;
                    std::cout << "CANCELED: Did you set the speech resource key and region values?" << std::endl;
                }
                else if (cancellation->Reason == CancellationReason::EndOfStream)
                {
                    std::cout << "CANCELED: Reach the end of the file." << std::endl;
                }
            });
    
        conversationTranscriber->SessionStopped.Connect([&recognitionEnd](const SessionEventArgs& e)
            {
                std::cout << "Session stopped.";
                recognitionEnd.set_value(); // Notify to stop recognition.
            });
    
        conversationTranscriber->StartTranscribingAsync().wait();
    
        // Waits for recognition end.
        recognitionEnd.get_future().wait();
    
        conversationTranscriber->StopTranscribingAsync().wait();
    }
    
    std::string GetEnvironmentVariable(const char* name)
    {
    #if defined(_MSC_VER)
        size_t requiredSize = 0;
        (void)getenv_s(&requiredSize, nullptr, 0, name);
        if (requiredSize == 0)
        {
            return "";
        }
        auto buffer = std::make_unique<char[]>(requiredSize);
        (void)getenv_s(&requiredSize, buffer.get(), requiredSize, name);
        return buffer.get();
    #else
        auto value = getenv(name);
        return value ? value : "";
    #endif
    }
    
  4. 取得範例音訊檔案 (英文) 或使用您自己的 .wav 檔案。 將 katiesteve.wav 取代為您 .wav 檔案的路徑和名稱。

    應用程式會辨識交談中多個參與者的語音。 您的音訊檔案應該包含多個說話者。

  5. 若要變更語音辨識語言,請以另一種支援的語言取代 en-US。 例如,es-ES 代表西班牙文 (西班牙)。 如果您未指定語言,則預設語言為 en-US。 如需詳細了解如何識別所可能說出的多種語言之一,請參閱語言識別

  6. 執行並執行您的應用程式,以開始交談謄寫:

    重要

    確保您已設定 SPEECH_KEYSPEECH_REGION 環境變數。 如果您未設定這些變數,則範例會失敗,並顯示錯誤訊息。

謄寫的交談應輸出為文字:

TRANSCRIBING:good morning
Speaker ID=Unknown
TRANSCRIBING:good morning steve
Speaker ID=Unknown
TRANSCRIBING:good morning steve how are you doing
Speaker ID=Guest-1
TRANSCRIBING:good morning steve how are you doing today
Speaker ID=Guest-1

TRANSCRIBED: Text=Good morning, Steve. How are you doing today?
Speaker ID=Guest-1

TRANSCRIBING:good
Speaker ID=Unknown
TRANSCRIBING:good morning
Speaker ID=Unknown
TRANSCRIBING:good morning kat
Speaker ID=Unknown
TRANSCRIBING:good morning katie i hope you're having a
Speaker ID=Guest-2
TRANSCRIBING:good morning katie i hope you're having a great start to your day
Speaker ID=Guest-2

TRANSCRIBED: Text=Good morning, Katie. I hope you're having a great start to your day.
Speaker ID=Guest-2

TRANSCRIBING:have you
Speaker ID=Unknown
TRANSCRIBING:have you tried
Speaker ID=Unknown
TRANSCRIBING:have you tried the latest
Speaker ID=Unknown
TRANSCRIBING:have you tried the latest real
Speaker ID=Guest-1
TRANSCRIBING:have you tried the latest real time
Speaker ID=Guest-1
TRANSCRIBING:have you tried the latest real time diarization
Speaker ID=Guest-1
TRANSCRIBING:have you tried the latest real time diarization in
Speaker ID=Guest-1
TRANSCRIBING:have you tried the latest real time diarization in microsoft
Speaker ID=Guest-1
TRANSCRIBING:have you tried the latest real time diarization in microsoft speech
Speaker ID=Guest-1
TRANSCRIBING:have you tried the latest real time diarization in microsoft speech service
Speaker ID=Guest-1
TRANSCRIBING:have you tried the latest real time diarization in microsoft speech service which
Speaker ID=Guest-1
TRANSCRIBING:have you tried the latest real time diarization in microsoft speech service which can
Speaker ID=Guest-1
TRANSCRIBING:have you tried the latest real time diarization in microsoft speech service which can tell you
Speaker ID=Guest-1
TRANSCRIBING:have you tried the latest real time diarization in microsoft speech service which can tell you who said
Speaker ID=Guest-1
TRANSCRIBING:have you tried the latest real time diarization in microsoft speech service which can tell you who said what
Speaker ID=Guest-1
TRANSCRIBING:have you tried the latest real time diarization in microsoft speech service which can tell you who said what in
Speaker ID=Guest-1
TRANSCRIBING:have you tried the latest real time diarization in microsoft speech service which can tell you who said what in real time
Speaker ID=Guest-1

TRANSCRIBED: Text=Have you tried the latest real time diarization in Microsoft Speech Service which can tell you who said what in real time?
Speaker ID=Guest-1

TRANSCRIBING:not yet
Speaker ID=Unknown
TRANSCRIBING:not yet i
Speaker ID=Guest-2
TRANSCRIBING:not yet i've been using
Speaker ID=Guest-2
TRANSCRIBING:not yet i've been using the
Speaker ID=Guest-2
TRANSCRIBING:not yet i've been using the batch
Speaker ID=Guest-2
TRANSCRIBING:not yet i've been using the batch trans
Speaker ID=Guest-2
TRANSCRIBING:not yet i've been using the batch transcription with
Speaker ID=Guest-2
TRANSCRIBING:not yet i've been using the batch transcription with diarization
Speaker ID=Guest-2
TRANSCRIBING:not yet i've been using the batch transcription with diarization function
Speaker ID=Guest-2
TRANSCRIBING:not yet i've been using the batch transcription with diarization functionality
Speaker ID=Guest-2
TRANSCRIBING:not yet i've been using the batch transcription with diarization functionality but
Speaker ID=Guest-2
TRANSCRIBING:not yet i've been using the batch transcription with diarization functionality but it
Speaker ID=Guest-2
TRANSCRIBING:not yet i've been using the batch transcription with diarization functionality but it produces
Speaker ID=Guest-2
TRANSCRIBING:not yet i've been using the batch transcription with diarization functionality but it produces di
Speaker ID=Guest-2
TRANSCRIBING:not yet i've been using the batch transcription with diarization functionality but it produces diarization
Speaker ID=Guest-2
TRANSCRIBING:not yet i've been using the batch transcription with diarization functionality but it produces diarization results
Speaker ID=Guest-2
TRANSCRIBING:not yet i've been using the batch transcription with diarization functionality but it produces diarization results after
Speaker ID=Guest-2
TRANSCRIBING:not yet i've been using the batch transcription with diarization functionality but it produces diarization results after the
Speaker ID=Guest-2
TRANSCRIBING:not yet i've been using the batch transcription with diarization functionality but it produces diarization results after the whole
Speaker ID=Guest-2
TRANSCRIBING:not yet i've been using the batch transcription with diarization functionality but it produces diarization results after the whole audio
Speaker ID=Guest-2
TRANSCRIBING:not yet i've been using the batch transcription with diarization functionality but it produces diarization results after the whole audio is
Speaker ID=Guest-2
TRANSCRIBING:not yet i've been using the batch transcription with diarization functionality but it produces diarization results after the whole audio is processed
Speaker ID=Guest-2
TRANSCRIBING:not yet i've been using the batch transcription with diarization functionality but it produces diarization results after the whole audio is processed is the
Speaker ID=Guest-2
TRANSCRIBING:not yet i've been using the batch transcription with diarization functionality but it produces diarization results after the whole audio is processed is the new
Speaker ID=Guest-2
TRANSCRIBING:not yet i've been using the batch transcription with diarization functionality but it produces diarization results after the whole audio is processed is the new feature
Speaker ID=Guest-2
TRANSCRIBING:not yet i've been using the batch transcription with diarization functionality but it produces diarization results after the whole audio is processed is the new feature able to
Speaker ID=Guest-2
TRANSCRIBING:not yet i've been using the batch transcription with diarization functionality but it produces diarization results after the whole audio is processed is the new feature able to di
Speaker ID=Guest-2
TRANSCRIBING:not yet i've been using the batch transcription with diarization functionality but it produces diarization results after the whole audio is processed is the new feature able to diarize
Speaker ID=Guest-2
TRANSCRIBING:not yet i've been using the batch transcription with diarization functionality but it produces diarization results after the whole audio is processed is the new feature able to diarize in real
Speaker ID=Guest-2
TRANSCRIBING:not yet i've been using the batch transcription with diarization functionality but it produces diarization results after the whole audio is processed is the new feature able to diarize in real time
Speaker ID=Guest-2

TRANSCRIBED: Text=Not yet. I've been using the batch transcription with diarization functionality, but it produces diarization results after the whole audio is processed. Is the new feature able to diarize in real time?
Speaker ID=Guest-2

TRANSCRIBING:absolutely
Speaker ID=Unknown
TRANSCRIBING:absolutely i
Speaker ID=Unknown
TRANSCRIBING:absolutely i recom
Speaker ID=Guest-1
TRANSCRIBING:absolutely i recommend
Speaker ID=Guest-1
TRANSCRIBING:absolutely i recommend you
Speaker ID=Guest-1
TRANSCRIBING:absolutely i recommend you give it a try
Speaker ID=Guest-1

TRANSCRIBED: Text=Absolutely, I recommend you give it a try.
Speaker ID=Guest-1

TRANSCRIBING:that's exc
Speaker ID=Unknown
TRANSCRIBING:that's exciting
Speaker ID=Unknown
TRANSCRIBING:that's exciting let me
Speaker ID=Guest-2
TRANSCRIBING:that's exciting let me try
Speaker ID=Guest-2
TRANSCRIBING:that's exciting let me try it right now
Speaker ID=Guest-2

TRANSCRIBED: Text=That's exciting. Let me try it right now.
Speaker ID=Guest-2

根據交談中的說話者數目,說話者會被識別為 Guest-1、Guest-2 等等。

注意

當尚未識別說話者時,您可能會在一些早期中繼結果中看到 Speaker ID=Unknown。 若沒有中繼的自動分段標記結果 (如果您未將 PropertyId::SpeechServiceResponse_DiarizeIntermediateResults 屬性設定為 “true”),則說話者識別碼一律為 “Unknown”。

清除資源

您可以使用 Azure 入口網站Azure 命令列介面 (CLI) 來移除您所建立的語音資源。

參考文件 | 套件 (Go) | GitHub 上的其他範例

適用於 Go 程式設計語言的語音 SDK 不支援交談轉譯。 請選取其他程式設計語言,或本文開頭的 Go 參考和樣本連結。

參考文件 | GitHub 上的其他範例

在本快速入門中,您會使用即時自動分段標記來執行語音轉換文字謄寫的應用程式。 自動分段標記針對參與交談的不同說話者進行區分。 語音服務提供有關哪位說話者正在講出所謄寫語音之特定部分的資訊。

說話者資訊包含在說話者識別碼欄位內的結果中。 說話者識別碼是服務在所提供的音訊內容中識別不同說話者時指派給每位交談參與者的通用識別碼。

提示

您可以在 Speech Studio 中嘗試即時語音轉換文字,而不需要註冊或撰寫任何程式碼。 不過,Speech Studio 尚不支援自動分段標記。

必要條件

設定環境

若要設定環境,請 安裝語音 SDK。 本快速入門中的範例適用於 JAVA 執行階段

  1. 安裝 Apache Maven。 然後執行 mvn -v 以確認安裝成功。

  2. 在專案的根目錄中建立新 pom.xml 檔案,並將下列內容複製到其中:

    <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
        <modelVersion>4.0.0</modelVersion>
        <groupId>com.microsoft.cognitiveservices.speech.samples</groupId>
        <artifactId>quickstart-eclipse</artifactId>
        <version>1.0.0-SNAPSHOT</version>
        <build>
            <sourceDirectory>src</sourceDirectory>
            <plugins>
            <plugin>
                <artifactId>maven-compiler-plugin</artifactId>
                <version>3.7.0</version>
                <configuration>
                <source>1.8</source>
                <target>1.8</target>
                </configuration>
            </plugin>
            </plugins>
        </build>
        <dependencies>
            <dependency>
            <groupId>com.microsoft.cognitiveservices.speech</groupId>
            <artifactId>client-sdk</artifactId>
            <version>1.40.0</version>
            </dependency>
        </dependencies>
    </project>
    
  3. 安裝語音 SDK 和相依性。

    mvn clean dependency:copy-dependencies
    

設定環境變數

您必須驗證應用程式以存取 Azure AI 服務。 本文說明如何使用環境變數來儲存您的認證。 然後,您可以從程式碼存取環境變數,以驗證您的應用程式。 針對實際執行環境,使用更安全的方法來儲存和存取您的認證。

重要

我們建議使用適用於 Azure 資源的受控識別搭配 Microsoft Entra ID 驗證,以避免使用在雲端執行的應用程式儲存認證。

如果您使用 API 金鑰,請將其安全地儲存在別處,例如 Azure Key Vault。 請勿在程式碼中直接包含 API 金鑰,且切勿公開張貼金鑰。

如需 AI 服務安全性的詳細資訊,請參閱驗證對 Azure AI 服務的要求

若要設定語音資源索引鍵和區域的環境變數,請開啟主控台視窗,並遵循作業系統和開發環境的指示進行。

  • 若要設定 SPEECH_KEY 環境變數,請以您其中一個資源金鑰取代 your-key
  • 若要設定 SPEECH_REGION 環境變數,請以您的其中一個資源區域取代 your-region
setx SPEECH_KEY your-key
setx SPEECH_REGION your-region

注意

如果您只需要存取目前主控台的環境變數,您可以使用 set (而不是 setx) 來設定環境變數。

新增環境變數之後,您可能需要重新啟動任何需要讀取環境變數的程式,包括主控台視窗。 例如,如果正在使用 Visual Studio 作為編輯器,請您在執行範例前重新啟動 Visual Studio。

使用交談謄寫從檔案實作自動分段標記

遵循下列步驟來建立主控台應用程式進行交談謄寫。

  1. 在相同的專案根目錄中建立名為 ConversationTranscription.java 的新檔案。

  2. 將下列程式碼複製到 ConversationTranscription.java

    import com.microsoft.cognitiveservices.speech.*;
    import com.microsoft.cognitiveservices.speech.audio.AudioConfig;
    import com.microsoft.cognitiveservices.speech.transcription.*;
    
    import java.util.concurrent.Semaphore;
    import java.util.concurrent.ExecutionException;
    import java.util.concurrent.Future;
    
    public class ConversationTranscription {
        // This example requires environment variables named "SPEECH_KEY" and "SPEECH_REGION"
        private static String speechKey = System.getenv("SPEECH_KEY");
        private static String speechRegion = System.getenv("SPEECH_REGION");
    
        public static void main(String[] args) throws InterruptedException, ExecutionException {
    
            SpeechConfig speechConfig = SpeechConfig.fromSubscription(speechKey, speechRegion);
            speechConfig.setSpeechRecognitionLanguage("en-US");
            AudioConfig audioInput = AudioConfig.fromWavFileInput("katiesteve.wav");
            speechConfig.setProperty(PropertyId.SpeechServiceResponse_DiarizeIntermediateResults, "true");
    
            Semaphore stopRecognitionSemaphore = new Semaphore(0);
    
            ConversationTranscriber conversationTranscriber = new ConversationTranscriber(speechConfig, audioInput);
            {
                // Subscribes to events.
                conversationTranscriber.transcribing.addEventListener((s, e) -> {
                    System.out.println("TRANSCRIBING: Text=" + e.getResult().getText() + " Speaker ID=" + e.getResult().getSpeakerId() );
                });
    
                conversationTranscriber.transcribed.addEventListener((s, e) -> {
                    if (e.getResult().getReason() == ResultReason.RecognizedSpeech) {
                        System.out.println();
                        System.out.println("TRANSCRIBED: Text=" + e.getResult().getText() + " Speaker ID=" + e.getResult().getSpeakerId() );
                        System.out.println();
                    }
                    else if (e.getResult().getReason() == ResultReason.NoMatch) {
                        System.out.println("NOMATCH: Speech could not be transcribed.");
                    }
                });
    
                conversationTranscriber.canceled.addEventListener((s, e) -> {
                    System.out.println("CANCELED: Reason=" + e.getReason());
    
                    if (e.getReason() == CancellationReason.Error) {
                        System.out.println("CANCELED: ErrorCode=" + e.getErrorCode());
                        System.out.println("CANCELED: ErrorDetails=" + e.getErrorDetails());
                        System.out.println("CANCELED: Did you update the subscription info?");
                    }
    
                    stopRecognitionSemaphore.release();
                });
    
                conversationTranscriber.sessionStarted.addEventListener((s, e) -> {
                    System.out.println("\n    Session started event.");
                });
    
                conversationTranscriber.sessionStopped.addEventListener((s, e) -> {
                    System.out.println("\n    Session stopped event.");
                });
    
                conversationTranscriber.startTranscribingAsync().get();
    
                // Waits for completion.
                stopRecognitionSemaphore.acquire();
    
                conversationTranscriber.stopTranscribingAsync().get();
            }
    
            speechConfig.close();
            audioInput.close();
            conversationTranscriber.close();
    
            System.exit(0);
        }
    }
    
  3. 取得範例音訊檔案 (英文) 或使用您自己的 .wav 檔案。 將 katiesteve.wav 取代為您 .wav 檔案的路徑和名稱。

    應用程式會辨識交談中多個參與者的語音。 您的音訊檔案應該包含多個說話者。

  4. 若要變更語音辨識語言,請以另一種支援的語言取代 en-US。 例如,es-ES 代表西班牙文 (西班牙)。 如果您未指定語言,則預設語言為 en-US。 如需詳細了解如何識別所可能說出的多種語言之一,請參閱語言識別

  5. 執行新的主控台應用程式,以開始交談謄寫:

    javac ConversationTranscription.java -cp ".;target\dependency\*"
    java -cp ".;target\dependency\*" ConversationTranscription
    

重要

確保您已設定 SPEECH_KEYSPEECH_REGION 環境變數。 如果您未設定這些變數,則範例會失敗,並顯示錯誤訊息。

謄寫的交談應輸出為文字:

TRANSCRIBING: Text=good morning Speaker ID=Unknown
TRANSCRIBING: Text=good morning steve Speaker ID=Unknown
TRANSCRIBING: Text=good morning steve how Speaker ID=Guest-1
TRANSCRIBING: Text=good morning steve how are you doing Speaker ID=Guest-1
TRANSCRIBING: Text=good morning steve how are you doing today Speaker ID=Guest-1

TRANSCRIBED: Text=Good morning, Steve. How are you doing today? Speaker ID=Guest-1

TRANSCRIBING: Text=good morning katie i hope Speaker ID=Guest-2
TRANSCRIBING: Text=good morning katie i hope you're having a Speaker ID=Guest-2
TRANSCRIBING: Text=good morning katie i hope you're having a great Speaker ID=Guest-2
TRANSCRIBING: Text=good morning katie i hope you're having a great start to Speaker ID=Guest-2
TRANSCRIBING: Text=good morning katie i hope you're having a great start to your day Speaker ID=Guest-2

TRANSCRIBED: Text=Good morning, Katie. I hope you're having a great start to your day. Speaker ID=Guest-2

TRANSCRIBING: Text=have you tried Speaker ID=Unknown
TRANSCRIBING: Text=have you tried the latest Speaker ID=Unknown
TRANSCRIBING: Text=have you tried the latest real Speaker ID=Guest-1
TRANSCRIBING: Text=have you tried the latest real time Speaker ID=Guest-1
TRANSCRIBING: Text=have you tried the latest real time diarization Speaker ID=Guest-1
TRANSCRIBING: Text=have you tried the latest real time diarization in Speaker ID=Guest-1
TRANSCRIBING: Text=have you tried the latest real time diarization in microsoft Speaker ID=Guest-1
TRANSCRIBING: Text=have you tried the latest real time diarization in microsoft speech Speaker ID=Guest-1
TRANSCRIBING: Text=have you tried the latest real time diarization in microsoft speech service Speaker ID=Guest-1
TRANSCRIBING: Text=have you tried the latest real time diarization in microsoft speech service which Speaker ID=Guest-1
TRANSCRIBING: Text=have you tried the latest real time diarization in microsoft speech service which can Speaker ID=Guest-1
TRANSCRIBING: Text=have you tried the latest real time diarization in microsoft speech service which can tell you Speaker ID=Guest-1
TRANSCRIBING: Text=have you tried the latest real time diarization in microsoft speech service which can tell you who said Speaker ID=Guest-1
TRANSCRIBING: Text=have you tried the latest real time diarization in microsoft speech service which can tell you who said what Speaker ID=Guest-1
TRANSCRIBING: Text=have you tried the latest real time diarization in microsoft speech service which can tell you who said what in Speaker ID=Guest-1
TRANSCRIBING: Text=have you tried the latest real time diarization in microsoft speech service which can tell you who said what in real time Speaker ID=Guest-1

TRANSCRIBED: Text=Have you tried the latest real time diarization in Microsoft Speech Service which can tell you who said what in real time? Speaker ID=Guest-1

TRANSCRIBING: Text=not yet Speaker ID=Unknown
TRANSCRIBING: Text=not yet i Speaker ID=Guest-2
TRANSCRIBING: Text=not yet i've been using Speaker ID=Guest-2
TRANSCRIBING: Text=not yet i've been using the Speaker ID=Guest-2
TRANSCRIBING: Text=not yet i've been using the batch Speaker ID=Guest-2
TRANSCRIBING: Text=not yet i've been using the batch trans Speaker ID=Guest-2
TRANSCRIBING: Text=not yet i've been using the batch transcription with Speaker ID=Guest-2
TRANSCRIBING: Text=not yet i've been using the batch transcription with diarization Speaker ID=Guest-2
TRANSCRIBING: Text=not yet i've been using the batch transcription with diarization function Speaker ID=Guest-2
TRANSCRIBING: Text=not yet i've been using the batch transcription with diarization functionality Speaker ID=Guest-2
TRANSCRIBING: Text=not yet i've been using the batch transcription with diarization functionality but Speaker ID=Guest-2
TRANSCRIBING: Text=not yet i've been using the batch transcription with diarization functionality but it Speaker ID=Guest-2
TRANSCRIBING: Text=not yet i've been using the batch transcription with diarization functionality but it produces Speaker ID=Guest-2
TRANSCRIBING: Text=not yet i've been using the batch transcription with diarization functionality but it produces di Speaker ID=Guest-2
TRANSCRIBING: Text=not yet i've been using the batch transcription with diarization functionality but it produces diarization Speaker ID=Guest-2
TRANSCRIBING: Text=not yet i've been using the batch transcription with diarization functionality but it produces diarization results Speaker ID=Guest-2
TRANSCRIBING: Text=not yet i've been using the batch transcription with diarization functionality but it produces diarization results after Speaker ID=Guest-2
TRANSCRIBING: Text=not yet i've been using the batch transcription with diarization functionality but it produces diarization results after the Speaker ID=Guest-2
TRANSCRIBING: Text=not yet i've been using the batch transcription with diarization functionality but it produces diarization results after the whole Speaker ID=Guest-2
TRANSCRIBING: Text=not yet i've been using the batch transcription with diarization functionality but it produces diarization results after the whole audio Speaker ID=Guest-2
TRANSCRIBING: Text=not yet i've been using the batch transcription with diarization functionality but it produces diarization results after the whole audio is Speaker ID=Guest-2
TRANSCRIBING: Text=not yet i've been using the batch transcription with diarization functionality but it produces diarization results after the whole audio is processed Speaker ID=Guest-2
TRANSCRIBING: Text=not yet i've been using the batch transcription with diarization functionality but it produces diarization results after the whole audio is processed is the Speaker ID=Guest-2
TRANSCRIBING: Text=not yet i've been using the batch transcription with diarization functionality but it produces diarization results after the whole audio is processed is the new Speaker ID=Guest-2
TRANSCRIBING: Text=not yet i've been using the batch transcription with diarization functionality but it produces diarization results after the whole audio is processed is the new feature Speaker ID=Guest-2
TRANSCRIBING: Text=not yet i've been using the batch transcription with diarization functionality but it produces diarization results after the whole audio is processed is the new feature able to Speaker ID=Guest-2
TRANSCRIBING: Text=not yet i've been using the batch transcription with diarization functionality but it produces diarization results after the whole audio is processed is the new feature able to di Speaker ID=Guest-2
TRANSCRIBING: Text=not yet i've been using the batch transcription with diarization functionality but it produces diarization results after the whole audio is processed is the new feature able to diarize in real Speaker ID=Guest-2
TRANSCRIBING: Text=not yet i've been using the batch transcription with diarization functionality but it produces diarization results after the whole audio is processed is the new feature able to diarize in real time Speaker ID=Guest-2

TRANSCRIBED: Text=Not yet. I've been using the batch transcription with diarization functionality, but it produces diarization results after the whole audio is processed. Is the new feature able to diarize in real time? Speaker ID=Guest-2

TRANSCRIBING: Text=absolutely Speaker ID=Unknown
TRANSCRIBING: Text=absolutely i recom Speaker ID=Guest-1
TRANSCRIBING: Text=absolutely i recommend Speaker ID=Guest-1
TRANSCRIBING: Text=absolutely i recommend you Speaker ID=Guest-1
TRANSCRIBING: Text=absolutely i recommend you give it a try Speaker ID=Guest-1

TRANSCRIBED: Text=Absolutely, I recommend you give it a try. Speaker ID=Guest-1

TRANSCRIBING: Text=that's exc Speaker ID=Unknown
TRANSCRIBING: Text=that's exciting Speaker ID=Unknown
TRANSCRIBING: Text=that's exciting let me try Speaker ID=Guest-2
TRANSCRIBING: Text=that's exciting let me try it right now Speaker ID=Guest-2

TRANSCRIBED: Text=That's exciting. Let me try it right now. Speaker ID=Guest-2

根據交談中的說話者數目,說話者會被識別為 Guest-1、Guest-2 等等。

注意

當尚未識別說話者時,您可能會在一些早期中繼結果中看到 Speaker ID=Unknown。 若沒有中繼的自動分段標記結果 (如果您未將 PropertyId.SpeechServiceResponse_DiarizeIntermediateResults 屬性設定為 “true”),則說話者識別碼一律為 “Unknown”。

清除資源

您可以使用 Azure 入口網站Azure 命令列介面 (CLI) 來移除您所建立的語音資源。

參考文件 | 套件 (npm) | GitHub 上的其他範例 | 程式庫原始程式碼

在本快速入門中,您會使用即時自動分段標記來執行語音轉換文字謄寫的應用程式。 自動分段標記針對參與交談的不同說話者進行區分。 語音服務提供有關哪位說話者正在講出所謄寫語音之特定部分的資訊。

說話者資訊包含在說話者識別碼欄位內的結果中。 說話者識別碼是服務在所提供的音訊內容中識別不同說話者時指派給每位交談參與者的通用識別碼。

提示

您可以在 Speech Studio 中嘗試即時語音轉換文字,而不需要註冊或撰寫任何程式碼。 不過,Speech Studio 尚不支援自動分段標記。

必要條件

設定環境

若要設定環境,請安裝適用於 JavaScript 的語音 SDK。 如果您只想要安裝套件名稱,請執行 npm install microsoft-cognitiveservices-speech-sdk。 如需引導式安裝指南,請參閱 SDK 安裝指南

設定環境變數

您必須驗證應用程式以存取 Azure AI 服務。 本文說明如何使用環境變數來儲存您的認證。 然後,您可以從程式碼存取環境變數,以驗證您的應用程式。 針對實際執行環境,使用更安全的方法來儲存和存取您的認證。

重要

我們建議使用適用於 Azure 資源的受控識別搭配 Microsoft Entra ID 驗證,以避免使用在雲端執行的應用程式儲存認證。

如果您使用 API 金鑰,請將其安全地儲存在別處,例如 Azure Key Vault。 請勿在程式碼中直接包含 API 金鑰,且切勿公開張貼金鑰。

如需 AI 服務安全性的詳細資訊,請參閱驗證對 Azure AI 服務的要求

若要設定語音資源索引鍵和區域的環境變數,請開啟主控台視窗,並遵循作業系統和開發環境的指示進行。

  • 若要設定 SPEECH_KEY 環境變數,請以您其中一個資源金鑰取代 your-key
  • 若要設定 SPEECH_REGION 環境變數,請以您的其中一個資源區域取代 your-region
setx SPEECH_KEY your-key
setx SPEECH_REGION your-region

注意

如果您只需要存取目前主控台的環境變數,您可以使用 set (而不是 setx) 來設定環境變數。

新增環境變數之後,您可能需要重新啟動任何需要讀取環境變數的程式,包括主控台視窗。 例如,如果正在使用 Visual Studio 作為編輯器,請您在執行範例前重新啟動 Visual Studio。

使用交談謄寫從檔案實作自動分段標記

遵循下列步驟來建立新的主控台應用程式進行交談謄寫。

  1. 開啟要新增專案的命令提示字元視窗,然後建立名為 ConversationTranscription.js 的新檔案。

  2. 安裝適用於 JavaScript 的語音 SDK:

    npm install microsoft-cognitiveservices-speech-sdk
    
  3. 將下列程式碼複製到 ConversationTranscription.js

    const fs = require("fs");
    const sdk = require("microsoft-cognitiveservices-speech-sdk");
    
    // This example requires environment variables named "SPEECH_KEY" and "SPEECH_REGION"
    const speechConfig = sdk.SpeechConfig.fromSubscription(process.env.SPEECH_KEY, process.env.SPEECH_REGION);
    
    function fromFile() {
        const filename = "katiesteve.wav";
    
        let audioConfig = sdk.AudioConfig.fromWavFileInput(fs.readFileSync(filename));
        let conversationTranscriber = new sdk.ConversationTranscriber(speechConfig, audioConfig);
    
        var pushStream = sdk.AudioInputStream.createPushStream();
    
        fs.createReadStream(filename).on('data', function(arrayBuffer) {
            pushStream.write(arrayBuffer.slice());
        }).on('end', function() {
            pushStream.close();
        });
    
        console.log("Transcribing from: " + filename);
    
        conversationTranscriber.sessionStarted = function(s, e) {
            console.log("SessionStarted event");
            console.log("SessionId:" + e.sessionId);
        };
        conversationTranscriber.sessionStopped = function(s, e) {
            console.log("SessionStopped event");
            console.log("SessionId:" + e.sessionId);
            conversationTranscriber.stopTranscribingAsync();
        };
        conversationTranscriber.canceled = function(s, e) {
            console.log("Canceled event");
            console.log(e.errorDetails);
            conversationTranscriber.stopTranscribingAsync();
        };
        conversationTranscriber.transcribed = function(s, e) {
            console.log("TRANSCRIBED: Text=" + e.result.text + " Speaker ID=" + e.result.speakerId);
        };
    
        // Start conversation transcription
        conversationTranscriber.startTranscribingAsync(
            function () {},
            function (err) {
                console.trace("err - starting transcription: " + err);
            }
        );
    
    }
    fromFile();
    
  4. 取得範例音訊檔案 (英文) 或使用您自己的 .wav 檔案。 將 katiesteve.wav 取代為您 .wav 檔案的路徑和名稱。

    應用程式會辨識交談中多個參與者的語音。 您的音訊檔案應該包含多個說話者。

  5. 若要變更語音辨識語言,請以另一種支援的語言取代 en-US。 例如,es-ES 代表西班牙文 (西班牙)。 如果您未指定語言,則預設語言為 en-US。 如需詳細了解如何識別所可能說出的多種語言之一,請參閱語言識別

  6. 執行新的主控台應用程式,以從檔案啟動語音辨識:

    node.exe ConversationTranscription.js
    

重要

確保您已設定 SPEECH_KEYSPEECH_REGION 環境變數。 如果您未設定這些變數,則範例會失敗,並顯示錯誤訊息。

謄寫的交談應輸出為文字:

SessionStarted event
SessionId:E87AFBA483C2481985F6C9AF719F616B
TRANSCRIBED: Text=Good morning, Steve. Speaker ID=Unknown
TRANSCRIBED: Text=Good morning, Katie. Speaker ID=Unknown
TRANSCRIBED: Text=Have you tried the latest real time diarization in Microsoft Speech Service which can tell you who said what in real time? Speaker ID=Guest-1
TRANSCRIBED: Text=Not yet. I've been using the batch transcription with diarization functionality, but it produces diarization result until whole audio get processed. Speaker ID=Guest-2
TRANSCRIBED: Text=Is the new feature can diarize in real time? Speaker ID=Guest-2
TRANSCRIBED: Text=Absolutely. Speaker ID=Guest-1
TRANSCRIBED: Text=That's exciting. Let me try it right now. Speaker ID=Guest-2
Canceled event
undefined
SessionStopped event
SessionId:E87AFBA483C2481985F6C9AF719F616B

根據交談中的說話者數目,說話者會被識別為 Guest-1、Guest-2 等等。

清除資源

您可以使用 Azure 入口網站Azure 命令列介面 (CLI) 來移除您所建立的語音資源。

參考文件 | 套件 (下載) | GitHub 上的其他範例

適用於 Objective-C 的語音 SDK 支援交談謄寫,但目前尚未提供相對應的指南。 請選取另一種程式設計語言來開始使用並了解概念,或參閱本文開頭連結的 Objective-C 參考和範例。

參考文件 | 套件 (下載) | GitHub 上的其他範例

適用於 Swift-C 的語音 SDK 支援交談謄寫,但目前尚未提供相對應的指南。 請選取另一種程式設計語言來開始使用並了解概念,或參閱本文開頭連結的 Swift 參考和範例。

參考文件 | 套件 (PyPi) | GitHub 上的其他範例

在本快速入門中,您會使用即時自動分段標記來執行語音轉換文字謄寫的應用程式。 自動分段標記針對參與交談的不同說話者進行區分。 語音服務提供有關哪位說話者正在講出所謄寫語音之特定部分的資訊。

說話者資訊包含在說話者識別碼欄位內的結果中。 說話者識別碼是服務在所提供的音訊內容中識別不同說話者時指派給每位交談參與者的通用識別碼。

提示

您可以在 Speech Studio 中嘗試即時語音轉換文字,而不需要註冊或撰寫任何程式碼。 不過,Speech Studio 尚不支援自動分段標記。

必要條件

設定環境

適用於 Python 的語音 SDK 會以 Python 套件索引 (PyPI) 模組的形式提供。 適用於 Python 的語音 SDK 與 Windows、Linux 和 macOS 相容。

安裝 Python 從 3.7 或更新的版本。 請先檢查 SDK 安裝指南以了解更多需求。

設定環境變數

您必須驗證應用程式以存取 Azure AI 服務。 本文說明如何使用環境變數來儲存您的認證。 然後,您可以從程式碼存取環境變數,以驗證您的應用程式。 針對實際執行環境,使用更安全的方法來儲存和存取您的認證。

重要

我們建議使用適用於 Azure 資源的受控識別搭配 Microsoft Entra ID 驗證,以避免使用在雲端執行的應用程式儲存認證。

如果您使用 API 金鑰,請將其安全地儲存在別處,例如 Azure Key Vault。 請勿在程式碼中直接包含 API 金鑰,且切勿公開張貼金鑰。

如需 AI 服務安全性的詳細資訊,請參閱驗證對 Azure AI 服務的要求

若要設定語音資源索引鍵和區域的環境變數,請開啟主控台視窗,並遵循作業系統和開發環境的指示進行。

  • 若要設定 SPEECH_KEY 環境變數,請以您其中一個資源金鑰取代 your-key
  • 若要設定 SPEECH_REGION 環境變數,請以您的其中一個資源區域取代 your-region
setx SPEECH_KEY your-key
setx SPEECH_REGION your-region

注意

如果您只需要存取目前主控台的環境變數,您可以使用 set (而不是 setx) 來設定環境變數。

新增環境變數之後,您可能需要重新啟動任何需要讀取環境變數的程式,包括主控台視窗。 例如,如果正在使用 Visual Studio 作為編輯器,請您在執行範例前重新啟動 Visual Studio。

使用交談謄寫從檔案實作自動分段標記

請遵循下列步驟以建立新的主控台應用程式。

  1. 開啟要新增專案的命令提示字元視窗,然後建立名為 conversation_transcription.py 的新檔案。

  2. 執行此命令以安裝語音 SDK:

    pip install azure-cognitiveservices-speech
    
  3. 將下列程式碼複製到 conversation_transcription.py

    import os
    import time
    import azure.cognitiveservices.speech as speechsdk
    
    def conversation_transcriber_recognition_canceled_cb(evt: speechsdk.SessionEventArgs):
        print('Canceled event')
    
    def conversation_transcriber_session_stopped_cb(evt: speechsdk.SessionEventArgs):
        print('SessionStopped event')
    
    def conversation_transcriber_transcribed_cb(evt: speechsdk.SpeechRecognitionEventArgs):
        print('\nTRANSCRIBED:')
        if evt.result.reason == speechsdk.ResultReason.RecognizedSpeech:
            print('\tText={}'.format(evt.result.text))
            print('\tSpeaker ID={}\n'.format(evt.result.speaker_id))
        elif evt.result.reason == speechsdk.ResultReason.NoMatch:
            print('\tNOMATCH: Speech could not be TRANSCRIBED: {}'.format(evt.result.no_match_details))
    
    def conversation_transcriber_transcribing_cb(evt: speechsdk.SpeechRecognitionEventArgs):
        print('TRANSCRIBING:')
        print('\tText={}'.format(evt.result.text))
        print('\tSpeaker ID={}'.format(evt.result.speaker_id))
    
    def conversation_transcriber_session_started_cb(evt: speechsdk.SessionEventArgs):
        print('SessionStarted event')
    
    def recognize_from_file():
        # This example requires environment variables named "SPEECH_KEY" and "SPEECH_REGION"
        speech_config = speechsdk.SpeechConfig(subscription=os.environ.get('SPEECH_KEY'), region=os.environ.get('SPEECH_REGION'))
        speech_config.speech_recognition_language="en-US"
        speech_config.set_property(property_id=speechsdk.PropertyId.SpeechServiceResponse_DiarizeIntermediateResults, value='true')
    
        audio_config = speechsdk.audio.AudioConfig(filename="katiesteve.wav")
        conversation_transcriber = speechsdk.transcription.ConversationTranscriber(speech_config=speech_config, audio_config=audio_config)
    
        transcribing_stop = False
    
        def stop_cb(evt: speechsdk.SessionEventArgs):
            #"""callback that signals to stop continuous recognition upon receiving an event `evt`"""
            print('CLOSING on {}'.format(evt))
            nonlocal transcribing_stop
            transcribing_stop = True
    
        # Connect callbacks to the events fired by the conversation transcriber
        conversation_transcriber.transcribed.connect(conversation_transcriber_transcribed_cb)
        conversation_transcriber.transcribing.connect(conversation_transcriber_transcribing_cb)
        conversation_transcriber.session_started.connect(conversation_transcriber_session_started_cb)
        conversation_transcriber.session_stopped.connect(conversation_transcriber_session_stopped_cb)
        conversation_transcriber.canceled.connect(conversation_transcriber_recognition_canceled_cb)
        # stop transcribing on either session stopped or canceled events
        conversation_transcriber.session_stopped.connect(stop_cb)
        conversation_transcriber.canceled.connect(stop_cb)
    
        conversation_transcriber.start_transcribing_async()
    
        # Waits for completion.
        while not transcribing_stop:
            time.sleep(.5)
    
        conversation_transcriber.stop_transcribing_async()
    
    # Main
    
    try:
        recognize_from_file()
    except Exception as err:
        print("Encountered exception. {}".format(err))
    
  4. 取得範例音訊檔案 (英文) 或使用您自己的 .wav 檔案。 將 katiesteve.wav 取代為您 .wav 檔案的路徑和名稱。

    應用程式會辨識交談中多個參與者的語音。 您的音訊檔案應該包含多個說話者。

  5. 若要變更語音辨識語言,請以另一種支援的語言取代 en-US。 例如,es-ES 代表西班牙文 (西班牙)。 如果您未指定語言,則預設語言為 en-US。 如需詳細了解如何識別所可能說出的多種語言之一,請參閱語言識別

  6. 執行新的主控台應用程式,以開始交談謄寫:

    python conversation_transcription.py
    

重要

確保您已設定 SPEECH_KEYSPEECH_REGION 環境變數。 如果您未設定這些變數,則範例會失敗,並顯示錯誤訊息。

謄寫的交談應輸出為文字:

TRANSCRIBING:
        Text=good morning
        Speaker ID=Unknown
TRANSCRIBING:
        Text=good morning steve
        Speaker ID=Unknown
TRANSCRIBING:
        Text=good morning steve how are
        Speaker ID=Guest-1
TRANSCRIBING:
        Text=good morning steve how are you doing today
        Speaker ID=Guest-1

TRANSCRIBED:
        Text=Good morning, Steve. How are you doing today?
        Speaker ID=Guest-1

TRANSCRIBING:
        Text=good morning katie
        Speaker ID=Unknown
TRANSCRIBING:
        Text=good morning katie i hope you're having a
        Speaker ID=Guest-2
TRANSCRIBING:
        Text=good morning katie i hope you're having a great start to
        Speaker ID=Guest-2
TRANSCRIBING:
        Text=good morning katie i hope you're having a great start to your day
        Speaker ID=Guest-2

TRANSCRIBED:
        Text=Good morning, Katie. I hope you're having a great start to your day.
        Speaker ID=Guest-2

TRANSCRIBING:
        Text=have you
        Speaker ID=Unknown
TRANSCRIBING:
        Text=have you tried
        Speaker ID=Unknown
TRANSCRIBING:
        Text=have you tried the latest
        Speaker ID=Unknown
TRANSCRIBING:
        Text=have you tried the latest real
        Speaker ID=Guest-1
TRANSCRIBING:
        Text=have you tried the latest real time
        Speaker ID=Guest-1
TRANSCRIBING:
        Text=have you tried the latest real time diarization
        Speaker ID=Guest-1
TRANSCRIBING:
        Text=have you tried the latest real time diarization in
        Speaker ID=Guest-1
TRANSCRIBING:
        Text=have you tried the latest real time diarization in microsoft
        Speaker ID=Guest-1
TRANSCRIBING:
        Text=have you tried the latest real time diarization in microsoft speech
        Speaker ID=Guest-1
TRANSCRIBING:
        Text=have you tried the latest real time diarization in microsoft speech service
        Speaker ID=Guest-1
TRANSCRIBING:
        Text=have you tried the latest real time diarization in microsoft speech service which
        Speaker ID=Guest-1
TRANSCRIBING:
        Text=have you tried the latest real time diarization in microsoft speech service which can
        Speaker ID=Guest-1
TRANSCRIBING:
        Text=have you tried the latest real time diarization in microsoft speech service which can tell you
        Speaker ID=Guest-1
TRANSCRIBING:
        Text=have you tried the latest real time diarization in microsoft speech service which can tell you who said        
        Speaker ID=Guest-1
TRANSCRIBING:
        Text=have you tried the latest real time diarization in microsoft speech service which can tell you who said what   
        Speaker ID=Guest-1
TRANSCRIBING:
        Text=have you tried the latest real time diarization in microsoft speech service which can tell you who said what in
        Speaker ID=Guest-1
TRANSCRIBING:
        Text=have you tried the latest real time diarization in microsoft speech service which can tell you who said what in real time
        Speaker ID=Guest-1

TRANSCRIBED:
        Text=Have you tried the latest real time diarization in Microsoft Speech Service which can tell you who said what in real time?
        Speaker ID=Guest-1

TRANSCRIBING:
        Text=not yet
        Speaker ID=Unknown
TRANSCRIBING:
        Text=not yet i
        Speaker ID=Guest-2
TRANSCRIBING:
        Text=not yet i've been using
        Speaker ID=Guest-2
TRANSCRIBING:
        Text=not yet i've been using the
        Speaker ID=Guest-2
TRANSCRIBING:
        Text=not yet i've been using the batch
        Speaker ID=Guest-2
TRANSCRIBING:
        Text=not yet i've been using the batch trans
        Speaker ID=Guest-2
TRANSCRIBING:
        Text=not yet i've been using the batch transcription with
        Speaker ID=Guest-2
TRANSCRIBING:
        Text=not yet i've been using the batch transcription with diarization
        Speaker ID=Guest-2
TRANSCRIBING:
        Text=not yet i've been using the batch transcription with diarization function
        Speaker ID=Guest-2
TRANSCRIBING:
        Text=not yet i've been using the batch transcription with diarization functionality
        Speaker ID=Guest-2
TRANSCRIBING:
        Text=not yet i've been using the batch transcription with diarization functionality but
        Speaker ID=Guest-2
TRANSCRIBING:
        Text=not yet i've been using the batch transcription with diarization functionality but it
        Speaker ID=Guest-2
TRANSCRIBING:
        Text=not yet i've been using the batch transcription with diarization functionality but it produces
        Speaker ID=Guest-2
TRANSCRIBING:
        Text=not yet i've been using the batch transcription with diarization functionality but it produces di
        Speaker ID=Guest-2
TRANSCRIBING:
        Text=not yet i've been using the batch transcription with diarization functionality but it produces diarization     
        Speaker ID=Guest-2
TRANSCRIBING:
        Text=not yet i've been using the batch transcription with diarization functionality but it produces diarization results
        Speaker ID=Guest-2
TRANSCRIBING:
        Text=not yet i've been using the batch transcription with diarization functionality but it produces diarization results after
        Speaker ID=Guest-2
TRANSCRIBING:
        Text=not yet i've been using the batch transcription with diarization functionality but it produces diarization results after the
        Speaker ID=Guest-2
TRANSCRIBING:
        Text=not yet i've been using the batch transcription with diarization functionality but it produces diarization results after the whole
        Speaker ID=Guest-2
TRANSCRIBING:
        Text=not yet i've been using the batch transcription with diarization functionality but it produces diarization results after the whole audio
        Speaker ID=Guest-2
TRANSCRIBING:
        Text=not yet i've been using the batch transcription with diarization functionality but it produces diarization results after the whole audio is
        Speaker ID=Guest-2
TRANSCRIBING:
        Text=not yet i've been using the batch transcription with diarization functionality but it produces diarization results after the whole audio is processed
        Speaker ID=Guest-2
TRANSCRIBING:
        Text=not yet i've been using the batch transcription with diarization functionality but it produces diarization results after the whole audio is processed is the
        Speaker ID=Guest-2
TRANSCRIBING:
        Text=not yet i've been using the batch transcription with diarization functionality but it produces diarization results after the whole audio is processed is the new
        Speaker ID=Guest-2
TRANSCRIBING:
        Text=not yet i've been using the batch transcription with diarization functionality but it produces diarization results after the whole audio is processed is the new feature
        Speaker ID=Guest-2
TRANSCRIBING:
        Text=not yet i've been using the batch transcription with diarization functionality but it produces diarization results after the whole audio is processed is the new feature able to
        Speaker ID=Guest-2
TRANSCRIBING:
        Text=not yet i've been using the batch transcription with diarization functionality but it produces diarization results after the whole audio is processed is the new feature able to di
        Speaker ID=Guest-2
TRANSCRIBING:
        Text=not yet i've been using the batch transcription with diarization functionality but it produces diarization results after the whole audio is processed is the new feature able to diarize in real
        Speaker ID=Guest-2
TRANSCRIBING:
        Text=not yet i've been using the batch transcription with diarization functionality but it produces diarization results after the whole audio is processed is the new feature able to diarize in real time
        Speaker ID=Guest-2

TRANSCRIBED:
        Text=Not yet. I've been using the batch transcription with diarization functionality, but it produces diarization results after the whole audio is processed. Is the new feature able to diarize in real time?
        Speaker ID=Guest-2

TRANSCRIBING:
        Text=absolutely
        Speaker ID=Unknown
TRANSCRIBING:
        Text=absolutely i
        Speaker ID=Unknown
TRANSCRIBING:
        Text=absolutely i recom
        Speaker ID=Guest-1
TRANSCRIBING:
        Text=absolutely i recommend
        Speaker ID=Guest-1
TRANSCRIBING:
        Text=absolutely i recommend you give it a try
        Speaker ID=Guest-1

TRANSCRIBED:
        Text=Absolutely, I recommend you give it a try.
        Speaker ID=Guest-1

TRANSCRIBING:
        Text=that's exc
        Speaker ID=Unknown
TRANSCRIBING:
        Text=that's exciting
        Speaker ID=Unknown
TRANSCRIBING:
        Text=that's exciting let me
        Speaker ID=Guest-2
TRANSCRIBING:
        Text=that's exciting let me try
        Speaker ID=Guest-2
TRANSCRIBING:
        Text=that's exciting let me try it right now
        Speaker ID=Guest-2

TRANSCRIBED:
        Text=That's exciting. Let me try it right now.
        Speaker ID=Guest-2

根據交談中的說話者數目,說話者會被識別為 Guest-1、Guest-2 等等。

注意

當尚未識別說話者時,您可能會在一些早期中繼結果中看到 Speaker ID=Unknown。 若沒有中繼的自動分段標記結果 (如果您未將 PropertyId.SpeechServiceResponse_DiarizeIntermediateResults 屬性設定為 “true”),則說話者識別碼一律為 “Unknown”。

清除資源

您可以使用 Azure 入口網站Azure 命令列介面 (CLI) 來移除您所建立的語音資源。

語音轉換文字 REST API 參考 | 適用於簡短音訊的語音轉換文字 REST API 參考 | GitHub 上的其他範例

REST API 不支援交談謄寫。 請從此頁面頂端選取另一個程式設計語言或工具。

語音 CLI 不支援交談謄寫。 請從此頁面頂端選取另一個程式設計語言或工具。

後續步驟