非同步對話轉譯多通道自動分段標記

文章
10/16/2024

注意

此功能目前處於公開預覽。此預覽版是在沒有服務等級協定的情況下提供，不建議用於生產工作負載。可能不支援特定功能，或可能已經限制功能。如需詳細資訊，請參閱 Microsoft Azure 預覽版增補使用條款。

本文使用 RemoteMeetingTranscriptionClient API 來示範非同步對話轉譯多通道自動分段標記。如果您已設定對話轉譯來進行非同步轉譯且具有 meetingId，您可以使用 RemoteMeetingTranscriptionClient API 來取得與該 meetingId 相關聯的轉譯。

重要

對話轉譯多通道自動分段標記 (預覽) 將於 2025 年 3 月 28 日淘汰。如需移轉至其他語音轉換文字功能的詳細資訊，請參閱移出對話轉譯多通道自動分段標記。

非同步與即時 + 非同步的比較

使用非同步謄寫時，您可以串流會議音訊，但不需要即時傳回謄寫。而是在傳送音訊之後，使用 Meeting 的 meetingId 來查詢非同步轉譯的狀態。當非同步轉譯就緒時，您就會取得 RemoteMeetingTranscriptionResult。

使用即時加上非同步時，您可以即時取得謄寫，但也可以使用 meetingId 進行查詢來取得謄寫 (類似於非同步案例)。

完成非同步轉譯需要兩個步驟。第一個步驟是上傳音訊，並選擇僅非同步或即時加上非同步。第二個步驟是取得轉譯結果。

上傳音訊

非同步轉譯的第一個步驟是使用語音 SDK，將音訊傳送至對話轉譯服務。

此範例程式碼說明如何在僅限非同步模式中使用對話轉譯。為了將音訊串流至轉譯器，您必須新增衍生自即時對話轉譯快速入門的音訊串流程式碼。

async Task CompleteContinuousRecognition(MeetingTranscriber recognizer, string meetingId)
{
    var finishedTaskCompletionSource = new TaskCompletionSource<int>();

    recognizer.SessionStopped += (s, e) =>
    {
        finishedTaskCompletionSource.TrySetResult(0);
    };

    recognizer.Canceled += (s, e) => 
    {
        Console.WriteLine($"CANCELED: Reason={e.Reason}");
        if (e.Reason == CancellationReason.Error)
        {
            Console.WriteLine($"CANCELED: ErrorCode={e.ErrorCode}");
            Console.WriteLine($"CANCELED: ErrorDetails={e.ErrorDetails}");
            Console.WriteLine($"CANCELED: Did you update the subscription info?");
            throw new System.ApplicationException("${e.ErrorDetails}");
        }
        finishedTaskCompletionSource.TrySetResult(0);
    };

    await recognizer.StartTranscribingAsync().ConfigureAwait(false);
    
    // Waits for completion.
    // Use Task.WaitAny to keep the task rooted.
    Task.WaitAny(new[] { finishedTaskCompletionSource.Task });
    
    await recognizer.StopTranscribingAsync().ConfigureAwait(false);
}

async Task<List<string>> GetRecognizerResult(MeetingTranscriber recognizer, string meetingId)
{
    List<string> recognizedText = new List<string>();
    recognizer.Transcribed += (s, e) =>
    {
        if (e.Result.Text.Length > 0)
        {
            recognizedText.Add(e.Result.Text);
        }
    };

    await CompleteContinuousRecognition(recognizer, meetingId);

    recognizer.Dispose();
    return recognizedText;
}

async Task UploadAudio()
{
    // Create the speech config object
    // Substitute real information for "YourSubscriptionKey" and "Region"
    SpeechConfig speechConfig = SpeechConfig.FromSubscription("YourSubscriptionKey", "Region");
    speechConfig.SetProperty("ConversationTranscriptionInRoomAndOnline", "true");

    // Set the property for asynchronous transcription
    speechConfig.SetServiceProperty("transcriptionMode", "async", ServicePropertyChannel.UriQueryParameter);

    // Alternatively: set the property for real-time plus asynchronous transcription
    // speechConfig.setServiceProperty("transcriptionMode", "RealTimeAndAsync", ServicePropertyChannel.UriQueryParameter);

    // Create an audio stream from a wav file or from the default microphone if you want to stream live audio from the supported devices
    // Replace with your own audio file name and Helper class which implements AudioConfig using PullAudioInputStreamCallback
    PullAudioInputStreamCallback wavfilePullStreamCallback = Helper.OpenWavFile("16kHz16Bits8channelsOfRecordedPCMAudio.wav");
    // Create an audio stream format assuming the file used above is 16kHz, 16 bits and 8 channel pcm wav file
    AudioStreamFormat audioStreamFormat = AudioStreamFormat.GetWaveFormatPCM(16000, 16, 8);
    // Create an input stream
    AudioInputStream audioStream = AudioInputStream.CreatePullStream(wavfilePullStreamCallback, audioStreamFormat);

    // Ensure the meetingId for a new meeting is a truly unique GUID
    String meetingId = Guid.NewGuid().ToString();

    // Create a Meeting
    using (var meeting = await Meeting.CreateMeetingAsync(speechConfig, meetingId))
    {
        using (var meetingTranscriber = new MeetingTranscriber(AudioConfig.FromStreamInput(audioStream)))
        {
            await meetingTranscriber.JoinMeetingAsync(meeting);
            // Helper function to get the real-time transcription results
            var result = await GetRecognizerResult(meetingTranscriber, meetingId);
        }
    }
}

如果您需要即時「加上」非同步，請將適當的程式碼加上註解和取消註解，如下所示：

// Set the property for asynchronous transcription
// speechConfig.SetServiceProperty("transcriptionMode", "async", ServicePropertyChannel.UriQueryParameter);

// Alternatively: set the property for real-time plus asynchronous transcription
speechConfig.SetServiceProperty("transcriptionMode", "RealTimeAndAsync", ServicePropertyChannel.UriQueryParameter);

取得轉譯結果

透過 NuGet 安裝 Microsoft.CognitiveServices.Speech.Remotemeeting 1.13.0 版或更新版本。

範例轉譯程式碼

在您具有 meetingId 之後，請在用戶端應用程式上建立遠端會議謄寫用戶端 RemoteMeetingTranscriptionClient，以查詢非同步謄寫的狀態。建立 RemoteMeetingTranscriptionOperation 的物件以取得長時間執行的 Operation 物件。您可以檢查作業的狀態，或等候作業完成。

// Create the speech config
SpeechConfig config = SpeechConfig.FromSubscription("YourSpeechKey", "YourSpeechRegion");
// Create the speech client
RemoteMeetingTranscriptionClient client = new RemoteMeetingTranscriptionClient(config);
// Create the remote operation
RemoteMeetingTranscriptionOperation operation = 
                            new RemoteMeetingTranscriptionOperation(meetingId, client);

// Wait for operation to finish
await operation.WaitForCompletionAsync(TimeSpan.FromSeconds(10), CancellationToken.None);
// Get the result of the long running operation
var val = operation.Value.MeetingTranscriptionResults;
// Print the fields from the results
foreach (var item in val)
{
    Console.WriteLine($"{item.Text}, {item.ResultId}, {item.Reason}, {item.UserId}, {item.OffsetInTicks}, {item.Duration}");
    Console.WriteLine($"{item.Properties.GetProperty(PropertyId.SpeechServiceResponse_JsonResult)}");
}
Console.WriteLine("Operation completed");

上傳音訊

您必須先使用語音 SDK 將音訊傳送至對話轉譯服務，才能執行非同步對話轉譯。

此範例程式碼說明如何在僅限非同步模式中使用對話轉譯。為了將音訊串流至轉譯器，您必須新增衍生自即時對話轉譯快速入門的音訊串流程式碼。請參閱該主題的限制一節，以查看支援的平台和語言 API。

// Create the speech config object
// Substitute real information for "YourSubscriptionKey" and "Region"
SpeechConfig speechConfig = SpeechConfig.fromSubscription("YourSubscriptionKey", "Region");
speechConfig.setProperty("ConversationTranscriptionInRoomAndOnline", "true");

// Set the property for asynchronous transcription
speechConfig.setServiceProperty("transcriptionMode", "async", ServicePropertyChannel.UriQueryParameter);

// Set the property for real-time plus asynchronous transcription
//speechConfig.setServiceProperty("transcriptionMode", "RealTimeAndAsync", ServicePropertyChannel.UriQueryParameter);

// pick a meeting Id that is a GUID.
String meetingId = UUID.randomUUID().toString();

// Create a Meeting
Future<Meeting> meetingFuture = Meeting.createMeetingAsync(speechConfig, meetingId);
Meeting meeting = meetingFuture.get();

// Create an audio stream from a wav file or from the default microphone if you want to stream live audio from the supported devices
// Replace with your own audio file name and Helper class which implements AudioConfig using PullAudioInputStreamCallback
PullAudioInputStreamCallback wavfilePullStreamCallback = Helper.OpenWavFile("16kHz16Bits8channelsOfRecordedPCMAudio.wav");
// Create an audio stream format assuming the file used above is 16kHz, 16 bits and 8 channel pcm wav file
AudioStreamFormat audioStreamFormat = AudioStreamFormat.getWaveFormatPCM((long)16000, (short)16,(short)8);
// Create an input stream
AudioInputStream audioStream = AudioInputStream.createPullStream(wavfilePullStreamCallback, audioStreamFormat);

// Create a meeting transcriber
MeetingTranscriber transcriber = new MeetingTranscriber(AudioConfig.fromStreamInput(audioStream));

// join a meeting
transcriber.joinMeetingAsync(meeting);

// Add the event listener for the real-time events
transcriber.transcribed.addEventListener((o, e) -> {
    System.out.println("Meeting transcriber Recognized:" + e.toString());
});

transcriber.canceled.addEventListener((o, e) -> {
    System.out.println("Meeting transcriber canceled:" + e.toString());
    try {
        transcriber.stopTranscribingAsync().get();
    } catch (InterruptedException ex) {
        ex.printStackTrace();
    } catch (ExecutionException ex) {
        ex.printStackTrace();
    }
});

transcriber.sessionStopped.addEventListener((o, e) -> {
    System.out.println("Meeting transcriber stopped:" + e.toString());

    try {
        transcriber.stopTranscribingAsync().get();
    } catch (InterruptedException ex) {
        ex.printStackTrace();
    } catch (ExecutionException ex) {
        ex.printStackTrace();
    }
});

// start the transcription.
Future<?> future = transcriber.startTranscribingAsync();
...

如果您需要即時「加上」非同步，請將適當的程式碼加上註解和取消註解，如下所示：

// Set the property for asynchronous transcription
//speechConfig.setServiceProperty("transcriptionMode", "async", ServicePropertyChannel.UriQueryParameter);

// Set the property for real-time plus asynchronous transcription
speechConfig.setServiceProperty("transcriptionMode", "RealTimeAndAsync", ServicePropertyChannel.UriQueryParameter);

取得轉譯結果

針對此處所示的程式碼，您需要 remote-meeting 1.8.0 版，僅 Windows 和 Linux 上的 JAVA (1.8.0 或更新版本) 才提供支援。

取得非同步會議用戶端 SDK

您可以藉由編輯 pom.xml 檔案來取得 remote-meeting，如下所示。

在檔案結尾的 </project> 結尾標籤之前，建立 repositories 元素並讓其參考語音 SDK 的 Maven 存放庫：

<repositories>
  <repository>
    <id>maven-cognitiveservices-speech</id>
    <name>Microsoft Cognitive Services Speech Maven Repository</name>
    <url>https://azureai.azureedge.net/maven/</url>
  </repository>
</repositories>

另請新增以 remotemeeting-client-sdk 1.8.0 為相依性的 dependencies 元素：

<dependencies>
  <dependency>
    <groupId>com.microsoft.cognitiveservices.speech.remotemeeting</groupId>
    <artifactId>remote-meeting</artifactId>
    <version>1.8.0</version>
  </dependency>
</dependencies>

儲存變更

範例轉譯程式碼

在您具有 meetingId 之後，請在用戶端應用程式上建立遠端會議謄寫用戶端 RemoteMeetingTranscriptionClient，以查詢非同步謄寫的狀態。使用 RemoteMeetingTranscriptionClient 中的 GetTranscriptionOperation 方法來取得 PollerFlux 物件。 PollerFlux 物件包含遠端作業狀態 RemoteMeetingTranscriptionOperation 和最終結果 RemoteMeetingTranscriptionResult 的相關資訊。作業完成後，請在 SyncPoller 上呼叫 getFinalResult，以取得 RemoteMeetingTranscriptionResult。在此程式碼中，我們會將結果內容列印至系統輸出。

// Create the speech config object
SpeechConfig speechConfig = SpeechConfig.fromSubscription("YourSubscriptionKey", "Region");

// Create a remote Meeting Transcription client
RemoteMeetingTranscriptionClient client = new RemoteMeetingTranscriptionClient(speechConfig);

// Get the PollerFlux for the remote operation
PollerFlux<RemoteMeetingTranscriptionOperation, RemoteMeetingTranscriptionResult> remoteTranscriptionOperation = client.GetTranscriptionOperation(meetingId);

// Subscribe to PollerFlux to get the remote operation status
remoteTranscriptionOperation.subscribe(
        pollResponse -> {
            System.out.println("Poll response status : " + pollResponse.getStatus());
            System.out.println("Poll response status : " + pollResponse.getValue().getServiceStatus());
        }
);

// Obtain the blocking operation using getSyncPoller
SyncPoller<RemoteMeetingTranscriptionOperation, RemoteMeetingTranscriptionResult> blockingOperation =  remoteTranscriptionOperation.getSyncPoller();

// Wait for the operation to finish
blockingOperation.waitForCompletion();

// Get the final result response
RemoteMeetingTranscriptionResult resultResponse = blockingOperation.getFinalResult();

// Print the result
if(resultResponse != null) {
    if(resultResponse.getMeetingTranscriptionResults() != null) {
        for (int i = 0; i < resultResponse.getMeetingTranscriptionResults().size(); i++) {
            MeetingTranscriptionResult result = resultResponse.getMeetingTranscriptionResults().get(i);
            System.out.println(result.getProperties().getProperty(PropertyId.SpeechServiceResponse_JsonResult.name()));
            System.out.println(result.getProperties().getProperty(PropertyId.SpeechServiceResponse_JsonResult));
            System.out.println(result.getOffset());
            System.out.println(result.getDuration());
            System.out.println(result.getUserId());
            System.out.println(result.getReason());
            System.out.println(result.getResultId());
            System.out.println(result.getText());
            System.out.println(result.toString());
        }
    }
}

System.out.println("Operation finished");

分享方式：

非同步對話轉譯多通道自動分段標記

非同步與即時 + 非同步的比較

上傳音訊

取得轉譯結果

範例轉譯程式碼

上傳音訊

取得轉譯結果

取得非同步會議用戶端 SDK

範例轉譯程式碼

意見反映

更多資源

分享方式：

非同步對話轉譯多通道自動分段標記

非同步與即時 + 非同步的比較

上傳音訊

取得轉譯結果

範例轉譯程式碼

上傳音訊

取得轉譯結果

取得非同步會議用戶端 SDK

範例轉譯程式碼

相關內容

意見反映

更多資源