非同期会議の文字起こし

[アーティクル]
01/18/2024

この記事では、RemoteMeetingTranscriptionClient API を使って、非同期での会議の文字起こしを行う方法を示します。非同期で文字起こしを行うように会議の文字起こしを構成してあり、meetingId がある場合は、RemoteMeetingTranscriptionClient API を使って、その meetingId に関連付けられている文字起こしを取得できます。

非同期またはリアルタイムと非同期

非同期での文字起こしでは、会議のオーディオをストリーミングしますが、文字起こしをリアルタイムで受け取る必要はありません。代わりに、オーディオを送信した後、Meeting の meetingId を使って、非同期での文字起こしの状態を照会します。非同期での文字起こしの準備が完了していると、RemoteMeetingTranscriptionResult を受け取ります。

リアルタイムと非同期では、リアルタイムで文字起こしを受け取りますが、meetingId で照会して文字起こしを取得することもできます (非同期のシナリオと同様)。

非同期での文字起こしを実行するには、2 つのステップが必要です。最初のステップでは、非同期のみまたはリアルタイムと非同期のどちらかを選択して、オーディオをアップロードします。 2 番目のステップでは、文字起こしの結果を取得します。

オーディオをアップロードする

非同期での文字起こしの最初の手順は、Speech SDK を使用して音声を会議の文字起こしサービスに送信することです。

このコード例では、非同期モード専用に MeetingTranscriber を作成する方法を示します。オーディオを文字起こし機能にストリーミングするには、Speech SDK を使ってリアルタイムで会議の文字起こしを行う方法に関する記事で作成したオーディオストリーミングコードを追加します。

async Task CompleteContinuousRecognition(MeetingTranscriber recognizer, string meetingId)
{
    var finishedTaskCompletionSource = new TaskCompletionSource<int>();

    recognizer.SessionStopped += (s, e) =>
    {
        finishedTaskCompletionSource.TrySetResult(0);
    };

    recognizer.Canceled += (s, e) => 
    {
        Console.WriteLine($"CANCELED: Reason={e.Reason}");
        if (e.Reason == CancellationReason.Error)
        {
            Console.WriteLine($"CANCELED: ErrorCode={e.ErrorCode}");
            Console.WriteLine($"CANCELED: ErrorDetails={e.ErrorDetails}");
            Console.WriteLine($"CANCELED: Did you update the subscription info?");
            throw new System.ApplicationException("${e.ErrorDetails}");
        }
        finishedTaskCompletionSource.TrySetResult(0);
    };

    await recognizer.StartTranscribingAsync().ConfigureAwait(false);
    
    // Waits for completion.
    // Use Task.WaitAny to keep the task rooted.
    Task.WaitAny(new[] { finishedTaskCompletionSource.Task });
    
    await recognizer.StopTranscribingAsync().ConfigureAwait(false);
}

async Task<List<string>> GetRecognizerResult(MeetingTranscriber recognizer, string meetingId)
{
    List<string> recognizedText = new List<string>();
    recognizer.Transcribed += (s, e) =>
    {
        if (e.Result.Text.Length > 0)
        {
            recognizedText.Add(e.Result.Text);
        }
    };

    await CompleteContinuousRecognition(recognizer, meetingId);

    recognizer.Dispose();
    return recognizedText;
}

async Task UploadAudio()
{
    // Create the speech config object
    // Substitute real information for "YourSubscriptionKey" and "Region"
    SpeechConfig speechConfig = SpeechConfig.FromSubscription("YourSubscriptionKey", "Region");
    speechConfig.SetProperty("ConversationTranscriptionInRoomAndOnline", "true");

    // Set the property for asynchronous transcription
    speechConfig.SetServiceProperty("transcriptionMode", "async", ServicePropertyChannel.UriQueryParameter);

    // Alternatively: set the property for real-time plus asynchronous transcription
    // speechConfig.setServiceProperty("transcriptionMode", "RealTimeAndAsync", ServicePropertyChannel.UriQueryParameter);

    // Create an audio stream from a wav file or from the default microphone if you want to stream live audio from the supported devices
    // Replace with your own audio file name and Helper class which implements AudioConfig using PullAudioInputStreamCallback
    PullAudioInputStreamCallback wavfilePullStreamCallback = Helper.OpenWavFile("16kHz16Bits8channelsOfRecordedPCMAudio.wav");
    // Create an audio stream format assuming the file used above is 16kHz, 16 bits and 8 channel pcm wav file
    AudioStreamFormat audioStreamFormat = AudioStreamFormat.GetWaveFormatPCM(16000, 16, 8);
    // Create an input stream
    AudioInputStream audioStream = AudioInputStream.CreatePullStream(wavfilePullStreamCallback, audioStreamFormat);

    // Ensure the meetingId for a new meeting is a truly unique GUID
    String meetingId = Guid.NewGuid().ToString();

    // Create a Meeting
    using (var meeting = await Meeting.CreateMeetingAsync(speechConfig, meetingId))
    {
        using (var meetingTranscriber = new MeetingTranscriber(AudioConfig.FromStreamInput(audioStream)))
        {
            await meetingTranscriber.JoinMeetingAsync(meeting);
            // Helper function to get the real-time transcription results
            var result = await GetRecognizerResult(meetingTranscriber, meetingId);
        }
    }
}

リアルタイム "かつ" 非同期で行う場合は、次のように、適切なコード行をコメント化およびコメント解除します。

// Set the property for asynchronous transcription
// speechConfig.SetServiceProperty("transcriptionMode", "async", ServicePropertyChannel.UriQueryParameter);

// Alternatively: set the property for real-time plus asynchronous transcription
speechConfig.SetServiceProperty("transcriptionMode", "RealTimeAndAsync", ServicePropertyChannel.UriQueryParameter);

文字起こしの結果を取得する

NuGet を使用して Microsoft.CognitiveServices.Speech.Remoteconversation バージョン 1.13.0 以降をインストールします。

サンプルの文字起こしコード

meetingId を取得した後、クライアントアプリケーションでリモート会議の文字起こしクライアント RemoteMeetingTranscriptionClient を作成して、非同期での文字起こしの状態を照会します。 RemoteMeetingTranscriptionOperation のオブジェクトを作成して、実行時間の長い Operation オブジェクトを取得します。操作の状態を確認することも、完了するまで待機することもできます。

// Create the speech config
SpeechConfig config = SpeechConfig.FromSubscription("YourSpeechKey", "YourSpeechRegion");
// Create the speech client
RemoteMeetingTranscriptionClient client = new RemoteMeetingTranscriptionClient(config);
// Create the remote operation
RemoteMeetingTranscriptionOperation operation = 
                            new RemoteMeetingTranscriptionOperation(meetingId, client);

// Wait for operation to finish
await operation.WaitForCompletionAsync(TimeSpan.FromSeconds(10), CancellationToken.None);
// Get the result of the long running operation
var val = operation.Value.MeetingTranscriptionResults;
// Print the fields from the results
foreach (var item in val)
{
    Console.WriteLine($"{item.Text}, {item.ResultId}, {item.Reason}, {item.UserId}, {item.OffsetInTicks}, {item.Duration}");
    Console.WriteLine($"{item.Properties.GetProperty(PropertyId.SpeechServiceResponse_JsonResult)}");
}
Console.WriteLine("Operation completed");

オーディオをアップロードする

非同期での文字起こしを実行する前に、Speech SDK を使用して音声を会議の文字起こしサービスに送信する必要があります。

このコード例では、非同期モード専用に会議の文字起こし機能を作成する方法を示します。オーディオを文字起こし機能にストリーミングするには、Speech SDK を使ってリアルタイムで会議の文字起こしを行う方法に関する記事で作成したオーディオストリーミングコードを追加する必要があります。そのトピックの制限事項に関するセクションを参照し、サポートされているプラットフォームと言語 API を確認してください。

// Create the speech config object
// Substitute real information for "YourSubscriptionKey" and "Region"
SpeechConfig speechConfig = SpeechConfig.fromSubscription("YourSubscriptionKey", "Region");
speechConfig.setProperty("ConversationTranscriptionInRoomAndOnline", "true");

// Set the property for asynchronous transcription
speechConfig.setServiceProperty("transcriptionMode", "async", ServicePropertyChannel.UriQueryParameter);

// Set the property for real-time plus asynchronous transcription
//speechConfig.setServiceProperty("transcriptionMode", "RealTimeAndAsync", ServicePropertyChannel.UriQueryParameter);

// pick a meeting Id that is a GUID.
String meetingId = UUID.randomUUID().toString();

// Create a Meeting
Future<Meeting> meetingFuture = Meeting.createMeetingAsync(speechConfig, meetingId);
Meeting meeting = meetingFuture.get();

// Create an audio stream from a wav file or from the default microphone if you want to stream live audio from the supported devices
// Replace with your own audio file name and Helper class which implements AudioConfig using PullAudioInputStreamCallback
PullAudioInputStreamCallback wavfilePullStreamCallback = Helper.OpenWavFile("16kHz16Bits8channelsOfRecordedPCMAudio.wav");
// Create an audio stream format assuming the file used above is 16kHz, 16 bits and 8 channel pcm wav file
AudioStreamFormat audioStreamFormat = AudioStreamFormat.getWaveFormatPCM((long)16000, (short)16,(short)8);
// Create an input stream
AudioInputStream audioStream = AudioInputStream.createPullStream(wavfilePullStreamCallback, audioStreamFormat);

// Create a meeting transcriber
MeetingTranscriber transcriber = new MeetingTranscriber(AudioConfig.fromStreamInput(audioStream));

// join a meeting
transcriber.joinMeetingAsync(meeting);

// Add the event listener for the real-time events
transcriber.transcribed.addEventListener((o, e) -> {
    System.out.println("Meeting transcriber Recognized:" + e.toString());
});

transcriber.canceled.addEventListener((o, e) -> {
    System.out.println("Meeting transcriber canceled:" + e.toString());
    try {
        transcriber.stopTranscribingAsync().get();
    } catch (InterruptedException ex) {
        ex.printStackTrace();
    } catch (ExecutionException ex) {
        ex.printStackTrace();
    }
});

transcriber.sessionStopped.addEventListener((o, e) -> {
    System.out.println("Meeting transcriber stopped:" + e.toString());

    try {
        transcriber.stopTranscribingAsync().get();
    } catch (InterruptedException ex) {
        ex.printStackTrace();
    } catch (ExecutionException ex) {
        ex.printStackTrace();
    }
});

// start the transcription.
Future<?> future = transcriber.startTranscribingAsync();
...

リアルタイム "かつ" 非同期で行う場合は、次のように、適切なコード行をコメント化およびコメント解除します。

// Set the property for asynchronous transcription
//speechConfig.setServiceProperty("transcriptionMode", "async", ServicePropertyChannel.UriQueryParameter);

// Set the property for real-time plus asynchronous transcription
speechConfig.setServiceProperty("transcriptionMode", "RealTimeAndAsync", ServicePropertyChannel.UriQueryParameter);

文字起こしの結果を取得する

ここで示すコードで必要な remote-meeting バージョン 1.8.0 は、Windows および Linux での Java (1.8.0 以降) に対してのみサポートされています。

非同期会議クライアント SDK の入手

次のように pom.xml ファイルを編集することによって、remote-meeting を取得できます。

ファイルの最後にある、終了タグ </project> の前に、Speech SDK 用の Maven リポジトリへの参照を含む repositories 要素を作成します。

<repositories>
  <repository>
    <id>maven-cognitiveservices-speech</id>
    <name>Microsoft Cognitive Services Speech Maven Repository</name>
    <url>https://azureai.azureedge.net/maven/</url>
  </repository>
</repositories>

また、依存関係として remotemeeting-client-sdk 1.8.0 を指定して、dependencies 要素を追加します。

<dependencies>
  <dependency>
    <groupId>com.microsoft.cognitiveservices.speech.remotemeeting</groupId>
    <artifactId>remote-meeting</artifactId>
    <version>1.8.0</version>
  </dependency>
</dependencies>

変更を保存します

サンプルの文字起こしコード

meetingId を取得した後、クライアントアプリケーションでリモート会議の文字起こしクライアント RemoteMeetingTranscriptionClient を作成して、非同期での文字起こしの状態を照会します。 RemoteMeetingTranscriptionClient の GetTranscriptionOperation メソッドを使用して、PollerFlux オブジェクトを取得します。 PollerFlux オブジェクトには、リモート操作の状態 RemoteMeetingTranscriptionOperation に関する情報と最終的な結果 RemoteMeetingTranscriptionResult が格納されます。操作が完了したら、SyncPoller の getFinalResult を呼び出して、RemoteMeetingTranscriptionResult を取得します。このコードでは、結果の内容をシステム出力に単に出力します。

// Create the speech config object
SpeechConfig speechConfig = SpeechConfig.fromSubscription("YourSubscriptionKey", "Region");

// Create a remote Meeting Transcription client
RemoteMeetingTranscriptionClient client = new RemoteMeetingTranscriptionClient(speechConfig);

// Get the PollerFlux for the remote operation
PollerFlux<RemoteMeetingTranscriptionOperation, RemoteMeetingTranscriptionResult> remoteTranscriptionOperation = client.GetTranscriptionOperation(meetingId);

// Subscribe to PollerFlux to get the remote operation status
remoteTranscriptionOperation.subscribe(
        pollResponse -> {
            System.out.println("Poll response status : " + pollResponse.getStatus());
            System.out.println("Poll response status : " + pollResponse.getValue().getServiceStatus());
        }
);

// Obtain the blocking operation using getSyncPoller
SyncPoller<RemoteMeetingTranscriptionOperation, RemoteMeetingTranscriptionResult> blockingOperation =  remoteTranscriptionOperation.getSyncPoller();

// Wait for the operation to finish
blockingOperation.waitForCompletion();

// Get the final result response
RemoteMeetingTranscriptionResult resultResponse = blockingOperation.getFinalResult();

// Print the result
if(resultResponse != null) {
    if(resultResponse.getMeetingTranscriptionResults() != null) {
        for (int i = 0; i < resultResponse.getMeetingTranscriptionResults().size(); i++) {
            MeetingTranscriptionResult result = resultResponse.getMeetingTranscriptionResults().get(i);
            System.out.println(result.getProperties().getProperty(PropertyId.SpeechServiceResponse_JsonResult.name()));
            System.out.println(result.getProperties().getProperty(PropertyId.SpeechServiceResponse_JsonResult));
            System.out.println(result.getOffset());
            System.out.println(result.getDuration());
            System.out.println(result.getUserId());
            System.out.println(result.getReason());
            System.out.println(result.getResultId());
            System.out.println(result.getText());
            System.out.println(result.toString());
        }
    }
}

System.out.println("Operation finished");

次のステップ

GitHub でサンプルを詳しく見てみる

次の方法で共有

非同期会議の文字起こし

非同期またはリアルタイムと非同期

オーディオをアップロードする

文字起こしの結果を取得する

サンプルの文字起こしコード

オーディオをアップロードする

文字起こしの結果を取得する

非同期会議クライアント SDK の入手

サンプルの文字起こしコード

次のステップ

フィードバック

フィードバック

その他のリソース