Asynchronous conversation transcription multichannel diarization

Note

This feature is currently in public preview. This preview is provided without a service-level agreement, and is not recommended for production workloads. Certain features might not be supported or might have constrained capabilities. For more information, see Supplemental Terms of Use for Microsoft Azure Previews.

In this article, asynchronous conversation transcription multichannel diarization is demonstrated using the RemoteMeetingTranscriptionClient API. If you configured conversation transcription to do asynchronous transcription and have a meetingId, you can obtain the transcription associated with that meetingId using the RemoteMeetingTranscriptionClient API.

Important

Conversation transcription multichannel diarization (preview) is retiring on March 28, 2025. For more information about migrating to other speech to text features, see Migrate away from conversation transcription multichannel diarization.

Asynchronous vs. real-time + asynchronous

With asynchronous transcription, you stream the meeting audio, but don't need a transcription returned in real-time. Instead, after the audio is sent, use the meetingId of Meeting to query for the status of the asynchronous transcription. When the asynchronous transcription is ready, you get a RemoteMeetingTranscriptionResult.

With real-time plus asynchronous, you get the transcription in real-time, but also get the transcription by querying with the meetingId (similar to asynchronous scenario).

Two steps are required to accomplish asynchronous transcription. The first step is to upload the audio, choosing either asynchronous only or real-time plus asynchronous. The second step is to get the transcription results.

Upload the audio

The first step for asynchronous transcription is to send the audio to the conversation transcription service using the Speech SDK.

This example code shows how to use conversation transcription in asynchronous-only mode. In order to stream audio to the transcriber, you need to add audio streaming code derived from the real-time conversation transcription quickstart.

async Task CompleteContinuousRecognition(MeetingTranscriber recognizer, string meetingId)
{
    var finishedTaskCompletionSource = new TaskCompletionSource<int>();

    recognizer.SessionStopped += (s, e) =>
    {
        finishedTaskCompletionSource.TrySetResult(0);
    };

    recognizer.Canceled += (s, e) => 
    {
        Console.WriteLine($"CANCELED: Reason={e.Reason}");
        if (e.Reason == CancellationReason.Error)
        {
            Console.WriteLine($"CANCELED: ErrorCode={e.ErrorCode}");
            Console.WriteLine($"CANCELED: ErrorDetails={e.ErrorDetails}");
            Console.WriteLine($"CANCELED: Did you update the subscription info?");
            throw new System.ApplicationException("${e.ErrorDetails}");
        }
        finishedTaskCompletionSource.TrySetResult(0);
    };

    await recognizer.StartTranscribingAsync().ConfigureAwait(false);
    
    // Waits for completion.
    // Use Task.WaitAny to keep the task rooted.
    Task.WaitAny(new[] { finishedTaskCompletionSource.Task });
    
    await recognizer.StopTranscribingAsync().ConfigureAwait(false);
}

async Task<List<string>> GetRecognizerResult(MeetingTranscriber recognizer, string meetingId)
{
    List<string> recognizedText = new List<string>();
    recognizer.Transcribed += (s, e) =>
    {
        if (e.Result.Text.Length > 0)
        {
            recognizedText.Add(e.Result.Text);
        }
    };

    await CompleteContinuousRecognition(recognizer, meetingId);

    recognizer.Dispose();
    return recognizedText;
}

async Task UploadAudio()
{
    // Create the speech config object
    // Substitute real information for "YourSubscriptionKey" and "Region"
    SpeechConfig speechConfig = SpeechConfig.FromSubscription("YourSubscriptionKey", "Region");
    speechConfig.SetProperty("ConversationTranscriptionInRoomAndOnline", "true");

    // Set the property for asynchronous transcription
    speechConfig.SetServiceProperty("transcriptionMode", "async", ServicePropertyChannel.UriQueryParameter);

    // Alternatively: set the property for real-time plus asynchronous transcription
    // speechConfig.setServiceProperty("transcriptionMode", "RealTimeAndAsync", ServicePropertyChannel.UriQueryParameter);

    // Create an audio stream from a wav file or from the default microphone if you want to stream live audio from the supported devices
    // Replace with your own audio file name and Helper class which implements AudioConfig using PullAudioInputStreamCallback
    PullAudioInputStreamCallback wavfilePullStreamCallback = Helper.OpenWavFile("16kHz16Bits8channelsOfRecordedPCMAudio.wav");
    // Create an audio stream format assuming the file used above is 16kHz, 16 bits and 8 channel pcm wav file
    AudioStreamFormat audioStreamFormat = AudioStreamFormat.GetWaveFormatPCM(16000, 16, 8);
    // Create an input stream
    AudioInputStream audioStream = AudioInputStream.CreatePullStream(wavfilePullStreamCallback, audioStreamFormat);

    // Ensure the meetingId for a new meeting is a truly unique GUID
    String meetingId = Guid.NewGuid().ToString();

    // Create a Meeting
    using (var meeting = await Meeting.CreateMeetingAsync(speechConfig, meetingId))
    {
        using (var meetingTranscriber = new MeetingTranscriber(AudioConfig.FromStreamInput(audioStream)))
        {
            await meetingTranscriber.JoinMeetingAsync(meeting);
            // Helper function to get the real-time transcription results
            var result = await GetRecognizerResult(meetingTranscriber, meetingId);
        }
    }
}

If you want real-time plus asynchronous, comment and uncomment the appropriate lines of code as follows:

// Set the property for asynchronous transcription
// speechConfig.SetServiceProperty("transcriptionMode", "async", ServicePropertyChannel.UriQueryParameter);

// Alternatively: set the property for real-time plus asynchronous transcription
speechConfig.SetServiceProperty("transcriptionMode", "RealTimeAndAsync", ServicePropertyChannel.UriQueryParameter);

Get transcription results

Install Microsoft.CognitiveServices.Speech.Remotemeeting version 1.13.0 or above via NuGet.

Sample transcription code

After you have the meetingId, create a remote meeting transcription client RemoteMeetingTranscriptionClient at the client application to query the status of the asynchronous transcription. Create an object of RemoteMeetingTranscriptionOperation to get a long running Operation object. You can check the status of the operation or wait for it to complete.

// Create the speech config
SpeechConfig config = SpeechConfig.FromSubscription("YourSpeechKey", "YourSpeechRegion");
// Create the speech client
RemoteMeetingTranscriptionClient client = new RemoteMeetingTranscriptionClient(config);
// Create the remote operation
RemoteMeetingTranscriptionOperation operation = 
                            new RemoteMeetingTranscriptionOperation(meetingId, client);

// Wait for operation to finish
await operation.WaitForCompletionAsync(TimeSpan.FromSeconds(10), CancellationToken.None);
// Get the result of the long running operation
var val = operation.Value.MeetingTranscriptionResults;
// Print the fields from the results
foreach (var item in val)
{
    Console.WriteLine($"{item.Text}, {item.ResultId}, {item.Reason}, {item.UserId}, {item.OffsetInTicks}, {item.Duration}");
    Console.WriteLine($"{item.Properties.GetProperty(PropertyId.SpeechServiceResponse_JsonResult)}");
}
Console.WriteLine("Operation completed");

Upload the audio

Before asynchronous conversation transcription can be performed, you need to send the audio to the conversation transcription service using the Speech SDK.

This example code shows how to use conversation transcription in asynchronous-only mode. In order to stream audio to the transcriber, you need to add audio streaming code derived from the real-time conversation transcription quickstart. Refer to the Limitations section of that topic to see the supported platforms and languages APIs.

// Create the speech config object
// Substitute real information for "YourSubscriptionKey" and "Region"
SpeechConfig speechConfig = SpeechConfig.fromSubscription("YourSubscriptionKey", "Region");
speechConfig.setProperty("ConversationTranscriptionInRoomAndOnline", "true");

// Set the property for asynchronous transcription
speechConfig.setServiceProperty("transcriptionMode", "async", ServicePropertyChannel.UriQueryParameter);

// Set the property for real-time plus asynchronous transcription
//speechConfig.setServiceProperty("transcriptionMode", "RealTimeAndAsync", ServicePropertyChannel.UriQueryParameter);

// pick a meeting Id that is a GUID.
String meetingId = UUID.randomUUID().toString();

// Create a Meeting
Future<Meeting> meetingFuture = Meeting.createMeetingAsync(speechConfig, meetingId);
Meeting meeting = meetingFuture.get();

// Create an audio stream from a wav file or from the default microphone if you want to stream live audio from the supported devices
// Replace with your own audio file name and Helper class which implements AudioConfig using PullAudioInputStreamCallback
PullAudioInputStreamCallback wavfilePullStreamCallback = Helper.OpenWavFile("16kHz16Bits8channelsOfRecordedPCMAudio.wav");
// Create an audio stream format assuming the file used above is 16kHz, 16 bits and 8 channel pcm wav file
AudioStreamFormat audioStreamFormat = AudioStreamFormat.getWaveFormatPCM((long)16000, (short)16,(short)8);
// Create an input stream
AudioInputStream audioStream = AudioInputStream.createPullStream(wavfilePullStreamCallback, audioStreamFormat);

// Create a meeting transcriber
MeetingTranscriber transcriber = new MeetingTranscriber(AudioConfig.fromStreamInput(audioStream));

// join a meeting
transcriber.joinMeetingAsync(meeting);

// Add the event listener for the real-time events
transcriber.transcribed.addEventListener((o, e) -> {
    System.out.println("Meeting transcriber Recognized:" + e.toString());
});

transcriber.canceled.addEventListener((o, e) -> {
    System.out.println("Meeting transcriber canceled:" + e.toString());
    try {
        transcriber.stopTranscribingAsync().get();
    } catch (InterruptedException ex) {
        ex.printStackTrace();
    } catch (ExecutionException ex) {
        ex.printStackTrace();
    }
});

transcriber.sessionStopped.addEventListener((o, e) -> {
    System.out.println("Meeting transcriber stopped:" + e.toString());

    try {
        transcriber.stopTranscribingAsync().get();
    } catch (InterruptedException ex) {
        ex.printStackTrace();
    } catch (ExecutionException ex) {
        ex.printStackTrace();
    }
});

// start the transcription.
Future<?> future = transcriber.startTranscribingAsync();
...

If you want real-time plus asynchronous, comment and uncomment the appropriate lines of code as follows:

// Set the property for asynchronous transcription
//speechConfig.setServiceProperty("transcriptionMode", "async", ServicePropertyChannel.UriQueryParameter);

// Set the property for real-time plus asynchronous transcription
speechConfig.setServiceProperty("transcriptionMode", "RealTimeAndAsync", ServicePropertyChannel.UriQueryParameter);

Get transcription results

For the code shown here, you need remote-meeting version 1.8.0, supported only for Java (1.8.0 or above) on Windows, and Linux.

Obtaining the async meeting client SDK

You can obtain remote-meeting by editing your pom.xml file as follows.

  1. At the end of the file, before the closing tag </project>, create a repositories element with a reference to the Maven repository for the Speech SDK:

    <repositories>
      <repository>
        <id>maven-cognitiveservices-speech</id>
        <name>Microsoft Cognitive Services Speech Maven Repository</name>
        <url>https://azureai.azureedge.net/maven/</url>
      </repository>
    </repositories>
    
  2. Also add a dependencies element, with the remotemeeting-client-sdk 1.8.0 as a dependency:

    <dependencies>
      <dependency>
        <groupId>com.microsoft.cognitiveservices.speech.remotemeeting</groupId>
        <artifactId>remote-meeting</artifactId>
        <version>1.8.0</version>
      </dependency>
    </dependencies>
    
  3. Save the changes

Sample transcription code

After you have the meetingId, create a remote meeting transcription client RemoteMeetingTranscriptionClient at the client application to query the status of the asynchronous transcription. Use GetTranscriptionOperation method in RemoteMeetingTranscriptionClient to get a PollerFlux object. The PollerFlux object has information about the remote operation status RemoteMeetingTranscriptionOperation and the final result RemoteMeetingTranscriptionResult. Once the operation is finished, get RemoteMeetingTranscriptionResult by calling getFinalResult on a SyncPoller. In this code, we print the result contents to system output.

// Create the speech config object
SpeechConfig speechConfig = SpeechConfig.fromSubscription("YourSubscriptionKey", "Region");

// Create a remote Meeting Transcription client
RemoteMeetingTranscriptionClient client = new RemoteMeetingTranscriptionClient(speechConfig);

// Get the PollerFlux for the remote operation
PollerFlux<RemoteMeetingTranscriptionOperation, RemoteMeetingTranscriptionResult> remoteTranscriptionOperation = client.GetTranscriptionOperation(meetingId);

// Subscribe to PollerFlux to get the remote operation status
remoteTranscriptionOperation.subscribe(
        pollResponse -> {
            System.out.println("Poll response status : " + pollResponse.getStatus());
            System.out.println("Poll response status : " + pollResponse.getValue().getServiceStatus());
        }
);

// Obtain the blocking operation using getSyncPoller
SyncPoller<RemoteMeetingTranscriptionOperation, RemoteMeetingTranscriptionResult> blockingOperation =  remoteTranscriptionOperation.getSyncPoller();

// Wait for the operation to finish
blockingOperation.waitForCompletion();

// Get the final result response
RemoteMeetingTranscriptionResult resultResponse = blockingOperation.getFinalResult();

// Print the result
if(resultResponse != null) {
    if(resultResponse.getMeetingTranscriptionResults() != null) {
        for (int i = 0; i < resultResponse.getMeetingTranscriptionResults().size(); i++) {
            MeetingTranscriptionResult result = resultResponse.getMeetingTranscriptionResults().get(i);
            System.out.println(result.getProperties().getProperty(PropertyId.SpeechServiceResponse_JsonResult.name()));
            System.out.println(result.getProperties().getProperty(PropertyId.SpeechServiceResponse_JsonResult));
            System.out.println(result.getOffset());
            System.out.println(result.getDuration());
            System.out.println(result.getUserId());
            System.out.println(result.getReason());
            System.out.println(result.getResultId());
            System.out.println(result.getText());
            System.out.println(result.toString());
        }
    }
}

System.out.println("Operation finished");