I want to perform speaker identification in speech translation code (using MS Cognitive Services) in a way similar to the speech transcription code in the following (via accessing the SpeakerId property):
conversationTranscriber.Transcribed += (s, e) =>
{
if (e.Result.Reason == ResultReason.RecognizedSpeech)
{
Console.WriteLine($"TRANSCRIBED: Text={e.Result.Text} Speaker ID={e.Result.SpeakerId}");
}
else if (e.Result.Reason == ResultReason.NoMatch)
{
Console.WriteLine($"NOMATCH: Speech could not be transcribed.");
}
};
https://learn.microsoft.com/en-us/azure/ai-services/speech-service/get-started-stt-diarization?tabs=windows&pivots=programming-language-csharp
My current code does speech translation in a way similar to:
https://learn.microsoft.com/en-us/azure/ai-services/speech-service/get-started-speech-translation?tabs=windows%2Cterminal&pivots=programming-language-csharp
Is there a way of modifying the code in the speech translation quickstart above to get the SpeakerId property (with only one call to MS Azure)? Or, is there an alternative way of achieving this with only one call to MS Azure?
NOTE: I would prefer to avoid making two calls from my code to MS Azure by: 1st transcribing the speech (and getting the data in the SpeakerId property) and then 2nd making a call to MS Azure to machine translate the transcribed speech. This is because I’m developing a real-time app and making two calls to MS Azure would likely be inefficient (i.e., I want to translate speech using one call to MS Azure while identifying the speaker).
NOTE: I did ask the virtual assistant (Q&A Assist) and got the following answer, but would like to check with a human professional:
"Unfortunately, it is not possible to get speaker identification in speech translation code using only one call to MS Cognitive Services. Speaker identification is only available in the speech-to-text transcription service, which is a separate service from the speech translation service. Therefore, you would need to make two separate calls to the MS Cognitive Services API to achieve both speaker identification and speech translation."