I am using Azure Cognitive Service to generate audio files, and I would like to get speech marks for the generated audio. As per the following link, WordBoundary event should give me the data I am looking for.
https://learn.microsoft.com/en-us/dotnet/api/microsoft.cognitiveservices.speech.speechsynthesizer.wordboundary?view=azure-dotnet
https://learn.microsoft.com/en-us/dotnet/api/microsoft.cognitiveservices.speech.speechsynthesiswordboundaryeventargs?view=azure-dotnet
Additionally, I have found a sample which gives explanation on how to bind the event and get speech marks using WordBoundary method.
https://github.com/Azure-Samples/cognitive-services-speech-sdk/blob/master/samples/csharp/sharedcontent/console/speech_synthesis_samples.cs
Method name: SynthesisWordBoundaryEventAsync
Based on this, I have created Azure function to return required data, however the WordBoundary event is not firing. Only difference I see that the sample code seems to be created for Console App and I am trying to use Azure function.
Any feedback would be helpful.
Here is my updated function based on a sample provided on GitHub.
text = text for given specific language to generate audio file
config = azure service config with given language and voice type settings.
private static async Task SynthesisWordBoundaryEventAsync(string text, SpeechConfig config)
{
// Creates a speech synthesizer with a null output stream.
// This means the audio output data will not be written to any stream.
// You can just get the audio from the result.
using (var synthesizer = new SpeechSynthesizer(config, null as AudioConfig))
{
// Subscribes to word boundary event
synthesizer.WordBoundary += (s, e) =>
{
// The unit of e.AudioOffset is tick (1 tick = 100 nanoseconds), divide by 10,000 to convert to milliseconds.
Console.WriteLine($"Word boundary event received. Audio offset: " +
$"{(e.AudioOffset + 5000) / 10000}ms, text offset: {e.TextOffset}, word length: {e.WordLength}.");
};
using (var result = await synthesizer.SpeakTextAsync(text))
{
if (result.Reason == ResultReason.SynthesizingAudioCompleted)
{
Console.WriteLine($"Speech synthesized for text .");
var audioData = result.AudioData;
Console.WriteLine($"{audioData.Length} bytes of audio data received for text [{text}]");
}
else if (result.Reason == ResultReason.Canceled)
{
var cancellation = SpeechSynthesisCancellationDetails.FromResult(result);
Console.WriteLine($"CANCELED: Reason={cancellation.Reason}");
if (cancellation.Reason == CancellationReason.Error)
{
Console.WriteLine($"CANCELED: ErrorCode={cancellation.ErrorCode}");
Console.WriteLine($"CANCELED: ErrorDetails=[{cancellation.ErrorDetails}]");
Console.WriteLine($"CANCELED: Did you update the subscription info?");
}
}
}
}
}
Thanks,
Samir