question

Samir-6734 avatar image
0 Votes"
Samir-6734 asked YulinLi-5027 answered

Azure Text to Speech Synthesizer.WordBoundary method not working to get word audio duration

I am using Azure Cognitive Service to generate audio files, and I would like to get speech marks for the generated audio. As per the following link, WordBoundary event should give me the data I am looking for.

https://docs.microsoft.com/en-us/dotnet/api/microsoft.cognitiveservices.speech.speechsynthesizer.wordboundary?view=azure-dotnet
https://docs.microsoft.com/en-us/dotnet/api/microsoft.cognitiveservices.speech.speechsynthesiswordboundaryeventargs?view=azure-dotnet

Additionally, I have found a sample which gives explanation on how to bind the event and get speech marks using WordBoundary method.
https://github.com/Azure-Samples/cognitive-services-speech-sdk/blob/master/samples/csharp/sharedcontent/console/speech_synthesis_samples.cs

Method name: SynthesisWordBoundaryEventAsync

Based on this, I have created Azure function to return required data, however the WordBoundary event is not firing. Only difference I see that the sample code seems to be created for Console App and I am trying to use Azure function.

Any feedback would be helpful.

Here is my updated function based on a sample provided on GitHub.
text = text for given specific language to generate audio file
config = azure service config with given language and voice type settings.

private static async Task SynthesisWordBoundaryEventAsync(string text, SpeechConfig config)
{

         // Creates a speech synthesizer with a null output stream.
         // This means the audio output data will not be written to any stream.
         // You can just get the audio from the result.
         using (var synthesizer = new SpeechSynthesizer(config, null as AudioConfig))
         {
             // Subscribes to word boundary event
             synthesizer.WordBoundary += (s, e) =>
             {
                 // The unit of e.AudioOffset is tick (1 tick = 100 nanoseconds), divide by 10,000 to convert to milliseconds.
                 Console.WriteLine($"Word boundary event received. Audio offset: " +
                         $"{(e.AudioOffset + 5000) / 10000}ms, text offset: {e.TextOffset}, word length: {e.WordLength}.");
             };

             using (var result = await synthesizer.SpeakTextAsync(text))
             {
                 if (result.Reason == ResultReason.SynthesizingAudioCompleted)
                 {
                     Console.WriteLine($"Speech synthesized for text .");
                     var audioData = result.AudioData;
                     Console.WriteLine($"{audioData.Length} bytes of audio data received for text [{text}]");
                 }
                 else if (result.Reason == ResultReason.Canceled)
                 {
                     var cancellation = SpeechSynthesisCancellationDetails.FromResult(result);
                     Console.WriteLine($"CANCELED: Reason={cancellation.Reason}");

                     if (cancellation.Reason == CancellationReason.Error)
                     {
                         Console.WriteLine($"CANCELED: ErrorCode={cancellation.ErrorCode}");
                         Console.WriteLine($"CANCELED: ErrorDetails=[{cancellation.ErrorDetails}]");
                         Console.WriteLine($"CANCELED: Did you update the subscription info?");
                     }
                 }
             }
         }
     }


Thanks,
Samir

azure-cognitive-servicesazure-speech
· 3
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

@Samir-6734 Did you try the sample with a console app to check if the word boundaries are returned? I couldn't find a sample that would help set this up with azure functions, I am not an expert on azure functions but I am checking internally if there is a sample that could help to run TTS with azure functions.

0 Votes 0 ·

@romungi-MSFT

No, I haven't try the sample call with console app. Let me try it tomorrow, and I will share my findings with you.

Thanks,
Samir

1 Vote 1 ·

@romungi-MSFT Just tried, and it did not work form a console app either.

0 Votes 0 ·

1 Answer

YulinLi-5027 avatar image
1 Vote"
YulinLi-5027 answered

Hi @Samir-6734, I checked your codes and it looks good.

Could you share with us your SDK log (https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/how-to-use-logging)

If you cannot upload file here, you can open an issue in our GitHub sample repo

5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.