Azure Text to Speech Synthesizer.WordBoundary method not working to get word audio duration

Samir 21 Reputation points
2022-05-14T01:45:03.343+00:00

I am using Azure Cognitive Service to generate audio files, and I would like to get speech marks for the generated audio. As per the following link, WordBoundary event should give me the data I am looking for.

https://learn.microsoft.com/en-us/dotnet/api/microsoft.cognitiveservices.speech.speechsynthesizer.wordboundary?view=azure-dotnet
https://learn.microsoft.com/en-us/dotnet/api/microsoft.cognitiveservices.speech.speechsynthesiswordboundaryeventargs?view=azure-dotnet

Additionally, I have found a sample which gives explanation on how to bind the event and get speech marks using WordBoundary method.
https://github.com/Azure-Samples/cognitive-services-speech-sdk/blob/master/samples/csharp/sharedcontent/console/speech_synthesis_samples.cs

Method name: SynthesisWordBoundaryEventAsync

Based on this, I have created Azure function to return required data, however the WordBoundary event is not firing. Only difference I see that the sample code seems to be created for Console App and I am trying to use Azure function.

Any feedback would be helpful.

Here is my updated function based on a sample provided on GitHub.
text = text for given specific language to generate audio file
config = azure service config with given language and voice type settings.

private static async Task SynthesisWordBoundaryEventAsync(string text, SpeechConfig config)
{

        // Creates a speech synthesizer with a null output stream.  
        // This means the audio output data will not be written to any stream.  
        // You can just get the audio from the result.  
        using (var synthesizer = new SpeechSynthesizer(config, null as AudioConfig))  
        {  
            // Subscribes to word boundary event  
            synthesizer.WordBoundary += (s, e) =>  
            {  
                // The unit of e.AudioOffset is tick (1 tick = 100 nanoseconds), divide by 10,000 to convert to milliseconds.  
                Console.WriteLine($"Word boundary event received. Audio offset: " +  
                        $"{(e.AudioOffset + 5000) / 10000}ms, text offset: {e.TextOffset}, word length: {e.WordLength}.");  
            };  

            using (var result = await synthesizer.SpeakTextAsync(text))  
            {  
                if (result.Reason == ResultReason.SynthesizingAudioCompleted)  
                {  
                    Console.WriteLine($"Speech synthesized for text .");  
                    var audioData = result.AudioData;  
                    Console.WriteLine($"{audioData.Length} bytes of audio data received for text [{text}]");  
                }  
                else if (result.Reason == ResultReason.Canceled)  
                {  
                    var cancellation = SpeechSynthesisCancellationDetails.FromResult(result);  
                    Console.WriteLine($"CANCELED: Reason={cancellation.Reason}");  

                    if (cancellation.Reason == CancellationReason.Error)  
                    {  
                        Console.WriteLine($"CANCELED: ErrorCode={cancellation.ErrorCode}");  
                        Console.WriteLine($"CANCELED: ErrorDetails=[{cancellation.ErrorDetails}]");  
                        Console.WriteLine($"CANCELED: Did you update the subscription info?");  
                    }  
                }  
            }  
        }  
    }  

Thanks,
Samir

Azure AI Speech
Azure AI Speech
An Azure service that integrates speech processing into apps and services.
2,069 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Yulin Li 6 Reputation points Microsoft Employee
    2022-05-19T10:09:31.507+00:00

    Hi @Samir , I checked your codes and it looks good.

    Could you share with us your SDK log (https://learn.microsoft.com/en-us/azure/cognitive-services/speech-service/how-to-use-logging)

    If you cannot upload file here, you can open an issue in our GitHub sample repo

    1 person found this answer helpful.

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.