Reduce latency in text to speech microsoft speedh SDK

Nas 0 Reputation points
2024-03-31T23:45:40.74+00:00

I am using Microsoft-cognitiveservices-speech-sdk in a react codebase, what is the best way to reduce latency and get output as fast as possible?

When I give it a text, the amount of seconds before the output is played is just too much and need a way to make this more real-time. Is there a way to start playing the sound as the transcribe is being done instead of waiting for the entire text to be synthesized?

  
   speechSynthesizer.synthesizing = () => {

     // Start playing audio
     };

Right now, I am playing the sound with below sample code, it works but takes so much time to start speaking:

 speechSynthesizerRef.current.speakTextAsync(
      text,
      (result) => {
        audioContext.current.decodeAudioData(result.audioData, (buffer) => {
          if (result.reason === ResultReason.SynthesizingAudioCompleted) {
        

            const newBufferSource = audioContext.current.createBufferSource();
            newBufferSource.connect(gainNode);
            gainNode.connect(audioContext.current.destination);

            newBufferSource.buffer = buffer;
            newBufferSource.start(0);
        
          }

        });
      },
      (err) => {
        console.error('Speech synthesis error:', err);
      
      }
    );
Azure AI Speech
Azure AI Speech
An Azure service that integrates speech processing into apps and services.
1,555 questions
{count} votes

1 answer

Sort by: Most helpful
  1. navba-MSFT 20,810 Reputation points Microsoft Employee
    2024-04-01T04:52:41.5966667+00:00

    @Nas Welcome to Microsoft Q&A Forum, Thank you for posting your query here!

    Please note, Microsoft does not publish any SLA for latency. Latency is a combination of many factors, including your network and client performance, especially when accessing lesser-used voices in text-to-speech.

    Suggestions:

    • Try with most recent version of the SDK and check if you still encounter same issue.
    • Please measure the Latency: The Speech SDK provides properties to measure the latency. You can use SpeechServiceResponse_SynthesisFirstByteLatencyMs to measure the time delay between the start of the synthesis task and receipt of the first chunk of audio data. Similarly, SpeechServiceResponse_SynthesisFinishLatencyMs can be used to measure the time delay between the start of the synthesis task and the receipt of the whole synthesized audio data.
    var result = await synthesizer.speakTextAsync(text);
    console.log(`first byte latency: ${result.properties.getProperty(PropertyId.SpeechServiceResponse_SynthesisFirstByteLatencyMs)} ms`);
    console.log(`finish latency: ${result.properties.getProperty(PropertyId.SpeechServiceResponse_SynthesisFinishLatencyMs)} ms`);
    
    

    To lower speech synthesis latency using Speech SDK there are a few best practices to lower the latency and bring the best performance to your end users. Please follow the recommendations available here:

    https://learn.microsoft.com/en-us/azure/ai-services/speech-service/how-to-lower-speech-synthesis-latency?pivots=programming-language-csharp

    If the above suggestions, doesn't help you can enable the JS SDK logging as shown below:

    sdk.Diagnostics.SetLoggingLevel(sdk.LogLevel.Debug); sdk.Diagnostics.SetLogOutputPath("LogfilePathAndName");
    

    Hope this helps. If you have any follow-up questions, please let me know. I would be happy to help.

    0 comments No comments