Reducing Latency in Azure Communication Services Speech Recognition

Question

Reducing Latency in Azure Communication Services Speech Recognition

Ashley 60

I am testing the Azure Communication Services Call Automation OpenAI sample locally for a similar use case. The initial message is promptly played when the call is connected. However, I'm experiencing a latency of 7 seconds from when speech is input during a call to when it is recognized in the RecognizeCompleted event. What steps can be taken to reduce this latency?

Siva Nair 2,420 Reputation points Microsoft External Staff Moderator

2025-06-16T22:01:37.79+00:00

Hi Ashley,

Just checking back if the above response was helpful, if you have any questions or concerns please, feel free to post back.

Thanks
Ashley 60 Reputation points

2025-06-17T02:42:52.6266667+00:00

I'll test and get back. Thanks!!
Sampath 3,750 Reputation points Microsoft External Staff Moderator

2025-06-17T10:29:09.3733333+00:00

Hi Ashley, Thank you for the follow-up and update. Please get back to us. Thank you.
Ashley 60 Reputation points

2025-06-17T12:01:03.9833333+00:00

Hi @Bhargavi Naragani , Can you please share some more details on :
"To avoid cold-start latency, trigger a dummy recognition request (even silent) at the beginning of the session. This primes the speech model."
I tried implementing the configurations for CallMediaRecognizeSpeechOptions that you had suggested in your answer, but I am getting Microsoft.Communication.RecognizeFailed event triggered with the below reason:{"code":400,"subCode":8510,"message":"Action failed, initial silence timeout reached."}
After this event, it does not recognize anything else. Passing empty text for playback throws an exception during playback. Additionally, I do not want a custom message to be played every time the RecognizeFailed event occurs.
Kindly check.
Sampath 3,750 Reputation points Microsoft External Staff Moderator

2025-06-19T15:26:45.52+00:00

Hi Ashley,

Just checking back if the above response was helpful, if you have any questions or concerns please, feel free to post back.

Thanks

2 answers

Your answer

Siva Nair 2,420 Reputation points Microsoft External Staff Moderator

2025-06-16T22:01:37.79+00:00

Hi Ashley,

Just checking back if the above response was helpful, if you have any questions or concerns please, feel free to post back.

Thanks
Ashley 60 Reputation points

2025-06-17T02:42:52.6266667+00:00

I'll test and get back. Thanks!!
Sampath 3,750 Reputation points Microsoft External Staff Moderator

2025-06-17T10:29:09.3733333+00:00

Hi Ashley, Thank you for the follow-up and update. Please get back to us. Thank you.
Ashley 60 Reputation points

2025-06-17T12:01:03.9833333+00:00

Hi @Bhargavi Naragani , Can you please share some more details on :
"To avoid cold-start latency, trigger a dummy recognition request (even silent) at the beginning of the session. This primes the speech model."
I tried implementing the configurations for CallMediaRecognizeSpeechOptions that you had suggested in your answer, but I am getting Microsoft.Communication.RecognizeFailed event triggered with the below reason:{"code":400,"subCode":8510,"message":"Action failed, initial silence timeout reached."}
After this event, it does not recognize anything else. Passing empty text for playback throws an exception during playback. Additionally, I do not want a custom message to be played every time the RecognizeFailed event occurs.
Kindly check.
Sampath 3,750 Reputation points Microsoft External Staff Moderator

2025-06-19T15:26:45.52+00:00

Hi Ashley,

Just checking back if the above response was helpful, if you have any questions or concerns please, feel free to post back.

Thanks

Answer 1

Hi @Ashley,

The latency of 7 seconds is likely caused by a few configurable factors like Silence Timeouts, Prompt Playback Interrupt Handling or Speech Model Initialization. Here’s what you can do to improve speech recognition response time:

In Java, using CallMediaRecognizeSpeechOptions:
```
   CallMediaRecognizeSpeechOptions options =
   new CallMediaRecognizeSpeechOptions(targetParticipant, Duration.ofMillis(500))
     .setInitialSilenceTimeout(Duration.ofSeconds(2))
     .setInterruptPrompt(true)
     .setSpeechLanguage("en-US");
```
- InitialSilenceTimeout => wait time to start listening (e.g. 2 seconds).
- endSilenceTimeout (passed in constructor) => wait after speech stops (e.g. 500 ms) https://learn.microsoft.com/en-us/javascript/api/%40azure/communication-call-automation/callmediarecognizespeechoptions?view=azure-node-latest
Set setInterruptPrompt(true) to immediately listen when a user interrupts the prompt. This avoids waiting for the complete prompt playback.
For faster and more accurate recognition:
```
   options.setSpeechRecognitionModelEndpointId("<your-custom-model-endpoint>");
```
- This uses a custom or pre-tuned speech model. https://learn.microsoft.com/en-us/azure/communication-services/how-tos/call-automation/recognize-action?pivots=programming-language-javascript
To avoid cold-start latency, trigger a dummy recognition request (even silent) at the beginning of the session. This primes the speech model.

Reference:
https://learn.microsoft.com/en-us/azure/communication-services/concepts/call-automation/recognize-ai-action
https://learn.microsoft.com/en-us/java/api/com.azure.communication.callautomation.models.callmediarecognizespeechoptions?view=azure-java-stable
https://learn.microsoft.com/en-us/java/api/overview/azure/communication-callautomation-readme?view=azure-java-stable

Hope this information is helpful, if you have any further concerns or queries, please feel free to reach out to us.

Answer 2

Hi @Ashley,
The RecognizeFailed with subCode 8510 indicates that the initial silence timeout was reached before any speech was detected. Once this happens, the recognizer session ends and won’t listen again unless you explicitly restart it. That’s why nothing else is recognized afterward.

Here’s how you can work around this while still priming the speech model:

Option 1 is to use a Short, Non-Silent Prompt with interruptPrompt=true**

Instead of a silent or empty prompt (which throws), use a very short audio clip (e.g., “Hello” or a 100ms tone) and set:


.setInterruptPrompt(true)

.setInitialSilenceTimeout(Duration.ofSeconds(1))

This primes the recognizer and allows the user to interrupt immediately. It avoids the RecognizeFailed event while still warming up the model.

Option 2 is to Catch and Recover from RecognizeFailed Gracefully**

If you still want to send a dummy recognition request without a prompt, wrap it in a try-catch and immediately follow it with a real recognition request:


// Dummy recognize to warm up

try {

    startDummyRecognition(); // expect it to fail silently

} catch (Exception e) {

    // Log and ignore

}

// Immediately start real recognition

startActualRecognition();

This avoids user-facing errors and keeps the recognizer active.

Option 3 is to use a Custom Speech Endpoint

If latency is critical, consider using a custom speech model hosted in your region. These models often initialize faster and are more responsive to domain-specific vocabulary.

Share via

Reducing Latency in Azure Communication Services Speech Recognition

2 answers

Your answer