Hi @Ashley,
The latency of 7 seconds is likely caused by a few configurable factors like Silence Timeouts, Prompt Playback Interrupt Handling or Speech Model Initialization. Here’s what you can do to improve speech recognition response time:
- In Java, using
CallMediaRecognizeSpeechOptions
:CallMediaRecognizeSpeechOptions options = new CallMediaRecognizeSpeechOptions(targetParticipant, Duration.ofMillis(500)) .setInitialSilenceTimeout(Duration.ofSeconds(2)) .setInterruptPrompt(true) .setSpeechLanguage("en-US");
-
InitialSilenceTimeout
=> wait time to start listening (e.g. 2 seconds). -
endSilenceTimeout
(passed in constructor) => wait after speech stops (e.g. 500 ms) https://learn.microsoft.com/en-us/javascript/api/%40azure/communication-call-automation/callmediarecognizespeechoptions?view=azure-node-latest
-
- Set
setInterruptPrompt(true)
to immediately listen when a user interrupts the prompt. This avoids waiting for the complete prompt playback. - For faster and more accurate recognition:
options.setSpeechRecognitionModelEndpointId("<your-custom-model-endpoint>");
- This uses a custom or pre-tuned speech model. https://learn.microsoft.com/en-us/azure/communication-services/how-tos/call-automation/recognize-action?pivots=programming-language-javascript
- To avoid cold-start latency, trigger a dummy recognition request (even silent) at the beginning of the session. This primes the speech model.
Reference:
https://learn.microsoft.com/en-us/azure/communication-services/concepts/call-automation/recognize-ai-action
https://learn.microsoft.com/en-us/java/api/com.azure.communication.callautomation.models.callmediarecognizespeechoptions?view=azure-java-stable
https://learn.microsoft.com/en-us/java/api/overview/azure/communication-callautomation-readme?view=azure-java-stable
Hope this information is helpful, if you have any further concerns or queries, please feel free to reach out to us.