Reducing Latency in Azure Communication Services Speech Recognition

Ashley 60 Reputation points
2025-06-16T14:26:28.1066667+00:00

I am testing the Azure Communication Services Call Automation OpenAI sample locally for a similar use case. The initial message is promptly played when the call is connected. However, I'm experiencing a latency of 7 seconds from when speech is input during a call to when it is recognized in the RecognizeCompleted event. What steps can be taken to reduce this latency?

Azure Communication Services
Azure Communication Services
An Azure communication platform for deploying applications across devices and platforms.
1,234 questions
{count} votes

2 answers

Sort by: Most helpful
  1. Bhargavi Naragani 5,350 Reputation points Microsoft External Staff Moderator
    2025-06-16T16:18:45.0366667+00:00

    Hi @Ashley,

    The latency of 7 seconds is likely caused by a few configurable factors like Silence Timeouts, Prompt Playback Interrupt Handling or Speech Model Initialization. Here’s what you can do to improve speech recognition response time:

    1. In Java, using CallMediaRecognizeSpeechOptions:
         CallMediaRecognizeSpeechOptions options =
         new CallMediaRecognizeSpeechOptions(targetParticipant, Duration.ofMillis(500))
           .setInitialSilenceTimeout(Duration.ofSeconds(2))
           .setInterruptPrompt(true)
           .setSpeechLanguage("en-US");
      
    2. Set setInterruptPrompt(true) to immediately listen when a user interrupts the prompt. This avoids waiting for the complete prompt playback.
    3. For faster and more accurate recognition:
         options.setSpeechRecognitionModelEndpointId("<your-custom-model-endpoint>");
      
    4. To avoid cold-start latency, trigger a dummy recognition request (even silent) at the beginning of the session. This primes the speech model.

    Reference:
    https://learn.microsoft.com/en-us/azure/communication-services/concepts/call-automation/recognize-ai-action
    https://learn.microsoft.com/en-us/java/api/com.azure.communication.callautomation.models.callmediarecognizespeechoptions?view=azure-java-stable
    https://learn.microsoft.com/en-us/java/api/overview/azure/communication-callautomation-readme?view=azure-java-stable

    Hope this information is helpful, if you have any further concerns or queries, please feel free to reach out to us.

    0 comments No comments

  2. Sampath 3,750 Reputation points Microsoft External Staff Moderator
    2025-06-18T13:07:00.78+00:00

    Hi @Ashley,
    The RecognizeFailed with subCode 8510 indicates that the initial silence timeout was reached before any speech was detected. Once this happens, the recognizer session ends and won’t listen again unless you explicitly restart it. That’s why nothing else is recognized afterward.

    Here’s how you can work around this while still priming the speech model:

    Option 1 is to use a Short, Non-Silent Prompt with interruptPrompt=true**

    Instead of a silent or empty prompt (which throws), use a very short audio clip (e.g., “Hello” or a 100ms tone) and set:

    
    .setInterruptPrompt(true)
    
    .setInitialSilenceTimeout(Duration.ofSeconds(1))
    
    

    This primes the recognizer and allows the user to interrupt immediately. It avoids the RecognizeFailed event while still warming up the model.

    Option 2 is to Catch and Recover from RecognizeFailed Gracefully**

    If you still want to send a dummy recognition request without a prompt, wrap it in a try-catch and immediately follow it with a real recognition request:

    
    // Dummy recognize to warm up
    
    try {
    
        startDummyRecognition(); // expect it to fail silently
    
    } catch (Exception e) {
    
        // Log and ignore
    
    }
    
    // Immediately start real recognition
    
    startActualRecognition();
    
    

    This avoids user-facing errors and keeps the recognizer active.

    Option 3 is to use a Custom Speech Endpoint

    If latency is critical, consider using a custom speech model hosted in your region. These models often initialize faster and are more responsive to domain-specific vocabulary.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.