30 secs timeout on Azure speech to text

Nandhu TS 0 Reputation points
2025-06-19T06:33:44.1833333+00:00

Hello,

I'm experiencing an issue with Azure Speech-to-Text where, in continuous recognition mode, it outputs a RECOGNIZED result every 30 seconds, regardless of whether speech has stopped. Adjusting settings like Speech_SegmentationSilenceTimeoutMs has not resolved the problem.

       I found a few posts where i saw similar issue and no solutions were provided. Im using a S0 speech service and there is no concurrency as this is not used by anyone else. Im using nodeJs sdk and using a script as of now to test this scenario. Can anyone help me with what this issue is ?  

Thanks and Regards,

Nandhu TS

Azure AI Speech
Azure AI Speech
An Azure service that integrates speech processing into apps and services.
2,064 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. Ravada Shivaprasad 535 Reputation points Microsoft External Staff Moderator
    2025-06-19T23:22:11.17+00:00

    Hi Nandhu TS

    You're encountering a known issue with Azure Speech-to-Text in continuous recognition mode where the service emits a RECOGNIZED result every 30 seconds, even when no speech is present. This behavior has been reported by other users, including a recent post on Microsoft Q&A that describes the exact same symptoms: using the Node.js SDK with an S0 speech service, no concurrency, and attempts to adjust Speech_SegmentationSilenceTimeoutMs proving ineffective. The post confirms that this issue persists across similar setups and lacks a documented solution so far.

    The underlying cause appears to be tied to how the SDK handles silence and segmentation in continuous recognition mode. While the Speech_SegmentationSilenceTimeoutMs setting is intended to control silence-based segmentation, the SDK may still enforce a default periodic segmentation interval, possibly for responsiveness or internal buffering reasons. This behavior is not fully documented, and similar challenges have been noted in other SDKs and languages, such as Python and C#, where users struggle to implement continuous recognition without unexpected segmentation. For example, a Stack Overflow thread discusses the lack of clarity around continuous recognition implementation across SDKs: Stack Overflow - Azure speech-to-text continuous recognition

    Additionally, latency and segmentation issues have been raised in GitHub issues for the Azure-Samples speech SDK, particularly in streaming scenarios using recognize_once, which may share internal mechanisms with continuous recognition: GitHub Issue - High latency in speech to text

    Hope it Helps!

    Thanks


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.