Hi Nandhu TS
You're encountering a known issue with Azure Speech-to-Text in continuous recognition mode where the service emits a RECOGNIZED
result every 30 seconds, even when no speech is present. This behavior has been reported by other users, including a recent post on Microsoft Q&A that describes the exact same symptoms: using the Node.js SDK with an S0 speech service, no concurrency, and attempts to adjust Speech_SegmentationSilenceTimeoutMs
proving ineffective. The post confirms that this issue persists across similar setups and lacks a documented solution so far.
The underlying cause appears to be tied to how the SDK handles silence and segmentation in continuous recognition mode. While the Speech_SegmentationSilenceTimeoutMs
setting is intended to control silence-based segmentation, the SDK may still enforce a default periodic segmentation interval, possibly for responsiveness or internal buffering reasons. This behavior is not fully documented, and similar challenges have been noted in other SDKs and languages, such as Python and C#, where users struggle to implement continuous recognition without unexpected segmentation. For example, a Stack Overflow thread discusses the lack of clarity around continuous recognition implementation across SDKs: Stack Overflow - Azure speech-to-text continuous recognition
Additionally, latency and segmentation issues have been raised in GitHub issues for the Azure-Samples speech SDK, particularly in streaming scenarios using recognize_once
, which may share internal mechanisms with continuous recognition: GitHub Issue - High latency in speech to text
Hope it Helps!
Thanks