Azure Speech Services, including the Speech-to-Text service, typically charge based on the duration of audio processed by the service, rather than the duration of an open connection or session. This means that if you're using a push stream for transcribing but not sending any audio data to it, you should not be incurring charges for that idle time.
https://azure.microsoft.com/en-us/pricing/details/cognitive-services/speech-services/