@DevKya The billing is done based on audio hours for speech to text and it starts as soon as the service receives audio after you call start_continuous_recognition_async() or the corresponding REST API. I understand a voice or meaningful audio might start later in the stream or audio file but from the service perspective the start and end of audio or audio file duration is used to calculate billing. In the above case, the audio passed between start_continuous_recognition_async() and stop_continuous_recognition_async(). I hope this helps!!
If this answers your query, do click Accept Answer
and Yes
for was this answer helpful. And, if you have any further query do let us know.