Is Azure Speech-to-Text API Charging Based on Audio Length or Duration When No Audio Data is Sent During Pauses?

Question

Is Azure Speech-to-Text API Charging Based on Audio Length or Duration When No Audio Data is Sent During Pauses?

Hieu Phan 0

I am using WebSocket to capture audio input from the user's microphone.

This audio data is then sent for continuous recognition (speech-to-text).

My application has the functionality to pause the microphone. When paused, since Azure Speech-to-Text doesn’t provide a pause function, I stop the recognition. After the user clicks the continue button, I restart the recognition.

However, restarting the recognition takes about 3-5 seconds, causing a delay for the user.

I am considering not stopping the recognition API. Instead, I might not send any audio data during the pause. In this case, would the API charge me based on the length of audio processed, or the duration from start to stop?

2 answers

Your answer

Answer 1

Hello Hieu Phan,

Welcome to the Microsoft Q&A and thank you for posting your questions here.

I understand that you would like to verify, if Azure Speech-to-Text API Charging Based on Audio Length or Duration When No Audio Data is Sent During Pauses.

Azure Speech-to-Text API charges are based on the duration of audio processed, not the length of audio sent. This means that even if no audio data is sent during pauses, you will not be charged for that time. Flexible pricing gives you the power and control you need pay for only what you use, with no upfront costs. - https://azure.microsoft.com/en-gb/products/ai-services/ai-speech#Pricing

With Speech, pay as you go based on:

The number of hours of audio you transcribe or translate for speech to text and speech translation.
The number of characters you convert to audio for text to speech
The number of transactions for Speaker Recognition

I hope this is helpful! Do not hesitate to let me know if you have any other questions.

Please don't forget to close up the thread here by upvoting and accept it as an answer if it is helpful.

Answer 2

kothapally Snigdha 3,020 Microsoft External Staff Moderator

Hello Hieu Phan,

Thanks for Reaching the Microsoft Q&A Forum.

When using Azure Speech-to-Text with the start continuous recognition method, you will be charged based on the entire duration of audio processed, not just the recognized speech. Azure AI Speech pricing is based on. can you please refer this link Here

The number of hours of audio you transcribe or translate for speech to text and speech translation.
The number of characters you convert to audio for text to speech.
The number of transactions for speaker recognition.If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

kothapally Snigdha 3,020 Reputation points Microsoft External Staff Moderator

2024-10-28T16:26:45.5233333+00:00

Hello Hieu Phan,

Following up to see if the above response was helpful.

Thank you!
kothapally Snigdha 3,020 Reputation points Microsoft External Staff Moderator

2024-10-29T08:50:06.6333333+00:00

Hello Hieu Phan,

We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet. In case if you have any resolution, please do share that same with the community as it can be helpful to others.

Thank you!

Share via

Is Azure Speech-to-Text API Charging Based on Audio Length or Duration When No Audio Data is Sent During Pauses?

2 answers

Your answer