Is Azure Speech-to-Text API Charging Based on Audio Length or Duration When No Audio Data is Sent During Pauses?

Hieu Phan 0 Reputation points
2024-10-25T09:14:23.9133333+00:00

I am using WebSocket to capture audio input from the user's microphone.

This audio data is then sent for continuous recognition (speech-to-text).

My application has the functionality to pause the microphone. When paused, since Azure Speech-to-Text doesn’t provide a pause function, I stop the recognition. After the user clicks the continue button, I restart the recognition.

However, restarting the recognition takes about 3-5 seconds, causing a delay for the user.

I am considering not stopping the recognition API. Instead, I might not send any audio data during the pause. In this case, would the API charge me based on the length of audio processed, or the duration from start to stop?

Azure AI Speech
Azure AI Speech
An Azure service that integrates speech processing into apps and services.
2,069 questions
0 comments No comments
{count} votes

2 answers

Sort by: Most helpful
  1. Sina Salam 22,031 Reputation points Volunteer Moderator
    2024-10-25T11:52:38.1366667+00:00

    Hello Hieu Phan,

    Welcome to the Microsoft Q&A and thank you for posting your questions here.

    I understand that you would like to verify, if Azure Speech-to-Text API Charging Based on Audio Length or Duration When No Audio Data is Sent During Pauses.

    Azure Speech-to-Text API charges are based on the duration of audio processed, not the length of audio sent. This means that even if no audio data is sent during pauses, you will not be charged for that time. Flexible pricing gives you the power and control you need pay for only what you use, with no upfront costs. - https://azure.microsoft.com/en-gb/products/ai-services/ai-speech#Pricing

    With Speech, pay as you go based on:

    • The number of hours of audio you transcribe or translate for speech to text and speech translation.
    • The number of characters you convert to audio for text to speech
    • The number of transactions for Speaker Recognition

    I hope this is helpful! Do not hesitate to let me know if you have any other questions.


    Please don't forget to close up the thread here by upvoting and accept it as an answer if it is helpful.

    0 comments No comments

  2. kothapally Snigdha 3,020 Reputation points Microsoft External Staff Moderator
    2024-10-25T12:51:14.2466667+00:00

    Hello Hieu Phan,

    Thanks for Reaching the Microsoft Q&A Forum.

    When using Azure Speech-to-Text with the start continuous recognition method, you will be charged based on the entire duration of audio processed, not just the recognized speech. Azure AI Speech pricing is based on. can you please refer this link Here

    • The number of hours of audio you transcribe or translate for speech to text and speech translation.
    • The number of characters you convert to audio for text to speech.
    • The number of transactions for speaker recognition.If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.