Hi imgkj,
When using Whisper and GPT-4O transcription models in Azure OpenAI, the way you track and calculate costs is different for each model.
For Whisper:
- You are charged based on the length of your audio file.
- The cost is calculated per minute of audio. For example, if you transcribe a 10-minute audio file, you pay for 10 minutes.
- You don’t need to worry about the number of words or characters in the transcript—just the audio duration.
- You can check the exact pricing here: Azure AI Speech Pricing.
For GPT-4O Transcription:
You are charged based on the number of tokens in the transcript.
A token is about 4 characters of text (for English). For example, the word “hello” is one token.
To estimate your cost, count the number of characters in your transcript and divide by 4 to get the number of tokens.
- Multiply the number of tokens by the price per token.
- You can check the exact pricing here: Azure OpenAI Service Pricing.
How to Track Costs:
For Whisper: Just keep track of the total minutes of audio you process.
For GPT-4O: Keep track of the number of characters in your transcripts, divide by 4 to estimate tokens, and use the token price for your cost calculation.
For more information:
Azure OpenAI Service Pricing
Azure AI Speech Pricing
Azure OpenAI Models Overview
Hope this helps. Do let us know if you have any further queries.
If this answers your query, do click Accept Answer
and Yes
for was this answer helpful.