Azure OpenAI Realtime API: Token usage vs Billing metrics

Question

Azure OpenAI Realtime API: Token usage vs Billing metrics

momotaimamura-5310 20

I have a few questions regarding token usage and billing for the Azure OpenAI Realtime API.

The Realtime API’s response.done message includes fields indicating the amount of tokens used such as usage.input_token_details.text_tokens and input_token_details.cached_tokens. On the other hand, the Azure portal shows usage metrics such as processed_prompt and generated_completion. Are these values directly related or equivalent?
Which of these token counts (Realtime API usage vs Azure metrics) are used for billing? Or are there other ways to get more accurate or detailed information about token usage and associated costs? I’m trying to estimate billing based on token counts and pricing data from the official page: https://azure.microsoft.com/en-us/pricing/details/cognitive-services/openai-service/

Alex Burlachenko 18,570 Reputation points Volunteer Moderator

2025-07-22T11:17:04.1733333+00:00
momotaimamura-5310 hi there )

thanks for dropping this question, its super relevant for anyone juggling azure openai api...

first off, the realtime api gives u token details right in its response.done message. u got usage.input_token_details.text_tokens and input_token_details.cached_tokens. these show how many tokens u burned through for the request. but then, the azure portal throws processed_prompt and generated_completion at u. are they the same thing? well... kinda, but not exactly ))

the realtime api's token counts are what u actually used for that specific call. the azure portal metrics? they're more like aggregated summaries. they might include extra overhead or batch processing stuff. for billing, microsoft usually goes by what the api reports, not just the portal numbers. so yeah, trust the realtime api's token counts more for cost estimation.

here's the official doc on how billing works for azure openai service https://azure.microsoft.com/en-us/pricing/details/cognitive-services/openai-service/. it's worth a look if u wanna nerd out on the details.

cached tokens can mess with ur calculations. if ur request hits cached data, u might see fewer tokens billed. but hey, that's a good thing, right? less money flying out the window ))

u can use the azure cost management tools to track spending in real time. they give u way more granularity than the basic portal metrics. this might help in other tools too, like keeping an eye on cloud costs across different services.

set up alerts when ur token usage spikes. azure lets u do that, so u don't get a nasty surprise when the bill arrives https://docs.microsoft.com/en-us/azure/cost-management-billing/costs/cost-mgt-alerts-monitor-usage-spending.

one more thing, if u're scripting this, grab the usage data from the api response and log it somewhere. that way, u can cross check it against the azure billing reports later. peace of mind, u know?

always double check the pricing tiers. sometimes the per token cost changes based on the model u're using. the doc i linked earlier has all the deets.

hope this clears things up

Best regards,

Alex

and "yes" if you would follow me at Q&A - personaly thx. P.S. If my answer help to you, please Accept my answer

https://ctrlaltdel.blog/
Pavankumar Purilla 11,480 Reputation points Microsoft External Staff Moderator

2025-07-24T06:36:44.99+00:00

Hi momotaimamura-5310,
Did you get any chance to check the response. Thank you!
Pavankumar Purilla 11,480 Reputation points Microsoft External Staff Moderator

2025-07-25T02:45:12.95+00:00

Hi momotaimamura-5310,
Just following up to see if you had a chance to review the above response. Thank you!

Answer accepted by question author

0 additional answers

Your answer

Pavankumar Purilla 11,480 Reputation points Microsoft External Staff Moderator

2025-07-24T06:36:44.99+00:00

Hi momotaimamura-5310,
Did you get any chance to check the response. Thank you!
Pavankumar Purilla 11,480 Reputation points Microsoft External Staff Moderator

2025-07-25T02:45:12.95+00:00

Hi momotaimamura-5310,
Just following up to see if you had a chance to review the above response. Thank you!

Answer 1

Hi momotaimamura-5310,

When using the Azure OpenAI Realtime API, the response.done message provides detailed token usage information, such as usage.input_token_details.text_tokens (total input tokens) and cached_tokens (tokens served from cache), along with output_tokens (tokens generated in the response). These fields help you understand how the model processes each request.

However, for billing purposes, Azure uses a different set of metrics that are visible in the Azure Portal under the Metrics section specifically, processed_prompt_tokens and generated_completion_tokens. These portal metrics represent the actual number of input and output tokens that are billed, excluding any tokens that were cached or otherwise not processed by the model. Therefore, while the Realtime API gives granular usage insights, only the metrics shown in the Azure portal are used to calculate costs. To estimate billing accurately, you should refer to these Azure metrics and apply the corresponding pricing from the official Azure OpenAI pricing page.

Share via

Azure OpenAI Realtime API: Token usage vs Billing metrics

0 additional answers

Your answer