Share via

Azure OpenAI Response API: Input Token Counting with previous_response_id and Support for Server-Side Context Compaction

dao quan 0 Reputation points
2026-02-28T04:23:26.3133333+00:00

How can I count input tokens when using the OpenAI Response API with previous_response_id? Is there an API similar to OpenAI’s /responses/input_tokens on Azure? Additionally, I want to know whether the Azure OpenAI Response API supports server-side compaction via the context_management parameter like OpenAI does.

Azure OpenAI Service
Azure OpenAI Service

An Azure service that provides access to OpenAI’s GPT-3 models with enterprise capabilities.

0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. Vinodh247 41,486 Reputation points MVP Volunteer Moderator
    2026-03-01T00:56:20.3433333+00:00

    Hi ,

    Thanks for reaching out to Microsoft Q&A.

    In Azure OpenAI, you can chain turns using previous_response_id, and the service maintains the conversation state for you, but all prior messages still count toward input tokens for billing and limits. Azure does not provide a standalone token-estimation endpoint similar to OpenAI’s /responses/input_tokens; instead, token usage is reported only after the call in the response metadata. Likewise, Azure’s implementation of the responses API does not currently expose OpenAI’s server-side context compaction controls (such as context_management), so if you want history trimming or summarisation you must implement that logic in your application layer.

    https://learn.microsoft.com/en-in/azure/foundry/openai/how-to/responses

    Please 'Upvote'(Thumbs-up) and 'Accept' as answer if the reply was helpful. This will be benefitting other community members who face the same issue.


Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.