Share via

Response API with previous_response_id to reference a stored response.

Cortaxiom 20 Reputation points
2026-05-13T18:32:03.7866667+00:00

When using the Response API with previous_response_id to reference a stored response, how are tokens from the referenced response billed? To be a bit more specific, if I store a system prompt of around 19k tokens via store: true and then reference it via the previous_response_id on all subsequent requests, are those 19k tokens billed at the standard input rate, the cached input rate, or not billed at all on the follow-up requests? And one other thing, do you know what the default retention period is for stored responses, and whether it can be configured?

Azure OpenAI in Foundry Models
0 comments No comments

1 answer

Sort by: Most helpful
  1. Divyesh Govaerdhanan 10,900 Reputation points MVP Volunteer Moderator
    2026-05-13T21:42:10.66+00:00

    Hello Cortaxiom,

    Welcome to Microsoft Q&A,

    When you pass previous_response_id, Azure OpenAI reconstitutes the full conversation context server-side and injects all prior turns into the model's input window. Those tokens, including your 19k system prompt, are counted as input tokens on every follow-up request. They are not excluded from billing.

    However, they are very likely to qualify for automatic prompt caching, which is where your cost savings come from.

    Because the prefix of the reconstructed prompt is identical across calls (same system prompt, same prior turns), Azure OpenAI's prompt caching kicks in automatically with no configuration required. Tokens that match a cached prefix are billed at the cached input token rate, which is a discount over the standard input rate for Standard deployments, and can be up to a 100% discount on Provisioned deployments.

    The exact discount varies by model and is listed on the Azure OpenAI pricing page.

    1. The repeated prefix must be at least 1,024 tokens long. At 19k tokens, your system prompt clears this threshold with room to spare.
    2. Cache hits occur in increments of 128 tokens after the initial 1,024.
    3. A single character change in the first 1,024 tokens causes a cache miss.
    4. Caches are typically cleared within 5-10 minutes of inactivity and are always removed within one hour of last use.

    By default, response data is stored via store: true is retained for 30 days. There is no documented parameter to configure a shorter or longer retention window. If you need to remove a stored response before the 30-day window expires, you can delete it explicitly.

    To extend prompt cache retention beyond the default in-memory window, Azure OpenAI also supports extended prompt caching via the prompt_cache_retention parameter on the Responses API.

    https://learn.microsoft.com/en-us/azure/foundry/openai/how-to/prompt-caching

    Please Upvote and accept the answer if it helps!!

    Was this answer helpful?


Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.