Cache responses to Azure OpenAI API requests

APPLIES TO: All API Management tiers

The azure-openai-semantic-cache-store policy caches responses to Azure OpenAI Chat Completion API requests to a configured external cache. Response caching reduces bandwidth and processing requirements imposed on the backend Azure OpenAI API and lowers latency perceived by API consumers.

Note

This policy must have a corresponding Get cached responses to Azure OpenAI API requests policy.
For prerequisites and steps to enable semantic caching, see Enable semantic caching for Azure OpenAI APIs in Azure API Management.

Note

Set the policy's elements and child elements in the order provided in the policy statement. Learn more about how to set or edit API Management policies.

Supported Azure OpenAI in Azure AI Foundry models

The policy is used with APIs added to API Management from the Azure OpenAI in AI Foundry models of the following types:

API type	Supported models
Chat completion	`gpt-3.5` `gpt-4` `gpt-4o` `gpt-4o-mini` `o1` `o3`
Embeddings	`text-embedding-3-large` `text-embedding-3-small` `text-embedding-ada-002`
Responses (preview)	`gpt-4o` (Versions: `2024-11-20`, `2024-08-06`, `2024-05-13`) `gpt-4o-mini` (Version: `2024-07-18`) `gpt-4.1` (Version: `2025-04-14`) `gpt-4.1-nano` (Version: `2025-04-14`) `gpt-4.1-mini` (Version: `2025-04-14`) `gpt-image-1` (Version: `2025-04-15`) `o3` (Version: `2025-04-16`) `o4-mini` (Version: `2025-04-16)

Note

Traditional completion APIs are only available with legacy model versions and support is limited.

For current information about the models and their capabilities, see Azure OpenAI in Foundry Models.

Policy statement

<azure-openai-semantic-cache-store duration="seconds"/>

Attributes

Attribute	Description	Required	Default
duration	Time-to-live of the cached entries, specified in seconds. Policy expressions are allowed.	Yes	N/A

Usage

Policy sections: outbound
Policy scopes: global, product, API, operation
Gateways: classic, v2, consumption, self-hosted

Usage notes

This policy can only be used once in a policy section.
If the cache lookup fails, the API call that uses the cache-related operation doesn't raise an error, and the cache operation completes successfully.
We recommend configuring a rate-limit policy (or rate-limit-by-key policy) immediately after any cache lookup. This helps keep your backend service from getting overloaded if the cache isn't available.

Examples

Example with corresponding azure-openai-semantic-cache-lookup policy

The following example shows how to use the azure-openai-semantic-cache-lookup policy along with the azure-openai-semantic-cache-store policy to retrieve semantically similar cached responses with a similarity score threshold of 0.05. Cached values are partitioned by the subscription ID of the caller.

Note

Add a rate-limit policy (or rate-limit-by-key policy) after the cache lookup to help limit the number of calls and prevent overload on the backend service in case the cache isn't available.

<policies>
    <inbound>
        <base />
        <azure-openai-semantic-cache-lookup
            score-threshold="0.05"
            embeddings-backend-id ="azure-openai-backend"
            embeddings-backend-auth ="system-assigned" >
            <vary-by>@(context.Subscription.Id)</vary-by>
        </azure-openai-semantic-cache-lookup>
        <rate-limit calls="10" renewal-period="60" />
    </inbound>
    <outbound>
        <azure-openai-semantic-cache-store duration="60" />
        <base />
    </outbound>
</policies>

For more information about working with policies, see:

Tutorial: Transform and protect your API
Policy reference for a full list of policy statements and their settings
Policy expressions
Set or edit policies
Reuse policy configurations
Policy snippets repo
Policy playground repo
Azure API Management policy toolkit
Get Copilot assistance to create, explain, and troubleshoot policies

Tilbakemeldinger

Var denne siden nyttig?

Last updated on 2025-09-11

Del via

Cache responses to Azure OpenAI API requests

Supported Azure OpenAI in Azure AI Foundry models

Policy statement

Attributes

Usage

Usage notes

Examples

Example with corresponding azure-openai-semantic-cache-lookup policy

Related policies

Related content

Tilbakemeldinger

Flere ressurser