Cache responses to Azure OpenAI API requests
APPLIES TO: All API Management tiers
The azure-openai-semantic-cache-store
policy caches responses to Azure OpenAI Chat Completion API and Completion API requests to a configured external cache. Response caching reduces bandwidth and processing requirements imposed on the backend Azure OpenAI API and lowers latency perceived by API consumers.
Note
- This policy must have a corresponding Get cached responses to Azure OpenAI API requests policy.
- For prerequisites and steps to enable semantic caching, see Enable semantic caching for Azure OpenAI APIs in Azure API Management.
- Currently, this policy is in preview.
Note
Set the policy's elements and child elements in the order provided in the policy statement. Learn more about how to set or edit API Management policies.
Supported Azure OpenAI Service models
The policy is used with APIs added to API Management from the Azure OpenAI Service of the following types:
API type | Supported models |
---|---|
Chat completion | gpt-3.5 gpt-4 |
Completion | gpt-3.5-turbo-instruct |
Embeddings | text-embedding-3-large text-embedding-3-small text-embedding-ada-002 |
For more information, see Azure OpenAI Service models.
Policy statement
<azure-openai-semantic-cache-store duration="seconds"/>
Attributes
Attribute | Description | Required | Default |
---|---|---|---|
duration | Time-to-live of the cached entries, specified in seconds. Policy expressions are allowed. | Yes | N/A |
Usage
- Policy sections: outbound
- Policy scopes: global, product, API, operation
- Gateways: v2
Usage notes
- This policy can only be used once in a policy section.
- If the cache lookup fails, the API call that uses the cache-related operation doesn't raise an error, and the cache operation completes successfully.
Examples
Example with corresponding azure-openai-semantic-cache-lookup policy
<policies>
<inbound>
<base />
<azure-openai-semantic-cache-lookup
score-threshold="0.05"
embeddings-backend-id ="azure-openai-backend"
embeddings-backend-auth ="system-assigned" >
<vary-by>@(context.Subscription.Id)</vary-by>
</azure-openai-semantic-cache-lookup>
</inbound>
<outbound>
<azure-openai-semantic-cache-store duration="60" />
<base />
</outbound>
</policies>
Related policies
Related content
For more information about working with policies, see:
- Tutorial: Transform and protect your API
- Policy reference for a full list of policy statements and their settings
- Policy expressions
- Set or edit policies
- Reuse policy configurations
- Policy snippets repo
- Author policies using Microsoft Copilot in Azure