Uredi

Deli z drugimi prek


Cache responses to Azure OpenAI API requests

APPLIES TO: Basic v2 | Standard v2

The azure-openai-semantic-cache-store policy caches responses to Azure OpenAI Chat Completion API and Completion API requests to a configured external cache. Response caching reduces bandwidth and processing requirements imposed on the backend Azure OpenAI API and lowers latency perceived by API consumers.

Note

Note

Set the policy's elements and child elements in the order provided in the policy statement. Learn more about how to set or edit API Management policies.

Policy statement

<azure-openai-semantic-cache-store duration="seconds"/>

Attributes

Attribute Description Required Default
duration Time-to-live of the cached entries, specified in seconds. Policy expressions are allowed. Yes N/A

Usage

Usage notes

  • This policy can only be used once in a policy section.
  • If the cache lookup fails, the API call that uses the cache-related operation doesn't raise an error, and the cache operation completes successfully.

Examples

Example with corresponding azure-openai-semantic-cache-lookup policy

<policies>
    <inbound>
        <base />
        <azure-openai-semantic-cache-lookup
            score-threshold="0.05"
            embeddings-backend-id ="azure-openai-backend"
            embeddings-backend-auth ="system-assigned" >
            <vary-by>@(context.Subscription.Id)</vary-by>
        </azure-openai-semantic-cache-lookup>
    </inbound>
    <outbound>
        <azure-openai-semantic-cache-store duration="60" />
        <base />
    </outbound>
</policies>

For more information about working with policies, see: