Get cached responses of Azure OpenAI API requests

APPLIES TO: Basic v2 | Standard v2

Use the azure-openai-semantic-cache-lookup policy to perform cache lookup of responses to Azure OpenAI Chat Completion API and Completion API requests from a configured external cache, based on vector proximity of the prompt to previous requests and a specified similarity score threshold. Response caching reduces bandwidth and processing requirements imposed on the backend Azure OpenAI API and lowers latency perceived by API consumers.

Note

Note

Set the policy's elements and child elements in the order provided in the policy statement. Learn more about how to set or edit API Management policies.

Policy statement

<azure-openai-semantic-cache-lookup
    score-threshold="similarity score threshold"
    embeddings-backend-id ="backend entity ID for embeddings API"
    embeddings-backend-auth ="system-assigned"             
    ignore-system-messages="true | false"      
    max-message-count="count" >
    <vary-by>"expression to partition caching"</vary-by>
</azure-openai-semantic-cache-lookup>

Attributes

Attribute Description Required Default
score-threshold Similarity score threshold used to determine whether to return a cached response to a prompt. Value is a decimal between 0.0 and 1.0. Learn more. Yes N/A
embeddings-backend-id Backend ID for OpenAI embeddings API call. Yes N/A
embeddings-backend-auth Authentication used for Azure OpenAI embeddings API backend. Yes. Must be set to system-assigned. N/A
ignore-system-messages Boolean. If set to true, removes system messages from a GPT chat completion prompt before assessing cache similarity. No false
max-message-count If specified, number of remaining dialog messages after which caching is skipped. No N/A

Elements

Name Description Required
vary-by A custom expression determined at runtime whose value partitions caching. If multiple vary-by elements are added, values are concatenated to create a unique combination. No

Usage

Usage notes

  • This policy can only be used once in a policy section.

Examples

Example with corresponding azure-openai-semantic-cache-store policy

<policies>
    <inbound>
        <base />
        <azure-openai-semantic-cache-lookup
            score-threshold="0.05"
            embeddings-backend-id ="azure-openai-backend"
            embeddings-backend-auth ="system-assigned" >
            <vary-by>@(context.Subscription.Id)</vary-by>
        </azure-openai-semantic-cache-lookup>
    </inbound>
    <outbound>
        <azure-openai-semantic-cache-store duration="60" />
        <base />
    </outbound>
</policies>

For more information about working with policies, see: