Cache responses to large language model API requests

Стаття
08/28/2024

APPLIES TO: All API Management tiers

The llm-semantic-cache-store policy caches responses to chat completion API and completion API requests to a configured external cache. Response caching reduces bandwidth and processing requirements imposed on the backend Azure OpenAI API and lowers latency perceived by API consumers.

Note

This policy must have a corresponding Get cached responses to large language model API requests policy.
For prerequisites and steps to enable semantic caching, see Enable semantic caching for Azure OpenAI APIs in Azure API Management.
Currently, this policy is in preview.

Note

Set the policy's elements and child elements in the order provided in the policy statement. Learn more about how to set or edit API Management policies.

Supported models

Use the policy with LLM APIs added to Azure API Management that are available through the Azure AI Model Inference API.

Policy statement

<llm-semantic-cache-store duration="seconds"/>

Attributes

Attribute	Description	Required	Default
duration	Time-to-live of the cached entries, specified in seconds. Policy expressions are allowed.	Yes	N/A

Usage

Policy sections: outbound
Policy scopes: global, product, API, operation
Gateways: v2

Usage notes

This policy can only be used once in a policy section.
If the cache lookup fails, the API call that uses the cache-related operation doesn't raise an error, and the cache operation completes successfully.

Examples

Example with corresponding llm-semantic-cache-lookup policy

<policies>
    <inbound>
        <base />
        <llm-semantic-cache-lookup
            score-threshold="0.05"
            embeddings-backend-id ="llm-backend"
            embeddings-backend-auth ="system-assigned" >
            <vary-by>@(context.Subscription.Id)</vary-by>
        </llm-semantic-cache-lookup>
    </inbound>
    <outbound>
        <llm-semantic-cache-store duration="60" />
        <base />
    </outbound>
</policies>

For more information about working with policies, see:

Tutorial: Transform and protect your API
Policy reference for a full list of policy statements and their settings
Policy expressions
Set or edit policies
Reuse policy configurations
Policy snippets repo
Author policies using Microsoft Copilot in Azure

Поділитися через

Cache responses to large language model API requests

Supported models

Policy statement

Attributes

Usage

Usage notes

Examples

Example with corresponding llm-semantic-cache-lookup policy

Зворотний зв’язок

Додаткові ресурси

Поділитися через

Cache responses to large language model API requests

Supported models

Policy statement

Attributes

Usage

Usage notes

Examples

Example with corresponding llm-semantic-cache-lookup policy

Related policies

Related content

Зворотний зв’язок

Додаткові ресурси