Hello,I guess the issue here is that your Semantic Caching in Azure API Management seems not to work off, because the cacheKey includes a UUID that changes on each request, even when the inputs are exactly the same. Let's try to figure out what is happening and how it should be fixed.
Since the same input is being sent in your setup, the cacheKey generated in each of them includes a different UUID every time. Therefore, it tends to result in a cache miss. The UUID at the end of your cacheKey is causing it to be different with every request.
A probable cause of this is that the UUID you're seeing is a result from a dynamic value in your request or policy configuration. Some possibilities:
Dynamic request content:
- Variables in the Request Body: If there are variables that are consistently changing per call, perhaps a TimeStamp, ID, or token fields, on the request body, it's also on the cacheKey; this will definitely make it different every time.
- Dynamic headers or query parameters used in cacheKey have the same issues.
Stack Configuration:
- Using dynamic variables in cacheKey: If the Policy is using variables like context.RequestId or any function that returns a new value per request in the cacheKey, it will result in different keys.
- Request Body Not Preserved: If for some reason the request body is not preserved correctly in the policy, attempts to read it will result in unexpected values.
For that, you would need to refactor your caching policy to include only static parts from the request that don't change between calls in order for identical requests to output the same cacheKey.
Check your policy configuration; more precisely, check the cache-lookup and cache-store policies for how the cacheKey is being built.
Counterexample of a faulty cache-lookup:
<cache-lookup vary-by-developer="false" key="@('ChatCompletions_Create.' + context.Request.Body.As<JObject>()['id'])" /> In case of context.Request.Body.As<JObject>()['id'] of dynamic type - it will change the cacheKey every time.
Change the cacheKey to only utilize static elements of the request.
Example Adjustment:
<cache-lookup vary-by-developer="false" key="@{
var body = context.Request.Body.As<JObject>(preserveContent: true);
return body['messages'].ToString();
}" />
For example, here the cacheKey is convened based on the messages field in the request body - that should always be the same for identical inputs.
Be sure you are preserving the request body when reading of it in the policy to avoid issues.
context.Request.Body.As<JObject>(preserveContent: true);
Exclude the following fields, which could be different in each request:
- Timestamps
- Unique Identifiers
- Session tokens
Example Policy Snippet Updated:
<policies>
<inbound>
<!-- Preserve original request -->
<base />
<!-- Lookup in Cache -->
<cache-lookup vary-by-developer="false" key="@{
var body = context.Request.Body.As<JObject>(preserveContent: true);
return body['messages'].ToString();
}" />
<!-- Return cached response on hit -->
<choose>
<when condition="@((context.Variables.ContainsKey('cacheHit') && context.Variables['cacheHit'] == true))">
<return-response>
<set-status code="200" reason="OK" />
<set-body>@(context.Variables['cachedResponse'])</set-body>
</return-response>
</when>
</choose>
</inbound>
<backend>
<!-- Send request to backend service -->
</backend>
<outbound>
<!-- Store result in cache if needed -->
<cache-store duration="3600" key="@{
var body = context.Request.Body.As<JObject>(preserveContent: true);
return body['messages'].ToString();
}" />
</outbound>
<on-error>
<base />
</on-error>
</policies>
If the issue persists, share with us your policy configuration (with sensitive information nullified) and we can help you locate the problem.
If this helps at all, please let me know; likewise, if you have any further questions.
Best regards,
Thierry