This browser is no longer supported.
Upgrade to Microsoft Edge to take advantage of the latest features, security updates, and technical support.
Adventure Works routes a shipping status lookup request to GPT-4o-mini (Tier 1). The model returns a response, but the quality score falls below the tier 1 quality floor of 75. The quality floor protection triggers. What happens next?
The request fails with an error and the customer sees an apology message, since the smaller model couldn't handle the request.
The request is retried with the next capability tier (GPT-4o), and if the quality passes the threshold, that response is returned to the customer.
The request is logged as a failure and added to the routing analysis report for manual review next month.
Adventure Works implements semantic caching for the product search agent. The similarity threshold is set at 0.60 (cosine similarity). A customer asks 'Do you have running shoes in size 10?' and another customer asks 'Are there hiking boots available in size 12?' Both receive the same cached response. What does this indicate?
The threshold is too high—it should be reduced to below 0.50 to prevent false cache hits.
The threshold is too low—0.60 cosine similarity allows queries with different intents to be treated as equivalent, causing false cache hits. Raise the threshold to require higher semantic similarity before returning cached results.
This is expected cache behavior—the two queries are semantically similar enough as footwear requests to share the same cached response.
Adventure Works defines P95 end-to-end latency < 3 seconds for Gold customers and < 12 seconds for Bulk API customers. The same agent processes requests from both. A cost optimization measure reduces Bulk API latency to 4 seconds but inadvertently increases Gold customer latency to 4.5 seconds. What should happen?
Accept the tradeoff—Gold customers now get 4.5 seconds, which is still better than the Bulk API target.
Roll back the cost optimization measure, since it violated the Gold customer SLA. Investigate alternative optimizations that improve Bulk API cost without affecting Gold customer performance.
Notify Gold customers of the temporary SLA degradation and proceed with the optimization, since the overall system cost savings benefit all customers long-term.
You must answer all questions before checking your work.
Was this page helpful?
Need help with this topic?
Want to try using Ask Learn to clarify or guide you through this topic?