Hi Alistair Thomson,
You're encountering a "Rate Limit Exceeded" issue in Azure OpenAI Playground when connecting an AI Search service to your GPT-4o-mini model, even though you've set the max TPM (30k) and RPM (300).
Possible Causes and solutions:
Cause:
AI Search Queries Increasing Token Usage
When using an AI Search data source, each prompt triggers additional queries to retrieve relevant data.
These queries consume extra tokens, reducing your effective tokens per minute (TPM) budget.
The third prompt failing suggests that token usage is exceeding the 30k TPM limit due to AI Search queries.
Solution:
Reduce Token Usage per Query
Limit document chunk size in AI Search.
Reduce the number of documents retrieved per query.
Try query filters instead of broad searches to minimize tokens consumed per response.
Cause:
Latency and Queuing Delays
AI Search introduces latency because it first retrieves data before passing it to the model.
Azure OpenAI might queue up requests, leading to a request-per-minute (RPM) limit breach.
Solution:
Increase RPM or Optimize Requests
Lower search frequency (e.g., cache recent responses).
Use batching: Combine multiple small queries into a single request.
Monitor RPM via Azure Metrics to check actual request volume.
Cause:
Background Queries Consuming Quota
Even when idle, AI Search may be performing background queries.
Other workloads (e.g., testing or indexing) might also be consuming OpenAI or AI Search resources.
Solution:
Isolate AI Search Load
Run AI Search queries separately from the chat interface.
Limit query triggers to specific cases instead of every message.
Check Azure AI Search logs to see how many queries are executed.
Cause:
AI Search Tier Might Still Be Insufficient
Upgrading to Standard Tier may improve search capacity but does not increase OpenAI TPM/RPM limits.
AI Search’s QPS (Queries per Second) limit may still throttle responses.
Solution:
Optimize AI Search Configuration
Check AI Search query limits: Azure AI Search Limits.
If needed, upgrade to a higher Standard tier with more query-per-second (QPS) capacity.
Hope this helps. Do let us know if you any further queries.
-------------
If this answers your query, do click Accept Answer
and Yes
for was this answer helpful.
Thank you.