OpenAI Playground Rate Limit Exceeded with two prompts when AI Search service connected

Question

OpenAI Playground Rate Limit Exceeded with two prompts when AI Search service connected

Alistair Thomson 30

Using Azure OpenAI, I have deployed a model gpt-4o-mini

Tokens per minute set to 30k (the maximum allowed in my case). Corresponding requests per minute = 300.

In AI playground, when chatting without an AI Search data source connected, it behaves OK.

As soon as the AI Search is connected, I can send one or two prompts before receiving the message Rate Limit Exceeded.
User's image

User's image

The AI search service is on the Basic Pricing tier and I cannot see anything under monitoring that would suggest an issue

User's image

Please explain the cause of the rate limit being exceeded. The message in the playground links to the deployed model, which has settings at max.

There is nothing in the AI search service to suggest limits are being reached.

As it stands, this is unusable in an application where we want to use the model's NLP to serve content from documents indexed by the AI search service.

Alekhya Vaddepally 1,670 Reputation points Microsoft External Staff Moderator

2025-03-05T16:07:32.06+00:00

Hi Alistair Thomson,

I seem like you're facing this issue with "Rate Limit Exceeded" errors in the OpenAI Playground when integrating your GPT-4o-mini model with an AI Search data source, despite configuring your deployment with the maximum tokens per minute (TPM) setting of 30k and an expected 300 requests per minute (RPM). This issue can be block, especially when it hinders the seamless integration of AI capabilities with your search service.

they are some solutions which may helpful:

you can upgrade your AI Search service to a higher tier, such as Standard, which offers more robust performance and scalability features.

and also runs on dedicated machines with more storage and processing capacity, which can help accommodate higher query loads and reduce the likelihood of rate limiting

https://learn.microsoft.com/en-us/azure/search/search-limits-quotas-capacity

Also Utilize the Azure Portal to monitor your AI Search service metrics and identify if you're hitting any query per second (QPS) limits. Adjusting the number of documents retrieved per search or optimizing the number of searches performed per query can help manage the load more effectively.

https://learn.microsoft.com/en-us/answers/questions/2085363/what-is-causing-a-429-rate-limit-issue-in-azure-ai

If in production environments, implementing retry logic with exponential backoff can help manage transient rate limit issues. This approach allows your application to wait and retry requests after a certain period, reducing the impact of temporary spikes in request rates.

check that your OpenAI deployment's TPM and RPM settings align with your application's requirements and that they are configured to handle the integrated load from the AI Search service, Also Adjust these settings appropriately can help prevent rate limit exceedances.

if you have any further concerns or queries, please feel free to reach out to us.
Alistair Thomson 30 Reputation points

2025-03-06T08:02:54.9433333+00:00

Please see my response below

3 answers

Your answer

Alistair Thomson 30 Reputation points

2025-03-06T08:02:54.9433333+00:00

Please see my response below

Answer 1

For anybody else having this issue, we increased the max allowed TPM for the model deployed in Azure Open AI in the region where it is deployed (UK South).

To do this we had to make a request directly to Microsoft, indicating the subscription, model and region.

To make this request, we used this link: https://aka.ms/oai/stuquotarequest

As a result, the max allowed TPM for this model was increased from 30K TPM to 2M TPM. We have set it around 250K and it is now working fine.

In my opinion this suggests that the default upper limit of 30K TPM is insufficient for AI workloads with the gpt-4o-mini model connected to an AI Search data source.

Answer 2

Alistair Thomson 30

Hi,

Thanks for the response.

I've create a new Azure Search service in standard tier pricing and recreated the index. I've attached this as the data source in the chat playground.

Unfortunately the suggestion made no difference other than the specific error message returned.

User's image

This error occurred at the third prompt in the chat playground.

Please advise.

Answer 3

Hi Alistair Thomson,

You're encountering a "Rate Limit Exceeded" issue in Azure OpenAI Playground when connecting an AI Search service to your GPT-4o-mini model, even though you've set the max TPM (30k) and RPM (300).

Possible Causes and solutions:

Cause:

AI Search Queries Increasing Token Usage

When using an AI Search data source, each prompt triggers additional queries to retrieve relevant data.

These queries consume extra tokens, reducing your effective tokens per minute (TPM) budget.

The third prompt failing suggests that token usage is exceeding the 30k TPM limit due to AI Search queries.

Solution:

Reduce Token Usage per Query

Limit document chunk size in AI Search.

Reduce the number of documents retrieved per query.

Try query filters instead of broad searches to minimize tokens consumed per response.

Cause:

Latency and Queuing Delays

AI Search introduces latency because it first retrieves data before passing it to the model.

Azure OpenAI might queue up requests, leading to a request-per-minute (RPM) limit breach.

Solution:

Increase RPM or Optimize Requests

Lower search frequency (e.g., cache recent responses).

Use batching: Combine multiple small queries into a single request.

Monitor RPM via Azure Metrics to check actual request volume.

Cause:

Background Queries Consuming Quota

Even when idle, AI Search may be performing background queries.

Other workloads (e.g., testing or indexing) might also be consuming OpenAI or AI Search resources.

Solution:

Isolate AI Search Load

Run AI Search queries separately from the chat interface.

Limit query triggers to specific cases instead of every message.

Check Azure AI Search logs to see how many queries are executed.

Cause:

AI Search Tier Might Still Be Insufficient

Upgrading to Standard Tier may improve search capacity but does not increase OpenAI TPM/RPM limits.

AI Search’s QPS (Queries per Second) limit may still throttle responses.

Solution:

Optimize AI Search Configuration

Check AI Search query limits: Azure AI Search Limits.

If needed, upgrade to a higher Standard tier with more query-per-second (QPS) capacity.

Hope this helps. Do let us know if you any further queries.

-------------

If this answers your query, do click Accept Answer and Yes for was this answer helpful.

Thank you.

Prashanth Veeragoni 4,930 Reputation points Microsoft External Staff Moderator

2025-03-10T08:13:43.86+00:00

Hi Alistair Thomson,

Glad to hear that your issue is resolved! Increasing the max allowed TPM makes a difference for high-demand AI workloads. Thanks for sharing the process you followed it will be helpful for others facing similar challenges with Azure OpenAI limits.If my above solution answers your query, do click Accept Answer and Yes for was this answer helpful.

Thank you.

Share via

OpenAI Playground Rate Limit Exceeded with two prompts when AI Search service connected

3 answers

Your answer