OpenAI Playground Rate Limit Exceeded with two prompts when AI Search service connected

Alistair Thomson 30 Reputation points
2025-03-05T08:08:03.8766667+00:00

Using Azure OpenAI, I have deployed a model gpt-4o-mini

Tokens per minute set to 30k (the maximum allowed in my case). Corresponding requests per minute = 300.

In AI playground, when chatting without an AI Search data source connected, it behaves OK.

As soon as the AI Search is connected, I can send one or two prompts before receiving the message Rate Limit Exceeded.
User's image

User's image

The AI search service is on the Basic Pricing tier and I cannot see anything under monitoring that would suggest an issue

User's image

Please explain the cause of the rate limit being exceeded. The message in the playground links to the deployed model, which has settings at max.

There is nothing in the AI search service to suggest limits are being reached.

As it stands, this is unusable in an application where we want to use the model's NLP to serve content from documents indexed by the AI search service.

Azure OpenAI Service
Azure OpenAI Service
An Azure service that provides access to OpenAI’s GPT-3 models with enterprise capabilities.
3,828 questions
{count} votes

3 answers

Sort by: Most helpful
  1. Alistair Thomson 30 Reputation points
    2025-03-10T07:55:31.9466667+00:00

    For anybody else having this issue, we increased the max allowed TPM for the model deployed in Azure Open AI in the region where it is deployed (UK South).

    To do this we had to make a request directly to Microsoft, indicating the subscription, model and region.

    To make this request, we used this link: https://aka.ms/oai/stuquotarequest

    As a result, the max allowed TPM for this model was increased from 30K TPM to 2M TPM. We have set it around 250K and it is now working fine.

    In my opinion this suggests that the default upper limit of 30K TPM is insufficient for AI workloads with the gpt-4o-mini model connected to an AI Search data source.

    1 person found this answer helpful.
    0 comments No comments

  2. Alistair Thomson 30 Reputation points
    2025-03-06T08:01:51.11+00:00

    Hi,

    Thanks for the response.

    I've create a new Azure Search service in standard tier pricing and recreated the index. I've attached this as the data source in the chat playground.

    Unfortunately the suggestion made no difference other than the specific error message returned.

    User's image

    This error occurred at the third prompt in the chat playground.

    Please advise.

    0 comments No comments

  3. Prashanth Veeragoni 1,520 Reputation points Microsoft External Staff
    2025-03-10T07:31:10.2933333+00:00

    Hi Alistair Thomson,

    You're encountering a "Rate Limit Exceeded" issue in Azure OpenAI Playground when connecting an AI Search service to your GPT-4o-mini model, even though you've set the max TPM (30k) and RPM (300).

    Possible Causes and solutions:

    Cause:

    AI Search Queries Increasing Token Usage

    When using an AI Search data source, each prompt triggers additional queries to retrieve relevant data.

    These queries consume extra tokens, reducing your effective tokens per minute (TPM) budget.

    The third prompt failing suggests that token usage is exceeding the 30k TPM limit due to AI Search queries.

    Solution:

    Reduce Token Usage per Query

    Limit document chunk size in AI Search.

    Reduce the number of documents retrieved per query.

    Try query filters instead of broad searches to minimize tokens consumed per response.

    Cause:

    Latency and Queuing Delays

    AI Search introduces latency because it first retrieves data before passing it to the model.

    Azure OpenAI might queue up requests, leading to a request-per-minute (RPM) limit breach.

    Solution:

    Increase RPM or Optimize Requests

    Lower search frequency (e.g., cache recent responses).

    Use batching: Combine multiple small queries into a single request.

    Monitor RPM via Azure Metrics to check actual request volume.

    Cause:

    Background Queries Consuming Quota

    Even when idle, AI Search may be performing background queries.

    Other workloads (e.g., testing or indexing) might also be consuming OpenAI or AI Search resources.

    Solution:

    Isolate AI Search Load

    Run AI Search queries separately from the chat interface.

    Limit query triggers to specific cases instead of every message.

    Check Azure AI Search logs to see how many queries are executed.

    Cause:

    AI Search Tier Might Still Be Insufficient

    Upgrading to Standard Tier may improve search capacity but does not increase OpenAI TPM/RPM limits.

    AI Search’s QPS (Queries per Second) limit may still throttle responses.

     Solution:

    Optimize AI Search Configuration

    Check AI Search query limits: Azure AI Search Limits.

    If needed, upgrade to a higher Standard tier with more query-per-second (QPS) capacity.

    Hope this helps. Do let us know if you any further queries.  

    ------------- 

    If this answers your query, do click Accept Answer and Yes for was this answer helpful.

    Thank you.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.