Custom rate limits in API Management for Azure OpenAI?

Question

Custom rate limits in API Management for Azure OpenAI?

JxH 20

Hi,

Hitting a few issues figuring out how to set up custom rate limits on calls through API Management to Azure OpenAI backends. We want to be able to track the number of tokens in each call and limits each consumer to a max amount per month.

I know that API Management has some functions like rate-limit and rate-limit-by-key, but I'm not sure how we could use these for this purpose.

Any help here would be great.

J

Accepted answer

1 additional answer

Your answer

Answer 1

Hello @JxH - We received your recent feedback that the answer by the Q&A-assist didn't help your question. In researching your use case, I've come across the following solution by Preston H:

Open AI Cost Gateway Pattern

I've yet to use it myself to determine whether it fully covers your exact use case but it does call out the following:

Regarding rate-limiting by the "number of tokens":

Streaming responses do not include Token Information, that must be calculated
Prompt Tokens are calcuated using Additional Python Function API wrapper that uses TikToken

Regarding "limiting each consumer to max amount per month":

Solution uses APIM Product Subscription Keys but can also be used against individual ID's, header value, etc

Based on the remarks above, it seems that you can accomplish your scenario using the above solution with some adjustments. Could you review and let me know if they are helpful?

Please "Accept Answer" if the answer is helpful so that others in the community may benefit from your experience.

JxH 20 Reputation points

2023-12-07T11:06:03.4266667+00:00

Hi Mike,

Interesting, I did not find this while searching. I will take a look at this in more detail, it does seem to be a good starting point anyway.

It's a pity that the Azure OpenAI endpoints don't have this functionality build in as they are already counting the tokens to track the TPM/PTU usage.

Thanks, I'll accept this as the answer.

J

Answer 2

Azure API Management provides rate and quota throttling to protect and add value to your API service. Rate limits are usually used to protect against short and intense volume bursts, while quotas are usually used for controlling call rates over a longer period of time. To implement rate limits, you can use the rate-limit-by-key policy to not allow high call volume. To implement quotas, you can set the total number of calls that a particular subscriber can make within a given month.

To track the number of tokens in each call and limit each consumer to a max amount per month, you can use product-based throttling. This allows you to apply limits on the developers who have signed up to use your API. You can also throttle individual end users of the API by using other parts of the message such as user agent, URL path fragments, and message size.

I hope this helps!

References:

Share via

Custom rate limits in API Management for Azure OpenAI?

1 additional answer

Your answer