Custom rate limits in API Management for Azure OpenAI?

JxH 20 Reputation points
2023-12-06T16:49:26.8+00:00

Hi,

Hitting a few issues figuring out how to set up custom rate limits on calls through API Management to Azure OpenAI backends. We want to be able to track the number of tokens in each call and limits each consumer to a max amount per month.

I know that API Management has some functions like rate-limit and rate-limit-by-key, but I'm not sure how we could use these for this purpose.

Any help here would be great.

J

Azure API Management
Azure API Management
An Azure service that provides a hybrid, multi-cloud management platform for APIs.
2,446 questions
0 comments No comments
{count} votes

Accepted answer
  1. MikeUrnun 9,777 Reputation points Moderator
    2023-12-06T21:47:07.72+00:00

    Hello @JxH - We received your recent feedback that the answer by the Q&A-assist didn't help your question. In researching your use case, I've come across the following solution by Preston H:

    I've yet to use it myself to determine whether it fully covers your exact use case but it does call out the following:

    • Regarding rate-limiting by the "number of tokens":

    Streaming responses do not include Token Information, that must be calculated
    Prompt Tokens are calcuated using Additional Python Function API wrapper that uses TikToken

    • Regarding "limiting each consumer to max amount per month":

    Solution uses APIM Product Subscription Keys but can also be used against individual ID's, header value, etc

    Based on the remarks above, it seems that you can accomplish your scenario using the above solution with some adjustments. Could you review and let me know if they are helpful?


    Please "Accept Answer" if the answer is helpful so that others in the community may benefit from your experience.

    1 person found this answer helpful.

1 additional answer

Sort by: Most helpful
  1. Q&A Assist
    2023-12-06T16:49:26.8066667+00:00
    AI generated content. This question contains an answer created with AI from Q&A Assist. Learn more

    Azure API Management provides rate and quota throttling to protect and add value to your API service. Rate limits are usually used to protect against short and intense volume bursts, while quotas are usually used for controlling call rates over a longer period of time. To implement rate limits, you can use the rate-limit-by-key policy to not allow high call volume. To implement quotas, you can set the total number of calls that a particular subscriber can make within a given month.

    To track the number of tokens in each call and limit each consumer to a max amount per month, you can use product-based throttling. This allows you to apply limits on the developers who have signed up to use your API. You can also throttle individual end users of the API by using other parts of the message such as user agent, URL path fragments, and message size.

    I hope this helps!


    References:

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.