Hello @JxH - We received your recent feedback that the answer by the Q&A-assist didn't help your question. In researching your use case, I've come across the following solution by Preston H:
I've yet to use it myself to determine whether it fully covers your exact use case but it does call out the following:
- Regarding rate-limiting by the "number of tokens":
Streaming responses do not include Token Information, that must be calculated
Prompt Tokens are calcuated using Additional Python Function API wrapper that uses TikToken
- Regarding "limiting each consumer to max amount per month":
Solution uses APIM Product Subscription Keys but can also be used against individual ID's, header value, etc
Based on the remarks above, it seems that you can accomplish your scenario using the above solution with some adjustments. Could you review and let me know if they are helpful?
Please "Accept Answer" if the answer is helpful so that others in the community may benefit from your experience.