An Azure service that provides access to OpenAI’s GPT-3 models with enterprise capabilities.
Vitor Cavaco hi,
you've done the math correctly on the theoretical maximum, but the real world implementation has some important nuances.
let's clarify the most important point. there is no separate, hard monthly token limit imposed by microsoft on top of the provisioned throughput unit. the ptu model is designed for predictable performance, not for capping monthly volume. your theoretical calculation of ~13.14 billion tokens is the intended capacity. however, you are right to ask about throttling. the 5,000 tokens per second is the key limit. this is a performance throttle, not a monthly quota. if you try to send more than 5,000 tokens in a single second, those excess requests will be throttled and fail. but if you spread your 13 billion tokens evenly across the month, you should not hit any throttle.
these limits are applied per ptu, per model, per region. if you have one ptu for gpt 4o in east us, that's a separate pool of throughput from another ptu you might have for a different model or in a different region.
no, there is no hidden monthly token cap. the only limit is the per second throughput of your provisioned ptu. as long as you stay under 5,000 tokens per second on average, you can use the full theoretical monthly capacity.
good luck with your high scale application. it sounds like you are pushing the boundaries in a great way.
regards,
Alex
and "yes" if you would follow me at Q&A - personaly thx.
P.S. If my answer help to you, please Accept my answer