@Bania RABIA I think you should be able to request the quota that you need for the above use case. Once you create an Azure OpenAI resource, you will have an option to create deployments of base models under standard deployment and these models have a soft limit of quota to ensure they are optimally used. This soft limit can be increased from the quota page on Azure OpenAI portal and once the request is approved, the deployment should be using the increased quota for future requests.
To know more about models, go to the models page to check in which region they are available and their limits. If you need additional or provisioned capacity you can use provisioned set of models for higher or provisioned capacity.
For example, gpt-4o-mini might have a capacity of 2000k tokens already set in default account.
When you request additional quota you are not charged, you are charged only based on usage for pay as you go models. That is, based on token usage on input, cached input and output the billing is done. If you have any cost constraints, you can setup usage reports or budgets from azure portal to monitor usage and setup alerts. I hope this helps!!
If this answers your query, do click Accept Answer
and Yes
for was this answer helpful. And, if you have any further query do let us know.