Thanks for reaching out to us, please see below table for the max request since the limit depends on the model -
Reference document - https://learn.microsoft.com/en-us/azure/ai-services/openai/quotas-limits
GPT 3.5
GPT 3.5 Turbo
More information you may need is here -
System message -
https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/use-your-data?tabs=ai-search
Give the model instructions about how it should behave and any context it should reference when generating a response. You can describe the assistant's personality, what it should and shouldn't answer, and how to format responses. There's no token limit for the system message, but will be included with every API call and counted against the overall token limit. The system message will be truncated if it's greater than 400 tokens.
Can I use quota to increase the max token limit of a model?
You can refer to the document - https://learn.microsoft.com/en-us/azure/ai-services/openai/faq#can-i-use-quota-to-increase-the-max-token-limit-of-a-model-
No, quota Tokens-Per-Minute (TPM) allocation isn't related to the max input token limit of a model. Model input token limits are defined in the models table and aren't impacted by changes made to TPM.
Please let me know if you need any more information, I hope it helps.
Regards,
Yutong
-Please kindly accept the answer and vote 'Yes' if you feel helpful to support the community, thanks a lot.