Thanks for reaching out to us. Please be aware that Azure OpenAI is operated separately from OpenAI. So, the retirement date may not be the same due to operational considerations.
- Model Retirement Dates: The retirement dates mentioned on Azure's website specifically refer to Azure OpenAI models. While Azure OpenAI and OpenAI work closely, they may operate on different schedules and timelines. To find the most accurate information about OpenAI model retirement dates, it's best to consult OpenAI's documentation or contact OpenAI directly.
- Streamed Responses: The Azure OpenAI API does not currently support streaming responses like the original OpenAI API. All responses are returned in bulk once the model has finished processing the input. If you require streaming responses, you may need to implement this functionality on the client side. For this feature, if you have any other questions and share more scenario details, we can check with product team.
- Using Azure OpenAI with SaaS Applications: Azure OpenAI can certainly be integrated into SaaS applications. If you're running into issues with TPM limits or quotas, there are a few potential solutions. First, you could consider batching requests together to reduce the total number of requests made. Second, you could look into increasing your quota limits with Azure. Azure often provides options to increase quotas based on your needs and usage patterns. Finally, you could implement a queue or other form of load balancing to manage high volumes of requests. This would involve having your application distribute requests among multiple instances of the Azure OpenAI service. Remember to review Azure's documentation and support resources for more detailed information and guidance.
I hope this helps! Please let us know if you have any further questions.
Regards,
Yutong
-Please kindly accept the answer if you feel helpful to support the community, thanks a lot.