Episode
FastTrack for Azure Season 3 Ep10: Load Balancing Azure OpenAI instances using APIM and Container
with Andre Dewes, Srini Padala, Chris Ayers
In this session we will show how to effectively load balance Azure OpenAI instances to mitigate throttling challenges (TPM & RPM limitations) using API Management custom policies.
We will also cover load balancing Azure OpenAI instances using a container deployed via Azure Container Apps
Learning objectives
- Discover strategies to enhance the performance and reliability of Azure OpenAI while minimizing throttling due to quota limitations.
Chapters
- 00:00 - Welcome and introductions
- 01:29 - Learning objectives
- 02:50 - Tokens
- 05:36 - Azure OpenAI Service quotas and limits
- 11:16 - Token Per Minute (TPM)
- 17:58 - Requests Per Minute (RPM)
- 20:43 - Dynamic Quota
- 24:35 - Best practices
- 27:30 - Challenges
- 30:24 - Load balancing multiple AOAI instances
- 33:03 - Review challenges
- 36:38 - Load balancing strategies
- 40:10 - Load balancing AOAI with Azure API Management
- 42:05 - Demo
- 01:22:47 - Summary and conclusion
Recommended resources
Related episodes
- Full series: Learn Live: FastTrack for Azure Season 3
Connect
- Andre Dewes | LinkedIn: /in/andre-dewes-480b5b62
- Srini Padala | LinkedIn: /in/srinivasa-padala
- Chris Ayers | Twitter: @Chris_L_Ayers | LinkedIn: /in/chris-l-ayers
In this session we will show how to effectively load balance Azure OpenAI instances to mitigate throttling challenges (TPM & RPM limitations) using API Management custom policies.
We will also cover load balancing Azure OpenAI instances using a container deployed via Azure Container Apps
Learning objectives
- Discover strategies to enhance the performance and reliability of Azure OpenAI while minimizing throttling due to quota limitations.
Chapters
- 00:00 - Welcome and introductions
- 01:29 - Learning objectives
- 02:50 - Tokens
- 05:36 - Azure OpenAI Service quotas and limits
- 11:16 - Token Per Minute (TPM)
- 17:58 - Requests Per Minute (RPM)
- 20:43 - Dynamic Quota
- 24:35 - Best practices
- 27:30 - Challenges
- 30:24 - Load balancing multiple AOAI instances
- 33:03 - Review challenges
- 36:38 - Load balancing strategies
- 40:10 - Load balancing AOAI with Azure API Management
- 42:05 - Demo
- 01:22:47 - Summary and conclusion
Recommended resources
Related episodes
- Full series: Learn Live: FastTrack for Azure Season 3
Connect
- Andre Dewes | LinkedIn: /in/andre-dewes-480b5b62
- Srini Padala | LinkedIn: /in/srinivasa-padala
- Chris Ayers | Twitter: @Chris_L_Ayers | LinkedIn: /in/chris-l-ayers
Have feedback? Submit an issue here.