Episode

FastTrack for Azure Season 3 Ep10: Load Balancing Azure OpenAI instances using APIM and Container

with Andre Dewes, Srini Padala, Chris Ayers

In this session we will show how to effectively load balance Azure OpenAI instances to mitigate throttling challenges (TPM & RPM limitations) using API Management custom policies.

We will also cover load balancing Azure OpenAI instances using a container deployed via Azure Container Apps

Learning objectives

Discover strategies to enhance the performance and reliability of Azure OpenAI while minimizing throttling due to quota limitations.

Chapters

00:00 - Welcome and introductions
01:29 - Learning objectives
02:50 - Tokens
05:36 - Azure OpenAI Service quotas and limits
11:16 - Token Per Minute (TPM)
17:58 - Requests Per Minute (RPM)
20:43 - Dynamic Quota
24:35 - Best practices
27:30 - Challenges
30:24 - Load balancing multiple AOAI instances
33:03 - Review challenges
36:38 - Load balancing strategies
40:10 - Load balancing AOAI with Azure API Management
42:05 - Demo
01:22:47 - Summary and conclusion

Recommended resources

Session documentation

Full series: Learn Live: FastTrack for Azure Season 3

Connect

Andre Dewes | LinkedIn: /in/andre-dewes-480b5b62
Srini Padala | LinkedIn: /in/srinivasa-padala
Chris Ayers | Twitter: @Chris_L_Ayers | LinkedIn: /in/chris-l-ayers

In this session we will show how to effectively load balance Azure OpenAI instances to mitigate throttling challenges (TPM & RPM limitations) using API Management custom policies.

We will also cover load balancing Azure OpenAI instances using a container deployed via Azure Container Apps

Learning objectives

Discover strategies to enhance the performance and reliability of Azure OpenAI while minimizing throttling due to quota limitations.

Chapters

00:00 - Welcome and introductions
01:29 - Learning objectives
02:50 - Tokens
05:36 - Azure OpenAI Service quotas and limits
11:16 - Token Per Minute (TPM)
17:58 - Requests Per Minute (RPM)
20:43 - Dynamic Quota
24:35 - Best practices
27:30 - Challenges
30:24 - Load balancing multiple AOAI instances
33:03 - Review challenges
36:38 - Load balancing strategies
40:10 - Load balancing AOAI with Azure API Management
42:05 - Demo
01:22:47 - Summary and conclusion

Recommended resources

Session documentation

Full series: Learn Live: FastTrack for Azure Season 3

Connect

Andre Dewes | LinkedIn: /in/andre-dewes-480b5b62
Srini Padala | LinkedIn: /in/srinivasa-padala
Chris Ayers | Twitter: @Chris_L_Ayers | LinkedIn: /in/chris-l-ayers

Intermediate

AI Engineer

Developer

Azure API Management

Azure Container Apps

FastTrack for Azure Season 3 Ep10: Load Balancing Azure OpenAI instances using APIM and Container

Learning objectives

Chapters

Recommended resources

Related episodes

Connect

Learning objectives

Chapters

Recommended resources

Related episodes

Connect