Load Balancing Calls to Azure OpenAI Services with Private Endpoints

Question

Load Balancing Calls to Azure OpenAI Services with Private Endpoints

AdityaSa 801

What is the general pattern for load balancing calls to multiple Azure OpenAI services with private endpoints originating from a custom app? Are there any authoritative references from our documentation such as the architecture center regarding load balancing and ensuring that retries are implemented based on token limit, usage? I found a link in our architecture center, but it's about logging and monitoring. Is there any other information available for load balancing, such as using Azure API Management (APIM)?

Ramr-msft 17,826 Reputation points

2023-08-16T01:50:47.67+00:00

Thanks for the question, Can you please add more details about the token size (input + output) will help in picking the best architecture.

Accepted answer

0 additional answers

Your answer

Ramr-msft 17,826 Reputation points

2023-08-16T01:50:47.67+00:00

Thanks for the question, Can you please add more details about the token size (input + output) will help in picking the best architecture.

Answer 1

Ramr-msft 17,826

Thanks for the question. You can make calls to two different AOAI endpoints directly from your own app by checking tokens in response.

Here is the blog for load balancing the Azure Open AI services using APIM.

https://journeyofthegeek.com/2023/05/31/load-balancing-in-azure-openai-service/

Share via

Load Balancing Calls to Azure OpenAI Services with Private Endpoints

0 additional answers

Your answer