Load Balancing Calls to Azure OpenAI Services with Private Endpoints

AdityaSa 801 Reputation points
2023-08-16T01:40:28.0766667+00:00

What is the general pattern for load balancing calls to multiple Azure OpenAI services with private endpoints originating from a custom app? Are there any authoritative references from our documentation such as the architecture center regarding load balancing and ensuring that retries are implemented based on token limit, usage? I found a link in our architecture center, but it's about logging and monitoring. Is there any other information available for load balancing, such as using Azure API Management (APIM)?

Azure AI services
Azure AI services
A group of Azure services, SDKs, and APIs designed to make apps more intelligent, engaging, and discoverable.
3,379 questions
{count} votes

Accepted answer
  1. Ramr-msft 17,821 Reputation points
    2023-08-16T01:53:13.7066667+00:00

    Thanks for the question. You can make calls to two different AOAI endpoints directly from your own app by checking tokens in response.

    Here is the blog for load balancing the Azure Open AI services using APIM.

    https://journeyofthegeek.com/2023/05/31/load-balancing-in-azure-openai-service/

    0 comments No comments

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.