Multiple OpenAI deployments in same region for reliability

Irwin 20 Reputation points
2024-04-27T22:51:37.8366667+00:00

Hello,

I often see elevated error rates on a particular GPT-4 deployment that sometimes lasts a week. We can implement fallbacks with deployments in other regions where we happen to have quota available for GPT-4 but that isn't always ideal. For example, the backup deployment might be on the other side of the planet.

Is there any advantage for creating multiple deployments of an OpenAI model in the same region and load balance requests to them with fallbacks to another deployment if it fails?

Thanks,

Irwin

Azure OpenAI Service
Azure OpenAI Service
An Azure service that provides access to OpenAI’s GPT-3 models with enterprise capabilities.
2,218 questions
{count} votes

Accepted answer
  1. Aki Nishikawa 720 Reputation points Microsoft Employee
    2024-04-28T03:50:36.06+00:00

    Hello @Irwin , From reliability point of view, load balancing between regions is much better, but load balancing to multiple instances in the same region is reasonable. If you'd like to increate TPM to avoid HTTP 429, however, load balancing between instances in the same region is not appropriate because...

    • TPM is defined per model and region.
    • Even if multiple instances are in the same region, total TPM per region and model is not changed.
    • If all instances are in the same region and same models are deployed onto them, you have to split TPM to each instance (split ratio is up to you).

    https://learn.microsoft.com/en-us/azure/ai-services/openai/quotas-limits#quotas-and-limits-reference

    1 person found this answer helpful.
    0 comments No comments

0 additional answers

Sort by: Most helpful