Multiple OpenAI deployments in same region for reliability

Question

Hello,

I often see elevated error rates on a particular GPT-4 deployment that sometimes lasts a week. We can implement fallbacks with deployments in other regions where we happen to have quota available for GPT-4 but that isn't always ideal. For example, the backup deployment might be on the other side of the planet.

Is there any advantage for creating multiple deployments of an OpenAI model in the same region and load balance requests to them with fallbacks to another deployment if it fails?

Thanks,

Irwin

Accepted Answer

Hello @Irwin , From reliability point of view, load balancing between regions is much better, but load balancing to multiple instances in the same region is reasonable. If you'd like to increate TPM to avoid HTTP 429, however, load balancing between instances in the same region is not appropriate because...

TPM is defined per model and region.
Even if multiple instances are in the same region, total TPM per region and model is not changed.
If all instances are in the same region and same models are deployed onto them, you have to split TPM to each instance (split ratio is up to you).

https://learn.microsoft.com/en-us/azure/ai-services/openai/quotas-limits#quotas-and-limits-reference

Multiple OpenAI deployments in same region for reliability

0 additional answers