Hi @Jerzy Czopek Thanks for posting the question here. This is an interesting scenario.
Looking at the implementation of Load balancing using Azure APIM to multiple instances of Azure Open AI services, we just provide the backend Id's as pool sources for load balancing. Please refer the following implementation sample of load-balancing outlined in the article Using Azure API Management Circuit Breaker and Load balancing with Azure OpenAI Service
This approach does not expose the underlying Open AI model implemented in the end points.
Please refer the document Intelligent Load Balancing with APIM for OpenAI for different options available and its approach towards load balancing.
It is also advised to have same model of Open AI endpoints when setting up load balancing as the current policies available (Roundrobin, Random, Priority, Weightage) do not have an option to configure what you are seeking.
However, if you decide to use a load balancer with multiple version on Open AI models, you may consider to use the set-backend-service policy to direct an incoming API request to an alternate backend. This approach uses a conditional logic and redirects the requests based on various parameters such as location, gateway that was called from, or other expressions such as versions. Once you this in place, you can have an additional check on the instance end point and route the incoming calls accordingly. Please refer to the documentation Reference backend using set-backend-service policy to know more about this.
Hope this answers your question. Please let us know if you need any additional information.