Thanks! To elaborate further on this query As always, your configuration is going to depend on the specific traffic pattern that your users are generating against the service, CDN, and StreamingEndpoint(s). One simple example to think about is whether you expect to have 10 videos each generating 1 million simultaneous views or 1 million videos each generating 10 simultaneous views. Both generate 10 million simultaneous views but the caching efficiency is very different for the two cases.
Your scenario is very likely somewhere in the middle but the more unique videos being viewed you have, the more load you will have on the StreamingEndpoints even if we assume perfect caching (which isn’t realistic). The better idea you have of how much load your users are putting on the CDN and in turn how much load the CDN is putting on the StreamingEndpoint, the better you will be able to do capacity planning.
The Streaming Endpoints provide Azure Monitor Metrics which you can use to get an idea of the load these resources are handling:
Monitoring Media Services | Microsoft Learn
Monitoring Media Services data reference | Microsoft Learn
Speaking in broad terms, handling millions of concurrent streams at scale we would generally recommend using one or more premium endpoints with CDN enabled (and CDN protections in place). The default limit is 10 units per Premium StreamingEndpoint and 2 StreamingEndpoints per account but both of these limits can be raised via a support request.
Streaming Endpoints (Origin) - Azure Media Services v3 | Microsoft Learn explains the differences in capabilities of the Premium and Standard StreamingEndpoints.
Encoding all of your content with a single bitrate will reduce the number of urls that the CDN needs to cache (and the StreamingEndpoint needs to serve) so it will improve your cache performance vs having multiple bitrates. That said, having only one bitrate reduces the effectiveness of using adaptive bitrate streaming like HLS and DASH. The client player has only one layer to choose from and thus cannot adapt the content to the network conditions (high bitrate content when the network is good, lower bitrate content when the network is congested). You will have to decide if the single bitrate encode fits the user experience you want to deliver to your customers. That said, if your content is very short (15 to 20 seconds as you said) the client isn’t going to have a lot of time to adapt anyway.
Also I would highly recommend you consider is making your solution highly available. Regional outages happen. Internet routing issues happen. A truly redundant solution would have the ability to process, serve, and cache the videos in a redundant fashion. Redundancy comes with a cost so you will have to decide the business impact of an outage in each of your various dependencies and then evaluate the costs of a mitigation.
The High Availability with Media Services Video on Demand - Azure Media Services v3 | Microsoft Learn sample shows an example of using multiple Azure Media Services accounts in multiple regions to process and serve content.
The sample would need some tweaks to work with multiple CDNs or to have a single CDN pull from multiple StreamingEndpoints but it should give you a starting point on what to consider when thinking through what a highly available solution looks like.
Hope this helps. Let us know.