@Girish Luckhun (RAPP) Thanks for reaching out. APIM provides a built-in queuing system that can store the remaining requests and wait for the processing of the current requests to complete. It does not cache the requests, but simply holds them in a queue until the server is available to process them. When an APIM instance reaches its physical capacity (capacity= request queue + memory+ CPU), it behaves similar to any overloaded web server that is unable to process incoming requests: latency will increase, connections will get dropped, timeout errors will occur, and so on. Therefore, it’s important to monitor the capacity of your APIM instance and consider scaling or upgrading when the capacity value exceeds a certain threshold for a long period. https://learn.microsoft.com/en-us/azure/api-management/api-management-capacity
The built-in cache and the external cache are separate and do not combine to form a larger cache. So, if you have a 1GB built-in cache and add a 5GB external Redis cache, you will have two separate caches, one of 1GB and another of 5GB, not a single 6GB cache. You can configure your APIs and operations in APIM to utilize either the built-in cache or the external cache as needed https://learn.microsoft.com/en-us/azure/api-management/api-management-howto-cache-external
APIM provides a throttling feature that allows you to limit the number of requests that can be made to your APIs. You can set throttling policies at the API level, product level, or global level. https://learn.microsoft.com/en-us/azure/api-management/api-management-sample-flexible-throttling
I hope this answers your question. Let me know if you have any other questions!