Asynchronous request reply pattern for Cosmos using Python

Chand, Anupam SBOBNG-ITA/RX 461 Reputation points
2021-06-14T11:32:21.32+00:00

Hi,

We are building a python API to read data from a Cosmos DB. We are expecting a high concurrency of requests between 500-1000.
The API has been built on Azure function using a premium APP service P1V2.
When we tested a low concurrency, the response time were consistent and within our expectations. However, when we started doing load testing, we found that the response times gradually went up ranging from 2s all the way to 60s.
The CPU% and memory did not show much changes. Even the http queue length was low <10. What we understood was that somehow the requests were getting queued up and being served eventually but with a delay. We are using a Singleton for Cosmos Client and have setup the parameters on the App config to allow for maximum concurrency for python apps.

The questions I had is as follows :

  1. What metric should we use to scale the app service up automatically? The CPU, memory and http queue length was still low. The only metric which showed a significant jump was the sockets count for inbound requests. Is this what we need to use to auto scale our App service?
  2. I also read that we need to follow asynchronous request/reply pattern. However, I found an existing issue that Cosmos SDK for python does not support async method (https://github.com/Azure/azure-sdk-for-python/issues/8636). Is this true? If yes, then what is the alternative to be followed for our use case? Is there some other method to read cosmos in async mode?

Any help would be appreciated.

Regards,
Anupam

Azure Cosmos DB
Azure Cosmos DB
An Azure NoSQL database service for app development.
1,615 questions
{count} votes

Accepted answer
  1. Saurabh Sharma 23,806 Reputation points Microsoft Employee
    2021-06-16T22:09:42.087+00:00

    Hi anonymous user,

    Here are updates which I have received internally -

    1. There’s not enough detail to be sure if this is present in your scenario, but we had a scenario other user was difficulty scale testing due to Client Affinity being enabled at their load balancer and so all the load testing clients were bound to the first couple of instances and the new scaled-out instances went unused. If you look at the individual CPU usage per app service instance and they’re severely unbalanced, you might be affected by it.
    2. Yes, it’s a known limitation and there is no ETA or workaround for that right now. How many rows/documents each API call will return? How much data?
      I believe for point reads it shouldn’t be a problem and you would adjust the RUs to support the necessary concurrency.

    Please let me know if you have any other questions.

    Thanks
    Saurabh

    1 person found this answer helpful.

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.