How to investigate timeouts from Azure Function to Mongo CosmosDB?

Pierre Segalen 21 Reputation points

Hello everyone,

My API is currently hosted in Azure Functions and everything is usually functional, however, I sometimes see some Functions getting timeouts when querying my Mongo API CosmosDB.
Complexity or heavy queries are not an option since it can happen on sundays, when there is literally no activity on the whole platform. (I just have a CRON running once every hour which just query CosmosDB to check if a document exists, it is this function that encounters timeouts on sundays)
When checking CosmosDB metrics, I can see "Failed Client Requests" corresponding to the Azure Functions logs reporting timeouts but I don't see any related CPU load or trouble anywhere in the metrics.

How can I investigate the problem and prevent it to happen in the future?

More about my stack: I use Node.js Functions written in TypeScript and I use the official "mongodb" client. The timeouts happen inside both HTTP-triggered and timer-trigger functions. My CosmosDB is in serverless mode.

Azure Functions
Azure Functions
An Azure service that provides an event-driven serverless compute platform.
4,357 questions
Azure Cosmos DB
Azure Cosmos DB
An Azure NoSQL database service for app development.
1,468 questions
{count} votes

Accepted answer
  1. MughundhanRaveendran-MSFT 12,436 Reputation points

    @Pierre Segalen ,

    Thanks for reaching out to Q&A.

    To investigate function app timeouts, I would suggest you to follow the below steps

    1. You have already checked the high cpu, but try to run the "Function app down or reporting errors" detector available in the Diagnose and Solve problems blade in the Functions portal to see if there were any other error/exception that caused the function host to go down and causing the timeout issue. You will have to run this detector during the time the timeout was seen (If issue was seen at 11 am UTC, run the detector from 10:30 am UTC).
    2. Check if there are any server errors during the time. This is available in the diagnose and solve problems blade.
    3. Above steps will let us know if its a problem with Function app platform. To troubleshoot this issue from code and cosmos perspective, I would suggest you to add some logging in the code that would identify which part of the code took longer to execute. Lets say there are 2 methods in the code and if some custom logging is enabled, we will be able to identify which method took longer time to execute.
    4. For Node.js functions, a function can become large because of many Node.js dependencies. Importing dependencies can also cause increased load times that result in unexpected timeouts. Dependencies are loaded both explicitly and implicitly. A single module loaded by your code may load its own additional modules.

    Mitigation / Recommendation

    • Enable Application Insight and check. e.g., calling some 3rd party rest API is taking time or not. Check dependencies section.
    • Check stack trace and identify which line is causing the issue and focus on that.
    • Have a RETRY mechanism in place.
    • Implement granular level logging (like start time and end time) and isolate which functionality is consuming more time. Already mentioned in the above steps.
    • Try using website_Run_from_package which can reduce cold-start times, particularly for JavaScript functions with large npm package trees.
    • Validate high CPU/high memory always.
    • Follow Best Practices,
    • Collect relevant code which is causing the timeout.
    • Try increasing timeout value for testing purpose if you are sure that execution is time consuming process
    • Move from consumption to Premium or App service plan if 10 min timeout is not enough to complete operation.

    I hope this helps!

    Please 'Accept as answer' and ‘Upvote’ if it helped so that it can help others in the community looking for help on similar topics.

    1 person found this answer helpful.

0 additional answers

Sort by: Most helpful