Thanks for reaching out to Q&A.
To investigate function app timeouts, I would suggest you to follow the below steps
- You have already checked the high cpu, but try to run the "Function app down or reporting errors" detector available in the Diagnose and Solve problems blade in the Functions portal to see if there were any other error/exception that caused the function host to go down and causing the timeout issue. You will have to run this detector during the time the timeout was seen (If issue was seen at 11 am UTC, run the detector from 10:30 am UTC).
- Check if there are any server errors during the time. This is available in the diagnose and solve problems blade.
- Above steps will let us know if its a problem with Function app platform. To troubleshoot this issue from code and cosmos perspective, I would suggest you to add some logging in the code that would identify which part of the code took longer to execute. Lets say there are 2 methods in the code and if some custom logging is enabled, we will be able to identify which method took longer time to execute.
- For Node.js functions, a function can become large because of many Node.js dependencies. Importing dependencies can also cause increased load times that result in unexpected timeouts. Dependencies are loaded both explicitly and implicitly. A single module loaded by your code may load its own additional modules.
Mitigation / Recommendation
- Enable Application Insight and check. e.g., calling some 3rd party rest API is taking time or not. Check dependencies section.
- Check stack trace and identify which line is causing the issue and focus on that.
- Have a RETRY mechanism in place.
- Implement granular level logging (like start time and end time) and isolate which functionality is consuming more time. Already mentioned in the above steps.
- Try using website_Run_from_package which can reduce cold-start times, particularly for JavaScript functions with large npm package trees.
- Validate high CPU/high memory always.
- Follow Best Practices, https://learn.microsoft.com/en-us/azure/azure-functions/functions-best-practices
- Collect relevant code which is causing the timeout.
- Try increasing timeout value for testing purpose if you are sure that execution is time consuming process
- Move from consumption to Premium or App service plan if 10 min timeout is not enough to complete operation.
I hope this helps!
Please 'Accept as answer' and ‘Upvote’ if it helped so that it can help others in the community looking for help on similar topics.