The behavior you're experiencing with your Azure Function App could be attributed to several factors related to concurrency limits and execution timeouts. Here are some insights:
- Invocation Visibility: The discrepancy between the number of invocations recorded in the Azure portal and your application logs could be due to the Azure Functions runtime's handling of concurrent executions. If the function is being triggered by an event grid event, it may be subject to throttling or limits imposed by the Azure Functions service, especially if the function is taking a long time to execute.
- Concurrency Limits: Azure Functions have built-in concurrency limits, which can affect how many instances of a function can run simultaneously. For the Consumption plan, the default limit is typically set to a maximum of 200 concurrent executions, but this can vary based on the plan you are using. Since you mentioned using the EP1 plan, you may want to check the specific concurrency limits associated with that plan.
- Execution Timeouts: Although you have set
"functionTimeout": "-1", which allows for unlimited execution time, there are still other factors that could lead to early termination of function executions. For example, if the function is being invoked via HTTP, the Azure Load Balancer has a maximum idle timeout of 230 seconds, which could affect long-running HTTP-triggered functions. Additionally, if the function app is under heavy load, it may not be able to allocate resources for all concurrent invocations, leading to some executions being dropped. - Scaling Behavior: The Azure Functions platform may not scale out quickly enough to handle the burst of 40 concurrent invocations, especially if each invocation is resource-intensive. You might want to consider implementing Durable Functions or using an asynchronous pattern to manage long-running tasks more effectively.
To summarize, the issues you're facing could be due to concurrency limits, execution timeouts, and scaling behaviors of your Azure Function App. It may be beneficial to monitor the performance metrics and logs to identify any specific errors or throttling events that could provide further insights into the problem.
References: