ServiceBusTrigger function throwing FunctionTimeoutException without a good reason (seem to relate to high load)

Mikolaj Kieres 0 Reputation points
2023-09-20T11:11:31.3833333+00:00

I have a simple Azure Function (github code), that processes messages from a service bus queue. It's task is to make a few HTTP calls (exactly 6) and save the retrieved data to the MongoDb. The entire processing logic takes usually no more than 5 seconds. However, I'm seeing a huge number of function execution failures reported in the Application Insights with the following stack traces:

Microsoft.Azure.WebJobs.Host.FunctionTimeoutException:
   at Microsoft.Azure.WebJobs.Host.Executors.FunctionExecutor+<TryHandleTimeoutAsync>d__35.MoveNext (Microsoft.Azure.WebJobs.Host, Version=3.0.37.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35: D:\a\_work\1\s\src\Microsoft.Azure.WebJobs.Host\Executors\FunctionExecutor.cs:663)
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at Microsoft.Azure.WebJobs.Host.Executors.FunctionExecutor+<InvokeWithTimeoutAsync>d__33.MoveNext (Microsoft.Azure.WebJobs.Host, Version=3.0.37.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35: D:\a\_work\1\s\src\Microsoft.Azure.WebJobs.Host\Executors\FunctionExecutor.cs:571)
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at Microsoft.Azure.WebJobs.Host.Executors.FunctionExecutor+<ExecuteWithWatchersAsync>d__32.MoveNext (Microsoft.Azure.WebJobs.Host, Version=3.0.37.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35: D:\a\_work\1\s\src\Microsoft.Azure.WebJobs.Host\Executors\FunctionExecutor.cs:527)
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at Microsoft.Azure.WebJobs.Host.Executors.FunctionExecutor+<ExecuteWithLoggingAsync>d__26.MoveNext (Microsoft.Azure.WebJobs.Host, Version=3.0.37.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35: D:\a\_work\1\s\src\Microsoft.Azure.WebJobs.Host\Executors\FunctionExecutor.cs:306)
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at Microsoft.Azure.WebJobs.Host.Executors.FunctionExecutor+<ExecuteWithLoggingAsync>d__26.MoveNext (Microsoft.Azure.WebJobs.Host, Version=3.0.37.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35: D:\a\_work\1\s\src\Microsoft.Azure.WebJobs.Host\Executors\FunctionExecutor.cs:352)
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at Microsoft.Azure.WebJobs.Host.Executors.FunctionExecutor+<TryExecuteAsync>d__18.MoveNext (Microsoft.Azure.WebJobs.Host, Version=3.0.37.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35: D:\a\_work\1\s\src\Microsoft.Azure.WebJobs.Host\Executors\FunctionExecutor.cs:108)

This doesn't make much sense to me because as previously mentioned the entire logic of the function shouldn't take too long to execute. I have also setup a timeout on the HTTP calls to 5 seconds to reduce running time of the function if there's an issue with the APIs queried.

I can't see any good reason for why this would be happening. It also seems to be somehow connected to an increase in the service bus load. I also checked MongoDb monitoring (because it doesn't show in the AI logs) and all of the reported operations are sub 50ms. When I tried to replicate with pushing a few messages onto a queue, these were processed very quickly (sub 5 sec) without any issues.

The below is the screenshot from one of the reported timeouts.

User's image

I should also mention that initially function timeout was set to 5min and similar was happening. I reduced it to 1min (see below config) to avoid unnecessary costs related to function running longer than expected.

Unsure if this will help with diagnosing this issue but the below is my host.json file content:

{
  "version": "2.0",
  "extensions": {
    "serviceBus": {
      "clientRetryOptions": {
        "mode": "exponential",
        "tryTimeout": "00:01:00",
        "delay": "00:00:00.80",
        "maxDelay": "00:01:00",
        "maxRetries": 3
      },
      "prefetchCount": 0,
      "transportType": "amqpWebSockets",
      "autoCompleteMessages": true,
      "maxAutoLockRenewalDuration": "00:05:00",
      "maxConcurrentCalls": 16,
      "maxConcurrentSessions": 8,
      "maxMessageBatchSize": 1000,
      "minMessageBatchSize": 1,
      "maxBatchWaitTime": "00:00:30",
      "sessionIdleTimeout": "00:01:00",
      "enableCrossEntityTransactions": false
    }
  },
  "functionTimeout": "00:01:00",
  "logging": {
    "applicationInsights": {
      "samplingSettings": {
        "isEnabled": true,
        "excludedTypes": "Request"
      }
    }
  }
}


Any help would be appreciated. I'm happy to share more logs if necessary

Kind Regards

Mikolaj

Azure Service Bus
Azure Service Bus
An Azure service that provides cloud messaging as a service and hybrid integration.
625 questions
Azure Functions
Azure Functions
An Azure service that provides an event-driven serverless compute platform.
4,991 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. Mike Urnun 9,811 Reputation points Microsoft Employee
    2023-09-28T20:02:44.1566667+00:00

    Hello @Mikolaj Kieres - Thanks for reaching out and I'm sorry for the late reply on this post. If these TimeOut errors are transient, it would require more logs and a thorough investigation to narrow down the root cause(s). Usually, it's best to open a support case and work with one of our support engineers (who has deeper insights into your unique environment) in a 1:1 setting for such investigations.

    Based on your description above though, I strongly recommend switching to Durable Functions for its support for stateful workflows. Invoking 6 HTTP calls and writing to a DB is a stateful workflow that'll have better resiliency running either on Durable Functions or Logic Apps workflow.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.