ServiceBusSessionProcessor stopping picking up messages across all nodes in a service fabric cluster

Mark Middlemist 166 Reputation points
2022-05-10T10:39:13.057+00:00

Hi All

We have service fabric clusters which includes an application that has a number of ServiceBusProcessors and ServiceBusSessionProcessors (it covers a number of scenarios, some with session based queues and some without).

Since moving to the newer Azure.Messaging.ServiceBus SDK nugets one of our fabric clusters, which is listening on a lower-traffic service bus namespace, is intermittently stopping picking up messages. By this I mean that all nodes (minimum 5) are still running, but the messages are backing up on the session-based queue (it doesn't appear to be affecting the non-session based queues which are handled by the SerivceBusProcessor, only the session-based one handled by the ServiceBusSessionProcessor).

We are in the process of reviewing our application code, but one vague suggestion I came across elsewhere implied that there may be an "idle timeout" that could cause processing to stop if no new messages came in on the queue. I do see the SessionIdleTimeout property, but if I'm reading correctly that would just automatically close a specific session, not affect the main processing.

It only seems to be happening on the one cluster/SB namespace pairing (touch wood) and seems to be about once a week or so - no regular pattern, just it's happened 3 times over the last 3 weeks. When it happens manually restarting the fabric nodes, so it recreates the processors kicks things back into life and it goes through clearing the queue, but obviously this is not ideal as unless we notice it has happened processing is suspended.

Are there any events we can tap into that we could use to restart the SessionProcessor? We do have logging in on the OnProcessErrorAsync, and it has shown some errors in our application code, but nothing that we can spot that should kill processing on all nodes (plus the documentation explicitly says not to try controlling processor run state from that function).

Any advice anyone could give on how to handle this would be very much appreicated,

Thanks in advance

Mark

Azure Service Fabric
Azure Service Fabric
An Azure service that is used to develop microservices and orchestrate containers on Windows and Linux.
257 questions
Azure Service Bus
Azure Service Bus
An Azure service that provides cloud messaging as a service and hybrid integration.
591 questions
0 comments No comments
{count} votes

Accepted answer
  1. Esben Bach 236 Reputation points
    2022-05-10T12:31:15.907+00:00

    @Mark Middlemist - we have "a lot" of session processor instances running in SF without any issues.
    HOWEVER, we had a similar problem when we started out using the WindowsAzure.ServiceBus SDK years ago.
    Our issue was that if the session was failed (due to an unhandled exception or some such), it would never be re-created properly so we had to restart the process it was running on.

    Could it be that you are simply waiting way too long for the session to be closed?
    By default the "new" SDK has a rather long timeout for sessions so we were forced to do something like the following:

    var options = new ServiceBusSessionProcessorOptions()
    {
    AutoCompleteMessages = true,
    MaxAutoLockRenewalDuration = TimeSpan.FromMinutes(15),
    PrefetchCount = 100,
    MaxConcurrentCallsPerSession = 1,
    MaxConcurrentSessions = 1000,
    });

    Afterwards it has become possible to release the session early using the ProcessSessionMessageEventArgs input.

    i.e "args.ReleaseSessions" - only use this if you know you are done with your session though :)


1 additional answer

Sort by: Most helpful
  1. Mark Middlemist 166 Reputation points
    2022-05-20T12:36:52.93+00:00

    Cheers for checking in @Esben Bach

    Things seem to have stabilized now. I looks like the main thing was fixing the error handling in our code so that exceptions aren't bubbled up beyond ProcessMessage.

    With that everything looks stable :)

    Thanks for your help

    0 comments No comments