question

MarkMiddlemist-1774 avatar image
0 Votes"
MarkMiddlemist-1774 asked MarkMiddlemist-1774 answered

ServiceBusSessionProcessor stopping picking up messages across all nodes in a service fabric cluster

Hi All

We have service fabric clusters which includes an application that has a number of ServiceBusProcessors and ServiceBusSessionProcessors (it covers a number of scenarios, some with session based queues and some without).

Since moving to the newer Azure.Messaging.ServiceBus SDK nugets one of our fabric clusters, which is listening on a lower-traffic service bus namespace, is intermittently stopping picking up messages. By this I mean that all nodes (minimum 5) are still running, but the messages are backing up on the session-based queue (it doesn't appear to be affecting the non-session based queues which are handled by the SerivceBusProcessor, only the session-based one handled by the ServiceBusSessionProcessor).

We are in the process of reviewing our application code, but one vague suggestion I came across elsewhere implied that there may be an "idle timeout" that could cause processing to stop if no new messages came in on the queue. I do see the SessionIdleTimeout property, but if I'm reading correctly that would just automatically close a specific session, not affect the main processing.

It only seems to be happening on the one cluster/SB namespace pairing (touch wood) and seems to be about once a week or so - no regular pattern, just it's happened 3 times over the last 3 weeks. When it happens manually restarting the fabric nodes, so it recreates the processors kicks things back into life and it goes through clearing the queue, but obviously this is not ideal as unless we notice it has happened processing is suspended.

Are there any events we can tap into that we could use to restart the SessionProcessor? We do have logging in on the OnProcessErrorAsync, and it has shown some errors in our application code, but nothing that we can spot that should kill processing on all nodes (plus the documentation explicitly says not to try controlling processor run state from that function).

Any advice anyone could give on how to handle this would be very much appreicated,

Thanks in advance

Mark

azure-service-busazure-service-fabric
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

esben avatar image
0 Votes"
esben answered esben commented

@MarkMiddlemist-1774 - we have "a lot" of session processor instances running in SF without any issues.
HOWEVER, we had a similar problem when we started out using the WindowsAzure.ServiceBus SDK years ago.
Our issue was that if the session was failed (due to an unhandled exception or some such), it would never be re-created properly so we had to restart the process it was running on.

Could it be that you are simply waiting way too long for the session to be closed?
By default the "new" SDK has a rather long timeout for sessions so we were forced to do something like the following:

var options = new ServiceBusSessionProcessorOptions()
{
AutoCompleteMessages = true,
MaxAutoLockRenewalDuration = TimeSpan.FromMinutes(15),
PrefetchCount = 100,
MaxConcurrentCallsPerSession = 1,
MaxConcurrentSessions = 1000,
});

Afterwards it has become possible to release the session early using the ProcessSessionMessageEventArgs input.

i.e "args.ReleaseSessions" - only use this if you know you are done with your session though :)

· 2
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

That could fit with something we've spotted in our application logic error handling.

Will make some tweaks and see how it goes

Thanks @esben

0 Votes 0 ·
esben avatar image esben MarkMiddlemist-1774 ·

Did you get anywhere with this Mark?

0 Votes 0 ·
MarkMiddlemist-1774 avatar image
0 Votes"
MarkMiddlemist-1774 answered

Cheers for checking in @esben

Things seem to have stabilized now. I looks like the main thing was fixing the error handling in our code so that exceptions aren't bubbled up beyond ProcessMessage.

With that everything looks stable :)

Thanks for your help

5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.