Poison Message Handling
Service broker provides automatic poison message detection, which disables the service queue upon five consecutive transaction rollbacks in trying to receive messages from the queue. However, an application should not rely on this feature for normal processing. Because automatic poison message detection stops the queue, this halts all processing for the application until the poison message is removed. Instead, an application should attempt to detect and remove poison messages as part of the application logic.
Typically, the message processing application can choose among the following four strategies:
A. Returns the poison message back to the service queue by rolling back the transaction in hope that the message can be successfully consumed next time when retried.
B. Preserves the message, the context and the encountered error to some permanent storage, such as a database table, commits the RECEIVE, and continues normally.
C. Ends the conversation with an error.
D. Uses SQL server event notification to ask for manual intervention when queue is disabled.
Option A above is good at handling failures caused by temporary conditions. For instance, a purchase order placed by an user can't be processed if the user profile has not existed yet because, say, user profile request processing that's happening in parallel takes more time than the purchase order; once that processing is complete, the purchase order processing is expected to be through without a problem. Another example could be that the processing transaction is chosen as the victim to rollback in the case of a deadlock; after such a deadlock is detected and removed, the message can then be successfully consumed.
However, there are situations where processing the same message will forever fail. One example may be that the message itself is not self-contained, say a purchase order can't be fulfilled because that credit card holder name and home address given don't match what's on file at the bank. A second example could be that a user profile can't be established because the account name chosen has already taken by an existing user. In these cases, approach B then sounds more suitable for the application to log the message content as well as the context in which it was happening so the situations can be further examined to see if the processing logic can be improved to handle these exception cases normally. Indeed, the two examples mentioned here have actually surfaced deficiencies in the application code because both scenarios can well be successfully processed if the message processing code is revised. For a more vivid example how inappropriate application error handling can lead to poison messages, check out Remus' blog here.
Care must be taken when continuing to the next message without the current one been processed. In a stateful session, skipping messages may cause the conversation into an illegal state and hence invalidates the rest of the dialog. If this is the case, the application then can't afford ignoring one message; thus ending the dialog with error as suggested by option C becomes the right thing to do. As an example, say a live meeting service that is unicasting an ongoing presentation, it had already received slide #2 and was waiting for the content of slide #3. But it then found it couldn't process the slide #3 message because of invalid media format. If it simply ignores them, the slide #3 worth of presentation simply becomes dumb to the receiver. So terminating the session with an error probably is the most appropriate.
Depending on what it is doing, an application can well use a combination of what is illustrated the above. For example, it can allow a failed receive to retry three seconds later. Further, if the retry is still unsuccessful, it then can choose to log the message, commit the RECEIVE transaction and continue to the next one.
As the last resort, application programmers may want not to do anything in their code but fully depend on the default mechanism built in service broker for poison message detection and handling. As formerly stated, the mechanism alone is not good enough. But combining it with SQL server event notification at least can request for administrator's manual intervention when a poison message is detected.
Once a BROKER_QUEUE_DISABLED event notification is defined, a notification message of the following format will be sent by service broker to the specified event notification queue when the service queue is disabled due to poison messages:
<EVENT_INSTANCE>
<EventType>BROKER_QUEUE_DISABLED</EventType>
<PostTime>2008-06-27T12:30:18.357</PostTime>
<SPID>53</SPID>
<DatabaseID>6</DatabaseID>
<TransactionID/>
<NTUserName/>
<NTDomainName/>
<HostName/>
<ClientProcessID/>
<ApplicationName/>
<StartTime>2008-06-27T12:30:18.168</StartTime>
<ObjectID>53575229</ObjectID>
<ServerName>junan02</ServerName>
<LoginSid/>
<EventSequence>7844</EventSequence>
<IsSystem>1</IsSystem>
</EVENT_INSTANCE>
To identify which service queue this notification is fired for, first use the database id to find out the database name:
SELECT name AS database_name
FROM sys.databases
WHERE database_id = 6
GO
(1 row(s) affected)
SsbTestDBTarget
Then switch to that database, and get the service queue name using the object id:
USE SSBTestDBTarget
GO
SELECT name AS service_queue_name
FROM sys.service_queues
WHERE object_id = 53575229
GO
(1 row(s) affected)
Service1Queue
The SQL script below shows in detail how to create an event notification service to receive notification messages when a user queue being watched gets disabled. A stored procedure is defined to process the posted notification by emailing the administrator about which queue is disabled and ask him/her to jump in to identify and eliminate the problem so normal message processing on the queue can be resumed.
-- create the event notification queue to receive queue_disabled events
CREATE QUEUE [QueueDisabledNotifQueue];
GO
-- create the event notification service for receiving queue_disabled events
CREATE SERVICE [QueueDisabledNotifService]
ON QUEUE [QueueDisabledNotifQueue]
([https://schemas.microsoft.com/SQL/Notifications/PostEventNotification]);
GO
-- create queue-disabled event notification to watch on 'Service1Queue'
CREATE EVENT NOTIFICATION [QueueDisabledNotif]
ON [Service1Queue]
FOR BROKER_QUEUE_DISABLED
TO SERVICE [QueueDisabledNotifService], 'current database'
GO
-- define the stored proc to process queue_disabled notifications
CREATE PROCEDURE [QueueDisabledNotifHandler] AS BEGIN
DECLARE @messageType SYSNAME
DECLARE @conversationHandle UNIQUEIDENTIFIER
DECLARE @messageBody XML
WHILE(1=1)BEGIN
BEGIN TRANSACTION
WAITFOR (
RECEIVE TOP(1)
@messageType=message_type_name,
@messageBody=message_body,
@conversationHandle=conversation_handle
FROM [QueueDisabledNotifQueue]), TIMEOUT 5000
IF( @@ROWCOUNT = 0 ) BEGIN
COMMIT TRANSACTION -- rollback is inappropriate here as it'll be counted by queue disabling logic for poison message detection
BREAK
END -- if( @@rowcount ... )
IF( @messageType = 'https://schemas.microsoft.com/SQL/Notifications/EventNotification' ) BEGIN
DECLARE @cmd NVARCHAR(MAX)
SET @cmd = 'dbo.sp_send_dbmail
@profile_name="Administrator",
@recipients="joesmith@my-company.com",
@body="CAST(@messageBody as NVARCHAR(MAX)",
@subject="Queue Disabled Detected";'
EXEC (@cmd)
END -- if( @messageType ... )
COMMIT TRANSACTION
END -- while(1=1)
END
GO
-- kick off queue disabled notification processing
ALTER QUEUE [QueueDiabledNotifQueue]
WITH ACTIVATION (
STATUS = ON,
PROCEDURE_NAME = QueueDisabledNotifHandler,
MAX_QUEUE_READERS = 1,
EXECUTE AS SELF )
Comments
Anonymous
June 16, 2009
PingBack from http://fixmycrediteasily.info/story.php?id=5194Anonymous
March 01, 2010
It seems like option B is pretty close to impossible when your error leaves you with XACT_STATE() = -1. What is the recommendation for solving this? In order to avoid the default poison message handing, is it recommended to simply rollback and immediately end the conversation with error?Anonymous
March 02, 2010
When the transaction becomes doomed the queue is probably going to become disabled due to poisoning, However, this is almost always due to something that the activated proc is doing, such as converting to XML in the RECEIVE statement. So the first thing to do is to fix that.Anonymous
May 31, 2010
The comment has been removedAnonymous
May 31, 2010
In addition to my previous comment: This must work always on Microsoft SQL Server 2008 R2 and above when queue created with option POISON_MESSAGE_HANDLING (STATUS = OFF)Anonymous
September 07, 2011
This example should have never worked, as the activation happens on an impersonated security context you can't access any resounces (like the one in msdb) outside your database. At least not without jumping through security hoops :)Anonymous
September 16, 2012
Option B is certainly feasible, even for actions that may cause a doomed transaction (exact_state() = -1). You need to follow a somewhat different strategy, using a timer message and a table to store the message for as long as it has not been processed yet. This method was presented by Remus Rusanu on his blog. Steps to follow are:
- start transaction
- waitfor( receive) If a message comes in that you need to process:
- insert the message received into a table using the conversation handle as the key value
- start a dialog timer for a few seconds into the future
- commit the transaction
- start a new transaction
- delete the row you just inserted
- attempt to process the message
- reply (or do whatever is needed for your message protocol)
- on success commit the transaction
- on failure rollback the transaction.
- continue to the next message If a DialogTimer message comes in:
- use the conversation handle to retrieve the row from the table, place an exclusive lock on the row using with (updlock, holdlock) or use an update statement to retrieve the values into local variables which places an update lock automatically,
- if no row was found (@@rowcount = 0), commit and continue to next message
- if row was found delete the row from the table
- process the message
- reply (or do whatever is needed for your message protocol)
- on success commit the transaction
- on failure rollback the transaction.
- continue to next message Other messages (f.e. EndDialog and Error) are processed as you would do normally. This way the poison messages will be removed from the queue, so the process can continue processing all correct messages. And all poison messages will be kept in the table for you to review them. The conversations for these poison messages will even still be valid, so you can decide to modify the messages and have them retried to your liking. You can also have the messages retried automatically multiple times by re-setting the timer upon receipt of each DialogTimer message too. The table can be used to retrieve the number of retries attempted already (or the number of retries left). You will have to split the DialogTimer code into 2 transactions, similar to what I've shown above: first update the retry counter on the row, start the timer and commit, then start a new transaction, delete, process and commit.