Building Reliable, Asynchronous Database Applications Using Service Broker
Roger Wolter
February 2005
Updated June 2005
Applies to:
Microsoft SQL Server 2005 Service Broker
Summary: Microsoft SQL Server 2005 Service Broker is a new platform for building distributed asynchronous database applications. Including an asynchronous, reliable messaging feature in the SQL Server database makes it possible to build a variety of database applications that were difficult, if not impossible, to build before. Service Broker and the other developer features in SQL Server 2005, such as common language runtime (CLR) integration and the XML data type, combine to make SQL Server 2005 a great platform for building highly scaleable database applications. (13 printed pages)
Contents
Introduction
Why Write Asynchronous, Queued Applications?
Why are Queued Applications Hard to Write?
Queuing at the Grocery Store
Queuing at the Airport
Service Broker Programming
Why Do Messaging in the Database?
Scenarios
Conclusion
Introduction
One of the major pushes in the development of Microsoft SQL Server 2005 was to enable reliable, scalable, and functionally rich database applications. CLR (common language runtime) integration made it possible for developers to incorporate significant business logic into stored procedures and new functionality in Transact-SQL and XML expanded the available range of data manipulation and storage functions available to the developer. The other significant feature in this area is SQL Server Service Broker, which adds reliable, scaleable, distributed, asynchronous functionality to database applications.
Why Write Asynchronous, Queued Applications?
When we talked to people who have successfully built large, scalable database applications, we found that almost all of their applications had one or more operations that were executed in an asynchronous, queued manner. Stock trading systems queue the settlement activity so it can execute in the background while the front end goes on to handle other trades. Order entry systems put the shipping information in a queue to be read later by a shipping system that is running on another server—often in another location. Travel booking systems make the actual reservations after the customer has completed the itinerary and a confirmation e-mail message is sent after the reservations are made. In all these cases, the responsiveness of the system is improved by doing much of the work asynchronously so the interactive user doesn't have to wait for it to complete before receiving a response.
In most large systems, a careful analysis will uncover parts of the processing that can be done asynchronously. While the queued parts of the application don't need to be done immediately, the system has to ensure that these tasks are done reliably. Service Broker makes the execution of asynchronous, queued tasks reliable and simple to implement.
Another advantage to making parts of your application asynchronous is that this part of the processing can be done when resources are available. If the shipping tasks in an order entry system are executed from a queue, the shipping system doesn't have to be able to handle the peak order load. In many order entry systems, the peak order rate might be two or three times the average order rate. Since the order entry system must be able to handle the peak load, at slack times there is a significant amount of unused processing power available. If asynchronous tasks are queued during peak times and executed during slack times, system usage efficiency is significantly increased.
Why are Queued Applications Hard to Write?
If asynchronous, queued applications have so many advantages, why aren't all applications written that way? Because writing them is hard! Many developers that set out to use a database table as a queue in their application find out that it's a lot harder than it looks. A table that is used as a queue has multiple processes simultaneously inserting, reading, and deleting a small number of records. This leads to concurrency issues, performance challenges, and frequent deadlocks. While many developers have successfully met this challenge, it's a lot harder than it should be. Service Broker addresses these issues by making Queues first class database objects in the SQL Server 2005 database. Most of the gnarly issues associated with writing queues are already handled so the developer can concentrate on writing asynchronous applications rather than writing the queuing infrastructure. The rest of this section goes into the hard problems involved in writing queued applications and explains how Service Broker helps resolve them.
Message Integrity
In many asynchronous, queued applications, the queued messages are actually valuable business objects. For example, in an order entry system, if you put shipments in a queue to be processed later, losing the queued data means that orders won't get shipped. Many reliable messaging systems persist messages to the file system so the loss of a disk drive means the messages are lost. Service Broker messages are persisted in hidden database tables, so all the data integrity features that SQL Server offers for protecting your data work equally well for protecting your queued messages. If you are using database mirroring for disaster recovery, when your database fails over to the secondary site, all your messages are already there and the Service Broker applications continue to run with no data loss.
Multi-Reader Queues
Multi-reader queues are one of the most effective means of scaling out asynchronous applications. To demonstrate this, I would like to show how queuing works in a couple scenarios we're all familiar with.
Queuing at the Grocery Store
Most grocery stores scale out with multiple queues. Each checkout counter has its own queue so when you want to buy your groceries, you have to pick one. If you are like me, most of the time the queue you pick will be the one behind the cart that requires a price check on every other item and the customer who is paying with a post-dated third-party check. Customers that were still at home when you got in line will be done paying for their groceries before you get to the front of your line. This demonstrates one problem with scaling out by using multiple queues—a task queued behind a long running task doesn't get serviced promptly.
The other major issue with multiple queues is that adding a queue requires the rebalancing of tasks between queues, which can lead to a lot of wasted movement between queues. Think of the grocery cart demolition derby that happens when a new checkout counter opens.
Queuing at the Airport
While I hesitate to use airlines as an example of how to do things efficiently, the way most ticket counters work is a better model for efficient queuing than the grocery store. This is because multiple ticket agents service the same queue. There is only one queue so you don't have to worry about getting in the wrong one. If a particular passenger takes a long time, the other ticket agents can continue to service the line (assuming there is more than one ticket agent).
The single queue with multiple readers also scales without difficulty. If the line is getting too long, other agents can be added without disrupting the line. Agents can also leave after servicing their current passenger without causing too much chaos in the queue.
At the risk of stretching this analogy too far, we can use the airport queue to illustrate another common issue with queue-based applications. Think about what happens when several people in the line are part of a group. For example, my family is checking in for a trip. The family is scattered throughout the line because we all arrived at different times. If we want to sit together, an agent will have to reserve a block of seats. If my wife and I end up at different ticket agents at the same time, I might reserve five seats in row 4 and my wife five seats in row 47. This is one of the key problems with multi-reader queues—if related messages are processed on different threads simultaneously, coordination can be difficult. Think of an order header and an order line being processed simultaneously. The reader processing the order line would think there was no header for the order because it wasn't in the database yet. In order to work correctly, the order line would have to retry the check for the order header multiple times to ensure that it was delayed and not missing.
An easier way to make this work would be if the first person to get to a ticket agent called all of the related passengers to the front so they could all be processed by the same ticket agent. Service Broker does a similar thing by putting a lock on related messages when one of the messages is received. The reader holding the lock can receive any of the messages in the queue that belong to the same group but no other reader can read one of them. The lock is held until the transaction it was taken in commits. This lock is called a conversation group lock. A conversation group is a developer-defined grouping of related messages. For example, all of the dialogs necessary to process a particular order—order header, order lines, inventory, shipping, billing, etc.—might be put into the same conversation group. When a message from one of these conversations is read, a lock is placed on the group so that only the reader holding the lock can process any of the related messages in the queue. It's important to note that only the messages from a single group are locked. There may be hundreds of groups being processed simultaneously but each group is processed on only one thread at a time. The lock lasts until the transaction that it was created under commits or rolls back.
The last issue I would like to illustrate with this analogy is what happens when another message from a conversation group arrives after the transaction holding the conversation group commits. In the case of the ticket counter analogy, this would be one of my kids getting to the airport after all the rest of us have checked in. Since the initial transaction is over, the last passenger may get processed by any of the ticket agents. The only way the new agent will know where to seat this passenger is if the initial agent left a note indicating where the rest of the party was seated. In the same way, once a transaction that processes a related group of messages is complete, it must record the "state" of the conversation so that when the next message belonging to this group arrives, the queue reader that receives the message will know where the previous transaction left off. Since this is a database application, the natural place to store this state is in a database table. Service Broker provides a handy way to tie the state of the conversation to the messages in the conversation—the conversation group ID. This is a uniqueidentifier that appears with every message in the conversation group. If this uniqueidentifier is used as a key in the table where the state is stored, it's easy for the message processing logic to find the state that is associated with every message it receives. Also, because only one queue reader can handle messages from a particular conversation group at a time, the developer doesn't have to worry about a state row being updated by two transactions at the same time, thereby causing the loss of state information.
As you can see from these examples, a multi-reader queue is a simple, effective way to scale out a large application. The conversation group locking mechanism provided by Service Broker makes writing applications that use multi-reader queues as easy to write as applications that use a single-reader queue.
Distribution
So far, we have been talking about queues as if they all exist in a single database. To build the kind of loosely coupled, distributed database applications required for many business scenarios, we have to expand this to include queues in many databases scattered across a network that is communicating through reliable messaging. We need reliable messaging because it doesn't make sense to use a database to ensure the integrity of messages in a queue and then take a chance on losing them when they are transferred to another database.
Service Broker uses a reliable messaging protocol called the dialog protocol to ensure that messages sent to a remote queue arrive exactly once and in order. Just as dialogs are bidirectional conversations, the dialog protocol supports message transfer in both directions simultaneously.
Dialog messages have a header that ensures that the message gets delivered securely to the proper destination in the correct order. It contains a sequence number, an identifier for the dialog it is in, the name of the service it is being sent to, security information, and some other information that is used to control message delivery. When a destination successfully receives a message, it acknowledges receipt of the message so that the source knows it was delivered successfully. Whenever possible, this acknowledgement is sent in the header of another message going back to the source so the number of messages is minimized. If the source doesn't receive an acknowledgement within a time limit, the message is resent until it is delivered successfully.
Message delivery systems often have problems delivering large messages. A gigabyte-sized message can take several minutes to send, which can effectively tie up a network connection for a significant amount of time. If a network error causes the message to be resent multiple times, network performance can be severely affected. The Service Broker dialog protocol deals with this issue by splitting very large messages up into several smaller fragments which are sent individually. If a network error causes a resend, only the message fragment that failed is resent. This is the reason Service Broker is able to support 2-GB message sizes while many reliable messaging systems can only send messages of 100 MB or less.
Transactional Messaging
"Exactly once" message processing requires transactional messages. To see why this is true, think about an application that crashes part way through processing a message. When the application restarts, how does it know whether to process the message it was processing when it crashed? The database may have already been updated with the results of the message processing already so that reprocessing the message may duplicate the data. As you can see, the only safe way to handle this is to make receiving the message part of the same transaction that updates the database. That way, if the system crashes, both the database update and the message receive are rolled back so the database and the message queue are in the same state as they were before the crash.
Because all Service Broker operations take place in the context of a database transaction, the transactional integrity of messaging operations is preserved. A typical Service Broker message-processing transaction will have the following steps:
- Begin transaction.
- Receive one or more messages from a conversation group.
- Retrieve the state of the conversation from the state tables.
- Process the messages and make one or more updates to application data based on the message contents.
- Send out some Service Broker messages—either responses to incoming messages or messages to other services required to process the incoming message.
- Read and process more messages for this conversation group if any are available.
- Update the conversation state tables with the new state of the conversation.
- Commit the transaction.
The powerful part of Service Broker transactional messaging is that if the system crashes or the application fails, the transaction rolls back and everything is back to the way it was when the transaction started—the state is unchanged, the application data is unchanged, no messages are sent, and the received messages are back on the queue. This makes error handling in this type of application very straightforward.
Queue-Reader Management
Message processing in a Service Broker application starts when the queue reader receives a message from a queue. Since messages are always pulled from a queue, the receiving application must be running when messages arrive in a queue. This is an issue with many asynchronous messaging applications—how do you ensure that the queue reader will be running when it is required? The two traditional approaches have been to make the queue reader a service that runs continuously or to use triggers that the messaging system fires when each message arrives. The Microsoft Windows service approach means that an application is running even when there are no messages to process. The trigger approach can have performance issues because the queue reader starts and stops frequently.
Service Broker takes a middle ground approach to queue-reader management called activation. To set up activation, the database administrator (DBA) associates a stored procedure with a Service Broker queue. When the first message arrives in the queue, the activation logic will start the specified stored procedure. The stored procedure is responsible for receiving and processing messages until the queue is empty. Once the queue is empty, the stored procedure can terminate to save resources.
If Service Broker determines that messages are being added to the queue faster than the stored procedure is able to process them, the activation logic will start additional copies of the stored procedure until either the stored procedures keep up with the incoming rate or the maximum number of stored procedures that are configured for the queue is reached. Since the number of queue readers servicing the queue expands and contracts as the incoming message rate changes, the right number of queue readers will be running at all times.
Service Broker Programming
Service Broker programming is designed to be familiar to a database programmer. Configuring a Service Broker application is done with the familiar CREATE, ALTER, and DROP data definition language (DDL) statements that are used to configure other database objects. The commands to create Service Broker dialogs and to send and receive messages on them are Data Manipulation Language (DML) extensions to the Transact-SQL language. The receive command syntax is similar to a select command and it returns a rowset containing messages, just as a select command returns a rowset containing rows. Developers accustomed to Transact SQL programming will find it very easy to learn Service Broker programming. The client APIs that are used to program the Service Broker are the same as the APIs used for all database programming—OLE DB, ODBC (Open Database Connectivity), ADO (ActiveX Data Objects), ADO.NET, and so on.
Why Do Messaging in the Database?
One of the questions frequently asked about Service Broker is, "Why build messaging into the database? Aren't there already enough reliable messaging systems out there?"
I hope that the information presented in this paper has helped to explain the decision to build Service Broker into the database engine, but here are a few more reasons why it makes sense to have messaging in the database:
- Single client connection for messages and data. In addition to the unified programming model mentioned in the previous section, this offers some other substantial advantages:
- An application can receive messages transactionally when running on any client that can connect to the database. Many messaging systems only allow transactional receives when the receiver is running on the same computer as the queue.
- Transactional messaging doesn't require a distributed transaction or two-phase commit, unlike message systems that don't store messages in the database.
- Integrated management, deployment, and operations between data and messages. All the tools and techniques you use to safeguard and manage your database data apply equally well to messages:
- Backing up and restoring the database also backs up and restores queued messages.
- If you use clustering or database mirroring to protect your database from failures, your messages enjoy the same protection.
- Since queues have relational views available, it's simple to find out what's happening in the queue. Want to know how many messages are in the queue? Select count(*) from queue. Want to know which messages haven't been delivered yet? Select * from sys.transmission_queue. If your messages contain XML data, you can use XQuery to search them. You can join messages in queue with state information and even data tables to find out the complete status of a particular order in your order entry system.
- There are also some significant performance advantages for messaging built into the database.
- As previously mentioned, two-phase commits are not required for transactional messages.
- The message updates, the state changes, and the data updates are all logged in the same transaction log so only a single log write is required to commit the transaction.
- Reliable message delivery usually transfers the message from a transmit queue to a receive queue. If Service Broker detects that the receive queue is in the same database instance as the transmit queue, the message is put directly onto the receive queue, thereby saving extra I/O and a transaction commit.
Scenarios
Now that we've seen how Service Broker works, let's look at a few of the many possible applications that you can build with Service Broker.
Order Entry
Order entry is a commonly used scenario because it's one process that just about everyone understands. Even developers who haven't worked on an order entry system have at least used one to order something.
The Service Broker-based order entry system uses queues to connect the subsystems of the system together. This not only increases parallelism and improves throughput, but it also provides a great deal of flexibility in configuration and architecture.
In this scenario, Service Broker is used to connect four loosely coupled services used to process each order. As the order entry service inserts the order headers and order lines into the database, it queues messages to the billing, shipping, inventory, and credit limit services to finish processing the order. Service Broker allows all four services to run in parallel, improving the response time of the system.
Depending on the business requirements of the system, the order entry service can either wait for responses from all four services before it returns to the user, or it can return as soon as the initial transaction is committed and let the rest of the services execute in the background. Implementing either behavior is a minor change to the order entry service. The other services are the same in either case.
Using Service Broker to link the services also offers several deployment options. All five services can run on the same server or they can be split up across up to five servers (or more if services are load balanced) as required to provide the required throughput. The service can also be deployed as stored procedures or as external applications. This flexibility is especially attractive to Independent Software Vendors (ISVs), who can create a single code base and deploy it in a wide variety of configurations depending on the customer's performance, redundancy, and throughput requirements.
Figure 1 illustrates the relationships among the queues and services that constitute the loosely coupled order entry system.
Figure 1. Loosely coupled order entry system
Parallel Stored Procedure
When triggers were first invented, database companies used to talk about using a trigger on an inventory table to automatically place a purchase order when the inventory level dropped too low. While this was a great concept, very few systems use it because the extra overhead of executing a lot of code in a trigger makes the database update too slow. Service Broker makes this kind of application possible by allowing the trigger to queue the work to be done so that the trigger can complete and allow the original transaction to commit. The queued work is then done later in a separate transaction—maybe even in a different database. SQL Server 2005 uses this model for query notifications and event notifications.
Service Broker also allows a stored procedure to start several other stored procedures in parallel. This can significantly improve response time. For example, consider a call center application that uses caller-ID on incoming calls to look up all the information about the customer who is calling so that the customer service representative has all the relevant information available. In many cases, this information must be obtained from different databases on different systems. Using remote queries to obtain all this information is possible but the response time may not be acceptable. The Service Broker can queue requests to all the remote services simultaneously and wait for the results in its input queue. Since all the requests are processed in parallel, the overall response time is improved. Figure 2 illustrates this processing.
Figure 2. Parallel stored procedure
Batch Processing
One of the significant use cases for Service Broker applications is a large batch-processing system. Most batch processes are made up of many small, semi-independent processes that must be scheduled and coordinated. Independent execution of sub processes improves throughput by allowing each subsystem to execute at its own optimal pace.
In the example illustrated in Figure 3, the inputs to the batch planning process—orders, forecasts, returns, etc.—are accumulated in input queues throughout the day. When the planning engine runs, it reads inputs from the queues, analyzes them, and then queues requests to the sub processes that process the plan outputs. The output queues allow the sub processes to execute independently, in parallel on one server or on a number of servers. This allows the process to scale out to as many servers as are required to handle the processing load.
Figure 3. Batch processing
Travel Booking
In a previous life, I taught a distributed database class for a Master's degree program. One example I used for distributed transactions was a travel agent that booked plane reservations, hotel rooms, and rental cars as part of the same transaction because the customer wouldn't want to book the hotel if the airline trip wasn't available. Like many examples, this worked until a programmer from a real travel agency showed up in class one day. He told me that no real travel agency could work this way. Airlines won't let anyone hold locks on their reservation tables and seat maps while an agent looks for a hotel reservation. Reservations are made based on current availability and if the seat or room is gone when the real reservation is made, the customer is called back to make sure the new reservation is OK.
In the Travel Booking Web Site scenario, bookings are made based on data in the flight and hotel availability tables. These tables are populated with information from the airlines and hotels. The information is updated frequently but it is always a little out of date. The actual booking doesn't take place until after the customer has completed booking his trip and possibly even after he has logged off.
This kind of delayed activity is ideally suited to the Service Broker architecture. The transaction that records the customer's travel arrangements commits messages to back-end services that do the actual booking. The booking services read messages from the queue and process each booking in a separate transaction. The booking service communicates through a variety of protocols to the systems that book the airline and hotel reservations. These communications might include XML Web services, SNA (Systems Network Architecture), HTTP, EDI (Electronic Data Interchange), faxes, Service Broker, etc. Because the input to the booking service is queued, the different latencies of these protocols aren't a problem. If the queue gets too long, Service Broker activation can start more queue readers to handle the load. If the volume reaches a point where the booking servers can't keep up, more servers can be added simply by adding rows to the routing tables.
When all the reservations for a particular itinerary have been booked successfully, a message is queued to the SQL Mail server to send a confirmation e-mail message to the customer. If one or more of the reservations fails, a customer service representative is notified to help the customer rebook the trip.
The Process Updates service runs in the background and periodically receives availability information from hotels and airlines. This information is massaged into a common format and then published to the servers in the Web farm through Notification Services to update their availability tables.
The loosely coupled Service Broker architecture allows Web servers to be added to the server farm by loading the software, configuring the database, and subscribing to the availability feeds. This can all be scripted so it happens with a minimal amount of manual intervention. To add more back-end servers to the booking services, the service must be installed and configured and then the new server address must be added to the routing tables in the Web servers.
Figure 4. Travel booking
Conclusion
SQL Server 2005 Service Broker is a new platform for building distributed asynchronous database applications. Including an asynchronous, reliable messaging feature in the SQL Server database makes it possible to build a variety of database applications that were difficult, if not impossible, to build before. Service Broker combined with the other developer features in SQL Server 2005 such as CLR integration and the XML data type, make SQL Server 2005 a great platform for building highly scaleable database applications.