Optimizing Performance in a Microsoft Message Queue Server Environment

 

May 1998

Microsoft Corporation

Summary: Message queue-based communication using Microsoft® Message Queue Server (MSMQ) offers applications the promise of extremely fast communication. This guide helps the application developer and systems administrator to understand the aspects of application design that affect performance most and use MSMQ to optimize application performance.

Contents

Introduction
Application Design
Hardware Configuration
MSMQ Performance Measurements
For More Information

Introduction

Message queue-based communication using Microsoft® Message Queue Server (MSMQ) offers applications the promise of extremely fast communication. In fact, the performance of MSMQ can meet and exceed the performance of most other communication technologies on Microsoft Windows NT® operating system. That said, optimal performance is not automatic. The application developer and systems administrator both have important roles to play. This article will help developers understand what aspects of application design affect performance the most. And administrators will find valuable information regarding hardware and network configuration. Finally, a series of tables lists the results from benchmarks run by Microsoft. These results will help developers and administrators understand the impact of design and configuration alternatives.

Application Design

Because MSMQ is designed to provide a number of automatic performance optimizations, most MSMQ applications will perform reasonably well with no special design attention. It is the rare MSMQ-based application, however, that will not benefit from one or more of the optimizations that developers can exploit. Also, there are MSMQ features that involve processing and network overhead; if they are not needed, simply knowing how to turn them off can provide significant performance gains. This section lists many of the optimizations that MSMQ provides and describes the overhead of options that may not need to be used all of the time.

Single Machine Optimizations

Any time sending and receiving applications occurs on the same machine as the queues they are using (for example, local applications), performance will be better than when a network connection is required. The simple reason is that operations that involve moving or copying messages (for example, data) can occur faster at memory speeds than at network speeds. In addition, MSMQ provides a performance optimization to local applications that remove messages quickly from their queues. If a receiving application is running and waiting to receive a message, MSMQ copies the message from the sender directly into the receiver's incoming message buffer. As long as the receiver removes the message quickly, MSMQ can avoid the overhead of writing the message to a queue.

Recoverable Mode Considerations

MSMQ supports two modes of message delivery: express and recoverable. Express messages are stored in RAM memory during routing and delivery, providing extremely fast performance but no recoverability when machines fail. Recoverable messages are written to disk during routing and delivery, making them somewhat slower than express messages but ideal when failures cannot be tolerated and when machine shutdowns are expected to occur while messages remain in queues (for example, a mobile application running on a laptop). If a machine crashes immediately after MSMQ accepts a message for delivery, but before the message is delivered to the target queue, MSMQ will find the message on disk when service restarts and will resume the sending process automatically. In a similar fashion, when an application reads a recoverable message from a queue, MSMQ makes a record of the read operation on disk. Even if the machine crashes immediately after the read operation occurs, MSMQ will not deliver the same copy of the message again when the machine restarts (although another copy of the message may be resent by the sending queue manager; transactional queues are required for once-only delivery). Understanding more about the way MSMQ implements recoverable messaging enables developers to write significantly faster applications.

MSMQ 1.0 stores all messages in queues based on a memory-mapped file structure. In the case of recoverable messages, MSMQ flushes changed memory segments to disk before confirming a successful outcome to the application (express messages use the same memory-mapped structure except MSMQ skips flush-to-disk operations). That said, MSMQ is designed to minimize write operations and is able to avoid writing recoverable messages to disk when the MSMQ queue manager is able to confirm message delivery to the receiving application within a short internal timeout period. This condition generally occurs when the receiving application is online waiting for messages to arrive, network connections to become available, and the target queue depth to get close to zero.

When the sending queue manager and the target queue are on the same machine (the same queue manager is used) one write operation is avoided. When sending queue manager and the target queue (and its associated queue manager) are on different machines, two write operations can be eliminated. Because remote reading from the target queue does not affect this behavior, the receiving application can be on a different machine than the target queue in either case. Using efficient dequeuing techniques, such as asynchronous notifications, will maximize the chance that queue depths stay close to zero, thereby enabling this optimization to occur.

The Multiple Sender Optimization

As mentioned in the preceding section, MSMQ periodically needs to write information to disk, and confirm that the write operation occurred, before reporting a successful operation to the calling application. When a single application, with a single thread, uses the MSMQ service on a given machine, the performance of the application effectively is limited by the time it takes to write information to the disk. When there are multiple threads (which can be part of the same or different applications) sending messages, however, MSMQ 1.0 is able to batch write operations and minimize the impact of disk delays.

For example, as shown in Table 3, a benchmark application is able to send 587 messages per second using one thread to send messages and 834 messages per second with three threads sending messages. The same improvement is seen when three copies of the single-threaded version of the application are started. Increasing the number of sending applications/threads will continue to improve message throughput up to about 50 applications/threads per machine. With 50 senders, for example, it is reasonable to expect up to six times the number of messages per second possible from a single application/thread.

MSMQ essentially combines the disk write operations required by multiple sending threads into a single operation. Because the time it takes to write extra data in an existing write operation is much less than the time required to begin, execute, and confirm a second (or third, and so on) stand-alone write operation, this optimization dramatically reduces the average time that it takes to send a message. This optimization does not affect the recoverability of send requests because no single application thread ever has more than one pending disk write request.

MSMQ 1.0 does not implement an equivalent optimization for read operations. Therefore, the throughput of applications that read from queues will remain largely unaffected by the number of applications/threads running on a single machine.

Transaction Coordination Alternatives

MSMQ further differentiates message types by segmenting recoverable message types into transactional and nontransactional subtypes. Transactional messages can only be manipulated (for example, sent, received, and so on) from within transactional units of work and roll back to their prior state when transactions abort. Including sending and receiving operations in a transaction enables developers to:

  • Ensure that multiple message queuing operations occur together or are rolled back to appear that none occurred. For example, a developer may not want to send a message to a checking account application unless a corresponding message to a savings account system can also be sent.
  • Ensure that message queuing operations succeed or fail along with other actions that occur in the transaction. For example, application logic may require that a database update should not occur if a message cannot be sent to an auditing application.

As a general note, the performance of applications that use transactions will be slower than those that don't because, due to the nature of the two-phase commit protocols used to implement transactions, messages actually get written to disk twice, and greater CPU resources are required. In order to optimize resource consumption, however, MSMQ provides two ways for developers to use transactions:

  • External transactions, coordinated by the Microsoft Distributed Transactions Coordinator (DTC)
  • Internal transactions, coordinated by MSMQ itself

Both options enable applications to include a set of MSMQ operations in a single transaction. However, external transactions are required if resources other than MSMQ need to be part of the transaction. The online documentation titled MSMQ SDK Help contains information and code examples for both types of transactions.

Internal transactions are provided by MSMQ to provide optimized performance to applications that need transactions for message queuing operations but do not require other resources to be coordinated. For these applications, MSMQ internal transactions are significantly faster than external DTC-based transactions. For example, as shown in Table 6, a benchmark application is able to send 101 messages per second using external transactions and 283 messages per second using internal transactions.

In the case of both internal transactions and external transactions, disk activities take place only when the application attempts to commit the transaction. Performance (in terms of messages sent per second) will therefore improve as the number of message operations per transaction rises. While application logic requirements usually dictate the boundaries of a transaction, developers should look for ways to reduce the total number of transactions required to send a group of messages to maximize performance.

Also, because MSMQ transactions use recoverable messages, all of the guidelines included in this White Paper for optimizing recoverable mode operations apply to applications that use transactions as well. However, because transactions require more CPU resources than most MSMQ operations, disk performance has less impact on the performance of transactional operations than nontransactional operations. This effect can be seen in Table 5; with five sending threads, CPU use is close to 100 percent. In most recoverable mode benchmark results, CPU utilization is considerably lower.

Locating and Opening Queues

When applications locate queues in the MSMQ Directory Service (for example, using MQLocateBegin()) or open a public queue for reading or writing (for example, using MQOpenQueue()), MSMQ makes a remote procedure call (RPC) over the network to one of the site's directory servers. Because network operations are always expensive from a performance perspective, applications that minimize locate and open calls will perform better than those that don't. For example, applications can locate and open queues once they start instead of opening and closing queues for each message.

Authentication and Encryption Considerations

By default, MSMQ does not encrypt or authenticate messages. When sending authenticated messages, MSMQ computes a hash value of each message using the sender's cryptographic certificate and attaches the value to the message. When the message reaches the destination machine, MSMQ recomputes the hash value and compares it to the value attached to the message. If the values match, the MSMQ can verify that the message came from the sender listed on the message and that the message was not tampered with during delivery. MSMQ is also able to encrypt messages using certificates to ensure that message contents cannot be viewed or changed by unauthorized applications. Regarding performance, both authenticating and encrypting messages incur a performance cost. This cost is primarily against CPU resources and is difficult to predict in a general sense.

Message Header Considerations

MSMQ messages are composed of a body and a set of properties contained in a header. Because longer messages take more time and resources to deliver than shorter ones, minimizing the size of messages will help to optimize performance. Minimizing the size of the message body will usually have the greatest effect, but in cases where the body is small, the size of the header can represent a significant percentage of a message's size. MSMQ helps to minimize header sizes by sending properties that are set explicitly by the application; other properties assume their default values.

One property that MSMQ does send by default is the sender's Security Identifier (SID). By default, MSMQ inserts the Windows NT SID of the sender's security context into the PROPID_M_SENDERID property of each message. Normally, a Windows NT SID is about 28 bytes in length, but may be as long as 68 bytes in cases where there are many subauthorities listed (for example, up to 50). To specify that the sender's SID should not be sent, the sending application should set the PROPID_M_SENDERID_TYPE property on the message equal to MQMSG_SENDERID_TYPE_NONE (the default is MQMSG_SENDERID_TYPE_SID). The minimum size of the message header for MSMQ 1.0, with no SID or other properties specified, is approximately 136 bytes.

It is important to note that the sender's SID can be trusted by the receiver only when using authenticated MSMQ messages. The Windows NT SID of a user is not a secret; malicious applications could form an MSMQ message with the SID of another user. Only authenticated messages, where MSMQ signs the message with the sender's certificate, ensure that the SID contained within the message can be trusted. In general, if the receiving application doesn't use the PROPID_M_SENDERID property, or the message is not sent in authenticated form (where MSMQ requires the SID for processing, and can verify the SID), applications can improve performance by not sending a SID.

Acknowledgment Messages

Applications can direct MSMQ to generate acknowledgement messages that report the success or failure of an individual message's delivery. By default, MSMQ doesn't generate acknowledgments, but applications may request acknowledgments by setting the PROPID_M_ACKNOWLEDGE property on a given message. MSMQ treats acknowledgement messages like any other message from a delivery perspective and using acknowledgements will consume resources and lessen performance. Because applications can select acknowledgement messages on a message-by-message basis, developers should use them only when required.

Journaling Considerations

MSMQ makes it easy for applications to keep copies of messages in journal queues. Messages can be journalled on the sending machine on a message-by-message basis or on the receiving machine on a queue-by-queue basis. MSMQ always uses recoverable messages for journal operations and all performance guidelines for recoverable messages apply.

Like other applications and services that run on Windows NT Server, the MSMQ basic performance characteristics are heavily influenced by the performance of the underlying Windows NT operating system. In that sense, it is important to ensure that machines have the resources required to run Windows NT efficiently before focusing on MSMQ performance. Once basic Windows NT requirements are met, administrators need to understand the minimum resource requirements of MSMQ. These requirements must be addressed in order for MSMQ to operate efficiently as a service. Here, the MSMQ Administrator's Guide is an excellent resource. Most importantly, applications affect the way that MSMQ consumes specific system resources, such as RAM memory, in predictable ways. Optimizing the configuration of these resources can have a dramatic effect on the performance of MSMQ-based applications. The following sections identify MSMQ performance characteristics that are affected by applications and suggest ways to improve performance by way of resource configuration.

Hardware Configuration

System Memory Size

As mentioned earlier, MSMQ supports two types of messages—express and recoverable. Recoverable messages are always written to disk (so that MSMQ can recover them in the event of a machine failure) and express messages are kept entirely in RAM memory while awaiting routing and delivery. In both cases, however, working copies of messages are kept in RAM memory—MSMQ only accesses disk-based copies of recoverable messages in the event of a failure. When RAM memory is exhausted, Windows NT has to swap memory pages out to disk, which degrades performance.

To maximize performance, there should be enough memory on a given machine to hold all of the messages that are expected to accumulate in its queues under normal operation. Messages may accumulate on the sending machine (for example, if the target machine is unreachable), on the target machine (for example, if the receiving application is not running, or is unable to keep up with the arrival rate of messages), or on intermediate routing servers.

Calculating the amount of RAM required to hold all messages requires an understanding of message sizes. The size of a message is the sum of the size of the message body and the size of the data kept in the message header. Message headers typically contain approximately 150 bytes of data, although the actual size of a given header is dependent on the number and size of the properties used by the application. Therefore, when sending 20,000 messages of 1 Kilobytes each and typical headers, it would be best to have at least 23 MB (20,000 X 1 KB + 20,000 X 150) of available RAM beyond minimal system requirements. Note that this recommendation applies only to cases where messages actually accumulate on a machine. If messages are normally dequeued and processed as quickly as they are delivered, significantly less RAM will be required.

Number of Hard Disks

Because Windows NT and MSMQ are able to perform many disk I/O operations in parallel, configuring system, application and MSMQ components to use separate physical disks (as opposed to single disks with multiple partitions) will result in performance improvements for most MSMQ-based applications. Applications that use recoverable messages will see the most improvement because messages are constantly being written to, and read from, disk storage.

The MSMQ Administrator's Guide, under the section titled Improving Messaging Performance with Multiple Disks, describes in detail how to improve MSMQ performance by splitting Windows NT, MSMQ components, and the application's database across separate disks. In summary of what is described there, maximum performance will be obtained when five separate physical disks are used for:

  • MSMQ message files
  • MSMQ message log files
  • MSMQ transaction log files
  • Windows NT virtual memory paging files
  • application data files

In addition, when applications use Microsoft Distributed Transaction Coordinator (DTC) or a database such as Microsoft SQL Server™, configuring their log files to use separate disks can also yield performance gains. Locating log files on separate disks is faster because it enables the sequential write operations used by logging algorithms to occur extremely quickly.

In situations where it is not practical to have many disks installed in a machine, it is important to note that three disks will usually provide most incremental performance gains. For example, performance from a three-disk configuration will usually be two to three times better than from a single disk configuration. A five-disk configuration will typically be 50 percent faster than with three disks.

Hard Disk Type

Not all hard disks deliver equivalent performance. In particular, high performance disks are available that use hardware striping and battery-protected write-through disk controllers that defer write operations until they can be performed most efficiently. These disks will significantly improve the performance of MSMQ, especially when sending or receiving recoverable messages. While performance of disks will vary by manufacturer and other factors, Microsoft has measured 10 times better message throughput performance from a three-way hardware-striped disk with a 2-MB battery-protected controller as opposed to a standard SCSI II disk.

Registry Key Settings

MSMQ provides the ability for developers and administrators to set many different parameters that affect MSMQ performance through the MSMQ Control Panel applet and the MSMQ Explorer administration tool. These parameters are described in detail in the MSMQ Administrators Guide found in the online documentation. Beyond these parameters, several esoteric aspects of an MSMQ installation can be manipulated through changes to registry keys. Refer to the REGENTRY.HLP file, under MSMQ Parameters Subkey, in the MSMQ Resource Kit documentation for details. As always, modify registry keys with great caution.

This section of the article presents a series of careful performance measurements taken by Microsoft Corporation in its MSMQ performance lab. It is important to note that there are many factors that influence MSMQ performance. In fact, virtually every aspect of an application environment, such as machine configuration and network load, will have some effect on performance. Therefore, any set of measurements (including the measurements below) can only serve as an indication of performance potential and are likely to differ somewhat from those taken in virtually any other application environment.

It is also important to note that the size of the disks and amount of RAM used for the measurements are larger than required by most applications to experience optimized performance. Microsoft selected these configurations to ensure that measurements were not constrained by hardware configurations and to be able to run tests with many messages in each iteration to observe reliable and consistent results.

MSMQ Performance Measurements

Single Machine, High-Performance Disk Scenario

This scenario measures the performance of MSMQ when used between two applications running on the same machine. Because message queuing applications frequently perform queue operations within one machine (such as reading a message from one queue and writing it to another), these measurements are a good indication of MSMQ performance when no network activity is required.

Measurements in this section were performed with the following machine configuration:

  • Compaq Proliant 5000; single 200MHz Pentium Pro processor; 256 MB of system RAM
  • Windows NT 4.0 Server, Enterprise Edition; Service Pack 3.
  • Four physical hard disk drives:
    • Disk1: MSMQ message storage files; 4-GB capacity; eight-way hardware striped disk with Compaq SmartArray 2 controller
    • Disk2: Windows NT 4.0 system page file; 4-GB capacity; SCSI II controller
    • Disk3: MSMQ transactional log; 4-GB capacity; SCSI II controller
    • Disk4: DTC log; 4-GB capacity; SCSI II controller

Measurements

Table 1. Messages Sent, 1 Thread, Express Mode

Metric \ Message size 10 bytes 1,000 bytes 2,000 bytes 4,000 bytes
Messages per Second 7564 6630 5547 4,510
CPU Utilization 100% 100% 100% 100%

Table 2. Messages Received, 1 Thread, Express Mode

Metric \ Message size 10 bytes 1,000 bytes 2,000 bytes 4,000 bytes
Messages per Second 13,986 1,2512 10,026 10,006
CPU Utilization 100% 100% 100% 100%

Table 3. Messages Sent, Recoverable Mode, No Transactions

Metric \ Message size 10 bytes 1,000 bytes 10,000 bytes
One Thread Messages per Second 594 587 302
  CPU Utilization 34% 30% 31%
Three Threads Messages per Second 1,225 834 554
  CPU Utilization 55% 52% 30%

Table 4. Messages Received, Recoverable Mode, No Transactions

Metric \ Message size 10 bytes 1,000 bytes 10,000 bytes
One Thread Messages per Second 1,400 1,251 1,251
  CPU Utilization 32% 33% 25%
Three Threads Messages per Second 1,406 1,393 1,362
  CPU Utilization 33% 34% 12%

The measurements shown in Tables 5 and 6 were performed using a large number of transactions with one message per transaction. As detailed in the section above on transactional performance, sending multiple messages per transaction would have yielded higher message per second results by spreading the per-transaction overhead across several messages. The measurements shown below, however, provide a better perspective of the relative overhead of transactional operations.

Table 5. Messages Sent, Recoverable Mode, Transactions as Indicated

Metric \ Message size 10 bytes 1,000 bytes 10,000 bytes
  DTC Internal DTC Internal DTC Internal
One Thread Messages/Sec 4 175 4 181 4 159
  CPU Utilization 6% 78% 6% 80% 7% 72%
Three Threads Messages/Sec 78 354 87 350 78 286
  CPU Utilization 83% 88% 86% 87% 88% 79%
Five Threads Messages/Sec 100 439 100 406 101 340
  CPU Utilization 99% 93% 98% 87% 99% 86%

Table 6. Messages Received, Recoverable Mode, Transactions as Indicated

Metric \ Message size 10 bytes 1000 bytes 10000 bytes
  DTC Internal DTC Internal DTC Internal
One Thread Messages/Sec 4 221 4 221 4 212
  CPU Utilization 5% 86% 5% 86% 5% 87%
Three Threads Messages/Sec 88 287 77 291 78 287
  CPU Utilization 86% 92% 69% 92% 89% 92%
Five Threads Messages/Sec 107 294 101 283 106 271
  CPU Utilization 100% 92% 100% 93% 99% 92%

Single machine, conventional disk scenario

The measurements shown below in Table 7 were performed on an AST Bravo MS 6266; Pentium II/MMX 266MHz processor; 64 MB of SDRAM (EDO) system memory; 4-GB SCSI II disk(s). While the processor and memory configurations are different from those used in the measurements above (which makes it difficult to compare numbers in absolute terms), Table 7 is included to illustrate two important behaviors:

  • The high-performance disks used in Tables 1-6 have a dramatic effect on operations that are bounded by disk I/O activities.
  • Because of the way MSMQ overlaps disk writes for recoverable messages, MSMQ performance rises with the number of sending processes or threads per machine—even as the number of processes/threads rises significantly.

Measurements

Table 7. Messages Sent, 1,000 Byte Messages, Recoverable Mode, No Transactions

Metric / Number of threads 1 2 5 10 20 50
Messages per Second, 1 disk 38 36 99 199 232 665
Messages per Second, 3 disks 66 110 166 249 332 999

Networked machine scenario

This scenario measures the performance of MSMQ when used between two applications running on different machines linked by a 100 Mbit Ethernet network. The test was performed by starting the sending application while the sending machine is disconnected from the network, letting all messages accumulate on the sender's machine, and then connecting the sender back again to the network. Allowing messages to accumulate makes it easier to gather consistent, conservative performance information but prevents some MSMQ optimizations from occurring. For example, recoverable messages will always be written to disk because delivery cannot occur quickly enough to avoid the write operation. Therefore, in test environments where senders and receivers are both running simultaneously, results can be significantly better.

Measurements in this section were performed with the following machine configurations:

Sender:

  • Compaq Proliant 5000; single 200MHz Pentium Pro processor; 256 MB of system RAM
  • Windows NT 4.0 Server, Enterprise Edition; Service Pack 3.
  • 4-GB capacity; eight-way hardware striped disk with Compaq SmartArray 2 controller.
  • Compaq NetFlex 3, Fast Ethernet Network adapter

Receiver:

  • Compaq Proliant 7000; single 200MHz Pentium Pro processor; 256 MB of system RAM
  • Windows NT 4.0 Server, Enterprise Edition; Service Pack 3.

Three physical hard disk drives:

  • Disk 1: System drive; MSMQ Binaries; 4-GB capacity; SCSI II controller
  • Disk 2: MSMQ message files; six-way hardware striped disk with Compaq SmartArray 2 controller
  • Disk 3: MSMQ Log files; three-way hardware striped disk with Compaq SmartArray 2 controller

Compaq NetFlex 3, Fast Ethernet Network adapter

Measurements

Table 8. Messages Sent, 1 Thread, Express Mode

Metric \ Message size 200 bytes 1,000 bytes 2,000 bytes 4,000 bytes
Messages per Second 2,064 1,867 1,720 1,196
Sending CPU Utilization % 100% 100% 92% 79%
Receiving CPU Utilization % 57% 70% 87% 91%
Network Utilization % 8% 21% 30% 41%

Table 9. Messages Sent, 1 Thread, Recoverable Mode, No Transactions

Metric \ Message size 200 bytes 1,000 bytes 2,000 bytes 4,000 bytes
Messages per Second 1,032 966 860 760
Sending CPU Utilization % 49% 52% 57% 56%
Receiving CPU Utilization % 35% 43% 51% 61%
Network Utilization % 3% 10% 16% 27%

For More Information

For the latest information on Windows NT Server, check out our Web site at https://www.microsoft.com/ntserver/ or the Windows NT Server Forum on MSN™, The Microsoft Network (GO WORD: MSNTS).