Reliable Messaging in a Service Oriented Architecture


Shy Cohen
Alan Geller
Chris Kaler
David Langworthy
Rodney Limprecht


This document describes the issues around reliable messaging in a Service Oriented Architecture and describes a comprehensive solution based on the Web services specifications. The approach described in this whitepaper introduces a reliable message sequence for the exchange of one or more messages in a transport-independent and reliable manner. Further, in conjunction with the WS-Security Framework, a reliable exchange may provide authentication, privacy, and integrity.

Together the specifications identified in this paper provide a comprehensive and integrated set of protocols for secure reliable messages across service boundaries.

1. Introduction and Motivation

Web services are deployed broadly within and across enterprises and are growing into new domains such as devices. This success has increased demand for functionality, reliability, and security. Reliable communication is essential to mission-critical applications.

End-to-end reliable messaging dramatically reduces the error conditions that an application developer must contend with. For example, in the absence of reliable messaging, messages sent to another service may never reach their intended destination, without any notification of failure to the sender. In other cases, messages may be duplicated or received out of order. These Partial Failure conditions are extremely difficult to detect, manage, and recover from. A reliable messaging system manages this complexity for developers in a uniform way.

The Web services Reliable Messaging specification (WS-ReliableMessaging, or WS-RM) [WSRM] addresses the need for reliable message exchange in a Service Oriented Architecture (SOA). Building these applications using the reliable Service Oriented Architecture provides the additional benefits of interoperability, versioning, security, and transport independence.

WS-ReliableMessaging provides the protocol elements necessary for reliable message exchange. Based on the SOAP processing model, it is designed to compose seamlessly with application messages, creating efficient, reliable application protocols.

Minimizing binding dependencies is an important consideration when designing and deploying services, and WS-ReliableMessaging is specifically designed to provide reliability independent of transport bindings. End-to-end reliability is assured even when communicating over transports with lesser guarantees or in environments where multiple transport connections are required for end-to-end connectivity. WS-ReliableMessaging protocol elements are carried on application messages, in order to identify, track, and acknowledge their successful transfer.

1.1 Reliable Messaging Innovations

The first publication of WS-ReliableMessaging in March of 2003 met the goals outlined above. Seven vendors met and interoperated on this specification in October of 2003. Based on this experience and comments from a feedback workshop in July of 2003, a revised specification was published in March of 2004. The second publication builds upon the first and adds several innovations primarily dealing with performance in enterprise deployments.

Explicit Sequence Creation – The destination of a reliable exchange may allocate the identifier used for the exchange. This added degree of control allows the destination more control over resources. Without this optimization, the destination must maintain state for all delivered messages for the maximum possible transmission delay. For transports such as SMTP, this can be quite long.

Negative Acknowledgements – WS-RM uses an advanced form of sequenced, selective receipt acknowledgements that eliminate performance bottlenecks in high traffic situations. WS-RM now includes negative acknowledgements (Nack), which allow faster recovery from communication failure. An endpoint may Nack a message whenever it thinks there may be a loss without updating the exchange metadata.

Other improvements include providing fault bindings for SOAP 1.1 and 1.2 [SOAP11, SOAP12] and including the highest sequence number allocated on acknowledgement requests.

2. Mechanics of Reliable Delivery

Many errors may interrupt the message exchange process. Messages may be delayed, lost, duplicated, or reordered. Host systems may experience failures and lose volatile state. Connectivity may be intermittent and services may not always be available to participate in message exchange. This section describes how the protocol is used to ensure the reliable transfer of messages, and provides a high-level overview of some implementation details.

The WS-ReliableMessaging protocol facilitates the successful transmission of messages from a source to a destination, and ensures that error conditions are detectable. The protocol is transport-independent, allowing it to be implemented using different network technologies. Implementations of this protocol hide intermittent communication failures from the application, and may provide recoverability in case of system failures (recoverability is related to message persistence, which is discussed below).

2.1 The Mechanisms Used for Reliable Delivery

The main mechanisms used in the implementation are:

Sequences – The messages sent from the source to the destination are scoped using a sequence. Sequences are distinguished using a unique identifier (a URI) for each sequence. It is important to note that the term sequence does not imply any processing order.

Message Numbers – Every message sent in the context of a sequence has a unique identifier in the context of the sequence. This identifier is a monotonically increasing integer number, starting with 1 and increasing by exactly 1 for each message. In a typical implementation the numbering would be performed "under the covers" by the messaging infrastructure, and the applications sending the messages would not be aware of it. This numbering scheme makes it simple to detect missing or duplicate messages, and simplifies acknowledgement generation and processing.

Acknowledgements – An acknowledgement is an indication that a message was successfully transferred to the destination. Messages are acknowledged using acknowledgement ranges. For example, acknowledging 1 through 4 and 6 through 13 will indicate that messages number 1 through 13 were received, with the exception of message number 5. An acknowledgement does not necessarily indicate that the message was processed. Indication of successful processing, or the reporting of processing errors, requires a separate, higher-level protocol. This protocol will often be application-specific.

Message persistence (durability) considerations do not affect the wire protocol and are not addressed by it. As mentioned above (in the discussion about acknowledgements), WS-RM ensures transfer, not processing. Persistence requirements have to do with the storing of the message on the destination until it is processed, and are thus the responsibility of the implementation.

Since persistence is a common aspect of reliable systems, an implementation of WS-RM would typically provide it (at least as an option). If provided, a typical implementation would only acknowledge transfer after the transferred message was persistently buffered.

It is interesting to note that because persistence is not related to the wire protocol, applications can be programmed with the same simplified communication error-handling model regardless of the persistence capabilities of the system.

2.2 Examining the Message Exchange

To illustrate the messages being exchanged in a reliable communication we use an example that shows a possible message exchange between two reliable messaging endpoints. The diagram below provides a graphical illustration of the exchange, and the following text describes it.

Figure 1. Sample message exchange

Phase 1 – Establishing preconditions

Before the reliable sequence is initiated, the protocol preconditions are established. These preconditions include activities such as policy exchange, endpoint resolution, and establishing trust.

Policy exchange leverages the WS-Policy family of specifications to enable a destination endpoint to describe and advertise its capabilities and/or requirements, and to enable the source endpoint to communicate to the destination the selected characteristics that apply for a given endpoint.

Phase 2 – Sequence creation

The protocol defines two methods for creating the sequence identifier: source creation and destination creation. Destination creation enables the destination to be more efficient with reclaiming resources related to a sequence.

If destination creation is required, the source requests the creation of a sequence identifier with a CreateSequence message. The destination responds either with a CreateSequenceResponse containing the new identifier or a fault.

In this example, the source requests the creation of a new sequence, and the destination replies with a globally unique identifier.

Phase 3 - Communication

The source starts sending messages in the sequence, beginning with MessageNumber 1. In this example, the source sends 3 messages, but the 2nd message is lost during transmission. Since the 3rd message is the last message sent by the client application in this exchange, the source includes a LastMessage element in its headers. The source maintains state showing that the 3 messages were transmitted and are awaiting acknowledgement.

The destination acknowledges the receipt of message number 1 and 3 in response to the source's LastMessage element. It is interesting to note that in this example, the destination did not acknowledge the messages as they were received, but instead uses its acknowledgement strategy to optimize message traffic.

Upon receipt of the acknowledgement, the source updates its state to reflect that messages 1 and 3 were acknowledged, and retransmits the 2nd message.

Upon sending the second message, the source includes an AckRequested element so that the destination would expedite an acknowledgement. The destination receives the second transmission of MessageNumber 2 and acknowledges receipt of message numbers 1 through 3. It is important to note that acknowledgements always include the full range of messages received by the destination.

When this acknowledgement is received, the sender knows that all the messages were received successfully by the destination.

In this example, retransmission occurred because message 2 was lost. However, the same series of events would happen on the source side if the message was delayed for a sufficient amount of time. For example, the message might be held up in a forwarding router that experienced congestion, and the destination's acknowledgement strategy might generate an acknowledgement message indicating the receipt of messages 1 and 3.

Due to delay and retransmission, the destination might receive message 2 twice. The retransmitted message is a new message on the underlying transport, but since it has the same sequence identifier and message number, the destination can recognize it as equivalent to the earlier message and remove the duplicate.

In this example the messages arrived out of order due to the original loss of message 2. There are other cases where messages might be received out of order. For example, different messages may be sent through different paths. Reordering of messages by the transports may also create situations where messages are perceived to be delayed. Reordering "on the wire" may require the destination's reliable messaging infrastructure to buffer messages until they can be delivered to the application, ignore (drop) messages that arrive out of order, or some combination of these.

Phase 4 – Sequence termination

Explicit sequence termination is done if destination-side sequence creation was performed, to indicate to the destination that the sequence is complete. After successfully completing the transmission of all messages (as indicated by the acknowledgements received), the source sends a TerminateSequence message to the destination. Upon receipt of this message, the destination has no further obligations to the source and may reclaim resources associated with the sequence.

2.3 More on Message Exchange

Requesting Acknowledgement

In some cases, the source may request that the destination acknowledge message receipt quickly, instead of delaying to reduce acknowledgement message traffic. This is done using the AckRequested element. A destination that receives a message that contains an AckRequested element should expedite a SequenceAcknowledgement.

Sequence Faults

Sequence faults are reported through the SOAP fault mechanism. The spec describes the binding to SOAP 1.1 (using the WS-RM SequenceFault element) and SOAP 1.2. All faults are unrecoverable, and both the sender and receiver of the fault should abnormally terminate the sequence for which the fault was generated, indicating the fault to the application, as appropriate.

In addition to the sequence-creation time faults specified above, the WS-RM spec defines the following faults:

  • Sequence terminated - sent by either the source or the destination to indicate that the endpoint that generated the fault has either encountered an unrecoverable condition, or has detected a violation of the protocol, and as a consequence has chosen to terminate the sequence. It is important to note that this is distinguished from the TerminateSequence message, which is part of a successful, orderly shutdown. This fault is used by either party to indicate abnormal termination.
  • Unknown sequence - sent by either the source or the destination in response to a message containing an element with an unknown sequence identifier (such as a sequence or a sequence acknowledgement).
  • Invalid acknowledgement – sent by the source in response to an acknowledgement that contains invalid information. For example, an acknowledgement for messages that have not yet been sent.
  • Message number rollover - sent by the source to indicate that it has run out of message numbers for a sequence.
  • Last message number exceeded - sent by a destination to indicate that it has received a message that has a MessageNumber that exceeds the value of the MessageNumber element that accompanied a LastMessage element for that sequence.

3. Secure Reliable Communication

Reliability and security are complementary requirements of a messaging system. It is important that the features that meet these requirements work together as well as separately.

3.1 Securing Reliability

Reliable messaging creates a new attack point: a foe can attempt to tamper with the reliable messaging mechanism itself, so that messages are received and processed out of sequence, or so that gaps or duplicates are not detected. In order to prevent such an attack, the source should sign the Sequence header in each message, and the destination should verify that signature. This prevents an attacker such as a rogue intermediary from modifying the data in the Sequence header without that modification being detected.

Since the source and destination in a reliable messaging sequence participate in a long-lived relationship, it is possible to create a single security context, as described by the WS-SecureConversation specification [SecureConversation], and use it for the lifetime of that relationship, allowing security tokens to be exchanged once and then used for an extended period. This also provides a simple way for the source and destination to agree on the key that is used to sign the Sequence header, as described above.

3.2 Example

Two companies integrating their supply chains across the Internet might implement a secure, reliable interaction as follows:

  • Company A initiates a session with company B by sending a RequestSecurityToken message containing its X.509 certificate and requesting a security context token, as per the WS-SecureConversation and WS-Trust specifications [Trust].
  • Company B generates a random AES-256 symmetric key and constructs a security context token containing the key, as per WS-SecureConversation.
  • Company B sends a RequestSecurityTokenResponse message back to company A containing the security context token and company B's X.509 certificate.
  • Company A now initiates a reliable (exactly once, in order) message sequence with Company B.
  • For each message, the sender derives a pair of keys from the key in the security context token, as described in WS-SecureConversation.
    • One derived key is used to sign the Sequence header, other key headers such as those described in WS-Addressing, and the message body. The body of the message is encrypted using the second derived key.
    • Each message contains a security header that holds the signature and the encryption description, as well as a reference to the security context token and the descriptions of the derived keys, as per the WS-Security [WSSecurity] and WS-SecureConversation specifications.
    • The encryption of the message body keeps the contents of the message confidential as it travels over the Internet, while the signature on the body prevents an attacker from tampering with the content of the message.
    • Similarly, the signature on the Sequence header prevents an attacker from tampering with the sequencing of messages as they pass through the Internet.
    • A single signature is used to cryptographically bind together the key immutable aspects of the message.
  • Acknowledgement and responses messages are similarly protected by signing and encrypting using derived keys.

Using derived keys significantly reduces the risk of an attacker being able to break the cryptographic protection of the message.

4. Terminology

The diagram below illustrates the entities and events in a simple reliable message exchange. First, the Application Source Sends a message for reliable delivery. The Reliable Messaging (RM) Source accepts the message and Transmits it one or more times. After receiving the message the RM Destination Acknowledges it. Finally, the RM Destination delivers the message to the Application Destination.

Endpoint: A referencable entity, processor, or resource where Web service messages are originated or targeted.

Application Source: The endpoint that Sends a message.

Application Destination: The endpoint to which a message is Delivered.

Delivery Assurance: The guarantee that the messaging infrastructure provides on the delivery of a message.

RM Source: The endpoint that transmits the message.

RM Destination: The endpoint that receives the message.

Send: The act of submitting a message to the RM Source for reliable delivery. The reliability guarantee begins at this point.

Deliver: The act of transferring a message from the RM Destination to the Application Destination. The reliability guarantee is fulfilled at this point.

Transmit: The act of writing a message to a network connection.

Receive: The act of reading a message from a network connection.

Acknowledgement: The communication from the RM Destination to the RM Source indicating the successful receipt of a message.

5. Contributors

The Authors wish to thank the following people for their contributions to this article:

Don Box, Omri Gazitt, John Shewchuk.

We also wish to thank the authors of the following documents:

Reliable Message Delivery in a Web Services World
Secure, Reliable, Transacted Web Services: Architecture and Composition

Finally, we thank the attendees of the feedback and interop workshops who have made important contributions to the WS-RM specification.

6. References