How to optimize Message Copy using CreateBufferedCopy?

Problem Statement

Some broker implementations require creating a copy the message forwarding it over to the backend. The broker also might slightly modify things like addressing headers etc. on the message for proper message routing within the DMZ. The problem is that we see a very high CPU cost in creating this copy message and this also results in lower throughput.  Note: Streamed transfer mode is not in scope for this article.

Simulation

For all performance issues we need to measure and profile and to investigate this issue we initially try to simulate the pattern of the broker by just copying over the message and then forwarding it to a backend dummy service. We then take profiles of this to understand how much the actual cost of copying is.

Simple Copy
Calls % Incl % Excl Function

21,530

8.5

0.1

System.ServiceModel.Channels.BufferedMessageBuffer::CreateMessage • System.ServiceModel.Channels.Message()

21,530

0.29

0.03

System.ServiceModel.Channels.Message::CreateBufferedCopy • System.ServiceModel.Channels.MessageBuffer(int32)

  An 8% cost for copying seems to be acceptable considering the value of making the copy and able to do other things if required. But then again this was not what is being observed. In the profiles from the actual broker we notice about 40% cost for creating a copy. This means that almost half the time is spent in creating a message copy. So effectively your throughput would almost drop to half when the broker is configured to create a copy of the message. This is excluding costs like logging etc. Evidently our simulation is not accurate so we need to isolate this further. We take in more functionality from the broker so that we hit this expensive path. One of the key observations was that the message is copied just before it is being forwarded. This also means that there are a bunch of manipulations that was done on the message and in our simulation we didn't perform any manipulation. So to get this closer we need to probably change some things on the message. To keep it simple we did something like removing some header and adding another header to the message since most brokers modify headers before forwarding it over.

 

int headerIndex = input.Headers.FindHeader(header.Name, header.Namespace);

if (headerIndex >= 0)

{

    input.Headers.RemoveAt(headerIndex);

}

input.Headers.Add(header);

  Eureka!! We observed our throughput went down and this was in line with what we were seeing in our broker. So we can see that CreateMessage and CreateBufferedCopy have increased in cost quite a bit.

With Single Header update
Calls % Incl % Excl Function

12,782

14.83

0.08

System.ServiceModel.Channels.DefaultMessageBuffer::CreateMessage • System.ServiceModel.Channels.Message()

12,782

27.69

0.02

System.ServiceModel.Channels.Message::CreateBufferedCopy • System.ServiceModel.Channels.MessageBuffer(int32)

  So this was performance data we collected.

Scenario

Copy and Forward

Copy forwarding with new header

CPU Utilization Throughput

98.6 %

11317.6545777148

98.7 %

6854.97017102

  Now that we have identified the root cause we also need to identify the solution so that the broker can achieve the functionality without taking up so much CPU.

Solution

The solution is actually quite simple "Modify your message after you create the buffered copy". I wantedgive the solution before the analysis since most of you would probably not be interested in the analysis but if you are then the rest would be interesting.

Analysis

The most common way to create a copy your message is using Message.CreateBufferedCopy(int).

  1. The default case of creating a buffered copy creates a BufferedMessage from an underlying BufferedMessageData
  2. This is optimal in the following cases
    1. Message headers are not modified (no update in buffered header values)
    2. BufferedMessage headers haven't been captured (this happens for e.g.  when the user inserts a header in the first location)

  If headers have been modified then CreateBufferedMessage takes an alternative path using the DefaultMessageBuffer. The reason is that a fully copy of the message has to be created if any buffered header has been modified. An internal property called headers.ContainsOnlyBufferedMessageHeadersis used to distinguish if the faster BufferedMessageBuffer can be used to create the buffered copy or not. If there are any modified headers then this means we need to assure that the message is fully marshaled over and the buffer itself cannot be copied(e.g. the user can add a reference type to the header) and so we fall back to a path that would fully reparse the message and create a fully deserialized copy of the modified message. The main point here is a copy should always be a deep copy and any kind of modification should not result in a message with shallow copied message parts. When you copy and create a message from the original then your message objects get its own copy of headers that it can play around with without affecting the original incoming message. Message copy by itself is a fast operation as you can see from the above profile and copying a modified message can be very CPU intensive when using buffered transfer mode.