How to Boost Performance of the ESB Routing and Transform Services – Part 3
Introduction
In the first article of the series, we discussed the advantages and disadvantages of using Dynamic Send Ports in a BizTalk application and in particular within a solution that makes an extensive use of the ESB Toolkit. We also talked about how to create custom services to boost the performance of message Routing and Transformation functionality of an ESBT solution using, respectively, Static Send Ports in place of Dynamic Send Ports and the capabilities provided by the XslCompiledTransform class.
In the second part of the series, we described the architecture and code of the use cases utilized to test the extensions introduced in the first article and to compare their performance with the scalability and latency provided by the Routing and Transformation service components provided out of the box by the ESB Toolkit.
In the last article of the series, we'll briefly introduce the code of the Test project used to generate load against the use cases and we'll analyze results with tools like Visual Studio Profiler and SQL Server Profiler. The primary objective of the present article is to provide evidence of the performance enhancements introduced by the new custom components, but also to demonstrate how exploiting the aforementioned tools to discover and remove potential bottlenecks in your solution that can seriously undermine performance.
Performance Tests
The next step has been to create Unit Tests and Load Tests to generate traffic against each of the 8 use cases described in the second part of the article. Once completing this task, I ran performance tests on my laptop (a Lenovo W500, dual-core CPU, 4 GB RAM): not really an adequate rig for this kind of activity, but even running tests on my laptop, I was able to collect significant results that I reported in this section.
Test Project
To accomplish performance testing, I followed the approach that I explained in detail here in one of my previous posts, and I started creating a Visual Studio Test Project (indeed, I just reused my previous project). Then I created a UnitTest class and I defined 2 methods called InvokeOneWayWCFReceiveLocation and InvokeTwoWayWCFReceiveLocation that can be used to synchronously invoke respectively One-Way and Two-Way WCF Receive Locations. The next step has been creating 8 Test Methods, one for each Test Case and finally 8 Load Tests, one for each Unit Test (see the figure below).
Each Load Test had been configured to send 5000 messages to the downstream WCF Receive Location using a Constant Load Pattern and 20 concurrent Worker Threads or Constant User Count. For each Load Test, I created a custom Counter Set called BizTalk (see the figure below) to collect the value of the most relevant performance counters during performance testing. In particular, the Counter Set includes the following performance counters:
- Inbound Latency (sec): measures the average latency in milliseconds from when the Messaging Engine receives a document from the adapter until the time it is published to the Message Box.
- Outbound Adapter Latency (sec): measures the average latency in milliseconds from when the adapter gets a document from the Messaging Engine until the time it is sent by the adapter.
- Outbound Latency (sec): the average latency in milliseconds from when the Messaging Engine receives a document from the Message Box until the time document is sent by the adapter.
- Request-Response Latency (sec): measures the average latency in milliseconds from when the Messaging Engine receives a request document from the adapter until the time a response document is given back to the adapter.
For your convenience, I copied below the code of the UnitTest class and the corresponding configuration file that contains the WCF client endpoints used by the Unit and Load Tests to invoke the WCF Receive Locations exposed by the ESB.Itinerary.Services application.
UnitTest Class
|
App.Config File
|
Scenario 1
The screenshots below were taken at the end of the most significant test runs and grouped by use case.
The following 3 tables report the results highlighted in red in the screens above:
Overall Results
Test Case |
Duration (sec) |
Throughput: Test/Sec |
Latency: Average Test Time (sec) |
% Processor Time |
OOBRoutingOOBTransform |
1069 |
4.68 |
4.25 |
77.9 |
OOBRoutingCustomTransform |
1049 |
4.76 |
4.16 |
68.7 |
CustomRoutingOOBTransform |
542 |
9.22 |
2.13 |
52.6 |
CustomRoutingCustomTransform |
409 |
12.2 |
1.60 |
40.5 |
Receive Location Performance Counters (BizTalkServerReceiveHost64)
Test Case |
Document processed/Sec |
Request-Response Latency (sec) |
OOBRoutingOOBTransform |
4.68 |
2.10 |
OOBRoutingCustomTransform |
4.78 |
2.06 |
CustomRoutingOOBTransform |
9.26 |
1.03 |
CustomRoutingCustomTransform |
12.3 |
0.76 |
Send Port Performance Counters (BizTalkServerSendHost64)
Test Case |
Document processed/Sec |
Outbound Latency (sec) |
Outbound Adapter Latency (sec) |
Inbound Latency (sec) |
OOBRoutingOOBTransform |
4.68 |
1.48 |
1.48 |
0.077 |
OOBRoutingCustomTransform |
4.78 |
1,56 |
1.55 |
0.027 |
CustomRoutingOOBTransform |
9.26 |
0.14 |
0.12 |
0.17 |
CustomRoutingCustomTransform |
12.3 |
0.060 |
0.038 |
0.059 |
The following findings emerge from the analysis of the figures above:
-
When using the default Routing Service provided out-of-the-box by the ESB Toolkit along with a Dynamic Send Port, the overall CPU utilization and the time spent to invoke the CustomersByCountry stored procedure is significantly higher than when using the custom Routing Service with a Static Send Port. Moreover, the time spent by the ESB pipelines is negligible compared to the time spent by the WCF Adapter to invoke the stored procedure and retrieve data. This statement becomes apparent when examining the value of the Request-Response Latency (sec) performance counter exposed by the BizTalkServerReceiveHost64 process which hosts the WCF Receive Location and the value of the Outbound Latency (sec) and Outbound Adapter Latency (sec) performance counters exposed by the BizTalkServerSendHost64 process that hosts the WCF Send Port.
-
When using a WCF-Custom Adapter + SqlBinding Static Send Port to invoke the stored procedure, the overall throughput increases by 200-300% compared to when using a Dynamic Send Port. This statement becomes apparent when examining the value of the Documents processed/Sec performance counters exposed by both the BizTalkServerReceiveHost64 and BizTalkServerSendHost64 processes.
-
The cost in terms of latency and throughput for transforming the response message decreases significantly when using the custom TransformService class in place of the original Transform service provided by the ESB Toolkit. This statement becomes evident when analyzing the value of the Inbound Latency (sec) performance counter exposed by the BizTalkServerSendHost64 process that hosts Send Ports.
-
The CPU utilization has been almost halved passing from the OOBRoutingOOBTransform to the CustomRoutingCustomTransform test case.
SQL Server Profiler
When I saw the results of performance tests, I decided to discover the reason of such a huge difference in performance between the worst (OOBRoutingOOBTransform) and the best case (CustomRoutingCustomTransform) . In particular, I wanted to verify the statements and findings contained in the "Performance Tips for the WCF SQL Adapter for BizTalk Server" article by Thomas F. Abraham's Blog. Therefore, I decided to use the SQL Server Profiler to trace the calls made by the WCF-SQL Adapter against the Northwind database when processing a request message. I created a new trace starting from the Tuning template and I specified the DatabaseID of the Northwind in the Filter to narrow down the scope.
Running the OOBRoutingOOBTransform and CustomRoutingCustomTransform load tests, I discovered the following:
- When running within a Static Send Port, the WCF-Custom Adapter + SqlBinding executes just the stored procedure call, as expected:
|
- Conversely, when running within a Dynamic Send Port, the same Adapter executes the following 3 commands for each incoming message:
|
The next step was using the standalone version of the Visual Studio Profiler to attach the BizTalkServerSendHost64 process and profile the run-time behavior of the WCF-Custom Adapter when running within a Dynamic or a Static Send Port. There are many excellent posts that do a great job explaining how to profile a managed application or service using the Visual Studio Profiler, so I will simply report and comment results here. In order to collect runtime data, I attached the BizTalkServerSendHost64 process while running a Load Test against the OOBRoutingOOBTransform and CustomRoutingCustomTransform Test Cases then I analyzed the report produced by Visual Studio Profiler at the end of each test session.
Visual Studio Profiler
The following figure shows the Hot Paths within the Call Tree View (indeed the picture is quite small therefore not very readable, sorry about that) that is from the report generated by Visual Studio Profiler.
As you can easily note examining the image above, the process spent 74.81% of time within the InitializeValues method exposed by the WcfClient class contained in the Microsoft.BizTalk.Adapter.Wcf.Runtime assembly. Now, as the name suggests, this class is used by the Send Handler of the WCF Adapter runtime to communicate with the target WCF service. For your convenience, I used .NET Reflector to disassemble the code of the InitializeValues method:
InitializeValues Method
|
A code analysis shows that the InitializeValues method accomplishes the following actions:
- Creates the endpoint address.
- Creates the binding that provides the recipe to build the channel stack. This operation is particularly heavy as the calls to the CreateUserBinding method exposed by the BindingFactory class spend over 70% of overall time.
- Uses the BtsActionMapping XML snippet (if any) defined within the WCF Send Port to retrieve the Action based on the actual operation.
- Initializes transactional and SSO behavior.
- Creates the channel factory.
As a term of comparison, I profiled the BizTalkServerSendHost64 process while running the OOBRoutingCustomTransform Load Test which uses a WCF Static Send Port instead of a Dynamic Send Port. Once again, the following figure shows the Hot Paths within the Call Tree View that from the report generated by Visual Studio Profiler.
Call Tree View
As you can note, in this case most of the time was spent by the BizTalk Message Agent to access the BizTalkMsgBoxDb, while 21% of time was spent by the Messaging Runtime to execute the send and receive pipelines used by the Two-Way WCF Static Send Port. In other words, this time the WCF Adapter initialized the Binding and ChannelFactory just when it processed the first message, and then it cached and reused these objects when handling the subsequent messages.
Scenario 2
The screenshots below were taken at the end of the most significant test runs and grouped by use case.
The following 3 tables report the results highlighted in red in the screens above:
Overall Results
Test Case |
Duration (sec) |
Throughput: Test/Sec |
Latency: Average Test Time (sec) |
% Processor Time |
OOBRoutingNoTransform |
525 |
9.51 |
2.07 |
70.7 |
CustomRoutingNoTransform |
402 |
12.4 |
1.56 |
44.2 |
Receive Location Performance Counters (BizTalkServerReceiveHost64)
Test Case |
Document processed/Sec |
Request-Response Latency (sec) |
OOBRoutingNoTransform |
9.53 |
0.99 |
CustomRoutingNoTransform |
12.5 |
0.75 |
Send Port Performance Counters (BizTalkServerSendHost64)
Test Case |
Document processed/Sec |
Outbound Latency (sec) |
Outbound Adapter Latency (sec) |
Inbound Latency (sec) |
OOBRoutingNoTransform |
9.53 |
0.36 |
0.34 |
0.044 |
CustomRoutingNoTransform |
12.5 |
0.070 |
0.047 |
0.048 |
The following findings emerge from the analysis of the figures above:
-
When using the default **Routing Service** provided out-of-the-box by the ESB Toolkit along with a Dynamic Send Port, the overall CPU utilization and the time spent to invoke thedownstream WCF **CalculatorService** via a **NetTcpBinding** endpoint is 30% higher than when using the custom **Routing Service** with a Static Send Port. This statement becomes apparent when examining the value of the **Request-Response Latency (sec)** performance counter exposed by the **BizTalkServerReceiveHost64** process that hosts the WCF Receive Location and the value of the **Outbound Latency (sec)** and **Outbound Adapter Latency (sec)** performance counters exposed by the **BizTalkServerSendHost64** process that hosts the WCF Send Port.
When using a WCF-Custom Adapter + NetTcpBinding Static Send Port to invoke the WCF CalculatorService, the overall throughput increases by 30% compared to when using a Dynamic Send Port. This statement becomes apparent when examining the value of the Documents processed/Sec performance counters exposed by both the BizTalkServerReceiveHost64 and BizTalkServerSendHost64 processes.
The CPU utilization has been almost halved passing from the OOBRoutingOOBTransform to the CustomRoutingCustomTransform test case.
Regardless if transactions are enabled and used when invoking an underlying system (SQL database or WCF service) using a WCF-Custom Dynamic Send Port and regardless the binding being used (SqlBinding, NetTcpBinding, etc.), it's extremely important that you explicitly set the EnableTransaction property to false or true, otherwise the WCF-Custom Adapter will try to determine (checking whether the actual binding includes the TransactionFlowElement binding element) at runtime whether a System.Transactions TransactionScope was required when sending a message over a channel. In our case, the EnableTransaction was not explicitly configured because, by default, when using a Static Resolver along with the WCF-Custom Adapter within an ESB Itinerary, this property is not set by default by the Itinerary Designer. For more information on the impact of the EnableTransaction property on the performance of WCF Dynamic Send Ports, you can read Mustansir’s post. In any case, I run a test to measure the impact of defining or not defining the EnableTransaction property.
OOBRoutingNoTransform without EnableTransaction specified: 9.53 messages/sec as reported above.
OOBRoutingNoTransform with EnableTransaction explicitly set to false: 12.00 messages/sec
Visual Studio Profiler
Running Visual Studio Profiler against the BizTalkServerSendHost64 process while performing the Load Test against the OOBRoutingNoTransform Test Case, I verified that when running within a Dynamic Send Port, the WCF-Custom Adapter spends most of the time building within the InitializeValues method exposed by the BindingFactory class. This assumption is fully confirmed by the by the following Call Tree View generated by Visual Studio Profiler.
Call Tree View
Scenario 3
The screenshots below were taken at the end of the most significant test runs and grouped by use case.
SimpleRoutingSlipCustomTransform
The following 3 tables report the results highlighted in red in the screens above:
Overall Results
Test Case |
Duration (sec) |
Throughput: Test/Sec |
Latency: Average Test Time (sec) |
% Processor Time |
SimpleRoutingSlipOOBTransform |
228 |
21.9 |
0.87 |
98.5 |
SimpleRoutingSlipCustomTransform |
103 |
48.4 |
0.37 |
92 |
Receive Location Performance Counters (BizTalkServerReceiveHost64)
Test Case |
Document processed/Sec |
Inbound Latency (sec) |
SimpleRoutingSlipOOBTransform |
22.1 |
0.34 |
SimpleRoutingSlipCustomTransform |
49.7 |
0.15 |
Send Port Performance Counters (BizTalkServerSendHost64)
Test Case |
Document processed/Sec |
Outbound Latency (sec) |
Outbound Adapter Latency (sec) |
Inbound Latency (sec) |
SimpleRoutingSlipOOBTransform |
22.0 |
0.12 |
0.12 |
- |
SimpleRoutingSlipCustomTransform |
49.5 |
00.079 |
0.074 |
- |
The following findings emerge from the analysis of the figures above:
-
When using the default Transform Service provided out-of-the-box by the ESB Toolkit, the time spent to invoke thetransformation map is more than double compared to the time necessary to run the same map using the custom Transform Service that internally makes use of the XslCompiledTransform class. This statement becomes apparent when examining the value of the Inbound Latency (sec) performance counter exposed by the BizTalkServerReceiveHost64 process.
-
When using the default Transform Service provided out-of-the-box by the ESB Toolkit, the overall throughput is less than half spent compared to throughput achieved using the custom Transform Service. This statement becomes evident when looking at the value of the Document processed/Sec performance counter exposed by the BizTalkServerReceiveHost64 process.
Lesson Learned The custom TransformService can significantly reduce the latency due to message mapping, reduce the CPU utilization and increase the overall throughput. |
Visual Studio Profiler
Once again, to individuate the component and even better, the culprit line of code that consumes most of the time when running the default TransformService, I used Visual Studio Profiler to attach and trace down the runtime behavior of the BizTalkServerReceiveHost64 process while running the a Load Test against the SimpleRoutingSlipOOBTransform Test Case.
As you can easily see 84% of time is spent by the ESB Dispatcher component for invoking the standard Transform Service and in particular 82% of time is spent by the XslTransform object used to transform the inbound message.
Instead, when using the custom Transform Service which compiles, caches and applies the map exploiting the capabilities offered by the XslCompiledTransform, the percentage of time spent by the Messaging Runtime to run the ESB Dispatcher is 32.43%, as highlighted in the following graph.
Conclusions
If you got this far, then you are truly a hero and certainly deserve an award. Therefore, here you can download the code of the custom Routing and Transform services and the code of Test Cases, Unit Tests and Load Tests. As always, I encourage you to try out my components within your applications and send me your feedbacks. Thanks!
[TW3]Shouldn’t this be SimpleRoutingSlepCustomTransform
Comments
Anonymous
October 13, 2010
Hi Paolo, great series of articles - very informative. Does the transform service handle multipart messages at all - I see there is code for it in the XslCompiledTransformHelper class but it doesnt seem to be wired up at all from the calling service?Anonymous
October 13, 2010
Good observation Grant. The underlying XslCompiledTransformHelper directly supports multi-source maps via the CompositeStream, as I described in another post (blogs.msdn.com/.../how-to-boost-message-transformations-using-the-xslcompiledtransform-class-extended.aspx). So in theory, the parts you supply to the XslCompiledTransformHelper as Stream[] parameter, could be parts of the same message. Actually, the Transform Service doesn't implement this pattern, but the code can be easily customized to capture this scenario. It's sufficient to create and expose a new boolean property called MultiPart. You could use this property to specify on an itinerary that want to apply a multi-source map to the parts of the incoming message and change the ExecuteTransform method exposed by the XslCompiledTransformHelper to handle to invoke the right version of the XslCompiledTransformHelper.Transform method. Let me know if I answered your question.Anonymous
October 16, 2010
Thanks for getting back to me Paolo, makes perfect sense. regards GrantAnonymous
August 17, 2015
Hi Paolo, do you think these results would have changed between BizTalk 2010 + ESB 2.1 and BizTalk 2013 R2 + ESB 2.2 ? I assume transform performance has improved, and dynamic port performance also. May have to download your test cases and rerun to discover =)Anonymous
August 19, 2015
Ah ah ah... Sorry Nikolai, I think you should rerun tests by yourself,I didn't touch the code since 2010! :) Ciao Paolo