Short paper on SQL Server 2005 transactional replication throughput

The attached is a short paper by Amy Keenan of Microsoft. Amy did some experiments to study the potential bottleneck of transactional replication. This paper is based on the results of such experiments.

SQL Server Transactional Replication Throughput Study

Amy Keenan

Microsoft Corporation

amykeen[at]microsoft.com

May 2006

Background

 

Over the years, we have heard about transactional replication performance issues from customers. Although most performance issues are caused by non-optimal designs of the application, it is also interesting to observe the performance behavior of transactional replication when stress on the system is high. In response to industry demands, SQL Server 2005 provides improvements in the performance and scalability of transactional replication, including: reducing blocking and contention between the Distribution Agent and cleanup jobs; and the option for the Distribution Agent to use multiple parallel streams when applying transactions to the Subscriber (the -SubscriptionStreams parameter of the Distribution Agent).

 

The goal of this study is to identify possible bottlenecks in the throughput of data in SQL Server transactional replication. We conducted tests to measure throughput under different configurations. We found that with an adequate disk subsystem, transactional replication can sustain high throughput with low latency (latency is the time between when a change is made at a Publisher and when that change is applied at a Subscriber).

 

 

Schema and Tests

 

We performed a benchmark test to measure the throughput of transactional replication using a single push subscription.

 

The initial tests on a slower disk subsystem revealed that we would reach a bottleneck on the hardware before reaching a bottleneck in the replication agents. Moving to a storage area network (SAN) allowed for the elimination of the hardware bottleneck, and follow-up tests using the new multi-streaming Distribution Agent resulted in performance improvement measured up to 20% for the given schema and number of streams, resulting in a reduction in latency to less than 5 seconds by the end of the test.

 

The test system had the following characteristics:

 

· The schema used for this test was based on a real-world customer scenario and includes 59 tables with row sizes up to 2 KB, all published in a single publication.

· The SQL Server instances used as the Publisher, Distributor, and Subscriber were the original released version of SQL Server 2005, build 9.00.1399.06.

· The Publisher, Distributor and Subscriber were on separate servers with similar, though not identical hardware. Table 1 lists the hardware configuration of the computers used in the study.

 

Table 1: Hardware configuration of the computers used in the study

CPU Speed

RAM

Physical Processors

Publisher

2.0 GHz

8 GB

4

Subscriber

2.0 GHz

8 GB

4

Distributor

2.8 GHz

32 GB

4

 

There were 40 clients continuously updating the data in each of the published tables, with each transaction varying in size from 10 to 100 commands, and a workload consisting of 70% inserts, 20% updates, and 10% deletes.

 

The Log Reader Agent and Distribution Agent ran continuously while the transactions were applied in the publication database. We used Windows System Monitor to gather statistics and Replication Monitor to aggregate the results.

 

Test Results

The primary performance indicators measured for this test were:

 

  • Throughput of the Log Reader Agent (as measured by the number of commands per second processed)
  • Throughput of the Distribution Agent (as measured by the number of commands per second processed)
  • Log Reader Agent latency (the number of seconds between the time that a command is applied at the Publisher, and the corresponding record is inserted into the distribution database.)
  • Distribution Agent latency (the number of seconds between the time that a record is inserted into the distribution database, and the corresponding command is applied at the Subscriber.)

 

Note that commands per second here refers to the number of SQL statements processed per second.

 

We ran the test in three iterations, each lasting 24 hours:

 

1) The servers were using local storage (SCSI disks).

2) The servers were attached to a SAN, with a single stream for the Distribution Agent.

3) The servers were still on a SAN, with multiple streams for the Distribution Agent.

 

The SAN is a Xiotech Magnitude 3D SAN with 84 fiber-connected drives, with 100 GB Data LUNs on 10 RAID 1+0 drives, and 40 GB LUNs on 6 RAID 1+0 drives. Each controller had 2 GB cache, 1 GB used and 1 GB mirrored to the other controller. Write caching was on for all database LUNs.

 

In the first iteration, we bottlenecked on the disk before encountering any bottleneck in the replication components, as indicated by the disk queue lengths (Table 3).

 

Table 2: Maximum and average throughput (in commands per second) of replication agents with servers using local storage

Max

Avg

Log Reader

14,612

5,961

Distribution

14,173

5,488

Table 3: Maximum and average disk queue length on systems using local storage

Max

Avg

Publisher

15

1

Distributor

301

61

Subscriber

282

66

 

 

In the second iteration, the SAN eliminated the disk bottleneck. However, within the first three hours of the test, we began to see increased latency in the Distribution Agent. The maximum latency for the Distribution Agent was 722 seconds, with an average of 591 seconds. The Log Reader Agent, with a maximum latency of 120 seconds and an average latency of 1 second, was able to keep up with the transaction load.

Table 4: Maximum and average throughput (in commands per second) of replication agents with servers using a SAN

Max

Avg

Log Reader

24,443

17,781

Distribution

18,508

15,113

Table 5: Maximum and average disk queue length on systems using a SAN

Max

Avg

Distributor

1

1

Publisher

2

1

Subscriber

6

1

From this second iteration, we observed that the Distribution Agent throughput now became the bottleneck.

In the third iteration, we took advantage of the multi-streaming Distribution Agent introduced in SQL Server 2005. Going from one to three streams produced around 20% improvement in the Distribution Agent throughput. During the 24 hours for the third iteration, the average latency for both of the agents was under 1 second, with the maximum latency being 2 seconds for the Log Reader Agent and 4 seconds for the Distribution Agent. Both the Log Reader Agent and the Distribution Agent were therefore able to keep up with the workload without incurring latency.

Table 6: Maximum and average throughput (in commands per second) of replication agents with servers using a SAN, and the Distribution Agent using three parallel streams

Max

Avg

Log Reader

25,033

18,171

Distribution

24,921

18,136

Table 7: Maximum and average disk queue length on systems using SAN, and the Distribution Agent using three parallel streams

Max

Avg

Distributor

1

1

Publisher

3

1

Subscriber

4

1

 

Summary

In summary, the introduction of the multi-streaming Distribution Agent architecture shows great promise in improving the overall throughput of transactional replication. The improvement in overall throughput and a latency decrease combine to address scalability issues previously faced by customers wanting to replicate in high volume scenarios.