SQL Server 2005 - RDTSC Truths and Myths Discussed
I posted some updates to my blog entry about the RDTSC and drift warnings in May. See the following link https://blogs.msdn.com/psssql/archive/2006/11/27/sql-server-2005-sp2-will-introduce-new-messages-to-the-error-log-related-to-timing-activities.aspx Because the CPU drift issue has continued to generate questions let me try to answer these directly.
Myth: Performance Problems
The statement I have seen frequently is that the drift warning equates to performance problems. This is not the case and extensive testing by the SQLOS team has shown that the even under forced RDTSC variations the Microsoft SQL Server database scheduler is not affected. The performance characteristics of the SQL Server are not affected by RDTSC drift, only the performance timing outputs.
Truth: RDTSCTest.exe
RDTSC test is a utility developed by the SQL Server Escalation Team to help us determine computers that may have inaccurate performance data results. The utility runs with the standard SQLDiag/PSSDiag collection that Microsoft SQL Server support collects when troubleshooting performance cases.
Microsoft SQL Server support has encountered a small subset of issues were the RDTSC drift lead to incorrect performance tuning. Support used at the inaccurate query durations and tuned the query with the longest duration. Only later to find that the durational calculation was incorrect and tuning the query did not fully help the customer. To avoid this we examine the RDTSCTest output as a guideline for how accurate the data should be.
The RDTSCTest can easily show drift. How much drift; is the common question so let me try to answer the question as best as I can.
There is no exposed way to snapshot the RDTSC values for all CPUs on the computer at the same time. To do this the utility has to snapshot the current RDTSC. Then affinitize the thread to another CPU and snapshot the RDTSC. It continues this movement across all CPUs. Then it sorts the returned values and prints out the results. The problem with this is that every time that you move to a new CPU you generally have to take a ride through the NT scheduler. On many computers this can cause an 8 to 12ms context switch of the thread. So if you have a single socket, dual core system drift of 12 to 24ms is easy to encounter. If you take the calculation to a 64 core system you are at 768ms and that only accounts for context switching.
Generally the Microsoft SQL Server support team considers drift less than several seconds, noise. Again, it is only a guideline for administrators and support to be cautious when looking at the durational values from various performance outputs. (These areas are outlined in KB931279 - https://support.microsoft.com/kb/931279/en-us)
We also look for sustained drift patterns. For example, if the utility records a drift delta of 3 seconds for CPU #5 on the computer will it continue to show this when it is executed on successive runs and at the same delta rates. If it does not it is likely a noise situation.
There are also conditions on NUMA systems that the CPUs within a node may remain in sync but doing cross node, RDTSC comparisons is not valid. Make sure that you carefully examine the NUMA configuration information when you analyze the output also.
Myth: /usepmtimer
I have seen several e-mail messages and posts that the administrator should add the /usepmtimer to the boot switches. Use of the /usepmtimer does not affect the RDTSC instruction. Instead it modifies the source the Windows QueryPerformanceCounter data. This has NO APPLICATION for the SQL Server RDTSC issues. Microsoft SQL Server makes the direct read call to retrieve the RDTSC timer.
Truth: Manufacture Updates
KB931279 currently mentions an AMD update to synchronize the RDTSC timers of all CPUs. AMD has indicated that dual-core optimizer is not targeted at server applications.
The sentiment of the statement still stands. If you are receiving drift warnings it is a good practice to make sure that you have the latest firmware, micro-code and BIOS updates.
Truth: Power Now, Speed Step, …
When doing performance tuning of Microsoft SQL Server queries it is a good practice to disable the CPU stepping features. This may require BIOS or similar settings and a restart to accommodate the configuration. This is best way to stabilize RDTSC timings. This enables you to retrieve accurate and meaningful performance data from the Microsoft SQL Server.
Truth and Myth: Affinity Mask
The Microsoft SQL Server affinity mask can reduce the probability of encountering inaccurate performance data but does not prevent it. By setting the affinity mask the SQL Server worker threads stay on the same CPU so that the start and stop RDTSC values are from the same CPU and do not become polluted by thread CPU switching activities.
Consider the case in which stepping is still enabled. The CPU could change frequencies, one or more times between the start and end of the query. The duration is based on the common frequency of all CPUs. When dividing the elapsed duration by a constant frequency it can result in incorrect duration.
Truth: I/O Stall Warnings
SQL Server 2000 SP4 added the I/O stall detection. I have outlined this in detail in my whitepapers and KB articles. See https://support.microsoft.com/kb/897284/en-us for more information. Microsoft SQL Server 2005 updated Stalled I/O tracking to use the RDTSC counters also. When an I/O is posted the starting RDTSC is captured. When the Lazy Writer processes executes it checks the status of the I/O and if still pending and more than 15 seconds has elapsed the warning is recorded. Because the RDTSC counter is in play and the Lazy Writer can execute on a different CPU the RDTSC values may diverge. In an extreme case of drift 15+ seconds the warning could be falsely produced.
Trace Flag 8033
Trace flag –T8033 can be used to suppress the drift warnings. Do not enable this trace flag on Microsoft SQL Server 2005 unless you fully understand the ramifications of ignoring the drift warnings.
Bob Dorr - Microsoft Senior SQL Server Escalation Engineer
Comments
Anonymous
November 12, 2007
PingBack from http://delta.wpbloggers.com/delta/?p=144Anonymous
February 01, 2008
Author: Robert Dorr, Keith Elmore, Lindsey Allen Introduction Have you ever asked the question: “IfAnonymous
March 24, 2008
We have a situation where we are being told by our storage team that the IO stall warnings are due to CPU drift. However, we are not seeing any warnings in the SQL log related to CPU drift itself. Is it feasible that CPU drift could cause stall warnings to be falsely reported but the CPU drift itself would not be reported? Thanks in advance for any insight and for this great reference article. Regards, Robert Davis. [RDORR] Robert, the CPU drift can cause the stalled warnings but I have never seen the warning without the SQL Server error log warning. To cause the I/O warning the drift would have to be over 15 seconds and I simply have yet to see a case like this. If you are getting stall warnings start looking at PerfMon avg disk sec/transfer during these windows.Anonymous
June 19, 2008
What does the “I/O request” error below represent? 2008-04-21 13:26:42.480 spid364 Microsoft SQL ServerAnonymous
December 16, 2008
Many of you have encountered the RDTSC timing variances that I outlined in an earlier blog post:  Anonymous
December 17, 2008
I recently posted about the availability of SQL Server 2005 SP3. I have a bit more information aboutAnonymous
May 29, 2009
A series of questions related to start time, duration, end time, T-SQL waitfor delay command and othersAnonymous
May 17, 2017
I tend not to create a lot of responses, but after browsing a great deal of comments here SQL Server 2005 – RDTSC Truths and Myths Discussed – CSS SQL Server Engineers. I do have 2 questions for you if you don't mind. Could it be simply me or does it look like like some of these comments come across like they are left by brain dead visitors?:-P And, if you are posting at other places, I'd like to keep up with anything new you have to post.Would you post a list of all of all your social community pages like your linkedin profile, Facebook page or twitter feed?