Share via


Sql Tools I use in the UK ADC Scalability Lab

As part of the ADC offering, we have a full scalability lab, which allows us to work with our customers by lifting their application into Thames Valley Park, and stress it – either for the purposes of performance tuning, or just seeing where it breaks. Often it could just be PoC work for an architecture or a new Piece of technology. As part of that, we invariably end up running some for of Sql Diagnostics to establish what more can be done, if Sql should be the bottleneck. It may surprise you that the tools we use (even with Sql 2008) are very basic, but continue to allow us to home in very quickly on the next actions. For your, and my convenience, I’ve documented my two favourite here – sp_blocker_pss80 and RML Utilities.

Blocking and WaitStats

My preferred approach is to use sp_blocker_pss80, as detailed here: https://support.microsoft.com/kb/271509. The reasoning is very simple – to date this is the only tool that tells you who is at the top of the blocking tree, and what their waitstats are. Because the util records anything with a non zero waittype, you get anything that is also running slowly due to other hardware or configuration issues. Interestingly, not many people use this, but in the lab, it very quickly picks up common blocking code. Once you have the waittype – simply search on the waittype name and you’ll invariably end up at an article that tells you what the wait type is, and what to do about it. You could use the blocked process report, but this doesn’t record the waittypes for the spid.

I’d suggest changing the waitfor delay to 5 seconds, rather than 15 seconds as quoted in this article – this will also increase the chance of catching something that is deadlocking before the deadlock monitor kills it. Because the output can be very lengthy, I wrote a very quick and dirty file splitter utility which you’re welcome to download from here – this saves opening up large text files in notepad, and means you can quickly compare the end set of waiting spids to the beginning.

RML Utilities

Irrespective of whether I’m looking for bottlenecks or baselining a server, I’ll typically collect a profiler trace and put it through RML utilities – this will give me statistics for statements and procedures based on the ‘shape’ of the statement rather than the actual statement itself. It allows me then to establish the most common calls into Sql, and then establish the histogram (typical batch time, number of batches called at a specific time, and individual and total batch cpu and IO statistics). It also allows you to perform comparisons between any two trace sessions. To understand the impact of running a server side trace I also set the user counter 1 to the ID of the trace, and back to 0 again at the end – this means in perfmon we can see when the trace is running, and if this has potentially skewed the performance of the system. I’ve provided a sample script here, which has the minimum events to collect, including the deadlock graph. Note you can run this with Sql 2005 – it will just throw a non fatal error when adding unsupported events.

You can download the x64 version of RML utilities from here: https://www.microsoft.com/downloads/details.aspx?FamilyId=B60CDFA3-732E-4347-9C06-2D1F1F84C342&displaylang=en. There’s a link to the x86 equivalent on the same page. You only need to install this tool on a desktop, or machine on which you’re going to do the analysis. To view it’s reports you’ll have to install the report viewer control  along with the tool (a link is also on this page).

The tool works by reading a profiler trace into a sql server database, then performing aggregations across the top of that to tell you where we’re spending most of our time (CPU, IO, and Duration) broken down by procedures and individual batches. This means all your service provider has to do is collect the profiler traces for you – you can then do the analysis off the production box.

Collecting the trace data

If you’ve not used server sided tracing before – this is identical to using the UI, but, has a much lower impact, and records to a file which can be whipped off the server to prevent a disk from filling up. The attach trace was written for 2008 but will work on 2005 (it will throw an error for events it can’t capture but will still work).

Before running the script – set the file name to something appropriate. You won’t need the trc extension. The trace is configured to rollover – that is, to record the trace data in chunks of 255Mb, so it’s easier to get off the server. If you’re worried about filling up the drive, you can specify the stop at time in the trace definition. Currently it will collect to the c drive, so make sure you change this. There are extra parameters in there of interest – one will stop the trace at set time, the other will stop the trace when a configured number of trace files have been collected.

Make sure, if you’re already using UserCounter1, you set the @SetUserCounter1 parameter to 0 to prevent the script from setting this value;

Performing the analysis

Before you run the analysis I strongly recommend that you configure an analysis database – one will be created for you if you don’t, but it will be two small, and this will significantly slow the indexing job at the end of the analysis. Make sure it’s set to simple recovery. In terms of sizing, I’d suggest for 5Gb of log data you go for a data file that is 5Gb in size, and a log that has an initial size of 1mb, but then pre-grow it to 20Gb. That’s not a typo – 15 million events drives about 20Gb of log data, even in simple recovery. After the analysis is complete you can then shrink the log right back down and keep the database for future reference

This is done using a command line. This command will work for the supplied trace file:

readtrace -I"{traceFiles}" -S{server in which to do analysis} –d{database to process and store results}  -o{output directory} -T28 -T29 -T32 -T33 -T16

if you get an error regarding multiple active results sets then  use these switches: -f –T35

There’s much more to RML Utils than I’ve listed here, including the ability to stress Sql using OStress – here I’ve listed what works for me in the lab. Enjoy!

Originally posted by Ryan Simpson on 4 January 2010 here.