Share via


This article may contain URLs that were valid when originally published, but now link to sites or pages that no longer exist. To maintain the flow of the article, we've left these URLs in the text, but disabled the links.

Trace-scrubbing Tools

Andrew Zanevsky

Andrew Zanevsky shares his trace-scrubbing procedures that make it easy for you to handle large trace files and aggregate transactions by type–even when captured T-SQL code has variations.

SQL Server Profiler is a veritable treasure trove when it comes to helping DBAs optimize their T-SQL code. But, the surfeit of riches (I'm reminded of the Arabian Nights tale of Aladdin) can be overwhelming. I recently had one of those "sinking" feelings when I first tried to make sense of the enormous amount of data collected by traces on a client's servers. At this particular client, the online transactions processing system executes more than 4 million database transactions per hour. That means that even a 30-minute trace that captures "SQL Batch Completed" events results in a table with 2 million rows. Of course, it's simply impractical to process so many records without some automation, and even selecting the longest or most expensive transactions doesn't necessarily help in identifying bottlenecks. After all, short transactions can be the culprits of poor performance when executed thousands of times per minute. [Subscribers may want to re-read Linchi Shea's June 2002 article, "Distill SQL Server Log Files," for his Perl code. Linchi's PDF and Perl code are both included in this article's Download.–Ed.]

Trace-scrubbing rules

My intuition told me to use aggregation. But it didn't take me long to discover that grouping and aggregating trace files by transaction type didn't work, thanks to slight variations in executed queries. Each stored procedure call had different parameters, and queries were executed with different search arguments and slightly varying WHERE clause conditions. Some transactions included comments or had a different number of spaces, tabs, or line breaks within the otherwise identical code. Some queries used synonyms for keywords.

	To solve the problem of large trace analysis, I developed a trace-scrubbing process that strips less significant details of all trace records and allows grouping by transaction type. For example, if my trace file contains 50,000 records about execution of the same stored procedure (each with different parameters), then my process groups them into one line with duration, CPU, reads and writes–all totaled and averaged. This lets me quickly identify the transaction types that are responsible for a large percentage of overall system utilization.

	Implementation of trace scrubbing requires 1) parsing of all T-SQL statements recorded by the trace, and 2) eliminating variable parameters, as in the following example. Imagine that the three transactions listed here are captured by a trace.

  1. Proc_A 
      123, 
      'alpha', 
      0x0a9fff8b
2. EXECUTE [Proc_A] -345, 'beta', null /*comment*/
3. EXEC    dbo.Proc_A  9876543, default    -- test

	Although all three transactions execute stored procedure Proc_A, variations in execution, parameters, line breaks, tabs, spaces, and comments make them look very different. My scrubbing algorithm aggregates them into one record with transaction type stored as "Proc_A%".

	In order to recognize that several transactions executed the same stored procedure or, for example, performed INSERTs into the same table with different values, the scrubbing algorithm performs the following operations:

  • Eliminates EXEC and EXECUTE keywords.
  • Replaces all line breaks and tabs with spaces.
  • Treats multiple spaces as one.
  • Eliminates any numeric constants, hexadecimal values, literal strings, and NULL keyword.
  • Removes comments of both types ("double-dash" and /*...*/).
  • Removes commas between eliminated parameters.
  • Deletes the "dbo" owner name that may be used in front of many object names and in most cases is the default.
  • Reduces the code of each analyzed transaction to the first 800 characters. This allows me to treat long batches as the same transaction type if their first parts are identical (after reduction of less significant elements as described by other rules).
  • Implements special scrubbing for transactions that involve system procedures such as sp_cursoropen, which passes the actual executed query as a literal parameter to the SQL Server. For these cases, I extract the actual query from the parameters and discard the sp_cursoropen "wrapper."

	This set of rules substantially reduces the number of transactions that you have to consider and makes it easier to manually process the resulting list. Note that these rules are empirical and may have to be modified for different databases. For example, while working with another client, I had to add logic to my scrubbing procedure to ignore database names, because their server has more than 400 databases with identical schemas. The removal of database names allowed me to aggregate trace results by stored procedure name, regardless of the database context, and to identify the slowest and most expensive ones. Scrubbing in your particular environment may require further customization.

Stored procedures

The trace.sql code in the accompanying Download contains three stored procedures for trace scrubbing and analysis. Stored procedure Process_Trace, whose parameters are listed in Table 1, performs scrubbing on a given trace table. It finds transaction types after parsing all code captured in a trace table and creates two additional tables that allow linking of each transaction type to the original trace records with full SQL code. It assumes that the trace table contains, at a minimum, the following standard columns: RowNumber, with unique row identifiers; TextData, which stores full SQL code of each transaction; and EventClass, which contains numeric trace event types, as documented in BOL. By default, the procedure only analyzes events of types 10 (RPC_Completed) and 12 (TSQL_Batch_Completed), but you can specify any list of event types as a parameter. A sample trace definition template file that I use is available in the Download as performance.tdf.

Table 1. Parameters of stored procedure Process_Trace.

Parameter

Datatype

Description

@table

varchar(128)

Trace table name.

@debug

tinyint

Reserved for system maintenance.

@drop

Bit

0 (default) — normal processing, 1 — drop summary tables and index created by the previous execution of this procedure on the same trace table and quit.

@Event_Class_Filter

varchar(200)

Comma-separated list of trace event IDs to include in the scrubbing. Default — '10,12' (RPC_Completed and TSQL_Batch_Completed). A full list can be found in BOL.

	Before you execute Process_Trace, you need to run a server-side trace, saving data to a file, and then load the trace file into a database table using system function fn_trace_gettable. (An alternative approach is to run Profiler, capturing data directly into a table, but I don't recommend it for heavy-duty production tracing, because your client-side workstation performance may be insufficient to capture all trace events and some of them may escape recording.) Once you have a trace table, pass it as a parameter to Process_Trace. You can think of this procedure as a preliminary step, compiling data for the next one that generates reports.

	Scrubbing of a million trace records takes several minutes and results in a list of transaction types that may still be several thousand rows long, only some of which are likely to be important for performance analysis. In order to identify the bottlenecks in this list, I developed another stored procedure, Calculate_Most_Expensive_Transactions, whose parameters are listed in Table 2. Using the original trace table and additional tables created by Process_Trace, this procedure generates a report about transaction cost by "scrubbed T-SQL type." It assumes that the original trace table contains columns Duration, Reads, Writes, and CPU, in addition to the three columns listed earlier that are required by Process_Trace.

Table 2. Parameters of stored procedure Calculate_Most_Expensive_Transactions.

Parameter

Datatype

Description

@trace_table

varchar(128)

Trace table name.

@output_table

varchar(128)

Report output table name or NULL (default) for standard output.

@topN

Int

Include scrubbed types in the report only if they fall within the top N items in at least one ranking category. Default — 50. Show all other types aggregated in one 'All Other' line.

@average_ms

Int

Individually list in the report (not as 'All Other') all scrubbed transaction types with average duration over @average_ms milliseconds. Default — 1000 (that is, 1 second).

@debug

tinyint

Reserved for system maintenance.

	Depending on your environment, SQL Server resources, and application requirements, you may be interested in total, average, or maximum values of captured parameters, as well as execution counts for each transaction type. The procedure calculates all of these values, computes totals for the whole trace table and cost percentages of each transaction in the total cost of the whole recorded transaction mix, and ranks transaction types by execution count, total transaction duration, total reads/writes, and total CPU usage. For good measure, I've thrown in the number of times each transaction type executed under 1 second, over 1 second, 2 seconds, 5 seconds, and 10 seconds.

	Once you've executed Process_Trace, you can run Calculate_Most_Expensive_Transactions and give it the same trace table name. (The output will appear in the result window of Query Analyzer.) Optional parameters allow storing it in a table for retention and historical trend analysis, but you don't need this option for testing.

	The result set allows you to identify the most expensive transactions by any parameter. You can find out what stored procedure or query was the most expensive I/O user, quickly find the top five heavy hitters in CPU utilization, identify the transactions that take the longest to execute on average, and so on. I paste the report into Microsoft Excel and sort it in different ways to analyze different performance aspects. A sample report is too wide for this page, but one is included in the Download as file SampleReport.xls. Fragments of this report are shown in Table 3 and Table 4.

Table 3. Sample trace report–top 10 transactions by total CPU resource used.

SQL_Type

Exec_Count

Total_CPU

Average_CPU

Percent_CPU

CPU_Rank

getListView%

1705

346890

203

11.709

1

getNext%

6199

107948

17

3.644

2

SELECT Auc_Number, sum(Sold_Count) FROM vwAuc GROUP BY Auc_Number

64

107107

1673

3.615

3

reportView%

66

71701

1086

2.42

4

getBuyerChargeSummary%

299

64048

214

2.162

5

AssignableWorkers%

162

63393

391

2.14

6

getDocument%

67

61230

913

2.067

7

reassignItem%

54782

57942

1

1.956

8

searchLocations%

27

54264

2009

1.832

9

SELECT * from GetProcessNoteList_vw WHERE Process_ID=%

640

54219

84

1.83

10

All Other:

     

66.625

 

Table 4. Sample trace report–top 10 transactions by average duration.

SQL_Type

Exec_Count

Total_Duration

Average_Duration

Max_Duration

SELECT Item_ID, Flags FROM v_DetailedPresale WHERE%

1

27,546

27,546

27,546

GetClosing_Approval_Listing;%

1

24,046

24,046

24,046

ProcessClosingqualifier%

1

16,656

16,656

16,656

SELECT * FROM v_Caller_Process_Provider_Association WHERE Phone_Number_%LIKE%

2

26,343

13,171

25,123

getIncompleteProcessDocForms%

2

25,029

12,514

16,546

NT_CSAUndelivered%

2

24,093

12,046

17,313

p_MoveBuyerPaymentDocVPRState%

2

22,829

11,414

20,783

AssociateSPEmployee%

1

11,296

11,296

11,296

SELECT retVal FROM v_Caller_Process_Provider_Association WHERE phone_number_%_Raw like%

3

32,389

10,796

16,530

delayedProcessClosing%

3

30,562

10,187

30,530

All Other:

1,361,157

12,816,138

9

30,643

	Excel also helps me to create pie charts illustrating "hit parades" of most expensive transactions, such as the ones shown in Figure 1 and Figure 2. They help me to instantly focus on the most critical transactions with high execution cost hindering performance. In most cases, I find that just a handful of transaction types out of the thousands of all executed ones are responsible for the bulk of SQL Server utilization. Narrowing down this list is crucial in performance analysis and optimization effort. Different servers may have different bottlenecks. Some are short on I/O resources, some on CPU power; for other servers, the capacity is sufficient, but average transaction duration is unacceptable. Depending on the specific challenges of your environment, you may derive different reports and charts from the base report that will isolate problems in the most important areas.

	To give you a taste of the data that the report offers for analysis, the SampleReport.xls file provides the following information about stored procedure getListView%. It was executed 1705 times during the trace. Out of this number, 1498 times it took less than 1 second, 120 times up to 2 seconds, 67 times up to 5 seconds, 16 times up to 10 seconds, and 4 times more than 10 seconds. Total duration of all executions was 961 seconds. All executions combined cost the server 3.6 million page reads, 733 writes, and 347,000 CPU units (cache page hits). Average duration per execution was a little over 0.5 second; it took 2125 reads (less than one write operation) and 203 CPU units. Maximum duration was almost 12 seconds, and the maximum number of reads in a single execution was 29,000+, 12 writes, and 2235 CPU units. Out of all transactions, this type was responsible for one-eighth of one percent of the total execution count, over 7 percent of the total duration, 4 percent of all reads, 0.8 percent of writes, and 11.7 percent of total CPU usage. In the proverbial hit parade, this transaction type was #61 in execution count, #2 by total duration, #4 in I/O (Reads+Writes), but #1 in CPU utilization.

	Based on this information, I knew that if my SQL Server was hurting on CPU capacity, then optimization of this stored procedure could potentially cut CPU usage by up to 11.7 percent. It also had the potential to yield up to 7 percent of I/O utilization. If my users complained about poor response time, I could see that this procedure was making them wait 1-12 seconds 207 times during the trace period. Optimization could potentially decrease the number of such user wait incidents, thus improving their experience. Some of this analysis would have been very difficult without scrubbing and aggregation.

	Note that reports generated by Calculate_Most_Expensive_Transactions contain transaction type IDs. These values can be used to retrieve all original SQL code for a particular transaction type using stored procedure Report_All_Transactions_By_SQL_ID, whose parameters are listed in Table 5. This procedure helps to analyze details of a scrubbed transaction type and find examples of actual unscrubbed T-SQL code.

Table 5. Parameters of stored procedure Report_All_Transactions_By_SQL_ID.

Parameter

Datatype

Description

@Table

varchar(128)

Trace table name.

@SQL_ID

int

Scrubbed SQL type id from a report produced by Calculate_Most_Expensive_Transactions.

@Sort

varchar(20)

Column name to sort by. Possible values: StartTime, Reads, CPU (default), Duration, or spid.

	Implementation of the scrubbing algorithm helped me to improve performance for several companies in a short amount of time, with little or no prior knowledge of the application and database schema. After I narrow down performance bottlenecks to just a few stored procedures or queries, I can focus on analyzing them and, for example, finding a missing index, a cursor that can be replaced with set processing, or another more effective algorithm implementation. My trace "treasures" are now neatly organized in their "coffers" and don't threaten overflowing.

	In a future article, I'll share with you a fully automated system that my colleagues and I built around the stored procedures presented in this article. It executes daily traces on a production server and automatically generates and publishes expensive transaction reports, using trace scrubbing and aggregation.

Download 410ANDREW.ZIP

To find out more about Microsoft SQL Server Professional and Pinnacle Publishing, visit their Web site at http://www.pinpub.com/

Note: This is not a Microsoft Corporation Web site. Microsoft is not responsible for its content.

This article is reproduced from the October 2004 issue of Microsoft SQL Server Professional. Copyright 2004, by Pinnacle Publishing, Inc., unless otherwise noted. All rights are reserved. Microsoft SQL Server Professional is an independently produced publication of Pinnacle Publishing, Inc. No part of this article may be used or reproduced in any fashion (except in brief quotations used in critical articles and reviews) without prior consent of Pinnacle Publishing, Inc. To contact Pinnacle Publishing, Inc., please call 1-800-788-1900.