Troubleshoot slow-running queries in SQL Server
Original product version: SQL Server
Original KB number: 243589
Introduction
This article describes how to handle a performance issue that database applications may experience when using SQL Server: slow performance of a specific query or group of queries. The following methodology will help you narrow down the cause of the slow queries issue and direct you towards resolution.
Find slow queries
To establish that you have query performance issues on your SQL Server instance, start by examining queries by their execution time (elapsed time). Check if the time exceeds a threshold you have set (in milliseconds) based on an established performance baseline. For example, in a stress testing environment, you may have established a threshold for your workload to be no longer than 300 ms, and you can use this threshold. Then, you can identify all queries that exceed that threshold, focusing on each individual query and its pre-established performance baseline duration. Ultimately, business users care about the overall duration of database queries; therefore, the main focus is on execution duration. Other metrics like CPU time and logical reads are gathered to help with narrowing down the investigation.
For currently executing statements, check total_elapsed_time and cpu_time columns in sys.dm_exec_requests. Run the following query to get the data:
SELECT req.session_id , req.total_elapsed_time AS duration_ms , req.cpu_time AS cpu_time_ms , req.total_elapsed_time - req.cpu_time AS wait_time , req.logical_reads , SUBSTRING (REPLACE (REPLACE (SUBSTRING (ST.text, (req.statement_start_offset/2) + 1, ((CASE statement_end_offset WHEN -1 THEN DATALENGTH(ST.text) ELSE req.statement_end_offset END - req.statement_start_offset)/2) + 1) , CHAR(10), ' '), CHAR(13), ' '), 1, 512) AS statement_text FROM sys.dm_exec_requests AS req CROSS APPLY sys.dm_exec_sql_text(req.sql_handle) AS ST ORDER BY total_elapsed_time DESC;
For past executions of the query, check last_elapsed_time and last_worker_time columns in sys.dm_exec_query_stats. Run the following query to get the data:
SELECT t.text, (qs.total_elapsed_time/1000) / qs.execution_count AS avg_elapsed_time, (qs.total_worker_time/1000) / qs.execution_count AS avg_cpu_time, ((qs.total_elapsed_time/1000) / qs.execution_count ) - ((qs.total_worker_time/1000) / qs.execution_count) AS avg_wait_time, qs.total_logical_reads / qs.execution_count AS avg_logical_reads, qs.total_logical_writes / qs.execution_count AS avg_writes, (qs.total_elapsed_time/1000) AS cumulative_elapsed_time_all_executions FROM sys.dm_exec_query_stats qs CROSS apply sys.Dm_exec_sql_text (sql_handle) t WHERE t.text like '<Your Query>%' -- Replace <Your Query> with your query or the beginning part of your query. The special chars like '[','_','%','^' in the query should be escaped. ORDER BY (qs.total_elapsed_time / qs.execution_count) DESC
Note
If
avg_wait_time
shows a negative value, it's a parallel query.If you can execute the query on demand in SQL Server Management Studio (SSMS) or Azure Data Studio, run it with SET STATISTICS TIME
ON
and SET STATISTICS IOON
.SET STATISTICS TIME ON SET STATISTICS IO ON <YourQuery> SET STATISTICS IO OFF SET STATISTICS TIME OFF
Then from Messages, you'll see the CPU time, elapsed time, and logical reads like this:
Table 'tblTest'. Scan count 1, logical reads 3, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0. SQL Server Execution Times: CPU time = 460 ms, elapsed time = 470 ms.
If you can collect a query plan, check the data from the Execution plan properties.
Run the query with Include Actual Execution Plan on.
Select the left-most operator from Execution plan.
From Properties, expand QueryTimeStats property.
Check ElapsedTime and CpuTime.
Running vs. Waiting: why are queries slow?
If you find queries that exceed your predefined threshold, examine why they could be slow. The cause of performance problems can be grouped into two categories, running or waiting:
WAITING: Queries can be slow because they're waiting on a bottleneck for a long time. See a detailed list of bottlenecks in types of Waits.
RUNNING: Queries can be slow because they're running (executing) for a long time. In other words, these queries are actively using CPU resources.
A query can be running for some time and waiting for some time in its lifetime (duration). However, your focus is to determine which is the dominant category that contributes to its long elapsed time. Therefore, the first task is to establish in which category the queries fall. It's simple: if a query isn't running, it's waiting. Ideally, a query spends most of its elapsed time in a running state and very little time waiting for resources. Also, in the best-case scenario, a query runs within or below a predetermined baseline. Compare the elapsed time and CPU time of the query to determine the issue type.
Type 1: CPU-bound (runner)
If the CPU time is close, equal to, or higher than the elapsed time, you can treat it as a CPU-bound query. For example, if the elapsed time is 3000 milliseconds (ms) and the CPU time is 2900 ms, that means most of the elapsed time is spent on the CPU. Then we can say it's a CPU-bound query.
Examples of running (CPU-bound) queries:
Elapsed Time (ms) | CPU Time (ms) | Reads (logical) |
---|---|---|
3200 | 3000 | 300000 |
1080 | 1000 | 20 |
Logical reads - reading data/index pages in cache - are most frequently the drivers of CPU utilization in SQL Server. There could be scenarios where CPU use comes from other sources: a while loop (in T-SQL or other code like XProcs or SQL CRL objects). The second example in the table illustrates such a scenario, where the majority of the CPU is not from reads.
Note
If the CPU time is greater than the duration, this indicates a parallel query is executed; multiple threads are using the CPU at the same time. For more information, see Parallel queries - runner or waiter.
Type 2: Waiting on a bottleneck (waiter)
A query is waiting on a bottleneck if the elapsed time is significantly greater than the CPU time. The elapsed time includes the time executing the query on the CPU (CPU time) and the time waiting for a resource to be released (wait time). For example, if the elapsed time is 2000 ms and the CPU time is 300 ms, the wait time is 1700 ms (2000 - 300 = 1700). For more information, see Types of Waits.
Examples of waiting queries:
Elapsed Time (ms) | CPU Time (ms) | Reads (logical) |
---|---|---|
2000 | 300 | 28000 |
10080 | 700 | 80000 |
Parallel queries - runner or waiter
Parallel queries may use more CPU time than the overall duration. The goal of parallelism is to allow multiple threads to run parts of a query simultaneously. In one second of clock time, a query may use eight seconds of CPU time by executing eight parallel threads. Therefore, it becomes challenging to determine a CPU-bound or a waiting query based on the elapsed time and CPU time difference. However, as a general rule, follow the principles listed in the above two sections. The summary is:
- If the elapsed time is much greater than the CPU time, consider it a waiter.
- If the CPU time is much greater than the elapsed time, consider it a runner.
Examples of Parallel queries:
Elapsed Time (ms) | CPU Time (ms) | Reads (logical) |
---|---|---|
1200 | 8100 | 850000 |
3080 | 12300 | 1500000 |
High-level visual representation of the methodology
Diagnose and resolve waiting queries
If you established that your queries of interest are waiters, your next step is to focus on resolving bottleneck issues. Otherwise, go to step 4: Diagnose and resolve running queries.
To optimize a query that's waiting on bottlenecks, identify how long the wait is and where the bottleneck is (the wait type). Once the wait type is confirmed, reduce the wait time or eliminate the wait completely.
To calculate the approximate wait time, subtract the CPU time (worker time) from the elapsed time of a query. Typically, the CPU time is the actual execution time, and the remaining part of the lifetime of the query is waiting.
Examples of how to calculate approximate wait duration:
Elapsed Time (ms) | CPU Time (ms) | Wait time (ms) |
---|---|---|
3200 | 3000 | 200 |
7080 | 1000 | 6080 |
Identify the bottleneck or wait
To identify historical long-waiting queries (for example, >20% of the overall elapsed time is wait time), run the following query. This query uses performance statistics for cached query plans since the start of SQL Server.
SELECT t.text, qs.total_elapsed_time / qs.execution_count AS avg_elapsed_time, qs.total_worker_time / qs.execution_count AS avg_cpu_time, (qs.total_elapsed_time - qs.total_worker_time) / qs.execution_count AS avg_wait_time, qs.total_logical_reads / qs.execution_count AS avg_logical_reads, qs.total_logical_writes / qs.execution_count AS avg_writes, qs.total_elapsed_time AS cumulative_elapsed_time FROM sys.dm_exec_query_stats qs CROSS apply sys.Dm_exec_sql_text (sql_handle) t WHERE (qs.total_elapsed_time - qs.total_worker_time) / qs.total_elapsed_time > 0.2 ORDER BY qs.total_elapsed_time / qs.execution_count DESC
To identify currently executing queries with waits longer than 500 ms, run the following query:
SELECT r.session_id, r.wait_type, r.wait_time AS wait_time_ms FROM sys.dm_exec_requests r JOIN sys.dm_exec_sessions s ON r.session_id = s.session_id WHERE wait_time > 500 AND is_user_process = 1
If you can collect a query plan, check the WaitStats from the execution plan properties in SSMS:
- Run the query with Include Actual Execution Plan on.
- Right-click the left-most operator in the Execution plan tab
- Select Properties and then WaitStats property.
- Check the WaitTimeMs and WaitType.
If you're familiar with PSSDiag/SQLdiag or SQL LogScout LightPerf/GeneralPerf scenarios, consider using either of them to collect performance statistics and identify waiting queries on your SQL Server instance. You can import the collected data files and analyze the performance data with SQL Nexus.
References to help eliminate or reduce waits
The causes and resolutions for each wait type vary. There's no one general method to resolve all wait types. Here are articles to troubleshoot and resolve common wait type issues:
- Understand and resolve blocking issues (LCK_M_*)
- Understand and resolve Azure SQL Database blocking problems
- Troubleshoot slow SQL Server performance caused by I/O issues (PAGEIOLATCH_*, WRITELOG, IO_COMPLETION, BACKUPIO)
- Resolve last-page insert PAGELATCH_EX contention in SQL Server
- Memory grants explanations and solutions (RESOURCE_SEMAPHORE)
- Troubleshoot slow queries that result from ASYNC_NETWORK_IO wait type
- Troubleshooting High HADR_SYNC_COMMIT wait type with Always On Availability Groups
- How It Works: CMEMTHREAD and Debugging Them
- Making parallelism waits actionable (CXPACKET and CXCONSUMER)
- THREADPOOL wait
For descriptions of many Wait types and what they indicate, see the table in Types of Waits.
Diagnose and resolve running queries
If CPU (worker) time is very close to the overall elapsed duration, the query spends most of its lifetime executing. Typically, when the SQL Server engine drives high CPU usage, the high CPU usage is coming from queries that drive a large number of logical reads (the most common reason).
To identify the queries that are responsible for high-CPU activity currently, run the following statement:
SELECT TOP 10 s.session_id,
r.status,
r.cpu_time,
r.logical_reads,
r.reads,
r.writes,
r.total_elapsed_time / (1000 * 60) 'Elaps M',
SUBSTRING(st.TEXT, (r.statement_start_offset / 2) + 1,
((CASE r.statement_end_offset
WHEN -1 THEN DATALENGTH(st.TEXT)
ELSE r.statement_end_offset
END - r.statement_start_offset) / 2) + 1) AS statement_text,
COALESCE(QUOTENAME(DB_NAME(st.dbid)) + N'.' + QUOTENAME(OBJECT_SCHEMA_NAME(st.objectid, st.dbid))
+ N'.' + QUOTENAME(OBJECT_NAME(st.objectid, st.dbid)), '') AS command_text,
r.command,
s.login_name,
s.host_name,
s.program_name,
s.last_request_end_time,
s.login_time,
r.open_transaction_count
FROM sys.dm_exec_sessions AS s
JOIN sys.dm_exec_requests AS r ON r.session_id = s.session_id CROSS APPLY sys.Dm_exec_sql_text(r.sql_handle) AS st
WHERE r.session_id != @@SPID
ORDER BY r.cpu_time DESC
If queries aren't driving the CPU at this moment, you can run the following statement to look for historical CPU-bound queries:
SELECT TOP 10 qs.last_execution_time, st.text AS batch_text,
SUBSTRING(st.TEXT, (qs.statement_start_offset / 2) + 1, ((CASE qs.statement_end_offset WHEN - 1 THEN DATALENGTH(st.TEXT) ELSE qs.statement_end_offset END - qs.statement_start_offset) / 2) + 1) AS statement_text,
(qs.total_worker_time / 1000) / qs.execution_count AS avg_cpu_time_ms,
(qs.total_elapsed_time / 1000) / qs.execution_count AS avg_elapsed_time_ms,
qs.total_logical_reads / qs.execution_count AS avg_logical_reads,
(qs.total_worker_time / 1000) AS cumulative_cpu_time_all_executions_ms,
(qs.total_elapsed_time / 1000) AS cumulative_elapsed_time_all_executions_ms
FROM sys.dm_exec_query_stats qs
CROSS APPLY sys.dm_exec_sql_text(sql_handle) st
ORDER BY(qs.total_worker_time / qs.execution_count) DESC
Common methods to resolve long-running, CPU-bound queries
- Examine the query plan of the query
- Update Statistics
- Identify and apply Missing Indexes. For more steps on how to identify missing indexes, see Tune nonclustered indexes with missing index suggestions
- Redesign or rewrite the queries
- Identify and resolve parameter-sensitive plans
- Identify and resolve SARG-ability issues
- Identify and resolve Row goal issues where long-running nested loops can be caused by TOP, EXISTS, IN, FAST, SET ROWCOUNT, OPTION (FAST N). For more information, see Row Goals Gone Rogue and Showplan enhancements - Row Goal EstimateRowsWithoutRowGoal
- Assess and resolve cardinality estimation issues. For more information, see Decreased query performance after upgrade from SQL Server 2012 or earlier to 2014 or later
- Identify and resolve quries that seem no never complete, see Troubleshoot queries that seem to never end in SQL Server
- Identify and resolve slow queries affected by optimizer timeout
- Identify high CPU performance issues. For more information, see Troubleshoot high-CPU-usage issues in SQL Server
- Troubleshoot a query that shows a significant performance difference between two servers
- Increase computing resources on the system (CPUs)
- Troubleshoot UPDATE performance issues with narrow and wide plans
Recommended resources
- Detectable types of query performance bottlenecks in SQL Server and Azure SQL Managed Instance
- Performance Monitoring and Tuning Tools
- Auto-Tuning Options in SQL Server
- Index architecture and Design Guidelines
- Troubleshoot query time-out errors
- Troubleshoot high-CPU-usage issues in SQL Server
- Decreased query performance after upgrade from SQL Server 2012 or earlier to 2014 or later