Upravit

Sdílet prostřednictvím


Troubleshoot slow-running queries in SQL Server

Original product version:   SQL Server
Original KB number:   243589

Introduction

This article describes how to handle a performance issue that database applications may experience when using SQL Server: slow performance of a specific query or group of queries. The following methodology will help you narrow down the cause of the slow queries issue and direct you towards resolution.

Find slow queries

To establish that you have query performance issues on your SQL Server instance, start by examining queries by their execution time (elapsed time). Check if the time exceeds a threshold you have set (in milliseconds) based on an established performance baseline. For example, in a stress testing environment, you may have established a threshold for your workload to be no longer than 300 ms, and you can use this threshold. Then, you can identify all queries that exceed that threshold, focusing on each individual query and its pre-established performance baseline duration. Ultimately, business users care about the overall duration of database queries; therefore, the main focus is on execution duration. Other metrics like CPU time and logical reads are gathered to help with narrowing down the investigation.

  • For currently executing statements, check total_elapsed_time and cpu_time columns in sys.dm_exec_requests. Run the following query to get the data:

    SELECT 
        req.session_id
        , req.total_elapsed_time AS duration_ms
        , req.cpu_time AS cpu_time_ms
        , req.total_elapsed_time - req.cpu_time AS wait_time
        , req.logical_reads
        , SUBSTRING (REPLACE (REPLACE (SUBSTRING (ST.text, (req.statement_start_offset/2) + 1, 
           ((CASE statement_end_offset
               WHEN -1
               THEN DATALENGTH(ST.text)  
               ELSE req.statement_end_offset
             END - req.statement_start_offset)/2) + 1) , CHAR(10), ' '), CHAR(13), ' '), 
          1, 512)  AS statement_text  
    FROM sys.dm_exec_requests AS req
        CROSS APPLY sys.dm_exec_sql_text(req.sql_handle) AS ST
    ORDER BY total_elapsed_time DESC;
    
  • For past executions of the query, check last_elapsed_time and last_worker_time columns in sys.dm_exec_query_stats. Run the following query to get the data:

    SELECT t.text,
         (qs.total_elapsed_time/1000) / qs.execution_count AS avg_elapsed_time,
         (qs.total_worker_time/1000) / qs.execution_count AS avg_cpu_time,
         ((qs.total_elapsed_time/1000) / qs.execution_count ) - ((qs.total_worker_time/1000) / qs.execution_count) AS avg_wait_time,
         qs.total_logical_reads / qs.execution_count AS avg_logical_reads,
         qs.total_logical_writes / qs.execution_count AS avg_writes,
         (qs.total_elapsed_time/1000) AS cumulative_elapsed_time_all_executions
    FROM sys.dm_exec_query_stats qs
         CROSS apply sys.Dm_exec_sql_text (sql_handle) t
    WHERE t.text like '<Your Query>%'
    -- Replace <Your Query> with your query or the beginning part of your query. The special chars like '[','_','%','^' in the query should be escaped.
    ORDER BY (qs.total_elapsed_time / qs.execution_count) DESC
    

    Note

    If avg_wait_time shows a negative value, it's a parallel query.

  • If you can execute the query on demand in SQL Server Management Studio (SSMS) or Azure Data Studio, run it with SET STATISTICS TIME ON and SET STATISTICS IO ON.

    SET STATISTICS TIME ON
    SET STATISTICS IO ON
    <YourQuery>
    SET STATISTICS IO OFF
    SET STATISTICS TIME OFF
    

    Then from Messages, you'll see the CPU time, elapsed time, and logical reads like this:

      Table 'tblTest'. Scan count 1, logical reads 3, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.
    
      SQL Server Execution Times:
        CPU time = 460 ms,  elapsed time = 470 ms.
    
  • If you can collect a query plan, check the data from the Execution plan properties.

    1. Run the query with Include Actual Execution Plan on.

    2. Select the left-most operator from Execution plan.

    3. From Properties, expand QueryTimeStats property.

    4. Check ElapsedTime and CpuTime.

      Screenshot of the SQL Server execution plan properties window with the property QueryTimeStats expanded.

Running vs. Waiting: why are queries slow?

If you find queries that exceed your predefined threshold, examine why they could be slow. The cause of performance problems can be grouped into two categories, running or waiting:

  • WAITING: Queries can be slow because they're waiting on a bottleneck for a long time. See a detailed list of bottlenecks in types of Waits.

  • RUNNING: Queries can be slow because they're running (executing) for a long time. In other words, these queries are actively using CPU resources.

A query can be running for some time and waiting for some time in its lifetime (duration). However, your focus is to determine which is the dominant category that contributes to its long elapsed time. Therefore, the first task is to establish in which category the queries fall. It's simple: if a query isn't running, it's waiting. Ideally, a query spends most of its elapsed time in a running state and very little time waiting for resources. Also, in the best-case scenario, a query runs within or below a predetermined baseline. Compare the elapsed time and CPU time of the query to determine the issue type.

Type 1: CPU-bound (runner)

If the CPU time is close, equal to, or higher than the elapsed time, you can treat it as a CPU-bound query. For example, if the elapsed time is 3000 milliseconds (ms) and the CPU time is 2900 ms, that means most of the elapsed time is spent on the CPU. Then we can say it's a CPU-bound query.

Examples of running (CPU-bound) queries:

Elapsed Time (ms) CPU Time (ms) Reads (logical)
3200 3000 300000
1080 1000 20

Logical reads - reading data/index pages in cache - are most frequently the drivers of CPU utilization in SQL Server. There could be scenarios where CPU use comes from other sources: a while loop (in T-SQL or other code like XProcs or SQL CRL objects). The second example in the table illustrates such a scenario, where the majority of the CPU is not from reads.

Note

If the CPU time is greater than the duration, this indicates a parallel query is executed; multiple threads are using the CPU at the same time. For more information, see Parallel queries - runner or waiter.

Type 2: Waiting on a bottleneck (waiter)

A query is waiting on a bottleneck if the elapsed time is significantly greater than the CPU time. The elapsed time includes the time executing the query on the CPU (CPU time) and the time waiting for a resource to be released (wait time). For example, if the elapsed time is 2000 ms and the CPU time is 300 ms, the wait time is 1700 ms (2000 - 300 = 1700). For more information, see Types of Waits.

Examples of waiting queries:

Elapsed Time (ms) CPU Time (ms) Reads (logical)
2000 300 28000
10080 700 80000

Parallel queries - runner or waiter

Parallel queries may use more CPU time than the overall duration. The goal of parallelism is to allow multiple threads to run parts of a query simultaneously. In one second of clock time, a query may use eight seconds of CPU time by executing eight parallel threads. Therefore, it becomes challenging to determine a CPU-bound or a waiting query based on the elapsed time and CPU time difference. However, as a general rule, follow the principles listed in the above two sections. The summary is:

  • If the elapsed time is much greater than the CPU time, consider it a waiter.
  • If the CPU time is much greater than the elapsed time, consider it a runner.

Examples of Parallel queries:

Elapsed Time (ms) CPU Time (ms) Reads (logical)
1200 8100 850000
3080 12300 1500000

High-level visual representation of the methodology

The screenshot shows a high-level visual representation of the methodology for troubleshooting slow queries.

Diagnose and resolve waiting queries

If you established that your queries of interest are waiters, your next step is to focus on resolving bottleneck issues. Otherwise, go to step 4: Diagnose and resolve running queries.

To optimize a query that's waiting on bottlenecks, identify how long the wait is and where the bottleneck is (the wait type). Once the wait type is confirmed, reduce the wait time or eliminate the wait completely.

To calculate the approximate wait time, subtract the CPU time (worker time) from the elapsed time of a query. Typically, the CPU time is the actual execution time, and the remaining part of the lifetime of the query is waiting.

Examples of how to calculate approximate wait duration:

Elapsed Time (ms) CPU Time (ms) Wait time (ms)
3200 3000 200
7080 1000 6080

Identify the bottleneck or wait

  • To identify historical long-waiting queries (for example, >20% of the overall elapsed time is wait time), run the following query. This query uses performance statistics for cached query plans since the start of SQL Server.

    SELECT t.text,
             qs.total_elapsed_time / qs.execution_count
             AS avg_elapsed_time,
             qs.total_worker_time / qs.execution_count
             AS avg_cpu_time,
             (qs.total_elapsed_time - qs.total_worker_time) / qs.execution_count
             AS avg_wait_time,
             qs.total_logical_reads / qs.execution_count
             AS avg_logical_reads,
             qs.total_logical_writes / qs.execution_count
             AS avg_writes,
             qs.total_elapsed_time
             AS cumulative_elapsed_time
    FROM sys.dm_exec_query_stats qs
             CROSS apply sys.Dm_exec_sql_text (sql_handle) t
    WHERE (qs.total_elapsed_time - qs.total_worker_time) / qs.total_elapsed_time
             > 0.2
    ORDER BY qs.total_elapsed_time / qs.execution_count DESC
    
  • To identify currently executing queries with waits longer than 500 ms, run the following query:

    SELECT r.session_id, r.wait_type, r.wait_time AS wait_time_ms
    FROM sys.dm_exec_requests r 
       JOIN sys.dm_exec_sessions s ON r.session_id = s.session_id 
    WHERE wait_time > 500
    AND is_user_process = 1
    
  • If you can collect a query plan, check the WaitStats from the execution plan properties in SSMS:

    1. Run the query with Include Actual Execution Plan on.
    2. Right-click the left-most operator in the Execution plan tab
    3. Select Properties and then WaitStats property.
    4. Check the WaitTimeMs and WaitType.
  • If you're familiar with PSSDiag/SQLdiag or SQL LogScout LightPerf/GeneralPerf scenarios, consider using either of them to collect performance statistics and identify waiting queries on your SQL Server instance. You can import the collected data files and analyze the performance data with SQL Nexus.

References to help eliminate or reduce waits

The causes and resolutions for each wait type vary. There's no one general method to resolve all wait types. Here are articles to troubleshoot and resolve common wait type issues:

For descriptions of many Wait types and what they indicate, see the table in Types of Waits.

Diagnose and resolve running queries

If CPU (worker) time is very close to the overall elapsed duration, the query spends most of its lifetime executing. Typically, when the SQL Server engine drives high CPU usage, the high CPU usage is coming from queries that drive a large number of logical reads (the most common reason).

To identify the queries that are responsible for high-CPU activity currently, run the following statement:

SELECT TOP 10 s.session_id,
           r.status,
           r.cpu_time,
           r.logical_reads,
           r.reads,
           r.writes,
           r.total_elapsed_time / (1000 * 60) 'Elaps M',
           SUBSTRING(st.TEXT, (r.statement_start_offset / 2) + 1,
           ((CASE r.statement_end_offset
                WHEN -1 THEN DATALENGTH(st.TEXT)
                ELSE r.statement_end_offset
            END - r.statement_start_offset) / 2) + 1) AS statement_text,
           COALESCE(QUOTENAME(DB_NAME(st.dbid)) + N'.' + QUOTENAME(OBJECT_SCHEMA_NAME(st.objectid, st.dbid)) 
           + N'.' + QUOTENAME(OBJECT_NAME(st.objectid, st.dbid)), '') AS command_text,
           r.command,
           s.login_name,
           s.host_name,
           s.program_name,
           s.last_request_end_time,
           s.login_time,
           r.open_transaction_count
FROM sys.dm_exec_sessions AS s
JOIN sys.dm_exec_requests AS r ON r.session_id = s.session_id CROSS APPLY sys.Dm_exec_sql_text(r.sql_handle) AS st
WHERE r.session_id != @@SPID
ORDER BY r.cpu_time DESC

If queries aren't driving the CPU at this moment, you can run the following statement to look for historical CPU-bound queries:

SELECT TOP 10  qs.last_execution_time, st.text AS batch_text,
    SUBSTRING(st.TEXT, (qs.statement_start_offset / 2) + 1, ((CASE qs.statement_end_offset WHEN - 1 THEN DATALENGTH(st.TEXT) ELSE qs.statement_end_offset END - qs.statement_start_offset) / 2) + 1) AS statement_text,
    (qs.total_worker_time / 1000) / qs.execution_count AS avg_cpu_time_ms,
    (qs.total_elapsed_time / 1000) / qs.execution_count AS avg_elapsed_time_ms,
    qs.total_logical_reads / qs.execution_count AS avg_logical_reads,
    (qs.total_worker_time / 1000) AS cumulative_cpu_time_all_executions_ms,
    (qs.total_elapsed_time / 1000) AS cumulative_elapsed_time_all_executions_ms
FROM sys.dm_exec_query_stats qs
CROSS APPLY sys.dm_exec_sql_text(sql_handle) st
ORDER BY(qs.total_worker_time / qs.execution_count) DESC

Common methods to resolve long-running, CPU-bound queries