Added per-operator level performance stats for Query Processing

Cross post with https://aka.ms/sqlserverteam

When you are troubleshooting query performance, metrics are never too much. With that in mind, based on customer feedback, but also our own CSS support, we have added runtime information to Showplan, and exposed a new Extended Event in SQL Server 2016 RC0. We are adding the new Showplan info and Extended Event in an upcoming SQL Server 2014 Service Pack, and as usual deliver added value to in-market versions.

Regarding Showplan, take a test query on which we are doing a Clustered index Scan, and look the information available until SQL Server 2014:

 <RunTimeInformation>
   <RunTimeCountersPerThread Thread="0" ActualRows="8001" ActualEndOfScans="1" ActualExecutions="1" />
</RunTimeInformation>

We can see the scan returned 8001 rows, but that doesn’t give you much insight regarding performance of the scan.

But below we can observe what kind of detail we can get from runtime stats in SQL Server 2016 RC0. These include the actual rows read, I/O Reads, CPU time, and elapsed time, all exposed per-thread:

 <RunTimeInformation>
   <RunTimeCountersPerThread Thread="0" ActualRows="8001" ActualRowsRead="10000000" Batches="0" ActualEndOfScans="1" ActualExecutions="1" ActualExecutionMode="Row" ActualElapsedms="965" ActualCPUms="965" ActualScans="1" ActualLogicalReads="26073" ActualPhysicalReads="0" ActualReadAheads="0" ActualLobLogicalReads="0" ActualLobPhysicalReads="0" ActualLobReadAheads="0" />
</RunTimeInformation>

And looking at the same Index Scan running in parallel:

 <RunTimeInformation>
   <RunTimeCountersPerThread Thread="6" ActualRows="0" ActualRowsRead="932085" Batches="0" ActualEndOfScans="1" ActualExecutions="1" ActualExecutionMode="Row" ActualElapsedms="160" ActualCPUms="158" ActualScans="1" ActualLogicalReads="2466" ActualPhysicalReads="0" ActualReadAheads="0" ActualLobLogicalReads="0" ActualLobPhysicalReads="0" ActualLobReadAheads="0" />
   <RunTimeCountersPerThread Thread="8" ActualRows="0" ActualRowsRead="886279" Batches="0" ActualEndOfScans="1" ActualExecutions="1" ActualExecutionMode="Row" ActualElapsedms="138" ActualCPUms="137" ActualScans="1" ActualLogicalReads="2341" ActualPhysicalReads="0" ActualReadAheads="0" ActualLobLogicalReads="0" ActualLobPhysicalReads="0" ActualLobReadAheads="0" />
   <RunTimeCountersPerThread Thread="10" ActualRows="0" ActualRowsRead="828520" Batches="0" ActualEndOfScans="1" ActualExecutions="1" ActualExecutionMode="Row" ActualElapsedms="121" ActualCPUms="120" ActualScans="1" ActualLogicalReads="2192" ActualPhysicalReads="0" ActualReadAheads="0" ActualLobLogicalReads="0" ActualLobPhysicalReads="0" ActualLobReadAheads="0" />
   <RunTimeCountersPerThread Thread="12" ActualRows="0" ActualRowsRead="932085" Batches="0" ActualEndOfScans="1" ActualExecutions="1" ActualExecutionMode="Row" ActualElapsedms="155" ActualCPUms="155" ActualScans="1" ActualLogicalReads="2466" ActualPhysicalReads="0" ActualReadAheads="0" ActualLobLogicalReads="0" ActualLobPhysicalReads="0" ActualLobReadAheads="0" />
   <RunTimeCountersPerThread Thread="11" ActualRows="0" ActualRowsRead="828520" Batches="0" ActualEndOfScans="1" ActualExecutions="1" ActualExecutionMode="Row" ActualElapsedms="133" ActualCPUms="132" ActualScans="1" ActualLogicalReads="2192" ActualPhysicalReads="0" ActualReadAheads="0" ActualLobLogicalReads="0" ActualLobPhysicalReads="0" ActualLobReadAheads="0" />
   <RunTimeCountersPerThread Thread="9" ActualRows="0" ActualRowsRead="724955" Batches="0" ActualEndOfScans="1" ActualExecutions="1" ActualExecutionMode="Row" ActualElapsedms="124" ActualCPUms="124" ActualScans="1" ActualLogicalReads="1918" ActualPhysicalReads="0" ActualReadAheads="0" ActualLobLogicalReads="0" ActualLobPhysicalReads="0" ActualLobReadAheads="0" />
   <RunTimeCountersPerThread Thread="1" ActualRows="0" ActualRowsRead="724955" Batches="0" ActualEndOfScans="1" ActualExecutions="1" ActualExecutionMode="Row" ActualElapsedms="121" ActualCPUms="120" ActualScans="1" ActualLogicalReads="1918" ActualPhysicalReads="0" ActualReadAheads="0" ActualLobLogicalReads="0" ActualLobPhysicalReads="0" ActualLobReadAheads="0" />
   <RunTimeCountersPerThread Thread="7" ActualRows="0" ActualRowsRead="414260" Batches="0" ActualEndOfScans="1" ActualExecutions="1" ActualExecutionMode="Row" ActualElapsedms="111" ActualCPUms="109" ActualScans="1" ActualLogicalReads="1096" ActualPhysicalReads="0" ActualReadAheads="0" ActualLobLogicalReads="0" ActualLobPhysicalReads="0" ActualLobReadAheads="0" />
   <RunTimeCountersPerThread Thread="5" ActualRows="0" ActualRowsRead="1035650" Batches="0" ActualEndOfScans="1" ActualExecutions="1" ActualExecutionMode="Row" ActualElapsedms="168" ActualCPUms="165" ActualScans="1" ActualLogicalReads="2740" ActualPhysicalReads="0" ActualReadAheads="0" ActualLobLogicalReads="0" ActualLobPhysicalReads="0" ActualLobReadAheads="0" />
   <RunTimeCountersPerThread Thread="4" ActualRows="8001" ActualRowsRead="932086" Batches="0" ActualEndOfScans="1" ActualExecutions="1" ActualExecutionMode="Row" ActualElapsedms="160" ActualCPUms="158" ActualScans="1" ActualLogicalReads="2466" ActualPhysicalReads="0" ActualReadAheads="0" ActualLobLogicalReads="0" ActualLobPhysicalReads="0" ActualLobReadAheads="0" />
   <RunTimeCountersPerThread Thread="2" ActualRows="0" ActualRowsRead="828520" Batches="0" ActualEndOfScans="1" ActualExecutions="1" ActualExecutionMode="Row" ActualElapsedms="156" ActualCPUms="130" ActualScans="1" ActualLogicalReads="2192" ActualPhysicalReads="0" ActualReadAheads="0" ActualLobLogicalReads="0" ActualLobPhysicalReads="0" ActualLobReadAheads="0" />
   <RunTimeCountersPerThread Thread="3" ActualRows="0" ActualRowsRead="932085" Batches="0" ActualEndOfScans="1" ActualExecutions="1" ActualExecutionMode="Row" ActualElapsedms="159" ActualCPUms="158" ActualScans="1" ActualLogicalReads="2466" ActualPhysicalReads="0" ActualReadAheads="0" ActualLobLogicalReads="0" ActualLobPhysicalReads="0" ActualLobReadAheads="0" />
   <RunTimeCountersPerThread Thread="0" ActualRows="0" Batches="0" ActualEndOfScans="0" ActualExecutions="0" ActualExecutionMode="Row" ActualElapsedms="0" ActualCPUms="0" ActualScans="1" ActualLogicalReads="2" ActualPhysicalReads="0" ActualReadAheads="0" ActualLobLogicalReads="0" ActualLobPhysicalReads="0" ActualLobReadAheads="0" />
</RunTimeInformation>

NOTE: ActualCPUms and ActualElapsedms will show cumulative values for the node and its children when executing in Row Mode, and will show just the single node values if executing in Batch Mode.

We have also introduced a new Extended Event (query_thread_profile), allowing more insight on the performance of each node and thread:

xEvent_Query

Note:  Showplan time scale is shown in milliseconds, while the xEvent shows microseconds for CPU and total time.

Pedro Lopes (@sqlpto) – Senior Program Manager