Top SQL Server 2005 Performance Issues for OLTP Applications
OLTP work loads are characterized by high volumes of similar small transactions.
It is important to keep these characteristics in mind as we examine the significance of database design, resource utilization and system performance. The top performance bottlenecks or gotchas for OLTP applications are outlined below.
Database Design issue if….
Too many table joins for frequent queries. Overuse of joins in an OLTP application results in longer running queries & wasted system resources. Generally, frequent operations requiring 5 or more table joins should be avoided by redesigning the database.
Too many indexes on frequently updated (inclusive of inserts, updates and deletes) tables incur extra index maintenance overhead. Generally, OLTP database designs should keep the number of indexes to a functional minimum, again due to the high volumes of similar transactions combined with the cost of index maintenance.
Big IOs such as table and range scans due to missing indexes. By definition, OLTP transactions should not require big IOs and should be examined.
Unused indexes incur the cost of index maintenance for inserts, updates, and deletes without benefiting any users. Unused indexes should be eliminated. Any index that has been used (by select, update or delete operations) will appear in sys.dm_db_index_usage_stats. Thus, any defined index not included in this DMV has not been used since the last re-start of SQL Server.
CPU bottleneck if…
Signal waits > 25% of total waits. See sys.dm_os_wait_stats for Signal waits and Total waits. Signal waits measure the time spent in the runnable queue waiting for CPU. High signal waits indicate a CPU bottleneck.
Plan re-use < 90% . A query plan is used to execute a query. Plan re-use is desirable for OLTP workloads because re-creating the same plan (for similar or identical transactions) is a waste of CPU resources. Compare SQL Server SQL Statistics: batch requests/sec to SQL compilations/sec. Compute plan re-use as follows: Plan re-use = (Batch requests - SQL compilations) / Batch requests. Special exception to the plan re-use rule: Zero cost plans will not be cached (not re-used) in SQL 2005 SP2. Applications that use zero cost plans will have a lower plan re-use but this is not a performance issue.
Parallel wait type cxpacket > 10% of total waits. Parallelism sacrifices CPU resources for speed of execution. Given the high volumes of OLTP, parallel queries usually reduce OLTP throughput and should be avoided. See sys.dm_os_wait_stats for wait statistics.
Memory bottleneck if…
Consistently low average page life expectancy. See Average Page Life Expectancy Counter which is in the Perfmon object SQL Server Buffer Manager (this represents is the average number of seconds a page stays in cache). For OLTP, an average page life expectancy of 300 is 5 minutes. Anything less could indicate memory pressure, missing indexes, or a cache flush.
Sudden big drop in page life expectancy. OLTP applications (e.g. small transactions) should have a steady (or slowly increasing) page life expectancy. See Perfmon object SQL Server Buffer Manager.
Pending memory grants. See counter Memory Grants Pending, in the Perfmon object SQL Server Memory Manager. Small OLTP transactions should not require a large memory grant.
Sudden drops or consistenty low SQL Cache hit ratio. OLTP applications (e.g. small transactions) should have a high cache hit ratio. Since OLTP transactions are small, there should not be (1) big drops in SQL Cache hit rates or (2) consistently low cache hit rates < 90%. Drops or low cache hit may indicate memory pressure or missing indexes.
IO bottleneck if…
High average disk seconds per read. When the IO subsystem is queued, disk seconds per read increases. See Perfmon Logical or Physical disk (disk seconds/read counter). Normally it takes 4-8ms to complete a read when there is no IO pressure. When the IO subsystem is under pressure due to high IO requests, the average time to complete a read increases, showing the effect of disk queues. Periodic higher values for disk seconds/read may be acceptable for many applications. For high performance OLTP applications, sophisticated SAN subsystems provide greater IO scalability and resiliency in handling spikes of IO activity. Sustained high values for disk seconds/read (>15ms) does indicate a disk bottleneck.
High average disk seconds per write. See Perfmon Logical or Physical disk. The throughput for high volume OLTP applications is dependent on fast sequential transaction log writes. A transaction log write can be as fast as 1ms (or less) for high performance SAN environments. For many applications, a periodic spike in average disk seconds per write is acceptable considering the high cost of sophisticated SAN subsystems. However, sustained high values for average disk seconds/write is a reliable indicator of a disk bottleneck.
Big IOs such as table and range scans due to missing indexes.
Top wait statistics in sys.dm_os_wait_stats are related to IO such as ASYNCH_IO_COMPLETION, IO_COMPLETION, LOGMGR, WRITELOG, or PAGEIOLATCH_x.
Blocking bottleneck if…
Index contention. Look for lock and latch waits in sys.dm_db_index_operational_stats. Compare with lock and latch requests.
High average row lock or latch waits. The average row lock or latch waits are computed by dividing lock and latch wait milliseconds (ms) by lock and latch waits. The average lock wait ms computed from sys.dm_db_index_operational_stats represents the average time for each block.
Block process report shows long blocks. See sp_configure “blocked process threshold” and Profiler “Blocked process Report” under the Errors and Warnings event.
Top wait statistics are LCK_x. See sys.dm_os_wait_stats.
High number of deadlocks. See Profiler “Graphical Deadlock” under Locks event to identify the statements involved in the deadlock.
Network bottleneck if…
High network latency coupled with an application that incurs many round trips to the database.
Network bandwidth is used up. See counters packets/sec and current bandwidth counters in the network interface object of Performance Monitor. For TCP/IP frames actual bandwidth is computed as packets/sec * 1500 * 8 /1000000 Mbps.