Improving Replication Performance By Removing Unused Indexes
Replication performance can be improved by addressing different aspects of SQL server, such as disk, memory, CPU, DB design and implementation. A great whitepaper covers all these can found at https://technet.microsoft.com/en-us/library/cc966539.aspx. However, on a system with a less optimal index strategy or on a production environment with legacy indexes, some "unused" indexes can tax on the performance of SQL server as well as increasing the latency of replication.
When you have indexes that are never used for any SELECT queries, the obvious issue is the wasted disk space to store them. The less obvious is the penalty you pay to maintain the indexes if their underlying data is subject to inserts, updates or deletes. Whenever the data changes, SQL may need to rearrange and maintain the index trees.
Consider in a replication environment where the updates of tables from the publisher get trickled down to the subscriber. In the profiler trace against the subscriber, you only see replication calls to insert, update or delete the tables on the subscriber but implicitly all the related indexes of the modified tables are also changing. By identifying what indexes are not used and removing them, both the publisher and subscriber can benefit from less indexes to maintain.
To find out the index usage, use DMV sys.dm_db_index_usage_stats along with system views sys.indexes , sys.objects and sys.schemas like the script below:
select sysschema.name 'schema_name'
,sysobj.name 'table_name'
,ISNULL(sysidx.name,'HEAP') 'index_name'
,sysidx.index_id
,'is_unused_index' = CASE ISNULL(idxstats.user_seeks,0) + ISNULL(idxstats.user_scans,0) + ISNULL(idxstats.user_lookups,0)
WHEN 0 THEN 1 ELSE 0 END
,sysidx.is_unique
,sysidx.is_unique_constraint
,ISNULL(idxstats.user_updates,0) 'user_updates'
,ISNULL(idxstats.user_seeks,0) 'user_seeks'
,ISNULL(idxstats.user_scans,0) 'user_scans'
,ISNULL(idxstats.user_lookups,0) 'user_lookups'
, DropScript = CASE ISNULL(idxstats.user_seeks,0) + ISNULL(idxstats.user_scans,0) + ISNULL(idxstats.user_lookups,0)
WHEN 0 THEN '' ELSE null END
+ 'IF EXISTS (SELECT * FROM sys.indexes WHERE object_id = OBJECT_ID(N''[' + sysschema.name + '].['
+ sysobj.name + ']'') AND name = N''' + sysidx.name + ''') '
+ case sysidx.is_unique_constraint -- 1: drop constraint 0: drop index
when 1 then ' ALTER TABLE [' + sysschema.name + '].[' + sysobj.name + '] DROP CONSTRAINT [' + sysidx.name + ']'
else ' DROP INDEX [' + sysidx.name + '] ON [' + sysschema.name + '].[' + sysobj.name + ']'
end
from sys.indexes sysidx
join sys.objects sysobj on sysidx.object_id = sysobj.object_id
join sys.schemas sysschema on sysobj.schema_id = sysschema.schema_id
left join sys.dm_db_index_usage_stats idxstats
on sysidx.object_id=idxstats.object_id
and sysidx.index_id=idxstats.index_id
and idxstats.database_id=db_id()
where sysobj.type_desc = 'USER_TABLE'
and sysidx.index_id > 1
and sysobj.is_ms_shipped = 0 -- exclude object created by an internal SQL Server component such as sysarticleupdates, syspublications, syssubscriptions
order by 1,2
Executing the script, you get a list of unused indexes and also the drop index statements that you can copy and paste as a deployment script.
As I mentioned before, indexes can be used find data (read-only) but they are also subject to updates (write-only) if the underlying data is modified. By permutating read and write activities on the indexes, we can conclude four types of index usage patterns as the table below:
Row # from output above |
Read-only access pattern (scans, seeks and lookups) |
Write-only access pattern (updates) |
Indexes not used and should be dropped |
Will improve performance after dropping indexes |
26 |
Yes |
Yes |
No |
N/A |
23, 28 |
No |
No |
Yes |
No |
24, 25 |
Yes |
No |
No |
N/A |
27 |
No |
Yes |
Yes |
Yes |
Let me go over all four of them.
1. BOTH read and write access patterns: this is indexes that help answering queries and also get updated when underlying data changes. Let's keep this kind.
2. No read or write access pattern: although it's not used but it's not hindering system performance. Removing this kind of indexs saves you disk space at most without influencing performance.
3. Only read access pattern but not write access pattern: like the first one, this sort helps your joins and the underlying data is never changed so no writes. Let's also keep it.
4. Last is the unnecessary indexes that are never used to answer queries but are needed to be updated to maintain index tree structure. Grab the DropScript column for your convenience where the drop statements are generated. Getting rid of these indexes should improve your replication performance.
The same exercise can be performed on a non-replication system. You should get the same performance gain when those unused indexes are dropped.
The gain should be in the form of shorter replication backup on the subscriber when the publisher makes a big push. Another form of gain, regardless of a replication system or not, should be noticed with lower CPU usage. The main reason is the reduction of index searches associated with maintaining the index trees.
A few notes of caution:
· Be aware that the DMV's used in the script accumulates index usages since SQL last restarts. You should make sure SQL has been running for a while, long enough to capture all patterns of your loads.
· Some unused indexes can be unique constraints which enforce your business rules. Carefully examine them to decide if they can be drop without breaking any business rules.
· Lastly, thoroughly try this out on you test or PPE environment before proceeding to production.
Leave a comment if you see any performance gain and by how much after removing these indexes.
Happy coding!
Mike Chang (Chien-Shen)
SQL SDE, MSDN