Tracking Down Missing Indexes in SQL Azure
One of the challenges of SQL Azure is that not all of the TSQL that you are used to using is supported yet. Since the underlying engine is plain ole’ SQL Server, the engine can understand the TSQL, but we just block its use because we haven’t yet made it work in the multi-tenant, multi-server environment that is SQL Azure.
One of the classic missing index scripts can be seen in Bart Duncan’s classic post. For simplicity, I have reposted the TSQL below:
1: SELECT
2: migs.avg_total_user_cost * (migs.avg_user_impact / 100.0) * (migs.user_seeks + migs.user_scans) AS improvement_measure,
3: 'CREATE INDEX [missing_index_' + CONVERT (varchar, mig.index_group_handle) + '_' + CONVERT (varchar, mid.index_handle)
4: + '_' + LEFT (PARSENAME(mid.statement, 1), 32) + ']'
5: + ' ON ' + mid.statement
6: + ' (' + ISNULL (mid.equality_columns,'')
7: + CASE WHEN mid.equality_columns IS NOT NULL AND mid.inequality_columns IS NOT NULL THEN ',' ELSE '' END
8: + ISNULL (mid.inequality_columns, '')
9: + ')'
10: + ISNULL (' INCLUDE (' + mid.included_columns + ')', '') AS create_index_statement,
11: migs.*, mid.database_id, mid.[object_id]
12: FROM sys.dm_db_missing_index_groups mig
13: INNER JOIN sys.dm_db_missing_index_group_stats migs ON migs.group_handle = mig.index_group_handle
14: INNER JOIN sys.dm_db_missing_index_details mid ON mig.index_handle = mid.index_handle
15: WHERE migs.avg_total_user_cost * (migs.avg_user_impact / 100.0) * (migs.user_seeks + migs.user_scans) > 10
16: ORDER BY migs.avg_total_user_cost * migs.avg_user_impact * (migs.user_seeks + migs.user_scans) DESC
Unfortunately, if you try to use this TSQL, you immediately run into the problem that none of the DMVs are supported in SQL Azure. So much for the easy way…
Since the DMVs are just ongoing collections of information that you can collect manually from dm_exec_query_stats, I decided to try to build this up manually. This led me to generate the following query:
1: SELECT top (50) cast(replace(cast(qp.query_plan as nvarchar(max)),'xmlns="https://schemas.microsoft.com/sqlserver/2004/07/showplan"','') as xml),
2: qp.query_plan.value('declare default element namespace "https://schemas.microsoft.com/sqlserver/2004/07/showplan"; (/ShowPlanXML/BatchSequence/Batch/Statements/StmtSimple/QueryPlan/MissingIndexes/MissingIndexGroup/@Impact)[1]' , 'decimal(18,4)') * execution_count AS TotalImpact
3: FROM sys.dm_exec_query_stats qs cross apply sys.dm_exec_sql_text(sql_handle) st cross apply sys.dm_exec_query_plan(plan_handle) qp
4: WHERE qp.query_plan.exist('declare default element namespace "https://schemas.microsoft.com/sqlserver/2004/07/showplan";/ShowPlanXML/BatchSequence/Batch/Statements/StmtSimple/QueryPlan/MissingIndexes/MissingIndexGroup/MissingIndex[@Database!="m"]') = 1
5: ORDER BY TotalImpact DESC
This generates a list of ShowPlanXMLs ordered by “execution count * missing index group impact”. Now that we have a list of ShowPlans ordered by overall impact, we need to parse the ShowPlanXML to pull out the missing indexes. For those unfamiliar with the missing index information in ShowPlanXML data, here is an example:
<MissingIndexes><MissingIndexGroup Impact="98.6314"><MissingIndex Database="[BugCheck]" Schema="[dbo]" Table="[Watchlists]"><ColumnGroup Usage="EQUALITY"><Column Name="[ID]" ColumnId="1" /></ColumnGroup></MissingIndex></MissingIndexGroup></MissingIndexes>
As you can see, it contains all the information necessary to define the indexes the engine thinks are missing.
Now, for each ShowPlanXML row, we need to use XQuery to shred the MissingIndexes information into its key information. In a classic case of copying good work already done instead of spending time doing it myself, I found that the Performance Dashboard Reports already do this shredding in one of their reports, so I could copy it:
1: SELECT cast(index_node.query('concat(string((./@Database)[1]),".",string((./@Schema)[1]),".",string((./@Table)[1]))') as nvarchar(100)) as target_object_name
2: ,replace(convert(nvarchar(max), index_node.query('for $colgroup in ./ColumnGroup,$col in $colgroup/Column where $colgroup/@Usage = "EQUALITY" return string($col/@Name)')), '] [', '],[') as equality_columns
3: ,replace(convert(nvarchar(max), index_node.query('for $colgroup in ./ColumnGroup,$col in $colgroup/Column where $colgroup/@Usage = "INEQUALITY" return string($col/@Name)')), '] [', '],[') as inequality_columns
4: ,replace(convert(nvarchar(max), index_node.query('for $colgroup in .//ColumnGroup,$col in $colgroup/Column where $colgroup/@Usage = "INCLUDE" return string($col/@Name)')), '] [', '],[') as included_columns
5: from (select convert(xml, @query_plan) as xml_showplan) as t outer apply xml_showplan.nodes ('//MissingIndexes/MissingIndexGroup/MissingIndex') as missing_indexes(index_node)
By combining the above two queries with a cursor, I can stick each shredded missing index into a temporary table. Then, I can use the equality, inequality, and included columns from the temporary table to generate CREATE INDEX statements as follows:
1: select distinct 'Create NonClustered Index IX_' + substring(replace(replace(target_object_name,'[',''),']',''), 0, charindex('.',replace(replace(target_object_name,'[',''),']',''))) +' On ' + target_object_name +
2: ' (' + IsNull(equality_columns,'') +
3: CASE WHEN equality_columns IS Null And inequality_columns IS Null THEN ',' ELSE '' END + IsNull(inequality_columns, '') + ')' +
4: CASE WHEN included_columns='' THEN
5: ';'
6: ELSE
7: ' Include (' + included_columns + ');'
8: END
9: from #results
DISCLAIMER: As with all automated INDEX suggestion scripts, you need take a look at the CREATE INDEX statements suggested and decide if they make sense for you before you run out and apply them to your production instance!!
One important thing to point out is that even though I was designing this script for SQL Azure, it works just fine against an on-premise instance of SQL Server. Since SQL Azure supports a subset of the overall SQL Server functionality, you will almost always find that a solution for SQL Azure works just fine against SQL Server. Lastly, this functionality has been added to the CSS SQL Azure Diagnostics Tool (CSAD) so that you don’t have to worry about running this manually if you don’t want to.
For completeness, here is the TSQL statement in its entirety:
create table #results (target_object_name nvarchar(100), equality_columns nvarchar(100), inequality_columns nvarchar(100), included_columns nvarchar(100))
declare @query_plan as xml
declare @totalimpact as float
declare querycursor CURSOR FAST_FORWARD FOR
SELECT top (50) cast(replace(cast(qp.query_plan as nvarchar(max)),'xmlns="https://schemas.microsoft.com/sqlserver/2004/07/showplan"','') as xml),
qp.query_plan.value('declare default element namespace "https://schemas.microsoft.com/sqlserver/2004/07/showplan"; (/ShowPlanXML/BatchSequence/Batch/Statements/StmtSimple/QueryPlan/MissingIndexes/MissingIndexGroup/@Impact)[1]' , 'decimal(18,4)') * execution_count AS TotalImpact
FROM sys.dm_exec_query_stats qs cross apply sys.dm_exec_sql_text(sql_handle) st cross apply sys.dm_exec_query_plan(plan_handle) qp
WHERE qp.query_plan.exist('declare default element namespace "https://schemas.microsoft.com/sqlserver/2004/07/showplan";/ShowPlanXML/BatchSequence/Batch/Statements/StmtSimple/QueryPlan/MissingIndexes/MissingIndexGroup/MissingIndex[@Database!="m"]') = 1
ORDER BY TotalImpact DESC
OPEN querycursor
FETCH NEXT FROM querycursor
INTO @query_plan, @totalimpact --need to remove the namespace
WHILE @@FETCH_STATUS=0
BEGIN
insert into #results (target_object_name, equality_columns, inequality_columns, included_columns)
SELECT cast(index_node.query('concat(string((./@Database)[1]),".",string((./@Schema)[1]),".",string((./@Table)[1]))') as nvarchar(100)) as target_object_name
,replace(convert(nvarchar(max), index_node.query('for $colgroup in ./ColumnGroup,$col in $colgroup/Column where $colgroup/@Usage = "EQUALITY" return string($col/@Name)')), '] [', '],[') as equality_columns
,replace(convert(nvarchar(max), index_node.query('for $colgroup in ./ColumnGroup,$col in $colgroup/Column where $colgroup/@Usage = "INEQUALITY" return string($col/@Name)')), '] [', '],[') as inequality_columns
,replace(convert(nvarchar(max), index_node.query('for $colgroup in .//ColumnGroup,$col in $colgroup/Column where $colgroup/@Usage = "INCLUDE" return string($col/@Name)')), '] [', '],[') as included_columns
from (select convert(xml, @query_plan) as xml_showplan) as t outer apply xml_showplan.nodes ('//MissingIndexes/MissingIndexGroup/MissingIndex') as missing_indexes(index_node)
FETCH NEXT FROM querycursor
INTO @query_plan, @totalimpact
END
CLOSE querycursor
DEALLOCATE querycursor
select distinct 'Create NonClustered Index IX_' + substring(replace(replace(target_object_name,'[',''),']',''), 0, charindex('.',replace(replace(target_object_name,'[',''),']',''))) +' On ' + target_object_name +
' (' + IsNull(equality_columns,'') +
CASE WHEN equality_columns IS Null And inequality_columns IS Null THEN ',' ELSE '' END + IsNull(inequality_columns, '') + ')' +
CASE WHEN included_columns='' THEN
';'
ELSE
' Include (' + included_columns + ');'
END
from #results
drop table #results
Comments
Anonymous
June 20, 2011
I was always wondering the low quality of stored procs MSSQL ships with. Now I know it's the copy/paste nature of building these up that is to blame. Why isdeclare
lower-case butWHILE
upper-case? How comefrom
is lower and upper-case in different parts of the script? Anyway...Anonymous
June 21, 2011
wqweto - I am not sure I follow your comment. The lack of consistency in casing above is totally driven by the fact that I pulled this from multiple sources, plus handwrote parts of it. Since the casing doesn't matter for anything by style point, I wasn't too focused on it. EvanAnonymous
June 28, 2011
Evan, I have been looking for something like this for awhile! I am getting a number of "String or binary data would be truncated. The statement has been terminated." messages when I run the query though. Some results are still returned though. Thanks so much for writing this! Brian P. HamachekAnonymous
June 28, 2011
The comment has been removedAnonymous
June 30, 2011
Thanks for the feedback, Brian. I've incorporated your suggestion and the increased column sizes will be in the next release.Anonymous
July 07, 2011
I am not sure I follow your comment -- 'as you dont understand plain english'. The lack of consistency in casing above is totally driven by the fact that I pulled this from multiple sources -- ' Yes thats COPY' plus handwrote parts of it -- 'And thats PASTE' Since the casing doesn't matter for anything by style point -- 'because you are so useless' I wasn't too focused on it -- 'Yes we know you are Dyslexic' thanks -BrianAnonymous
January 20, 2013
Thank you very much for this! Good stuff! wqweto: Please stop with the negative stuff.