Condividi tramite


Tracking Down Missing Indexes in SQL Azure

One of the challenges of SQL Azure is that not all of the TSQL that you are used to using is supported yet. Since the underlying engine is plain ole’ SQL Server, the engine can understand the TSQL, but we just block its use because we haven’t yet made it work in the multi-tenant, multi-server environment that is SQL Azure. 

One of the classic missing index scripts can be seen in Bart Duncan’s classic post. For simplicity, I have reposted the TSQL below:

    1:  SELECT 
    2:    migs.avg_total_user_cost * (migs.avg_user_impact / 100.0) * (migs.user_seeks + migs.user_scans) AS improvement_measure, 
    3:    'CREATE INDEX [missing_index_' + CONVERT (varchar, mig.index_group_handle) + '_' + CONVERT (varchar, mid.index_handle) 
    4:    + '_' + LEFT (PARSENAME(mid.statement, 1), 32) + ']'
    5:    + ' ON ' + mid.statement 
    6:    + ' (' + ISNULL (mid.equality_columns,'') 
    7:      + CASE WHEN mid.equality_columns IS NOT NULL AND mid.inequality_columns IS NOT NULL THEN ',' ELSE '' END 
    8:      + ISNULL (mid.inequality_columns, '')
    9:    + ')' 
   10:    + ISNULL (' INCLUDE (' + mid.included_columns + ')', '') AS create_index_statement, 
   11:    migs.*, mid.database_id, mid.[object_id]
   12:  FROM sys.dm_db_missing_index_groups mig
   13:  INNER JOIN sys.dm_db_missing_index_group_stats migs ON migs.group_handle = mig.index_group_handle
   14:  INNER JOIN sys.dm_db_missing_index_details mid ON mig.index_handle = mid.index_handle
   15:  WHERE migs.avg_total_user_cost * (migs.avg_user_impact / 100.0) * (migs.user_seeks + migs.user_scans) > 10
   16:  ORDER BY migs.avg_total_user_cost * migs.avg_user_impact * (migs.user_seeks + migs.user_scans) DESC

Unfortunately, if you try to use this TSQL, you immediately run into the problem that none of the DMVs are supported in SQL Azure. So much for the easy way…

Since the DMVs are just ongoing collections of information that you can collect manually from dm_exec_query_stats, I decided to try to build this up manually. This led me to generate the following query:

 

    1:  SELECT top (50) cast(replace(cast(qp.query_plan as nvarchar(max)),'xmlns="https://schemas.microsoft.com/sqlserver/2004/07/showplan"','') as xml),
    2:  qp.query_plan.value('declare default element namespace "https://schemas.microsoft.com/sqlserver/2004/07/showplan"; (/ShowPlanXML/BatchSequence/Batch/Statements/StmtSimple/QueryPlan/MissingIndexes/MissingIndexGroup/@Impact)[1]' , 'decimal(18,4)') * execution_count AS TotalImpact
    3:  FROM sys.dm_exec_query_stats qs cross apply sys.dm_exec_sql_text(sql_handle) st cross apply sys.dm_exec_query_plan(plan_handle) qp 
    4:  WHERE qp.query_plan.exist('declare default element namespace "https://schemas.microsoft.com/sqlserver/2004/07/showplan";/ShowPlanXML/BatchSequence/Batch/Statements/StmtSimple/QueryPlan/MissingIndexes/MissingIndexGroup/MissingIndex[@Database!="m"]') = 1
    5:  ORDER BY TotalImpact DESC

This generates a list of ShowPlanXMLs ordered by “execution count * missing index group impact”. Now that we have a list of ShowPlans ordered by overall impact, we need to parse the ShowPlanXML to pull out the missing indexes. For those unfamiliar with the missing index information in ShowPlanXML data, here is an example:

<MissingIndexes><MissingIndexGroup Impact="98.6314"><MissingIndex Database="[BugCheck]" Schema="[dbo]" Table="[Watchlists]"><ColumnGroup Usage="EQUALITY"><Column Name="[ID]" ColumnId="1" /></ColumnGroup></MissingIndex></MissingIndexGroup></MissingIndexes>

As you can see, it contains all the information necessary to define the indexes the engine thinks are missing.

Now, for each ShowPlanXML row, we need to use XQuery to shred the MissingIndexes information into its key information. In a classic case of copying good work already done instead of spending time doing it myself, I found that the Performance Dashboard Reports already do this shredding in one of their reports, so I could copy it:

    1:      SELECT cast(index_node.query('concat(string((./@Database)[1]),".",string((./@Schema)[1]),".",string((./@Table)[1]))') as nvarchar(100)) as target_object_name
    2:      ,replace(convert(nvarchar(max), index_node.query('for $colgroup in ./ColumnGroup,$col in $colgroup/Column where $colgroup/@Usage = "EQUALITY" return string($col/@Name)')), '] [', '],[') as equality_columns
    3:      ,replace(convert(nvarchar(max), index_node.query('for $colgroup in ./ColumnGroup,$col in $colgroup/Column where $colgroup/@Usage = "INEQUALITY" return string($col/@Name)')), '] [', '],[') as inequality_columns
    4:      ,replace(convert(nvarchar(max), index_node.query('for $colgroup in .//ColumnGroup,$col in $colgroup/Column where $colgroup/@Usage = "INCLUDE"    return string($col/@Name)')), '] [', '],[') as included_columns 
    5:      from (select convert(xml, @query_plan) as xml_showplan) as t outer apply xml_showplan.nodes ('//MissingIndexes/MissingIndexGroup/MissingIndex') as missing_indexes(index_node)

By combining the above two queries with a cursor, I can stick each shredded missing index into a temporary table. Then, I can use the equality, inequality, and included columns from the temporary table to generate CREATE INDEX statements as follows:

    1:  select distinct 'Create NonClustered Index IX_' + substring(replace(replace(target_object_name,'[',''),']',''), 0, charindex('.',replace(replace(target_object_name,'[',''),']',''))) +' On ' + target_object_name + 
    2:  ' (' + IsNull(equality_columns,'') + 
    3:  CASE WHEN equality_columns IS Null And inequality_columns IS Null THEN ',' ELSE '' END + IsNull(inequality_columns, '') + ')' + 
    4:  CASE WHEN included_columns='' THEN
    5:  ';'
    6:  ELSE
    7:  ' Include (' + included_columns + ');'
    8:  END
    9:  from #results

DISCLAIMER: As with all automated INDEX suggestion scripts, you need take a look at the CREATE INDEX statements suggested and decide if they make sense for you before you run out and apply them to your production instance!!

One important thing to point out is that even though I was designing this script for SQL Azure, it works just fine against an on-premise instance of SQL Server. Since SQL Azure supports a subset of the overall SQL Server functionality, you will almost always find that a solution for SQL Azure works just fine against SQL Server. Lastly, this functionality has been added to the CSS SQL Azure Diagnostics Tool (CSAD) so that you don’t have to worry about running this manually if you don’t want to.

For completeness, here is the TSQL statement in its entirety:

 create table #results (target_object_name nvarchar(100), equality_columns nvarchar(100), inequality_columns nvarchar(100), included_columns nvarchar(100))
  
 declare @query_plan as xml
 declare @totalimpact as float
  
 declare querycursor CURSOR FAST_FORWARD FOR
 SELECT top (50) cast(replace(cast(qp.query_plan as nvarchar(max)),'xmlns="https://schemas.microsoft.com/sqlserver/2004/07/showplan"','') as xml),
 qp.query_plan.value('declare default element namespace "https://schemas.microsoft.com/sqlserver/2004/07/showplan"; (/ShowPlanXML/BatchSequence/Batch/Statements/StmtSimple/QueryPlan/MissingIndexes/MissingIndexGroup/@Impact)[1]' , 'decimal(18,4)') * execution_count AS TotalImpact
 FROM sys.dm_exec_query_stats qs cross apply sys.dm_exec_sql_text(sql_handle) st cross apply sys.dm_exec_query_plan(plan_handle) qp 
 WHERE qp.query_plan.exist('declare default element namespace "https://schemas.microsoft.com/sqlserver/2004/07/showplan";/ShowPlanXML/BatchSequence/Batch/Statements/StmtSimple/QueryPlan/MissingIndexes/MissingIndexGroup/MissingIndex[@Database!="m"]') = 1
 ORDER BY TotalImpact DESC
  
 OPEN querycursor
 FETCH NEXT FROM querycursor
 INTO @query_plan, @totalimpact  --need to remove the namespace
  
 WHILE @@FETCH_STATUS=0
 BEGIN
  
     insert into #results (target_object_name, equality_columns, inequality_columns, included_columns)
     SELECT cast(index_node.query('concat(string((./@Database)[1]),".",string((./@Schema)[1]),".",string((./@Table)[1]))') as nvarchar(100)) as target_object_name
     ,replace(convert(nvarchar(max), index_node.query('for $colgroup in ./ColumnGroup,$col in $colgroup/Column where $colgroup/@Usage = "EQUALITY" return string($col/@Name)')), '] [', '],[') as equality_columns
     ,replace(convert(nvarchar(max), index_node.query('for $colgroup in ./ColumnGroup,$col in $colgroup/Column where $colgroup/@Usage = "INEQUALITY" return string($col/@Name)')), '] [', '],[') as inequality_columns
     ,replace(convert(nvarchar(max), index_node.query('for $colgroup in .//ColumnGroup,$col in $colgroup/Column where $colgroup/@Usage = "INCLUDE"    return string($col/@Name)')), '] [', '],[') as included_columns 
     from (select convert(xml, @query_plan) as xml_showplan) as t outer apply xml_showplan.nodes ('//MissingIndexes/MissingIndexGroup/MissingIndex') as missing_indexes(index_node)
     
     FETCH NEXT FROM querycursor
     INTO @query_plan, @totalimpact
     
 END
  
 CLOSE querycursor
 DEALLOCATE querycursor
  
 select distinct 'Create NonClustered Index IX_' + substring(replace(replace(target_object_name,'[',''),']',''), 0, charindex('.',replace(replace(target_object_name,'[',''),']',''))) +' On ' + target_object_name + 
 ' (' + IsNull(equality_columns,'') + 
 CASE WHEN equality_columns IS Null And inequality_columns IS Null THEN ',' ELSE '' END + IsNull(inequality_columns, '') + ')' + 
 CASE WHEN included_columns='' THEN
 ';'
 ELSE
 ' Include (' + included_columns + ');'
 END
 from #results
  
 drop table #results

Comments

  • Anonymous
    June 20, 2011
    I was always wondering the low quality of stored procs MSSQL ships with. Now I know it's the copy/paste nature of building these up that is to blame. Why is declare lower-case but WHILE upper-case? How come from is lower and upper-case in different parts of the script? Anyway...

  • Anonymous
    June 21, 2011
    wqweto - I am not sure I follow your comment.  The lack of consistency in casing above is totally driven by the fact that I pulled this from multiple sources, plus handwrote parts of it.  Since the casing doesn't matter for anything by style point, I wasn't too focused on it. Evan

  • Anonymous
    June 28, 2011
    Evan, I have been looking for something like this for awhile!  I am getting a number of "String or binary data would be truncated.  The statement has been terminated." messages when I run the query though.  Some results are still returned though. Thanks so much for writing this! Brian P. Hamachek

  • Anonymous
    June 28, 2011
    The comment has been removed

  • Anonymous
    June 30, 2011
    Thanks for the feedback, Brian.  I've incorporated your suggestion and the increased column sizes will be in the next release.

  • Anonymous
    July 07, 2011
    I am not sure I follow your comment -- 'as you dont understand plain english'.   The lack of consistency in casing above is totally driven by the fact that I pulled this from multiple sources -- ' Yes thats COPY' plus handwrote parts of it -- 'And thats PASTE' Since the casing doesn't matter for anything by style point -- 'because you are so useless' I wasn't too focused on it -- 'Yes we know you are Dyslexic' thanks -Brian

  • Anonymous
    January 20, 2013
    Thank you very much for this! Good stuff! wqweto: Please stop with the negative stuff.