When should I INCLUDE certain columns on an INDEX CREATION

Bobby P 231 Reputation points
2023-02-10T19:50:11.5+00:00

I always get confused when trying to determine if I should INCLUDE columns or not when creating a SQL Server Index.

Is there any easy explanation without getting into Index leaf page logic and the like.

Any easy rule to remember as to when I should or should not use INCLUE columns on my INDEX creation

Thanks in advance for your review and am hopeful for an answer.

SQL Server Other
{count} votes

3 answers

Sort by: Most helpful
  1. Erland Sommarskog 121.4K Reputation points MVP Volunteer Moderator
    2023-02-10T22:44:04.42+00:00

    I'm afraid there is not any simple answer beyond "It depends". Things like indexing and performance is a lot about tradeoff, and in the end a good gut feeling is needed.

    That said, I will go on with some examples. Consider this query:

    SELECT a, b, SUM(c)
    FROM  tbl
    WHERE d = @value
    AND e = 1          -- e Can only be 0 or 1 with a 50/50 distribution
    GROUP BY a, b
    

    We can construct many indexes for this query. The "ideal" index would be

    CREATE INDEX ix ON tbl(d, e, a, b) INCLUDE(c)
    

    d should be the leading column, because it is the most selective. a and b should be at the end, because with d and e on top, SQL Server can locate all matching rows, and then scan this in order to perform the GROUP BY.

    However, there may be reasons I don't want to create the index this way. As I indicated, e is not very selective. Maybe I have a twin query which is the same, except that the condition on e is not there, and I don't want to create two indexes that are similar to each other. After all, there is a cost to maintain indexes.

    If I take out e entirely from the index, this means that the query above has to go to the data page to find the value of e. But in that case, there is no much point in having c included.

    So this leaves is with any of:

    CREATE INDEX ix1 ON tbl(d, a, b) INCLUDE(c, e)
    CREATE INDEX ix2 ON tbl(c, a, d)
    CREATE INDEX ix2 ON tbl(c, a, d) INCLUDE(e)
    

    Now you may ask where did the last one come from? So this is my personal take on included columns. Many advocate adding columns in the SELECT list as included columns to make the index covering and the query to perform better. However, my experience is that the index will only be covering for a little while. Say that we decide to use ix1. But then two months later, there is a new requirement that we should also return AVG(f) from the query. ix1 will no longer be covering, unless we also add f as an included column. But by then, we have already forgotten about our index-tuning operation, and we don't add f. And, who knows, maybe we don't notice that the query is slower.

    So this is why I am less fond of putting columns from the SELECT list into the index only to make it covering. On the other hand, I'm more keen on adding a column in the WHERE-JOIN logic as an included column. There are two reasons for this. One is that it will reduced the number of key lookups more than if you add columns in the SELECT list, since without the included column you need to make key lookups also for non-matching rows (exactly to find that they don't match). The other reason is that these conditions tend to me more stable, so the odds are better that this part of the query remains covered even if the query changes.

    I realise that this was not the type of answer you were looking for. But I still hope that this answer has given you some sort of insight.

    1 person found this answer helpful.
    0 comments No comments

  2. Jingyang Li 5,896 Reputation points Volunteer Moderator
    2023-02-10T21:30:10.7533333+00:00

    Search covering index and read through, you may understand a little more. Create and optimize indexes is a kill that (indice) will need you to invest time and experience to work with. It is an on-going work for applications with many changing factors. It is not the thing done once and you are good forever. Good to think about how to work with it. You can improve the skill with your research and reads.

    0 comments No comments

  3. LiHongMSFT-4306 31,566 Reputation points
    2023-02-13T02:48:47.2633333+00:00

    Hi @Bobby P

    Whether we would see a performance difference when a query used the index with all columns in the key, versus the index with most of the columns included in the leaf level.

    As answered above: It ultimately depends on the query.

    What you need to understand is that there are some great benefits to including columns in an index, but they will not always perform the same as indexes that have all columns in the key. It may be tempting to move columns that are not part of predicates and joins out of the key, and just include them, to reduce the overall size of the index. However, in some cases this requires more resources for query execution and may degrade performance. The degradation may be insignificant or it may not be…you will not know until you test. Therefore, when designing an index, it’s important to think about the columns after the leading one – and understand whether they need to be part of the key (e.g. because keeping the data ordered will provide benefit) or if they can serve their purpose as included columns.

    As is typical with indexing in SQL Server, you have to test your queries with your indexes to determine the best strategy.

    Refer to this article for more details: Should new index columns be in the key, or included?

    Also, you might find something useful in this similar thread: Why use the INCLUDE clause when creating an index?

    Best regards,

    Cosmog Hong


    If the answer is the right solution, please click "Accept Answer" and kindly upvote it. If you have extra questions about this answer, please click "Comment".

    Note: Please follow the steps in our Documentation to enable e-mail notifications if you want to receive the related email notification for this thread.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.