When should I INCLUDE certain columns on an INDEX CREATION

Question

When should I INCLUDE certain columns on an INDEX CREATION

Bobby P 231

I always get confused when trying to determine if I should INCLUDE columns or not when creating a SQL Server Index.

Is there any easy explanation without getting into Index leaf page logic and the like.

Any easy rule to remember as to when I should or should not use INCLUE columns on my INDEX creation

Thanks in advance for your review and am hopeful for an answer.

Deleted

This comment has been deleted due to a violation of our Code of Conduct. The comment was manually reported or identified through automated detection before action was taken. Please refer to our Code of Conduct for more information.
Deleted

This comment has been deleted due to a violation of our Code of Conduct. The comment was manually reported or identified through automated detection before action was taken. Please refer to our Code of Conduct for more information.
LiHongMSFT-4306 31,566 Reputation points

2023-03-10T08:38:34.9233333+00:00

Hi @Bobby P

We haven't heard any response from you yet.

Do you have further question on this issue, could we offer more support?

If any answer helps on your issue, you could mark it as answer so other user with similar problem could see this easier. :)

3 answers

Your answer

Deleted

This comment has been deleted due to a violation of our Code of Conduct. The comment was manually reported or identified through automated detection before action was taken. Please refer to our Code of Conduct for more information.
LiHongMSFT-4306 31,566 Reputation points

2023-03-10T08:38:34.9233333+00:00

Hi @Bobby P

We haven't heard any response from you yet.

Do you have further question on this issue, could we offer more support?

If any answer helps on your issue, you could mark it as answer so other user with similar problem could see this easier. :)

Answer 1

I'm afraid there is not any simple answer beyond "It depends". Things like indexing and performance is a lot about tradeoff, and in the end a good gut feeling is needed.

That said, I will go on with some examples. Consider this query:

SELECT a, b, SUM(c)
FROM  tbl
WHERE d = @value
AND e = 1          -- e Can only be 0 or 1 with a 50/50 distribution
GROUP BY a, b

We can construct many indexes for this query. The "ideal" index would be

CREATE INDEX ix ON tbl(d, e, a, b) INCLUDE(c)

d should be the leading column, because it is the most selective. a and b should be at the end, because with d and e on top, SQL Server can locate all matching rows, and then scan this in order to perform the GROUP BY.

However, there may be reasons I don't want to create the index this way. As I indicated, e is not very selective. Maybe I have a twin query which is the same, except that the condition on e is not there, and I don't want to create two indexes that are similar to each other. After all, there is a cost to maintain indexes.

If I take out e entirely from the index, this means that the query above has to go to the data page to find the value of e. But in that case, there is no much point in having c included.

So this leaves is with any of:

CREATE INDEX ix1 ON tbl(d, a, b) INCLUDE(c, e)
CREATE INDEX ix2 ON tbl(c, a, d)
CREATE INDEX ix2 ON tbl(c, a, d) INCLUDE(e)

Now you may ask where did the last one come from? So this is my personal take on included columns. Many advocate adding columns in the SELECT list as included columns to make the index covering and the query to perform better. However, my experience is that the index will only be covering for a little while. Say that we decide to use ix1. But then two months later, there is a new requirement that we should also return AVG(f) from the query. ix1 will no longer be covering, unless we also add f as an included column. But by then, we have already forgotten about our index-tuning operation, and we don't add f. And, who knows, maybe we don't notice that the query is slower.

So this is why I am less fond of putting columns from the SELECT list into the index only to make it covering. On the other hand, I'm more keen on adding a column in the WHERE-JOIN logic as an included column. There are two reasons for this. One is that it will reduced the number of key lookups more than if you add columns in the SELECT list, since without the included column you need to make key lookups also for non-matching rows (exactly to find that they don't match). The other reason is that these conditions tend to me more stable, so the odds are better that this part of the query remains covered even if the query changes.

I realise that this was not the type of answer you were looking for. But I still hope that this answer has given you some sort of insight.

Answer 2

Jingyang Li 5,896 Volunteer Moderator

Search covering index and read through, you may understand a little more. Create and optimize indexes is a kill that (indice) will need you to invest time and experience to work with. It is an on-going work for applications with many changing factors. It is not the thing done once and you are good forever. Good to think about how to work with it. You can improve the skill with your research and reads.

Answer 3

Hi @Bobby P

Whether we would see a performance difference when a query used the index with all columns in the key, versus the index with most of the columns included in the leaf level.

As answered above: It ultimately depends on the query.

What you need to understand is that there are some great benefits to including columns in an index, but they will not always perform the same as indexes that have all columns in the key. It may be tempting to move columns that are not part of predicates and joins out of the key, and just include them, to reduce the overall size of the index. However, in some cases this requires more resources for query execution and may degrade performance. The degradation may be insignificant or it may not be…you will not know until you test. Therefore, when designing an index, it’s important to think about the columns after the leading one – and understand whether they need to be part of the key (e.g. because keeping the data ordered will provide benefit) or if they can serve their purpose as included columns.

As is typical with indexing in SQL Server, you have to test your queries with your indexes to determine the best strategy.

Refer to this article for more details: Should new index columns be in the key, or included?

Also, you might find something useful in this similar thread: Why use the INCLUDE clause when creating an index?

Best regards,

Cosmog Hong

If the answer is the right solution, please click "Accept Answer" and kindly upvote it. If you have extra questions about this answer, please click "Comment".

Note: Please follow the steps in our Documentation to enable e-mail notifications if you want to receive the related email notification for this thread.

Share via

When should I INCLUDE certain columns on an INDEX CREATION

3 answers

Your answer