Indexing when is it bad?

Senn 41 Reputation points
2021-06-27T08:55:04.773+00:00

Hi everyone,

Why is this bad? The answer is also given but I really don't get it well. Can you simply explain with an example what is meant? I thought yes because you have to care about all query which are greater than 20. So the first 1-20 are not relevant for the user. But it seems another reasoning. Is it because its an heap = unclustered that its not worth to have an index because its unordered. So it doesn't make sense to have a range for? If I have heap then its better to have equal search then its way faster?

Consider a relation stored as a randomly ordered file and a single unclustered index on the field sal of the relation. If you want to retrieve all records with sal > 20, is using the index always the best alternative? Explain.

No. In this case, the index is unclustered, each qualifying data entry could contain an rid that points to a distinct data page, leading to as many data page I/Os as the number of data entries that match the range query. In this situation, using index is actually worse than file scan.

Azure SQL Database
SQL Server
SQL Server
A family of Microsoft relational database management and analysis systems for e-commerce, line-of-business, and data warehousing solutions.
13,361 questions
{count} votes

Accepted answer
  1. Erland Sommarskog 107.2K Reputation points
    2021-06-27T10:03:16.97+00:00

    It seems that you are taking a course or a class on relational databases in general. The forum you have landed in specifically targeted for SQL Server, and what is right for SQL Server may be not be correct for another product.

    For the given question, the answer is indeed "no", but then answer seems to forget why. "No", is correct because the question has the word "always". Without that "always", the answer would be "it depends". If, say, 1% of the rows in the relation has sal > 20, using the index is the best alternative. If 50% of the rows fulfil this condition, this is a lousy alternative, exactly for the reasons given in the answer. You will have to visit the same page many times, whereas when doing a scan, you would only visit it once. And, by the way, heap or clustered index, is not very important here. The important thing is that the index on sal is nonclustered.

    What is also important is whether we actually need to visit the data pages. Say that we are only looking for a count. In that case, there is no need to go to the data page, and the index will work fine.

    Now, keep in mind that these are not decisions that you as an SQL programmer have to make. These are decisions the optimizer makes when determining the most efficient way to compute the query. And as long as the optimizer does its job, you don't have to bother. This does certainly not always happen, and you may have to help it, by adding indexes or other means, and in this case it is good to has some understanding of when an index may help or not. But I am not sure that should be taught in a class where you also learn to write queries. Performance tuning is something you very much learn through practical experience.


0 additional answers

Sort by: Most helpful