Index on query with multiple joins

Question

Index on query with multiple joins

Yassir 201

Hello,

I know that it is good to create index for column on where and join conditions. Secondarily on any group by. Last is Order by.

I'm trying the know what's the best way to create an index for a table that used multiple time with different column of join

For example on the query as bellow the table1 use those columns on joins : t1_Reference ,Date_Key ,t1_Key,PP_key and each column with different table

SELECT
	 t1.t1_Key
	,t3.t3_Type
	,t1.Client_key
	,t2.ClientType
	,t4.AG_Key
	,FORMAT(t1.Date, 'yyyy-MM') AS 'Mois'
	,abs(DATEDIFF(DAY, t1.dts_Date,  t1.dtt_Date)) as TimeDay
	,count(1) as nb
INTO #temp_table
FROM #table1 t1
INNER JOIN #table2 t2 ON t2.t2_Reference = t1.t1_Reference AND t2.Date_Key = t1.Date_Key
INNER JOIN #table3 t3 ON t1.t1_Key = t3.t1_Key 
INNER JOIN table4  t4 ON t1.PP_key = t4.PP_key
GROUP BY
	 t1.t1_Key
	,t3.t3_Type
	,t1.Client_key
	,t2.ClientType
	, t4.AG_Key
	,FORMAT(t1.Date, 'yyyy-MM')
	,abs(DATEDIFF(DAY, t1.dts_Date,  t1.dtt_Date))

The size of the tables are :

#table1 (temp table) : 64 450 048 Rows

#table2 (temp table) : 2 227 414 Rows

#table3 (temp table) : 6 Rows

table4 (physical table) : 579 628 194 Rows

FYI : The datatype of all columns in join are Int except t2_Reference and t1_Reference are varchar

I would like to know what is the best way to create the index and Why ?

Thanks for help !

LiHongMSFT-4306 31,566 Reputation points

2023-03-10T08:27:26.7966667+00:00

Hi @Yassir

Is this issue solved now? If you have any questions, please feel free to share with us.

And don't forget to accept the answers if they helped. Your action would be helpful to other users who encounter the same issue and read this thread.

Thanks for your understanding!

2 answers

Your answer

LiHongMSFT-4306 31,566 Reputation points

2023-03-10T08:27:26.7966667+00:00

Hi @Yassir

Is this issue solved now? If you have any questions, please feel free to share with us.

And don't forget to accept the answers if they helped. Your action would be helpful to other users who encounter the same issue and read this thread.

Thanks for your understanding!

Answer 1

Erland Sommarskog 121.4K MVP Volunteer Moderator

The question cannot be answered with the information given.

Particularly, let's study this join condition:

INNER JOIN #table3 t3 ON t1.t1_Key = t3.t1_Key

You say that there are only six values in #table3. But how many rows in #table1 do they address? In one extreme they all match at most one row each, and thus it is highly selectable. In this case, having an index on t1_key is definitely a winner.

In the other end, there are only six distinct values #table1.t1_key, and they all match #table3 and the join is not selective at all. If this is the case, there is no point at all in indexing #table1.t1_key.

The other two join conditions do not seem that they would be selctive anyway. I would recommend that you identify the primary keys for the tables and create the PK, because that tends to help the optimizer. But if there is what it seems, that is, my assumption is that #table3 is not a lookup, the optimizer will have much of a choice but to scan all tables and settle for a number of hash joins.

Yassir 201 Reputation points

2023-02-26T00:38:50.4833333+00:00

Hello @Erland Sommarskog ,
Thanks for your reply.
To answer your questions : the #table1 is a fact table that contains more than 64 450 048 Rows and #table3 is just a dimension table . Every row in #table1 have one value in #table3.
Erland Sommarskog 121.4K Reputation points MVP Volunteer Moderator

2023-02-26T07:49:13.93+00:00

Thanks for the clarification. I don't have anything to add to what I said in my original answer.

Answer 2

Hi @Yassir

Recommend a useful tool: Database Engine Tuning Advisor (DTA). Select Tools –> Database Engine Tuning Advisor in SSMS to open the tool.

First find out which statements you want to optimize, save them as a .sql file as input, and also choose the default database (otherwise DTA will not find the table) and the database you want to optimize.

In "Tuning Options", there are also some options that can be selected according to your actual needs.

Note:

(1) It is best not to run DTA directly on a production database.

During the running process, DTA will create some temporary objects on the database to test whether its suggestions are valid, so it will not only burden system resources, but also modify the database. Near the end of the analysis, DTA will delete these temporary objects, but if the DTA does not end normally, or the user forcibly terminates the operation of DTA, these temporary objects will be left in the database, which may affect the normal operation of the database in the future.

(2) The recommendations given by DTA must be confirmed before they can be implemented on the database.

Like the Missing Index, DTA's advice is one-sided. Before implementation, it is necessary to do an overall assessment and cannot be hastily implemented. Generally speaking, the quality of DTA advice is still relatively high. If a statement DTA cannot give suggestions, the probability of finding a good index through manual tuning is not very large. In this case, the best way out is to understand the logic of the statement and rewrite it into a simpler and more reasonable form, that is, to optimize performance by modifying the statement design.

Best regards,

Cosmog Hong

If the answer is the right solution, please click "Accept Answer" and kindly upvote it. If you have extra questions about this answer, please click "Comment".

Note: Please follow the steps in our Documentation to enable e-mail notifications if you want to receive the related email notification for this thread.

Share via

Index on query with multiple joins

2 answers

Your answer