Select query taking longer time & runs for hours to retrive 99 rows of data from an specific date but the same query completes in 2 min on an development

Question

I have an situation where my app team is informing that one of the Select query that contains left join & traverses across 2 databases and pulls data is taking longer time & runs for hours with no output & appears the same completed in less than 3 mins up until last week.

App team says the same query is still running on development and completes in 2 mins & i validated it by running my self & it does , the data on my PROD and DEV are of same size and so is the structure of the table /index . It retrieves 99 rows of data in dev & is expected to do same on PROD.

Index and update stats are all up to date on PROD & we use Ola index maint with stats , DEV & PROD system are the same resource level and edition & version is SQL2016 SP2 CU2 ENT with win 2012 R2 STD ( we are still working to get the CU updated to N-1 to CU14 )

there is an tradeoff where the PROD is being accessed more than DEV however it still doesnt give me an concrete measure to putforth since query in PROD runs for hours with no data

MAXDOP at 8 & cost threshold at 50 on both DEV & PROD is maintained

when i left the query run on PROD i had the whoisactive monitored & the session was always suspended with CXCONSUMER ( i got an conformation from my server admin & they told no host level changes on VM or server were done) . Please advise as the problem appears to happen only from last week & the stats & index are up to date until today.

Attached 2 image & during the run time there are an lot of other sessions also running parallel for ETL loads .

any assistance is welcome

Accepted Answer

To be honest, I don't believe the executions plans to be the same -. and if they are, the data is largely different.

Anyway, I would suggest that you look at the plans with Live Query Plan so you can see how the query progresses on the slow server. Be sure that you use SSMS 18, which display things like "1 of 100" below the operators, which means 1 actual out of 100 estimated. When the percentages starts to exceed 100 % it starts to be interesting.

Answer

The first ting I would check is if any of the table do not have a clustered index. SQL Server is bad at re-using space when you delete data for heap tables (tables without clustered indexes), so they can grow to ridiculous size if you delete data from the table.

If above isn't the case, then it is time to compare execution plans between the two tand take it from there. CXCONSUMER only tell us that we have parallelism, which in itself often is a good thing. But it also can be the optimizers way to "throw hardware at the problem".

Answer

When the SELECT query was running on the production server, did you use sp_who2 to check if it was blocked by the other processes? Maybe that would help.

Answer

Hi

Create Database Diagrams
Create Views and select tables and fields.
run View .

Best Regards.
Please remember to mark the replies as answers if they help.

Share via

Select query taking longer time & runs for hours to retrive 99 rows of data from an specific date but the same query completes in 2 min on an development

3 additional answers