Select query taking longer time & runs for hours to retrive 99 rows of data from an specific date but the same query completes in 2 min on an development

Eben Earnest 41 Reputation points
2021-01-14T16:09:09.207+00:00

I have an situation where my app team is informing that one of the Select query that contains left join & traverses across 2 databases and pulls data is taking longer time & runs for hours with no output & appears the same completed in less than 3 mins up until last week.

App team says the same query is still running on development and completes in 2 mins & i validated it by running my self & it does , the data on my PROD and DEV are of same size and so is the structure of the table /index . It retrieves 99 rows of data in dev & is expected to do same on PROD.

Index and update stats are all up to date on PROD & we use Ola index maint with stats , DEV & PROD system are the same resource level and edition & version is SQL2016 SP2 CU2 ENT with win 2012 R2 STD ( we are still working to get the CU updated to N-1 to CU14 )

there is an tradeoff where the PROD is being accessed more than DEV however it still doesnt give me an concrete measure to putforth since query in PROD runs for hours with no data

MAXDOP at 8 & cost threshold at 50 on both DEV & PROD is maintained

when i left the query run on PROD i had the whoisactive monitored & the session was always suspended with CXCONSUMER ( i got an conformation from my server admin & they told no host level changes on VM or server were done) . Please advise as the problem appears to happen only from last week & the stats & index are up to date until today.

Attached 2 image & during the run time there are an lot of other sessions also running parallel for ETL loads .

any assistance is welcome
56741-image.png
56702-image.png

SQL Server
SQL Server
A family of Microsoft relational database management and analysis systems for e-commerce, line-of-business, and data warehousing solutions.
12,878 questions
{count} votes

Accepted answer
  1. Erland Sommarskog 102.2K Reputation points
    2021-01-14T22:49:54.757+00:00

    To be honest, I don't believe the executions plans to be the same -. and if they are, the data is largely different.

    Anyway, I would suggest that you look at the plans with Live Query Plan so you can see how the query progresses on the slow server. Be sure that you use SSMS 18, which display things like "1 of 100" below the operators, which means 1 actual out of 100 estimated. When the percentages starts to exceed 100 % it starts to be interesting.

    1 person found this answer helpful.

3 additional answers

Sort by: Most helpful
  1. tibor_karaszi@hotmail.com 4,301 Reputation points
    2021-01-14T17:54:20.26+00:00

    The first ting I would check is if any of the table do not have a clustered index. SQL Server is bad at re-using space when you delete data for heap tables (tables without clustered indexes), so they can grow to ridiculous size if you delete data from the table.

    If above isn't the case, then it is time to compare execution plans between the two tand take it from there. CXCONSUMER only tell us that we have parallelism, which in itself often is a good thing. But it also can be the optimizers way to "throw hardware at the problem".


  2. Guoxiong 8,126 Reputation points
    2021-01-14T19:26:03.5+00:00

    When the SELECT query was running on the production server, did you use sp_who2 to check if it was blocked by the other processes? Maybe that would help.

    0 comments No comments

  3. Abdulhakim M. Elrhumi 351 Reputation points
    2021-01-14T21:35:14.79+00:00

    Hi

    Create Database Diagrams
    Create Views and select tables and fields.
    run View .

    Best Regards.
    Please remember to mark the replies as answers if they help.

    0 comments No comments