I have an situation where my app team is informing that one of the Select query that contains left join & traverses across 2 databases and pulls data is taking longer time & runs for hours with no output & appears the same completed in less than 3 mins up until last week.
App team says the same query is still running on development and completes in 2 mins & i validated it by running my self & it does , the data on my PROD and DEV are of same size and so is the structure of the table /index . It retrieves 99 rows of data in dev & is expected to do same on PROD.
Index and update stats are all up to date on PROD & we use Ola index maint with stats , DEV & PROD system are the same resource level and edition & version is SQL2016 SP2 CU2 ENT with win 2012 R2 STD ( we are still working to get the CU updated to N-1 to CU14 )
there is an tradeoff where the PROD is being accessed more than DEV however it still doesnt give me an concrete measure to putforth since query in PROD runs for hours with no data
MAXDOP at 8 & cost threshold at 50 on both DEV & PROD is maintained
when i left the query run on PROD i had the whoisactive monitored & the session was always suspended with CXCONSUMER ( i got an conformation from my server admin & they told no host level changes on VM or server were done) . Please advise as the problem appears to happen only from last week & the stats & index are up to date until today.
Attached 2 image & during the run time there are an lot of other sessions also running parallel for ETL loads .
any assistance is welcome