tuning sql server recursive query

Wayne Bartkowski 656 Reputation points
2022-11-02T02:18:30.5+00:00

I have a query that looks for the created and completed dates for outages that have overlapping times so it returns the minutes from the start to the last completed time in the group. Below image should help explain.
256236-exampleoverlap.png

So I have been able to research a query that works and returns that correct result as tested so far.

;with timePeriods as ( -- sort time frames according to [created_date] per [Id]  
 Select     dt.TheDate  
            ,jbs.asset_id  
            ,min(jbs.created_date) as created_date  
            ,Isnull(Max(jbs.completed_datetime),getdate()) as completed_datetime  
            ,count(*) as JobCount ---need to display this but keep getting errors  
            ,ROW_NUMBER() over (partition by jbs.asset_id order by min(jbs.created_date), Isnull(Max(jbs.completed_datetime),getdate())) as rn  
            from [dbo].[vwJobs] jbs  
            Join dbo.DateDimension  dt  on   
                Datediff(day, dt.TheDate , jbs.created_date) <=0     
                and Datediff(day, dt.TheDate ,  isnull(jbs.completed_datetime, getdate())) >=0  
             Join [dbo].[vwAssets] ast  on ast.asset_id = jbs.asset_id  
                Where asset_code = 'TCS' --TCS: Total Control System'  
                And jbs.[availability] ='Y'   
                and isnull(ast.deadflag,'') <> 'Y'  
            Group by dt.TheDate, jbs.asset_id  
                ), cte as   
( -- SQL recursive CTE expression  
     select -- anchor query  
      [TheDate], asset_id, [created_date], [completed_datetime], rn, 1 as GroupId  
     from timePeriods  
     where rn = 1  

     union all  

     select -- recursive sql query  
       p1.TheDate,  
       p1.[asset_id],  
      case  
      when (p1.[created_date] between p2.[created_date] and p2.[completed_datetime]) then p2.[created_date]  
      when (p2.[created_date] between p1.[created_date] and p1.[completed_datetime]) then p1.[created_date]  
      when (p1.[created_date] < p2.[created_date] and p1.[completed_datetime] > p2.[completed_datetime]) then p1.[created_date]  
      when (p1.[created_date] > p2.[created_date] and p1.[completed_datetime] < p2.[completed_datetime]) then p2.[created_date]  
      else p2.[created_date]  
      end as [created_date],  

      case  
      when (p1.[completed_datetime] between p2.[created_date] and p2.[completed_datetime]) then p2.[completed_datetime]  
      when (p2.[completed_datetime] between p1.[created_date] and p1.[completed_datetime]) then p1.[completed_datetime]  
      when (p1.[created_date] < p2.[created_date] and p1.[completed_datetime] > p2.[completed_datetime]) then p1.[completed_datetime]  
      when (p1.[created_date] > p2.[created_date] and p1.[completed_datetime] < p2.[completed_datetime]) then p2.[completed_datetime]  
      else p2.[completed_datetime]  
      end as [completed_datetime],  


      p2.rn,  
      case when  
      (p1.[created_date] between p2.[created_date] and p2.[completed_datetime]) or  
      (p1.[completed_datetime] between p2.[created_date] and p2.[completed_datetime]) or  
      (p1.[created_date] < p2.[created_date] and p1.[completed_datetime] > p2.[completed_datetime]) or  
      (p1.[created_date] > p2.[created_date] and p1.[completed_datetime] < p2.[completed_datetime])  
      then  
      p1.GroupId  
      else  
      (p1.GroupId+1)  
      end as GroupId  
     from cte p1 -- referencing CTE itself  
     inner join timePeriods p2  
      on p1.[asset_id] = p2.[asset_id] and  
      (p1.rn+1) = p2.rn  
)  


Select   
    "KPI No"=  10  
    ,"KPI Type" = 'Availability'  
    ,"KPI Category" = 'Total Control System'  
    ,"KPI Name" = 'Availability Of Controls'+ ast.[conops_rank]  
    ,dt.Thedate  --as ReportMonthYear  
    ,ast.Asset_id  
    ,ast.Asset_code  
    ,ast.Asset_Type  
    ,Conops_Rank  
    ,60*24 as DailyTotalTime  
    ,created_date DownTimeStart  
    ,completed_datetime DownTimeEnd  
    ,Isnull(DATEDIFF(MINUTE, created_date, completed_datetime),0) as AssetDailyDowntimeMinutes   
from [dbo].[vwAssets] ast  
Join dbo.DateDimension  dt  on Datediff(day, dt.TheDate , ast.feature_start_date) <=0     
and Datediff(day, dt.TheDate ,  isnull(nullif(ast.feature_end_date,'3000-01-01 00:00:00.000'), getdate())) >=0  
Left Join cte jbs on jbs.thedate = dt.thedate and jbs.asset_id = ast.asset_id  
Where asset_code =  'TCS'  
order by asset_id, thedate  

Unfortunately, this query takes ruffly 15minutes to return data, so it is not the best for a report view.
I was hopeful that it could be written more efficiently but I can't see how I would get around the loop.
Failing a more efficient query I could use this to create a table for reporting purposes.
I have attached a txt file with an explain plan that I am not overly familiar with analysing.

256228-explainplan.txt

Thanks in advance

SQL Server
SQL Server
A family of Microsoft relational database management and analysis systems for e-commerce, line-of-business, and data warehousing solutions.
12,708 questions
Transact-SQL
Transact-SQL
A Microsoft extension to the ANSI SQL language that includes procedural programming, local variables, and various support functions.
4,552 questions
0 comments No comments
{count} votes

3 answers

Sort by: Most helpful
  1. CosmogHong-MSFT 22,941 Reputation points Microsoft Vendor
    2022-11-02T03:04:11.8+00:00

    Hi @Wayne Bartkowski
    Here are some tips:

    1. Replace a complex statement with multiple simple statements.
      For example, create a temporary table and insert the result set of the first CTE into it.
    2. Make sure all mentioned columns are indexed, especially filter columns.
    3. Avoid too many CASE WHEN expressions in the recursive CTE. You could move the first two CASE WHEN expressions outside of the recursive CTE, like this: ;with cte as
      ( -- SQL recursive CTE expression
      select -- anchor query
      [TheDate],
      asset_id,
      [created_date1],
      '' as [created_date2],
      [completed_datetime1],
      '' as [completed_datetime2],
      rn,
      1 as GroupId
      from timePeriods
      where rn = 1 union all select -- recursive sql query
      p1.TheDate,
      p1.[asset_id],
      p1.[created_date],
      p2.[created_date],
      p1.[completed_datetime],
      p2.[completed_datetime],
      p2.rn,
      case when
      (p1.[created_date] between p2.[created_date] and p2.[completed_datetime]) or
      (p1.[completed_datetime] between p2.[created_date] and p2.[completed_datetime]) or
      (p1.[created_date] < p2.[created_date] and p1.[completed_datetime] > p2.[completed_datetime]) or
      (p1.[created_date] > p2.[created_date] and p1.[completed_datetime] < p2.[completed_datetime])
      then p1.GroupId else (p1.GroupId+1) end as GroupId
      from cte p1 inner join timePeriods p2 on p1.[asset_id] = p2.[asset_id] and (p1.rn+1) = p2.rn
      )
      select TheDate,[asset_id],
      case
      when ([created_date1] between [created_date2] and [completed_datetime2]) then [created_date2]
      when ([created_date2] between [created_date1] and [completed_datetime1]) then [created_date1]
      when ([created_date1] < [created_date2] and [completed_datetime1] > [completed_datetime2]) then [created_date1]
      when ([created_date1] > [created_date2] and [completed_datetime1] < [completed_datetime2]) then [created_date2]
      else [created_date2] end as [created_date],
      ,etc
      from cte

    Best regards,
    LiHong


    If the answer is the right solution, please click "Accept Answer" and kindly upvote it. If you have extra questions about this answer, please click "Comment".
    Note: Please follow the steps in our Documentation to enable e-mail notifications if you want to receive the related email notification for this thread.

    0 comments No comments

  2. Erland Sommarskog 101K Reputation points MVP
    2022-11-02T22:43:44.053+00:00

    I am not sure that this should be a recursive query. I immediately need to add the disclaimer that the query is complex and difficult to read in a hurry.

    But it seems to me that for each iteration in the recursive CTE you are only adding one more set of data. That is, you are running a loop in disguise. In a "normal" recursive CTE one traverses a recursive structure like a tree of nodes from bottom and up or vice versa.

    Maybe this is better written as a true loop, using a temp table for intermediate results. This can make it easier to check intermediate results, and it can also be easier to look at execution plans for the different steps, as the queries can be simpler.

    As for the execution plans, I noticed that you posted in text format. I will have to admit that it was many years ago since I looked plans in that format. The best is to get the XML format and upload it to http://www.pastetheplan.com.

    0 comments No comments

  3. Wayne Bartkowski 656 Reputation points
    2022-11-09T22:14:29.783+00:00

    Thanks for looking into this.
    I found some useful tips on SQL Server Central as well.
    With the final result like the code below which runs in an acceptable timeframe.
    ;WITH C1 AS (
    SELECT TheDate
    ,asset_id
    ,ts=created_date
    ,Type=1
    ,e=NULL
    ,s=ROW_NUMBER() OVER (PARTITION BY thedate, asset_id ORDER BY created_date)
    FROM [dbo].Jobs
    Join dbo.DateDimension on
    Datediff(day, TheDate , created_date) <=0
    and Datediff(day, TheDate , isnull(completed_datetime, getdate())) >=0
    Join [dbo].Assets on asset_id = asset_id
    where asset_code = 'DFR'
    UNION ALL
    SELECT TheDate
    ,asset_id
    ,ts=isnull(completed_datetime, getdate())
    ,Type=-1
    ,e=ROW_NUMBER() OVER (PARTITION BY thedate, asset_id ORDER BY completed_datetime)
    ,s=NULL
    FROM [dbo].Jobs
    Join dbo.DateDimension dt on
    Datediff(day, TheDate , created_date) <=0
    and Datediff(day, TheDate , isnull(completed_datetime, getdate())) >=0
    Join [dbo].Assets ast on asset_id = asset_id
    where asset_code = 'DFR' ),
    C2 AS (
    SELECT C1.*
    ,se=ROW_NUMBER() OVER (PARTITION BY thedate,asset_id ORDER BY ts, Type DESC)
    FROM C1),
    C3 AS (
    SELECT thedate, asset_Id, ts
    ,grpnm=FLOOR((ROW_NUMBER() OVER (PARTITION BY thedate,asset_id ORDER BY ts)-1) / 2 + 1)
    FROM C2
    WHERE COALESCE(s-(se-s)-1, (se-e)-e) = 0),
    C4 as (SELECT thedate
    , asset_Id
    , StartDate=MIN(ts)
    , EndDate=MAX(ts)
    , DATEDIFF(minute, MIN(ts),MAX(ts)) as MinDown
    FROM C3
    GROUP BY thedate, asset_Id, grpnm)

    Select  
    	Thedate  --as ReportMonthYear  
    	,Asset_id  
    	,Asset_code  
    	,Asset_Type  
    	,60*24 as DailyTotalTime  
    	,StartDate   
    	,EndDate   
    	,MinDown   
    	,Isnull(a.JobCount,0) as JobCount  
    from [dbo].[Assets]   
    Join dbo.DateDimension  on Datediff(day, TheDate , feature_start_date) <=0     
    and Datediff(day, TheDate ,  isnull(nullif(feature_end_date,'3000-01-01 00:00:00.000'), getdate())) >=0  
    Join C4 Jobs on thedate = thedate and asset_id = asset_id  
    Join (Select  	TheDate  
    			,asset_id  
    			,min(created_date) as created_date  
    			,Isnull(Max(completed_datetime),getdate()) as completed_datetime  
    			,count(*) as JobCount  
    			from [dbo].[Jobs]   
    			Join dbo.DateDimension  dt  on   
    				Datediff(day, TheDate , created_date) <=0     
    				and Datediff(day, TheDate ,  isnull(completed_datetime, getdate())) >=0  
    			 Join [dbo].[Assets] ast  on asset_id = asset_id  
    			Where  asset_code = 'DFR'  
    			Group by TheDate, asset_id) a on a.TheDate = TheDate and a.asset_id = asset_id  
    Where asset_code =	'DFR'  
    order by thedate, asset_id  
      
      
      
    
    0 comments No comments