recursive CTE seems appropriate, but I'm having trouble

EB 1 Reputation point
2020-10-15T16:09:33.987+00:00

Hi All,

I need to return query results of a table with one of the column values updated to that of another row's column value some of the time. What I can't seem to do with LAG or LEAD is use that updated value to determine if a third row's column value should be updated, which is why I think a recursive CTE sounds appropriate.

We have a list of patients hospital stays, with some patients having multiple rows. If a patient's second hospital stay started less than 30 days after their first hospital stay, we want the discharge date of the second hospital stay updated to use the discharge date of the first hospital stay. We want to continue this compare-and-update process for every row (other than the first row) for all of the remaining records for that patient. For example, here's my table:

CREATE TABLE #PatientStays ( PatientID INT , AdmitDate DATE , DischargeDate DATE);
INSERT INTO #PatientStays VALUES
(1, '2020-01-01', '2020-01-05') ,
(1, '2020-01-28', '2020-02-02') ,
(1, '2020-03-15', '2020-03-16') ,
(1, '2020-04-05', '2020-04-11') ,
(2, '2020-01-18', '2020-01-25') ,
(2, '2020-03-01', '2020-03-09') ,
(2, '2020-04-04', '2020-04-06');

And here's how we want the results returned:
SELECT 1 AS PatientID, '2020-01-01' AS AdmitDate, '2020-01-05' AS DischargeDate
UNION ALL SELECT 1, '2020-01-28', '2020-01-05' UNION ALL
SELECT 1, '2020-03-15', '2020-03-16' UNION ALL
SELECT 1, '2020-04-05', '2020-03-16' UNION ALL
SELECT 2, '2020-01-18', '2020-01-25' UNION ALL
SELECT 2, '2020-03-01', '2020-03-09' UNION ALL
SELECT 2, '2020-04-04', '2020-03-09'

Thanks for any help you can provide. I've done some reading on recursive CTE's, but everything I find seems to be more complicated than what I'm trying to do and I get tangled up.

Thanks again,
Eric

Transact-SQL
Transact-SQL
A Microsoft extension to the ANSI SQL language that includes procedural programming, local variables, and various support functions.
4,665 questions
{count} votes

7 answers

Sort by: Most helpful
  1. Tom Cooper 8,471 Reputation points
    2020-10-15T19:57:18.263+00:00
    ;With cteRN As
    (Select PatientID, AdmitDate, DischargeDate,
       Row_Number() Over (Partition By PatientID Order By AdmitDate) As RN
    From #PatientStays),
    
    cteDates As
    (Select PatientID, AdmitDate, DischargeDate, RN, 
      DischargeDate As NewDischargeDate
    From cteRN
    Where RN = 1
    Union All
    Select d.PatientID, r.AdmitDate, r.DischargeDate, r.RN, 
      Case When DateDiff(day, d.DischargeDate, r.AdmitDate) < 30 Then d.DischargeDate Else r.DischargeDate End As NewDischargeDate
    From cteDates d
    Inner Join cteRN r On d.PatientID = r.PatientID And d.RN + 1 = r.RN)
    
    Update s 
    Set s.DischargeDate = d.NewDischargeDate
    From #PatientStays s
    Inner Join cteDates d On s.PatientID = d.PatientID And s.AdmitDate = d.AdmitDate;
    
    -- Check Result
    Select PatientID, AdmitDate, DischargeDate 
    From #PatientStays
    Order By PatientID, AdmitDate;
    

    Tom

    1 person found this answer helpful.
    0 comments No comments

  2. Guoxiong 8,206 Reputation points
    2020-10-15T20:20:47.553+00:00

    I think your last row of the output is not correct since difference between '2020-03-01' and '2020-04-04' is bigger than 30 days. So it should not be updated. Try this:

    WITH CTE AS (
        SELECT *, 
            LAG(AdmitDate, 1) OVER (PARTITION BY PatientID ORDER BY AdmitDate) AS PreviousAdmitDate,
            LAG(DischargeDate, 1) OVER (PARTITION BY PatientID ORDER BY DischargeDate) AS PreviousDischargeDate
        FROM #PatientStays
    )
    
    UPDATE t
    SET t.DischargeDate = CASE WHEN DATEDIFF(DAY, c.PreviousAdmitDate, c.AdmitDate) < 30 THEN c.PreviousDischargeDate ELSE t.DischargeDate END
    FROM #PatientStays AS t
    INNER JOIN CTE AS c ON c.PatientID = t.PatientID AND c.AdmitDate = t.AdmitDate AND c.DischargeDate = t.DischargeDate
    
    SELECT * FROM #PatientStays
    
    1 person found this answer helpful.
    0 comments No comments

  3. Tom Cooper 8,471 Reputation points
    2020-10-17T05:53:51.007+00:00

    In order to handle chains of DischargeDate's (like when you added the new row with an AdmitDate of 2020-04-12, use the following (the only change from my first answer is to use "Then d.NewDischargeDate" instead of "Then d.DischargeDate" in the CASE statement.

    ;With cteRN As
    (Select PatientID, AdmitDate, DischargeDate,
       Row_Number() Over (Partition By PatientID Order By AdmitDate) As RN
    From #PatientStays),
    
    cteDates As
    (Select PatientID, AdmitDate, DischargeDate, RN, 
      DischargeDate As NewDischargeDate
    From cteRN
    Where RN = 1
    Union All
    Select d.PatientID, r.AdmitDate, r.DischargeDate, r.RN, 
      Case When DateDiff(day, d.DischargeDate, r.AdmitDate) < 30 Then d.NewDischargeDate Else r.DischargeDate End As NewDischargeDate
    From cteDates d
    Inner Join cteRN r On d.PatientID = r.PatientID And d.RN + 1 = r.RN)
    
    Update s
    Set s.DischargeDate = d.NewDischargeDate
    From #PatientStays s
    Inner Join cteDates d On s.PatientID = d.PatientID And s.AdmitDate = d.AdmitDate;
    
    -- Check Result
    Select PatientID, AdmitDate, DischargeDate 
    From #PatientStays
    Order By PatientID, AdmitDate;
    

    Tom

    1 person found this answer helpful.
    0 comments No comments

  4. Viorel 118.5K Reputation points
    2020-10-17T07:59:40.307+00:00

    Check an example that does not use recursive CTE:

    drop table if exists #PatientStays
    
    CREATE TABLE #PatientStays ( PatientID INT , AdmitDate DATE , DischargeDate DATE);
    INSERT INTO #PatientStays VALUES
        (1, '2020-01-01', '2020-01-05'),
        (1, '2020-01-28', '2020-02-02'),
        (1, '2020-03-15', '2020-03-16'),
        (1, '2020-04-05', '2020-04-11'),
        (1, '2020-04-12', '2020-04-13'),
        (2, '2020-01-18', '2020-01-25'),
        (2, '2020-03-01', '2020-03-09'),
        (2, '2020-04-04', '2020-04-06')
    
    select * 
    from #PatientStays
    order by PatientID, AdmitDate
    
    ---
    
    ;
    with Q1 as 
    (
        select *, lag(DischargeDate) over (partition by PatientId order by AdmitDate) as pdd 
        from #PatientStays
    ),
    Q2 as
    (
        select *
        from Q1
        where pdd is null or DATEDIFF(d, pdd, AdmitDate) > 30
    ),
    Q3 as
    (
        select ps.*, Q2.DischargeDate as pdd 
        from #PatientStays ps
        inner join Q2 on Q2.PatientID = ps.PatientID and DATEDIFF(d, Q2.DischargeDate, ps.AdmitDate) between 1 and 30
    )
    update Q3
    set DischargeDate = pdd
    
    select * from #PatientStays
    order by PatientID, AdmitDate
    
    /*
    PatientID   AdmitDate  DischargeDate
    ----------- ---------- -------------
    1           2020-01-01 2020-01-05
    1           2020-01-28 2020-01-05
    1           2020-03-15 2020-03-16
    1           2020-04-05 2020-03-16
    1           2020-04-12 2020-03-16
    2           2020-01-18 2020-01-25
    2           2020-03-01 2020-03-09
    2           2020-04-04 2020-03-09
    */
    
    1 person found this answer helpful.

  5. EchoLiu-MSFT 14,591 Reputation points
    2020-10-16T03:56:08.353+00:00

    Hi @EricBragas-0496 ,

    Recursive cte does not seem to solve your problem.

    I have provided another method, the first cte uses the row-number function to mark the row number, and the second cte uses the join and row number to realize the self-joining of the table, so that the data to be compared is placed in the same row. Then you can directly compare the second hospital stay with the first hospital stay, whether the interval exceeds 30 days to update the corresponding data.

    Please check:

     ;with cte  
     as (select *, row_number() over (partition by PatientID order by AdmitDate) rn   
         from #PatientStays)  
     ,cte2   
     as (select c1.PatientID,c1.AdmitDate,c1.DischargeDate,  
                c2.PatientID PatientID2,c2.AdmitDate AdmitDate2,c2.DischargeDate DischargeDate2  
         from cte c1  
         left join cte c2 on c1.PatientID = c2.PatientID AND c1.rn-1 = c2.rn )  
           
     update #PatientStays  
     set DischargeDate = case when datediff(day,c.AdmitDate2,c.AdmitDate)<30    
                              then c.DischargeDate2 else #PatientStays.DischargeDate end   
     from cte2 c  
     where  #PatientStays.PatientID=c.PatientID AND #PatientStays.AdmitDate = c.AdmitDate   
            AND #PatientStays.DischargeDate = c.DischargeDate  
      
     select * from #PatientStays    
    

    32835-image.png

    If you have any question, please feel free to let me know.
    If the response is helpful, please click "Accept Answer" and upvote it.

    Regards
    Echo


    If the answer is helpful, please click "Accept Answer" and upvote it.
    Note: Please follow the steps in our documentation to enable e-mail notifications if you want to receive the related email notification for this thread.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.