Compute total job duration of calendar events with possible overlaps

Vladimir Moldovanenko 26 Reputation points
2021-05-26T20:10:31.727+00:00

Hi.

Thanks for looking at my challenge.

The challenge is to compute total duration of service jobs for a work center given individual job start/finish times. Jobs may overlap.

DECLARE @CalendarEvents TABLE  
(  
    wrcID int NOT NULL -- work center  
    ,jobID int NOT NULL -- job to perform  
    -- calendar  
    ,ceStartDateTime datetimeoffset(0) NOT NULL  
    ,ceFinishDateTime datetimeoffset(0) NOT NULL  
    ,PRIMARY KEY(wrcID, jobID)  
)  
INSERT INTO @CalendarEvents  
VALUES  
(1, 1, '2021-05-26 08:00:00.0000000 -05:00', '2021-05-26 08:30:00.0000000 -05:00')  
,(1, 2, '2021-05-26 08:30:00.0000000 -05:00', '2021-05-26 09:00:00.0000000 -05:00')  
,(1, 3, '2021-05-26 09:10:00.0000000 -05:00', '2021-05-26 09:40:00.0000000 -05:00')  
,(1, 4, '2021-05-26 09:10:00.0000000 -05:00', '2021-05-26 09:40:00.0000000 -05:00')  
,(1, 5, '2021-05-26 09:10:00.0000000 -05:00', '2021-05-26 09:30:00.0000000 -05:00')  
,(1, 6, '2021-05-26 09:20:00.0000000 -05:00', '2021-05-26 09:30:00.0000000 -05:00')  
,(1, 7, '2021-05-26 09:50:00.0000000 -05:00', '2021-05-26 10:00:00.0000000 -05:00')  
  
,(2, 10, '2021-05-26 08:00:00.0000000 -05:00', '2021-05-26 08:30:00.0000000 -05:00')  
,(2, 11, '2021-05-26 08:20:00.0000000 -05:00', '2021-05-26 08:50:00.0000000 -05:00')  
  
  
SELECT ce.*  
    ,DATEDIFF(minute, ce.ceStartDateTime, ce.ceFinishDateTime) SegmentDuration_Mins  
FROM @CalendarEvents ce  
  
-- Compute the total job duration by work center, considering any overlaps, expected result  
SELECT *  
FROM(VALUES (1, 100), (2, 50)) v(wrcID, TotalDuration_Mins)  
  
-- wrong result  
SELECT  
    ce.wrcID  
    ,SUM(DATEDIFF(minute, ce.ceStartDateTime, ce.ceFinishDateTime)) TotalDuration_Mins  
FROM @CalendarEvents ce  
GROUP BY  
    ce.wrcID  

This can be represented as follows

100004-2021-05-26-14-53-01-calendar.jpg

the right result is: 120 - 20 = 100 minutes in duration for all scheduled jobs. It's apparent that straight aggregation will not account for overlaps.

If you have any idea(s), even if this is in an approach, please let me know.

Thanks for your time and expertise. I appreciate it.

Vladimir

Transact-SQL
Transact-SQL
A Microsoft extension to the ANSI SQL language that includes procedural programming, local variables, and various support functions.
4,601 questions
0 comments No comments
{count} vote

Accepted answer
  1. Viorel 114.7K Reputation points
    2021-05-27T09:00:30.847+00:00

    Check an alternative approach too:

    ;
    with G as
    (
        select *,
            concat('POLYGON ((', s, ' 0,', s, ' 1, ', e, ' 1, ', e, ' 0, ', s, ' 0))') as p 
        from @CalendarEvents
        cross apply (select min(ceStartDateTime) from @CalendarEvents) M(m)
        cross apply (values (datediff(minute, m, ceStartDateTime), datediff(minute, m, ceFinishDateTime))) D(s, e)
    )
    select wrcID, geometry::UnionAggregate(p).STArea() as TotalDuration_Mins
    from G
    group by wrcID
    
    1 person found this answer helpful.
    0 comments No comments

6 additional answers

Sort by: Most helpful
  1. Vladimir Moldovanenko 251 Reputation points
    2021-05-28T16:21:04.2+00:00

    @Erland Sommarskog for my specific case this would work too, as 1 minute is the minimum resolution I need. However, performance wise it will not do well compared to the below.

    the only negative for the method shared by @Viorel that I see is that whoever maintains it after me, if ever, needs to know a little about t-sql geometry, which is rare skill. I happen to have worked with it before, so I am good.

    Just to add a minor refinement to definition and processing that I had to account for.
    My input is restricted to a single calendar day and scheduled entries can span multiple day.

    Therefore. for any given day only that day's load needs to be accounted for and it needs to start with the beginning of that day

    -- add job that spans more than a day, cause load to be a full day  
    ,(1, 0, '2021-05-26 08:00:00.0000000 -05:00', '2021-05-28 00:30:00.0000000 -05:00')  
      
    DECLARE @CurrentTimeZoneName nvarchar(128) = 'Central Standard Time'--dbo.fnCLR_getCurrentTimeZoneName()  
    DECLARE @ocDate date = '2021-05-27' -- input day to evaluate schedule for  
      
    SELECT  
        --geometry::Parse(pol.P) P  
        geometry::UnionAggregate(pol.P).STArea()  
    FROM(VALUES   
        (CAST(@ocDate as datetime2(0)) AT TIME ZONE @CurrentTimeZoneName  
        ,CAST(DATEADD(day, 1, @ocDate) as datetime2(0)) AT TIME ZONE @CurrentTimeZoneName)  
    ) i(ocDateTimeStart, ocDateTimeFinish)  
    CROSS JOIN @CalendarEvents ce  
    CROSS APPLY(SELECT -- if dates overlap day, trim to a day  
        ceStartDateTime = CASE WHEN ce.ceStartDateTime < i.ocDateTimeStart THEN i.ocDateTimeStart ELSE ce.ceStartDateTime END  
        ,ceFinishDateTime = CASE WHEN ce.ceFinishDateTime > i.ocDateTimeFinish THEN i.ocDateTimeFinish ELSE ce.ceFinishDateTime END  
    ) d  
    CROSS APPLY(VALUES (DATEDIFF(minute, i.ocDateTimeStart, d.ceStartDateTime), DATEDIFF(minute, i.ocDateTimeStart, d.ceFinishDateTime))) p(s, e) -- durations in minutes for start and end points since the begining of the day  
    CROSS APPLY(VALUES (CONCAT('POLYGON ((', p.s, ' 0,', p.s, ' 1, ', p.e, ' 1, ', p.e, ' 0, ', p.s, ' 0))'))) pol(P)  
    -- range overlap. Two ranges overlap if (StartA <= EndB) and (EndA >= StartB)  
    WHERE ce.ceStartDateTime <= i.ocDateTimeFinish  
    AND ce.ceFinishDateTime >= i.ocDateTimeStart  
    

    This does produce the correct results. The plan complexity is trivial.

    Someone may benefit from this.

    Thanks to all that contributed. This was an interesting and exciting challenge for me.


  2. Vladimir Moldovanenko 251 Reputation points
    2021-05-28T23:15:57.15+00:00

    Here is how it works so you can appreciate the genius of this solution.

    Use table and data definition from the above

    DECLARE @CurrentTimeZoneName nvarchar(128) = 'Central Standard Time'--dbo.fnCLR_getCurrentTimeZoneName()  
    DECLARE @ocDate date = '2021-05-26' -- input day to evaluate schedule for  
          
    SELECT  
        CONCAT('Job ', ce.jobID) as Label  
        ,geometry::Parse(pol.P) P  
        --geometry::UnionAggregate(pol.P).STArea()  
    FROM(VALUES   
        (CAST(@ocDate as datetime2(0)) AT TIME ZONE @CurrentTimeZoneName  
        ,CAST(DATEADD(day, 1, @ocDate) as datetime2(0)) AT TIME ZONE @CurrentTimeZoneName)  
    ) i(ocDateTimeStart, ocDateTimeFinish)  
    CROSS JOIN @CalendarEvents ce  
    CROSS APPLY(SELECT -- if dates overlap day, trim to a day  
        ceStartDateTime = CASE WHEN ce.ceStartDateTime < i.ocDateTimeStart THEN i.ocDateTimeStart ELSE ce.ceStartDateTime END  
        ,ceFinishDateTime = CASE WHEN ce.ceFinishDateTime > i.ocDateTimeFinish THEN i.ocDateTimeFinish ELSE ce.ceFinishDateTime END  
    ) d  
    CROSS APPLY(VALUES (DATEDIFF(minute, i.ocDateTimeStart, d.ceStartDateTime), DATEDIFF(minute, i.ocDateTimeStart, d.ceFinishDateTime))) p(s, e) -- durations in minutes for start and end points since the beginning of the day  
    -- this polygon, a bar, is closed geometry, built with 5 points, specified as 'X Y', with comma separator, clockwise, equivalent to a bar of time interval of 1 unit high, laid flat on X = 0  
    -- https://learn.microsoft.com/en-us/sql/relational-databases/spatial/polygon?view=sql-server-ver15  
    CROSS APPLY(VALUES (CONCAT('POLYGON (('  
        -- X        Y  Separator  
        ------    ---- ---------  
        ,p.s, ' ' , 0, ','      -- point 1, X Y, distance from the 00:00 hrs marker, in my case the beginning of a day. it can be the beginning of a year or other peg  
        ,p.s, ' ' , 1, ','      -- point 2, X Y, bar height is only 1 unit, second if you will. it does not matter what this is, just it is of 1 unit.  
        ,p.e, ' ' , 1, ','      -- point 3, X Y, top right end point  
        ,p.e, ' ' , 0, ','      -- point 4, X Y, bottom right point  
        ,p.s, ' ' , 0,          -- point 5, X Y, the same as point 1, closes geometry  
    '))'))) pol(P)  
    -- so, each job gets a bar of a length equal to its duration  
    -- because they all are layered on X = 0, UnionAggregate combines these, into new polygons. Overlapping polygons become one.  
    -- https://learn.microsoft.com/en-us/sql/t-sql/spatial-geography/unionaggregate-geography-data-type?view=sql-server-ver15  
    -- then area method derives area of these polygons. because Y is 1 unit high, it's length that remains, which is total minutes.  
    -- https://learn.microsoft.com/en-us/sql/t-sql/spatial-geography/starea-geography-data-type?view=sql-server-ver15  
    WHERE ce.ceStartDateTime <= i.ocDateTimeFinish  
    AND ce.ceFinishDateTime >= i.ocDateTimeStart  
    AND ce.wrcID = 1  
      
      
    -- run the above and select 'Spatial Results' tab  
    -- grid should show you these bars. hover over bars to see job ID representing bars.  
    

    @Viorel simply took Excel screenshot, created bars, laid them on X axis, combined them and got area value.
    @Viorel please chime in if you can explain more or correct me

    100596-geometry-2021-05-28-18-07-51.jpg

    This is simply unique solution to this problem, worth understanding how it works. It can be used to solve similar issues.

    Thanks

    Vladimir