Count and running total of records with consecutive dates (window?)

PeterSh 176 Reputation points

I have a record set of around 10 million rows or so with some timestamp for a range of items. A row is added when a value is detected on a given day, meaning days with no values are missing.

For simplicity, let's use this data as an example.


We have two meters, 5555 and 6666, recording values over four days. Meter 5555 had no data on the third day, so has no corresponding row.

What I'm trying to get to is something like the following.


When I load the most recent row for a meter, I need to be able to see how many consecutive days I have values for and what the cumulative value is across those days.

I think I should be able to use window action to do this, but I'm not having any luck getting my head around it. Anything I come up with seems to need to reference the row being generated by my window function in previous rows - but I don't think it works that way.

Perhaps I need two window functions? I'm not sure.

Any help on this would be great, thanks.

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
9,464 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. PeterSh 176 Reputation points

    I managed to get something working, though I don't know how 'best practice' it is.

    Essentially, I'm passing the rows through three consecutive window transformations.


    So, based on a test dataset with the data from the example:


    Window1: Over meterId, Sort date, Range by Unbounded, Window columns as below (generates consecutiveFlag).


    To break that down, if the date from the previous row plus one day matches the date of this row, return 0, else return 1. The trick here is to give us something to sum that wont change the result if the date is consecutive.

    Window1 preview:


    Window2: Over meterId, Sort date, Range by Unbounded, Window consecutiveGroup = sum(consecutiveFlag)

    Window2 preview:


    Because the flag only changes consecutiveGroup when the row is not consecutive, this gives us something else to group on.

    Window3: Over meterId & consecutiveGroup, Sort date, Range by Unbounded, Window as below.


    Simple stuff, but only possible because we generated another criteria to group.

    Window3 preview:


    And that's it, but just to match the sample output, I sort on date & meterId, then use select to drop the interim rows to get the final output.


    I hope someone finds this helpful. I was struggling to find a solution without taking it out of data factory.

    1 person found this answer helpful.