How to Full Outer Join/Merge two streams on ID and closest timestamp

Matty Stubbs 190 Reputation points
2023-11-02T01:58:42.1333333+00:00

Given two streams that have been full outer joined, I am unsure how to fill in the null the data. I would like any nulls to be replaced with the latest data from the other table relative to the timestamp.

I am joining on a shared ID and the timestamp. Here is an example of what I would like to happen:
Req_Table:

 Req_Prop_ID | Req_Status | Req_Timestamp          |
-------------+------------+------------------------+
         243 |         15 | 2018-01-23 03:38:40.00 |
         243 |         42 | 2018-01-23 05:39:20.00 |
         392 |         -3 | 2018-03-29 00:00:20.00 | 
		 488 |          3 | 2018-04-02 00:00:20.00 | 

Prop_Table:

Prop_ID | Prop_State | Prop_Status | Prop_Timestamp         |
--------+------------+-------------+------------------------+
    243 |          7 |           4 | 2018-01-23 04:16:20.00 |	
    243 |          7 |           6 | 2018-01-23 09:16:20.00 |	
    392 |          1 |           1 | 2018-03-29 06:38:30.00 |
    488 |          7 |           1 | 2018-04-02 00:00:20.00 |

JOINED TABLE

 Req_Prop_ID | Req_Status | Req_Timestamp          | Prop_State | Prop_Status |
-------------+------------+------------------------+------------+-------------+
         243 |         15 | 2018-01-23 03:38:40.00 |       NULL |        NULL |
         243 |         15 | 2018-01-23 04:16:20.00 |          7 |           4 |
         243 |         42 | 2018-01-23 05:39:20.00 |          7 |           4 |
         243 |         42 | 2018-01-23 09:16:20.00 |          7 |           6 |
         392 |         -3 | 2018-03-29 00:00:20.00 |       NULL |        NULL |
         392 |         -3 | 2018-03-29 06:38:30.00 |          1 |           1 |
         488 |          3 | 2018-04-02 00:00:20.00 |          7 |           1 |

The duplicated Prop_ID column has been removed in the above Joined Table, but could be present if need be.

How does one achieve this in Synapse? It's easy enough to do a Full outer join, however how do I grab the most recent data at a specific timestamp from the other table?
User's image

I'm assuming I'm going to have to do something with an Alter Row transformation but I am unsure on exactly what needs to be done? Is there an easy way of getting this outcome?

Thanks

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
4,924 questions
0 comments No comments
{count} votes

Accepted answer
  1. Matty Stubbs 190 Reputation points
    2023-11-03T05:32:39.83+00:00

    I figured it out, had to use the window function on each stream to create another timestamp column. Then could do a non-equality inner join based on the two colums.
    User's image

    Then create a window column called coalesce(lead(Timestamp_Req), toTimestamp('9999-12-31 23:59:59', 'yyyy-MM-dd HH:mm:ss'))
    User's image

    Which will give output that looks like this:
    User's image

    Then I created a derived column that took one second off the TimestampEnd column, TimestampEnd_P - seconds(1)
    Next, do the same process for the other stream.
    Lastly, do the inner join on the two streams. Keep in mind you will have to broadcast one of these - try to broadcast pick the stream you think will be smaller.
    User's image

    Then create a derived column called time_stamp or whatever that combines the two time series:
    iif(Timestamp_Req > Timestamp_P, Timestamp_Req, Timestamp_P)

    Lastly do a select to remove all the extra timestamp columns only keeping the derived col that was jsut created.

    0 comments No comments

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.