How to implement lead /lag, rank() over partition using Data Factory

Question

How to implement lead /lag, rank() over partition using Data Factory

sam nick 346

Hi,

I have the below table that I wish to achieve.testfile.txt

The raw data is only JobID, Status, Substatus and CreateDate.

I need to create the other 3 columns. I am able to achieve this via a sql code, using the lag function for previous status and previous substatus.

lag(status) over (partition by JobiD order by createDate)

I use the combination of a Merge into (like an update command) and the rank() over partition by for the Status in Month(this is to mark the final row in that month.

		MERGE INTO FinalJobtable T

   USING ( select jobID, Rank () over (partition by jobid, year(createdate) , month(createdate) order by createdate desc,) FlagNum
from FinalJobtable 

	) S 

		ON T.jobID= S.jobID

		AND S.FlagNum= 1

WHEN MATCHED THEN Update SET t.StatusInMonth  = '1';
```![User's image](/api/attachments/2fa1db72-730f-4526-9e4c-2f76ed1a653e?platform=QnA)


Please can you advise on how to achieve this via a dataflow. Currently I have a stored proc that gets called via a pipeline to perform the above functions, but it takes time and i want to utilize ADF's capability to achieve this. I cant seem to find those sql functions in ADF. 

Thank you.

NIKHILA NETHIKUNTA 4,600 Reputation points Microsoft External Staff

2024-09-16T10:50:51.6033333+00:00

@sam nick
Just checking in to see if the below answer helped. If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

1 answer

Your answer

NIKHILA NETHIKUNTA 4,600 Reputation points Microsoft External Staff

2024-09-16T10:50:51.6033333+00:00

@sam nick
Just checking in to see if the below answer helped. If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

Answer 1

LAG and LEAD (Window Functions in ADF Data Flow):

In ADF Data Flow, you can use the Window Transformation to replicate LAG() and LEAD() functionality.

Set up the window transformation to partition by JobID and order by CreateDate.

Steps:

Add a Window transformation in the data flow.
In the partition, specify JobID.
In the sorting condition, specify CreateDate.
Create two new columns using window functions:
- For LAG (Previous status): Use first() with an offset of 1 (this replicates LAG()).
- For LEAD (if needed): Use last() with an offset of 1 to replicate LEAD().


   PreviousStatus = first(Status, 1)

   PreviousSubStatus = first(SubStatus, 1)

RANK() Over Partition in ADF:

The Window Transformation can also handle ranking by partitioning by JobID and ordering by CreateDate.

Steps:

Add another Window transformation.
1. Partition by JobID.
2. Order by CreateDate.
3. Use the rank() function in ADF Data Flow to rank based on this partition.
```
   Rank = rank()
```

Update for Status in Month:

To replicate the final step in your SQL (MERGE INTO FinalJobTable), you can use the Join transformation.
- You can apply the condition to filter only the rows with the rank 1 (i.e., the final row in the month).
- After applying the filter, use an Alter Row transformation to mark the row for an update.
Steps:

Add a Filter transformation after ranking.
1. Filter where Rank = 1.
2. Use an Alter Row transformation to update the StatusInMonth for these rows.

NIKHILA NETHIKUNTA 4,600 Reputation points Microsoft External Staff

2024-09-17T15:43:13.3733333+00:00

@Sam Nickam
Just checking in to see if the above answer helped. If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

Share via

How to implement lead /lag, rank() over partition using Data Factory

1 answer

Your answer