How to implement lead /lag, rank() over partition using Data Factory

sam nick 346 Reputation points
2024-09-13T14:22:10.3+00:00

Hi,

I have the below table that I wish to achieve.testfile.txt

The raw data is only JobID, Status, Substatus and CreateDate.

I need to create the other 3 columns. I am able to achieve this via a sql code, using the lag function for previous status and previous substatus.

lag(status) over (partition by JobiD order by createDate)

I use the combination of a Merge into (like an update command) and the rank() over partition by for the Status in Month(this is to mark the final row in that month.

		MERGE INTO FinalJobtable T

   USING ( select jobID, Rank () over (partition by jobid, year(createdate) , month(createdate) order by createdate desc,) FlagNum
from FinalJobtable 

	) S 

		ON T.jobID= S.jobID

		AND S.FlagNum= 1

WHEN MATCHED THEN Update SET t.StatusInMonth  = '1';
```![User's image](/api/attachments/2fa1db72-730f-4526-9e4c-2f76ed1a653e?platform=QnA)


Please can you advise on how to achieve this via a dataflow. Currently I have a stored proc that gets called via a pipeline to perform the above functions, but it takes time and i want to utilize ADF's capability to achieve this. I cant seem to find those sql functions in ADF. 

Thank you. 

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
11,639 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Amira Bedhiafi 33,631 Reputation points Volunteer Moderator
    2024-09-14T10:55:39.8166667+00:00

    LAG and LEAD (Window Functions in ADF Data Flow):

    • In ADF Data Flow, you can use the Window Transformation to replicate LAG() and LEAD() functionality.

    Set up the window transformation to partition by JobID and order by CreateDate.

    Steps:

    1. Add a Window transformation in the data flow.
    2. In the partition, specify JobID.
    3. In the sorting condition, specify CreateDate.
    4. Create two new columns using window functions:
      • For LAG (Previous status): Use first() with an offset of 1 (this replicates LAG()).
      • For LEAD (if needed): Use last() with an offset of 1 to replicate LEAD().
    
       PreviousStatus = first(Status, 1)
    
       PreviousSubStatus = first(SubStatus, 1)
    
    

    RANK() Over Partition in ADF:

    • The Window Transformation can also handle ranking by partitioning by JobID and ordering by CreateDate.

    Steps:

    1. Add another Window transformation.
      1. Partition by JobID.
      2. Order by CreateDate.
      3. Use the rank() function in ADF Data Flow to rank based on this partition.
      
         Rank = rank()
      
      

    Update for Status in Month:

    • To replicate the final step in your SQL (MERGE INTO FinalJobTable), you can use the Join transformation.
      • You can apply the condition to filter only the rows with the rank 1 (i.e., the final row in the month).
      • After applying the filter, use an Alter Row transformation to mark the row for an update.
      Steps:
    1. Add a Filter transformation after ranking.
      1. Filter where Rank = 1.
      2. Use an Alter Row transformation to update the StatusInMonth for these rows.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.