Edit

Share via


Window functions in mapping data flows

APPLIES TO: Azure Data Factory Azure Synapse Analytics

Tip

Try out Data Factory in Microsoft Fabric, an all-in-one analytics solution for enterprises. Microsoft Fabric covers everything from data movement to data science, real-time analytics, business intelligence, and reporting. Learn how to start a new trial for free!

Data flows are available in both Azure Data Factory pipelines and Azure Synapse Analytics pipelines. This article applies to mapping data flows. If you're new to transformations, refer to the introductory article Transform data using mapping data flows.

This article provides details about window functions supported by Azure Data Factory and Azure Synapse Analytics in mapping data flows.

Window function list

The following functions are available only in window transformations.

Window function Task
cumeDist Computes the position of a value relative to all values in the partition. The result is the number of rows preceding or equal to the current row in the ordering of the partition divided by the total number of rows in the window partition. Any tie values in the ordering evaluate to the same position.
denseRank Computes the rank of a value in a group of values specified in a window's order by clause. The result is one plus the number of rows preceding or equal to the current row in the ordering of the partition. The values don't produce gaps in the sequence. The denseRank function works even when data isn't sorted and looks for change in values.
lag Gets the value of the first parameter evaluated n rows before the current row. The second parameter is the number of rows to look back, and the default value is 1. If there aren't as many rows, a value of null is returned unless a default value is specified.
lead Gets the value of the first parameter evaluated n rows after the current row. The second parameter is the number of rows to look forward, and the default value is 1. If there aren't as many rows, a value of null is returned unless a default value is specified.
nTile Divides the rows for each window partition into n buckets ranging from 1 to at most n. Bucket values differ by at most 1. If the number of rows in the partition doesn't divide evenly into the number of buckets, the remainder values are distributed one per bucket, starting with the first bucket. The NTile function is useful for the calculation of tertiles, quartiles, deciles, and other common summary statistics.

The function calculates two variables during initialization. The size of a regular bucket has one extra row added to it. Both variables are based on the size of the current partition. During the calculation process, the function keeps track of the current row number, the current bucket number, and the row number at which the bucket changes (bucketThreshold). When the current row number reaches bucket threshold, the bucket value increases by one. The threshold increases by the bucket size (plus one extra if the current bucket is padded).
rank Computes the rank of a value in a group of values specified in a window's order by clause. The result is one plus the number of rows preceding or equal to the current row in the ordering of the partition. The values produce gaps in the sequence. The rank function works even when data isn't sorted and looks for change in values.
rowNumber Assigns a sequential row numbering for rows in a window starting with 1.