Window transformation in mapping data flow

APPLIES TO: Azure Data Factory Azure Synapse Analytics

Tip

Try out Data Factory in Microsoft Fabric, an all-in-one analytics solution for enterprises. Microsoft Fabric covers everything from data movement to data science, real-time analytics, business intelligence, and reporting. Learn how to start a new trial for free!

Data flows are available both in Azure Data Factory and Azure Synapse Pipelines. This article applies to mapping data flows. If you are new to transformations, please refer to the introductory article Transform data using a mapping data flow.

The Window transformation is where you define window-based aggregations of columns in your data streams. In the Expression Builder, you can define different types of aggregations that are based on data or time windows (SQL OVER clause) such as LEAD, LAG, NTILE, CUMEDIST, and RANK. A new field is generated in your output that includes these aggregations. You can also include optional group-by fields.

Screenshot shows Windowing selected from the menu.

Over

Set the partitioning of column data for your window transformation. The SQL equivalent is the Partition By in the Over clause in SQL. If you wish to create a calculation or create an expression to use for the partitioning, you can do that by hovering over the column name and selecting Computed column.

Screenshot shows Windowing Settings with the Over tab selected.

Sort

Another part of the Over clause is setting the Order By. This clause sets the data sort ordering. You can also create an expression for a calculate value in this column field for sorting.

Screenshot shows Windowing Settings with the Sort tab selected.

Range By

Next, set the window frame as Unbounded or Bounded. To set an unbounded window frame, set the slider to Unbounded on both ends. If you choose a setting between Unbounded and Current Row, then you must set the Offset start and end values. Both values are positive integers. You can use either relative numbers or values from your data.

The window slider has two values to set: the values before the current row and the values after the current row. The offset between start and end matches the two selectors on the slider.

Screenshot shows Windowing Settings with the Range by tab selected.

Window columns

Lastly, use the Expression Builder to define the aggregations you wish to use with the data windows such as RANK, COUNT, MIN, MAX, DENSE RANK, LEAD, LAG, etc.

The full list of aggregation and analytical functions available for you to use in the Data Flow Expression Language via the Expression Builder are listed in Data transformation expressions in mapping data flow.

If you're looking for a simple group-by aggregation, use the Aggregate transformation