Aggregate functions in mapping data flow

APPLIES TO: Azure Data Factory Azure Synapse Analytics

Tip

Try out Data Factory in Microsoft Fabric, an all-in-one analytics solution for enterprises. Microsoft Fabric covers everything from data movement to data science, real-time analytics, business intelligence, and reporting. Learn how to start a new trial for free!

Data flows are available both in Azure Data Factory and Azure Synapse Pipelines. This article applies to mapping data flows. If you are new to transformations, please refer to the introductory article Transform data using a mapping data flow.

The following articles provide details about aggregate functions supported by Azure Data Factory and Azure Synapse Analytics in mapping data flows.

Aggregate function list

The following functions are only available in aggregate, pivot, unpivot, and window transformations.

Aggregate function Task
approxDistinctCount Gets the approximate aggregate count of distinct values for a column. The optional second parameter is to control the estimation error.
avg Gets the average of values of a column.
avgIf Based on a criteria gets the average of values of a column.
collect Collects all values of the expression in the aggregated group into an array. Structures can be collected and transformed to alternate structures during this process. The number of items will be equal to the number of rows in that group and can contain null values. The number of collected items should be small.
collectUnique Collects all values of the expression in the aggregated group into a unique array. Structures can be collected and transformed to alternate structures during this process.The number of items will be smaller or equal to the number of rows in that group and can contain null values. The number of collected items should be small.
count Gets the aggregate count of values. If the optional column(s) is specified, it ignores NULL values in the count.
countAll Gets the aggregate count of values including NULLs.
countDistinct Gets the aggregate count of distinct values of a set of columns.
countAllDistinct Gets the aggregate count of distinct values of a set of columns including NULLs.
countIf Based on a criteria gets the aggregate count of values. If the optional column is specified, it ignores NULL values in the count.
covariancePopulation Gets the population covariance between two columns.
covariancePopulationIf Based on a criteria, gets the population covariance of two columns.
covarianceSample Gets the sample covariance of two columns.
covarianceSampleIf Based on a criteria, gets the sample covariance of two columns.
first Gets the first value of a column group. If the second parameter ignoreNulls is omitted, it is assumed false.
isDistinct Finds if a column or set of columns is distinct. It does not count null as a distinct value
kurtosis Gets the kurtosis of a column.
kurtosisIf Based on a criteria, gets the kurtosis of a column.
last Gets the last value of a column group. If the second parameter ignoreNulls is omitted, it is assumed false.
max Gets the maximum value of a column.
maxIf Based on a criteria, gets the maximum value of a column.
mean Gets the mean of values of a column. Same as AVG.
meanIf Based on a criteria gets the mean of values of a column. Same as avgIf.
min Gets the minimum value of a column.
minIf Based on a criteria, gets the minimum value of a column.
skewness Gets the skewness of a column.
skewnessIf Based on a criteria, gets the skewness of a column.
stddev Gets the standard deviation of a column.
stddevIf Based on a criteria, gets the standard deviation of a column.
stddevPopulation Gets the population standard deviation of a column.
stddevPopulationIf Based on a criteria, gets the population standard deviation of a column.
stddevSample Gets the sample standard deviation of a column.
stddevSampleIf Based on a criteria, gets the sample standard deviation of a column.
sum Gets the aggregate sum of a numeric column.
sumDistinct Gets the aggregate sum of distinct values of a numeric column.
sumDistinctIf Based on criteria gets the aggregate sum of a numeric column. The condition can be based on any column.
sumIf Based on criteria gets the aggregate sum of a numeric column. The condition can be based on any column.
topN Gets the top N values for this column.
variance Gets the variance of a column.
varianceIf Based on a criteria, gets the variance of a column.
variancePopulation Gets the population variance of a column.
variancePopulationIf Based on a criteria, gets the population variance of a column.
varianceSample Gets the unbiased variance of a column.
varianceSampleIf Based on a criteria, gets the unbiased variance of a column.