How to build a generic Aggregation data flow in ADF?

Poel van der, RE (Ron) 451 Reputation points
2020-11-04T14:36:59.26+00:00

Hi

I wanted to try to create a generic aggregation pipeline which can be used for multiple scenarios.
Always more than 1 column to aggregate on, always 1 or more measures to be aggregated.
See attachment

37418-knipsel.jpg

Is there a way to get this working?

regards
Ron

PS Maybe @DanielP-MSFT or @HarithaMaddi-MSFT have a suggestion :)

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
11,623 questions
{count} votes

Accepted answer
  1. HarithaMaddi-MSFT 10,146 Reputation points
    2020-11-05T13:01:13.677+00:00

    Hi @Poel van der, RE (Ron) ,

    Welcome to Microsoft Q&A Platform. Thanks for posting the query.

    Below is one approach I used to implement it. Please use below GIF and attached JSONs for the same.

    37761-dynamicaggregationadf.gif
    37719-dataflowjson.txt
    37720-pipelinejson.txt

    Dynamic Approach:

    Also, as the above approach needs knowing the number of columns ahead that are used for aggregations, I added few more transformations(map drifted and select) in addition to suggestions from @DanielP-MSFT and below is the solution for dynamic number of columns. Below is the GIF and attached the code.

    38151-dynamicaggregationcolumnsadf.gif
    38152-dynamicdataflowcodejson.txt

    Hope this helps! Please suggest if the requirement is different or for further queries and we will be glad to assist.

    --------------------------

    • Please accept an answer if correct. Original posters help the community find answers faster by identifying the correct answer. Here is how.
    • Want a reminder to come back and check responses? Here is how to subscribe to a notification.

1 additional answer

Sort by: Most helpful
  1. DanielP-MSFT 171 Reputation points
    2020-11-05T16:28:18.223+00:00

    Hey @Poel van der, RE (Ron) ,

    What it looks like you are missing is wrapping the byNames() function with array() as byNames() returns an any type.

    In the aggregation groupBy, you will select groupByColumns as your grouping condition (this will create a single row for each unique array value).

    In the aggregation columns, you will need two column patterns.

    1. Match by in($columns, name) and output first($$) with column name $$
    2. Match by in($measures, name) and output count() with column name $$

    Can you try this out and see if it works?

    Thanks,
    Daniel

    Side note: you can also create a computed column to use as your aggregation group by condition and avoid the derive column.

    1 person found this answer helpful.

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.