Combine multiple rows into single row in dataflow in Azure Data Factory

Alekya Seemakurty, Sri 86 Reputation points
2022-07-14T22:38:00.037+00:00

Hello All,

I have a source XML where the data is flatten based on lowest level. Post flatenning, my data is looking like this

Key Row1 Row2 Row3 Row4 Row5
101 A
101 NULL B
101 NULL NULL C
101 NULL NULL NULL D
101 NULL NULL NULL NULL E
102 A NULL NULL NULL NULL
102 NULL B NULL NULL NULL
102 NULL NULL NULL D NULL
102 NULL NULL NULL NULL E
103 A NULL NULL NULL NULL
103 NULL B NULL NULL NULL
103 NULL NULL C NULL NULL
103 NULL NULL NULL D NULL
104 NULL B NULL NULL NULL
104 NULL NULL C NULL NULL
104 NULL NULL NULL D NULL

Now, I want the output to be in a single row for each key field.

Key Row1 Row2 Row3 Row4 Row5
101 A B C D E
102 A B D E
103 A B C D
104 B C D

Is there any other method to achieve this with any transformation apart from a regular JOIN transformation?

Azure Monitor
Azure Monitor
An Azure service that is used to collect, analyze, and act on telemetry data from Azure and on-premises environments.
Azure Storage Explorer
Azure Storage Explorer
An Azure tool that is used to manage cloud storage resources on Windows, macOS, and Linux.
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
{count} votes

Answer accepted by question author
  1. AnnuKumari-MSFT 34,566 Reputation points Microsoft Employee Moderator
    2022-07-15T06:50:24.057+00:00

    Hi @Alekya Seemakurty, Sri ,

    Thankyou for using Microsoft Q&A platform and thanks for posting your query.

    As I understand your query, you want to combine multiple columns into single column based on aggregation on Key column here. Please let me know if my understanding is incorrect.

    You can try the following approach to achieve the above requirement:

    1. Create a dataset pointing to the source file . I have stored the above provided source data in .csv file . Make sure to select Edit option in Column delimiter and provide Space ' ' in the textbox as your column is not separated by comma delimiter. Also, check first row as header.

    220960-image.png

    Data looks like this :
    220980-image.png

    2. Use Aggregate transformation and group by using 'Keys' column and use collect(Row1) in the expression of aggregate tab.

    220984-image.png

    3. Use Derived column transformation to create Columns named - 'Row1','Row2','Row3','Row4','Row5'

    221052-image.png

    221003-image.png

    4. Use Select Transformation to select only necessary columns

    220949-image.png

    221053-image.png

    5. Use Sink transformation to load the data in csv. Check the first row as header option.
    221004-image.png

    Hope this will help. Please let us know if any further queries.

    ------------------------------

    • Please don't forget to click on 130616-image.png or upvote 130671-image.png button whenever the information provided helps you.
      Original posters help the community find answers faster by identifying the correct answer. Here is how
    • Want a reminder to come back and check responses? Here is how to subscribe to a notification
    • If you are interested in joining the VM program and help shape the future of Q&A: Here is how you can be part of Q&A Volunteer Moderators
    2 people found this answer helpful.

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.