Add column to CSV File from another CSV File (Azure Data Factory)

Mvdit0 1 Reputation point
2022-04-05T01:08:49.293+00:00

For example:

Persons.csv

name, last_name
-----------------------
jack, jack_lastName
luc, luc_lastname

FileExample.csv

id
243
123

Result:

name, last_name, exampleId
-------------------------------
jack, jack_lastName, 243
luc, luc_lastname, 123

I want to aggregate any number of columns from another data source, to insert that final result in a file or in a database table.

I have been trying many ways but I can't do it.

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
0 comments No comments
{count} votes

3 answers

Sort by: Most helpful
  1. MarkKromer-MSFT 5,231 Reputation points Microsoft Employee Moderator
    2022-04-05T05:00:33.497+00:00

    What is your join condition?


  2. MarkKromer-MSFT 5,231 Reputation points Microsoft Employee Moderator
    2022-04-05T05:20:36.8+00:00

    Here is one way to solve it:

    1. Create a new data flow
    2. Add 2 sources: 1 for Persons.csv and 1 for FileExample.csv
    3. Add a surrogate key transformation after each source, names the keys as sk1 and sk2 respectively
    4. Add a Join transformation and join on sk1 == sk2
    5. After the Join, add a Select transformation and remove the sk1 and sk2 columns

  3. MarkKromer-MSFT 5,231 Reputation points Microsoft Employee Moderator
    2022-04-06T06:47:35.28+00:00

    You pattern will look something like this:

    2 Delimited Text sources that you join on the surrogate keys and then write to the SQLSink. A 3rd source is the same SQL table that you write to in the sink. Notice I've set the sink ordering to ensure that I write the data first (SQLSink), then read back the auto-incremented IDs after the table write has been committed. The query I'm using in the ReadFromSQL just reads the data from that table so that I can write the IDs to my OuputIDs CSV file.

    190462-image.png


Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.