How do I split incoming data by column into separate files per row based on column

Scott Klein 161 Reputation points
2023-12-04T18:39:52.0466667+00:00

First, I want to thank everyone who has answered my ADF questions so far. I have been diving into ADF deeper than I expected to go, so thank you to all who have provided some guidance.

Second, I need to back the bus up and provide a bit more detail of the problem I am trying to solve. At a high level, I am pulling in data from SQL Server to Azure blob storage. For the sake of this example, we'll say that the SQL table has 5 columns. Column 1 is an ID, and columns 4 and 5 are varchar(max) columns in which XML is stored.

User's image

For example, let's say we are pulling in 10 rows (ID 1 through 10). What we want to do is split the incoming data, by column, into separate files, such that, columns 1 through 3 go into 1 file. Then, for each row, column 4 will go into separate files, and column 5 will go into separate files. Thus, when all is pulled in, there will be 1 file with rows 1-3, then 10 different files with row 4, and 10 different files for row 5.

We'll dynamically add a column or two to file 1 which points to the appropriate column 4 and column 5 files, something like this:

User's image

Thus, when the ADF pipeline and data flow completes, I would have a total of 30 files; 1 main file and 20 xml files. Hopefully this makes sense.

I have started down this path by getting it working somewhat, using a data flow and branching. I get 1 main file, 1 file for all the xml in column 4 (all 10 rows), and 1 file for the xml in column 5 (all 10 row). Using the derived column I can dynamically name the 2nd and 3rd files.

So, ultimately, I have two questions; What I do NOT know how to do is what I mentioned above; how do I get a file per row for columns 4 and 5, and second, I'd like to pass the value of the ID column through so that I can create the filenames with pointing back to the specific row as sown in the image above. Currently my data flow looks like this, so hopefully I fairly close.

User's image

Again, thank you for the insight and help!

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
11,627 questions
{count} votes

2 answers

Sort by: Most helpful
  1. AnnuKumari-MSFT 34,556 Reputation points Microsoft Employee Moderator
    2023-12-05T07:29:27.4833333+00:00

    Hi Scott Klein ,

    Thankyou for using Microsoft Q&A platform and thanks for posting your query here.

    As per my understanding you are trying to split the data present in the source file into multiple files based on rows as well as columns . So , first 3 columns per row should go in file1 to file10. Fourth column should get splitted into file 11 to 20 per row. Fifth column should be splitted into file 21 to 30. Please correct me if my understanding has any gap.

    • For splitting column-wise , you can use 'Select transformation' as you already have done it. For splitting row-wise, you can use 'Conditional split' transformation where you can specify conditions for each row (ex: Substring of Id equals 1, Substring of Id equals 2 and so on..)
    • In order to dynamically provide sink file name as column data, you can go to sink settings and choose 'File name option' as 'Name file as column data' User's image

    Hope it helps. Kindly accept the answer by clicking on Accept answer button. Thankyou


  2. MarkKromer-MSFT 5,226 Reputation points Microsoft Employee Moderator
    2023-12-06T16:35:20.2733333+00:00

    Hi Scott! By using "Name file as column data" you should be able to make a new file for every value for every row. But your question seems to indicate that you are not getting a file for every row. Is that correct?


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.