How do I split incoming data by column into separate files per row based on column

Question

How do I split incoming data by column into separate files per row based on column

Scott Klein 161

First, I want to thank everyone who has answered my ADF questions so far. I have been diving into ADF deeper than I expected to go, so thank you to all who have provided some guidance.

Second, I need to back the bus up and provide a bit more detail of the problem I am trying to solve. At a high level, I am pulling in data from SQL Server to Azure blob storage. For the sake of this example, we'll say that the SQL table has 5 columns. Column 1 is an ID, and columns 4 and 5 are varchar(max) columns in which XML is stored.

User's image

For example, let's say we are pulling in 10 rows (ID 1 through 10). What we want to do is split the incoming data, by column, into separate files, such that, columns 1 through 3 go into 1 file. Then, for each row, column 4 will go into separate files, and column 5 will go into separate files. Thus, when all is pulled in, there will be 1 file with rows 1-3, then 10 different files with row 4, and 10 different files for row 5.

We'll dynamically add a column or two to file 1 which points to the appropriate column 4 and column 5 files, something like this:

User's image

Thus, when the ADF pipeline and data flow completes, I would have a total of 30 files; 1 main file and 20 xml files. Hopefully this makes sense.

I have started down this path by getting it working somewhat, using a data flow and branching. I get 1 main file, 1 file for all the xml in column 4 (all 10 rows), and 1 file for the xml in column 5 (all 10 row). Using the derived column I can dynamically name the 2nd and 3rd files.

So, ultimately, I have two questions; What I do NOT know how to do is what I mentioned above; how do I get a file per row for columns 4 and 5, and second, I'd like to pass the value of the ID column through so that I can create the filenames with pointing back to the specific row as sown in the image above. Currently my data flow looks like this, so hopefully I fairly close.

User's image

Again, thank you for the insight and help!

AnnuKumari-MSFT 34,556 Reputation points Microsoft Employee Moderator

2023-12-19T09:45:14.72+00:00

Hi Scott Klein ,

Just following up to see if the below answer helped. Please do consider clicking Accept Answer as accepted answers help community as well. Also, please click on Yes for the survey 'Was the answer helpful'

2 answers

Your answer

AnnuKumari-MSFT 34,556 Reputation points Microsoft Employee Moderator

2023-12-19T09:45:14.72+00:00

Hi Scott Klein ,

Just following up to see if the below answer helped. Please do consider clicking Accept Answer as accepted answers help community as well. Also, please click on Yes for the survey 'Was the answer helpful'

Answer 1

AnnuKumari-MSFT 34,556 Microsoft Employee Moderator

Hi Scott Klein ,

Thankyou for using Microsoft Q&A platform and thanks for posting your query here.

As per my understanding you are trying to split the data present in the source file into multiple files based on rows as well as columns . So , first 3 columns per row should go in file1 to file10. Fourth column should get splitted into file 11 to 20 per row. Fifth column should be splitted into file 21 to 30. Please correct me if my understanding has any gap.

For splitting column-wise , you can use 'Select transformation' as you already have done it. For splitting row-wise, you can use 'Conditional split' transformation where you can specify conditions for each row (ex: Substring of Id equals 1, Substring of Id equals 2 and so on..)
In order to dynamically provide sink file name as column data, you can go to sink settings and choose 'File name option' as 'Name file as column data'

Hope it helps. Kindly accept the answer by clicking on Accept answer button. Thankyou

Scott Klein 161 Reputation points

2023-12-05T22:24:27.6933333+00:00

Hi Annu, thank you for the information. Can you please elaborate on conditional split. In my scenario, there is no "condition". Meaning, I just want to send column 4 of every row to a separate file. There may be thousands of rows, so specifying a condition for each row is unrealistic. And, the number of rows will probably different each time the data flow runs.
AnnuKumari-MSFT 34,556 Reputation points Microsoft Employee Moderator

2023-12-19T09:44:57.38+00:00

Hi Scott Klein ,

You can add surrogate key transformation which will assign each row with unique consecutive Id which you can use to perform the split where you can define the condition like :

Id==1 for first branch , Id==2 for second branch and so on . If you want different number of rows , you can define the condition accordingly like 'Id>=3 & Id<=6' can be one condition too. Each branch can be attached to different sink transformations.

Answer 2

MarkKromer-MSFT 5,226 Microsoft Employee Moderator

Hi Scott! By using "Name file as column data" you should be able to make a new file for every value for every row. But your question seems to indicate that you are not getting a file for every row. Is that correct?

Scott Klein 161 Reputation points

2023-12-11T04:58:03.2733333+00:00

Hi Mark, that is correct, when setting "Name file as column data" I was not getting a file for every row, unless I configured something incorrectly.
AnnuKumari-MSFT 34,556 Reputation points Microsoft Employee Moderator

2023-12-14T06:56:22.4466667+00:00

Hi Scott Klein ,

Are you getting any error when you try the above suggested approach? Or , it's just not working.

Share via

How do I split incoming data by column into separate files per row based on column

2 answers

Your answer