Azure Data Flow splitting dataset into multiple tables

Question

Azure Data Flow splitting dataset into multiple tables

HansJKiamzon 56

Hi team. I'm utilizing Azure Data Flow to read in an excel file with multiple tabs of data. One of the tabs contains three sets of data with the same schema, that is required to be broken out into separate datasets. The data is formatted similar to this:

We're not able to define ranges for each set of data because the # of rows changes. Because each set of data shares the same column headers, I'm thinking that a test needs to be performed to identify the header and then write the succeeding rows to a dataset until the next header or when the next row is null (after last dataset).

Any suggestions for implementing this logic would be appreciated.

Ty.

Hans Kiamzon

Accepted answer

0 additional answers

Your answer

Answer 1

Kiran-MSFT 696 Microsoft Employee

You can do this with a combination of keyGenerate (Surrogate key) followed by(branch/filter then join) and then Split.

You will branch and filter all header rows with their row position. You will then use a Agg tx with empty groupBy and use collect function to get all the header row indices as one row.

Join the header row indices(single row) with the full data as cross join.

Then use the information in the joined data to split into 3 streams.

HansJKiamzon 56 Reputation points

2020-12-03T06:43:28.643+00:00

Hi Kiran. Thanks for the quick response. I will implement this logic tomorrow and will respond if there are additional questions. If not, I will accept the answer and up-vote.

thank you.

Hans
HansJKiamzon 56 Reputation points

2020-12-08T05:47:39.093+00:00

Hi again @Kiran-MSFT . I've started to implement your solution and have a clarifying question. When I aggregate the header rows and gather the indices using the collect function, I end up with an array. Do you suggest breaking up the array into separate columns using a derived column transformation, and then cross-joining this back with the main dataset?

Ty.

Hans
Kiran-MSFT 696 Reputation points Microsoft Employee

2020-12-08T05:49:44.863+00:00

You can split into individual parts by addressing the array as arr[1] etc or you can join and do it after. The end result is the same.
HansJKiamzon 56 Reputation points

2020-12-08T05:52:57.757+00:00

Understood. Thanks @Kiran-MSFT . Will continue implementing and let you know if there are additional questions.

Hans
HansJKiamzon 56 Reputation points

2020-12-14T07:25:53.123+00:00

Hi @Kiran-MSFT . I've successfully implemented your solution. Thank you very much for your assistance. I've up-voted.

Have a nice day.

Hans

Share via

Azure Data Flow splitting dataset into multiple tables

0 additional answers

Your answer