Azure Data Flow splitting dataset into multiple tables

HansJKiamzon 56 Reputation points
2020-12-03T05:22:41.17+00:00

Hi team. I'm utilizing Azure Data Flow to read in an excel file with multiple tabs of data. One of the tabs contains three sets of data with the same schema, that is required to be broken out into separate datasets. The data is formatted similar to this:

44645-multipledatasets.png

We're not able to define ranges for each set of data because the # of rows changes. Because each set of data shares the same column headers, I'm thinking that a test needs to be performed to identify the header and then write the succeeding rows to a dataset until the next header or when the next row is null (after last dataset).

Any suggestions for implementing this logic would be appreciated.

Ty.

Hans Kiamzon

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
11,624 questions
0 comments No comments
{count} votes

Accepted answer
  1. Kiran-MSFT 696 Reputation points Microsoft Employee
    2020-12-03T06:40:01.563+00:00

    You can do this with a combination of keyGenerate (Surrogate key) followed by(branch/filter then join) and then Split.

    You will branch and filter all header rows with their row position. You will then use a Agg tx with empty groupBy and use collect function to get all the header row indices as one row.

    Join the header row indices(single row) with the full data as cross join.

    Then use the information in the joined data to split into 3 streams.


0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.