Excel file with multiple row of column headers and blank columns

Jzd 1 Reputation point
2021-07-15T06:49:00.887+00:00

Hi,

I have an Excel file as source in Data Factory which has multiple rows for column headers and blank columns as separators. May I know what would be the best approach to transform this dataset into a single dataset?

114889-image.png

My idea is to make it like:

Header 1 Subhead 1
Header 1 Subhead 2
Header 2 Subhead 3
Header 2 Subhead 4

But I don't know how to realize it :(

Also I have blank columns which I delete manually in Select function. Is there any way I can select columns in the Dataset options?

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
10,196 questions
{count} votes

1 answer

Sort by: Most helpful
  1. HimanshuSinha-msft 19,386 Reputation points Microsoft Employee
    2021-07-16T19:47:41.093+00:00

    Hello @,
    Thanks for the ask and using the Microsoft Q&A platform .

    I did tried to work with the data which you shared . The blank column which basically acts like a boundry between do different dataset , shows up as a blank column with no records but a header . See the snapshot below . I was thinking if we can use some functions to split this , but I dont think we have any . The only option which i see is to use a SELECT transformation and select the first 4 columns .

    115474-image.png

    In excel the columns like header1 is merged but when In dataflow it appears to be like

    115522-image.png

    This is aklso not helping us much here .

    As is I dont think we can achieve the goal here , unless you make some modification on the Excel side .

    LThanks
    HImanshu

    Please do let me know how it goes .
    Thanks
    Himanshu