Excel file with multiple row of column headers, want to make it one column header using azure data factory

Question

Excel file with multiple row of column headers, want to make it one column header using azure data factory

Tulsi K 20

Hi, I have an Excel file used as a data source in Data Factory, where I encountered a challenge with multiple rows for column headers (Nested headers). Specifically, I aim to consolidate two rows into a single row to streamline the header columns. To provide a clearer understanding, consider the following example: User's image By merging these rows, I aim to make like this.

project1 test1, project1 test2, project1 test3, project2 test4, project2 test5, project2 test6.

Guidance on how to efficiently accomplish this task within Data Factory would be helpful.

QuantumCache 20,366 Reputation points Moderator

2024-02-28T21:01:59.9233333+00:00

Khimlal, Tulsi Please update your question. The expected format is difficult to understand. Please add the expected output.
Tulsi K 20 Reputation points

2024-02-29T05:07:25.08+00:00

Hi @SatishBoddu-MSFT The expected out put is as below

Please note that 2nd column i.e., test1, test2 is been repeated in actual data
QuantumCache 20,366 Reputation points Moderator

2024-02-29T05:11:34.4266667+00:00

@Khimlal, Tulsi Thanks for sharing the information. Are the names of the Source Columns fixed or Dynamic?
Tulsi K 20 Reputation points

2024-03-18T11:48:43.4833333+00:00

Hi @SaBo-MFST apologies for late response.

It should be fixed but it might change once in a year as it contains fiscal year data.
and one more thing is in second column the columns are repeated.

Now the requirement is to send to azure sql database using adf. Can we achieve using tsql query?

1 answer

Your answer

QuantumCache 20,366 Reputation points Moderator

2024-02-28T21:01:59.9233333+00:00

Khimlal, Tulsi Please update your question. The expected format is difficult to understand. Please add the expected output.
Tulsi K 20 Reputation points

2024-02-29T05:07:25.08+00:00

Hi @SatishBoddu-MSFT The expected out put is as below

Please note that 2nd column i.e., test1, test2 is been repeated in actual data
QuantumCache 20,366 Reputation points Moderator

2024-02-29T05:11:34.4266667+00:00

@Khimlal, Tulsi Thanks for sharing the information. Are the names of the Source Columns fixed or Dynamic?
Tulsi K 20 Reputation points

2024-03-18T11:48:43.4833333+00:00

Hi @SaBo-MFST apologies for late response.

It should be fixed but it might change once in a year as it contains fiscal year data.
and one more thing is in second column the columns are repeated.

Now the requirement is to send to azure sql database using adf. Can we achieve using tsql query?

Answer 1

Amira Bedhiafi 33,071 Volunteer Moderator

Since your Excel file has nested headers, you need to flatten these headers. This involves a custom transformation because you need to merge two rows of headers into one. Beging with loading the Excel data into a staging environment, such as a SQL database or Azure Blob Storage, preserving the headers as they are. Then use a custom script (Azure Function, Azure Databricks notebook, or stored procedure in SQL Database) to programmatically merge the header rows. This script would read the first two rows, merge the header names accordingly (e.g., "Project1 test1", "Project1 test2", etc.), and then apply these as the column headers for the dataset. Once the headers are correctly formatted, use Data Flow within ADF to further transform, filter, or aggregate your data as needed. You can now treat the data as having a single row of headers.

Tulsi K 20 Reputation points

2024-02-21T11:37:22.48+00:00

Hi @Amira Bedhiafi Thanks for your response. Is there any way to achieve this using only data flows in adf?
Amira Bedhiafi 33,071 Reputation points Volunteer Moderator

2024-02-21T12:12:27.4233333+00:00

Yes but with json data maybe https://learn.microsoft.com/en-us/azure/data-factory/data-flow-flatten

Share via

Excel file with multiple row of column headers, want to make it one column header using azure data factory

1 answer

Your answer