Is it possible to merge files with headers and without headers?

Shunya Kakehi 1 Reputation point
2023-01-06T13:05:32.307+00:00

Hello.
I want to merge the file with headers and without headers in Azure Synapse pipeline, but I am facing a problem.
What I would like to achieve is as follows
276830-image.png
276891-image.png
276848-image.png
276892-image.png

When merging in a pipeline copy activity, the merge will not work because the files that can be selected for the source data set are either with or without headers.

・If the source data set is merged with a header, the first line of each file will be the header, so data from files without headers will be lost.
・If the source data set is made without headers, the headers of files that have headers will be included in the data.

I was using the method of removing headers from files with headers and manually entering the schema in the copy activity mapping, but it is nonsense to manually set the schema since I want the pipeline to run automatically every week.
276893-image.png
276839-image.png

Is there a way to merge only the data in each file using the headers of the file with headers?
Any answers would be appreciated.
Thank you.

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
5,373 questions
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
11,624 questions
{count} votes

1 answer

Sort by: Most helpful
  1. MartinJaffer-MSFT 26,236 Reputation points
    2023-01-11T16:41:18.1666667+00:00

    In short, no you can't merge files with-headers and without-headers. All must be one or the other.

    So, the next step is to alter them so they are the same!

    A lookup activity can get just the first row. If you know what the headers look like, then use an if-condition activity to check whether the returned row looks like a header. Then a copy activity can remove the header by using 2 similar datasets, one with "first row as header" (source) and one without "first row as header" (sink). This re-writes the file in place.

    Once all the files processed, we use a copy activity to merge them.

    @Shunya Kakehi

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.