Copy data not recognizing existing headers

Question

Copy data not recognizing existing headers

AE90 20

I'm very new to DF, so I need help figuring out the most efficient way to set up a pipeline. I have to copy .zip files from an SFTP, store them in Blob storage, and extract .csv files with certain names from the .zip file to process and move into a database. .zip files are added the SFTP and need to be picked up daily. Each .zip file contains files with different information and different columns (ex. users_todaysdate.csv, demographics_todaysdate.csv, service_rates_todaysdate.csv).

I've been able to copy the files from SFTP to Blob and set the trigger to copy them daily, then, using wildcard file path, copying each file with a specific prefix (users, demographics, service_rates) to a new folder, so they're no longer compressed. However, it's not recognizing my headers and keeps adding "PROP_#" as headers to the new file copied into the final folder. Or it tries to force me to use the "merge files" copy behavior, which still doesn't recognize the headers.

How do I fix this? Is there a better, more efficient way for this pipeline?

Venkat Reddy Navari 2,975 Reputation points Microsoft External Staff Moderator

2025-06-20T05:39:49.4833333+00:00

@AE90 Following up to see if the below answer was helpful. If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

Accepted answer

0 additional answers

Your answer

Venkat Reddy Navari 2,975 Reputation points Microsoft External Staff Moderator

2025-06-20T05:39:49.4833333+00:00

@AE90 Following up to see if the below answer was helpful. If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

Answer 1

Venkat Reddy Navari 2,975 Microsoft External Staff Moderator

Hi @AE90 You're on the right path, thanks for sharing the detailed steps. The issue you're facing (where columns show up as Prop_0, Prop_1, etc.) usually comes down to how ADF is interpreting the CSV schema during copy. By default, the Copy Data activity doesn’t treat the first row as headers unless it's explicitly told to which causes it to auto-generate column names.

Here’s a step-by-step suggestion to make your pipeline both efficient and schema-aware:

Copying ZIP files from SFTP to Blob: You’ve already done this great. Using wildcards and scheduling the trigger daily is exactly the right approach.

Unzipping the files: Once your ZIP files are in Blob, ADF needs help to extract them, since there's no built-in unzip activity. You have a couple of solid options here:

Azure Function or Logic App: Write a small function or flow that extracts the ZIP contents to a staging folder in your Blob container.
ZipDeflate in Binary Dataset: If each ZIP contains only one file and the structure is predictable, ADF supports ZIP decompression when using the Binary dataset + ZipDeflate compression setting.

Copying the extracted CSVs and fixing the header issue: This is where you’ll correct the schema recognition problem:

Use a DelimitedText dataset instead of Binary.
In the dataset settings, make sure "First row as header" is set to True.
Avoid checking "Import schema" if the CSV structures vary — this lets ADF treat them dynamically.
Skip "Merge Files" unless all your CSVs have the same schema (which it sounds like they don’t).

With this setup, ADF should properly read the actual column headers from the file instead of generating PROP_# columns
For more detailed information, you can refer to https://learn.microsoft.com/en-us/azure/data-factory/copy-activity-schema-and-type-mapping

Hope this helps. If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

Venkat Reddy Navari 2,975 Reputation points Microsoft External Staff Moderator

2025-06-18T10:02:19.1533333+00:00

@AE90 Following up to see if the above answer was helpful. If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.
AE90 20 Reputation points

2025-06-18T18:07:55.87+00:00
@Venkat Reddy Navari Thank you for your response and helpful information! I've already done pretty much everything you suggested and it's still not reading my headers.

all of my datasets are set to DelimitedText (.csv), no Binary

for unzipping, I used Compression type "ZipDeflate" on the dataset being copied from the SFTP. From what I can tell, all the files copied as unzipped .csv's into my Blob storage folder just fine. Each zipped file may have different files (as mentioned in my original post), but the columns for each file don't change. (For ex. users_todaysdate.csv has the same 5 columns, demographics_todaysdate.csv has the same 8 columns, and service_rates_todaysdate.csv has the same 3 columns for each deposit)

From the Blob storage folder that holds all the unzipped folders, I use a wildcard search to search each folder for the specific file name prefix I need to move each daily file into its designated dataset, so "Users" has its own dataset, "Demographics" has its own dataset, and "Service_Rates" has its own dataset.

import schema is not checked

First row as header is checked

I've tried the "Merge files" copy behavior with no success. Using any other copy behavior works, but doesn't recognize my header.

Please let me know if there is any additional information I can provide to figure out why I still can't get it to recognize my headers.
Venkat Reddy Navari 2,975 Reputation points Microsoft External Staff Moderator

2025-06-19T10:00:14.17+00:00

@AE90 thanks for confirming the setup, sounds like you've got most things right. Since the header issue persists despite using “First row as header” and DelimitedText, here are a few options to try:

Check header behavior using Data Flows: Use columnNames() in a Mapping Data Flow to inspect how ADF sees your columns at runtime. This helps detect hidden characters or misread headers.

Dynamic schema mapping: Instead of using separate datasets, create a parameterized dataset and use dynamic mapping in your Copy activity. This is useful if your file prefixes (users, demographics, etc.) are consistent, but schemas differ.

Azure Function for header cleanup: If headers are inconsistent or malformed, an Azure Function can read the first row, clean up header names, and return the proper schema for ADF to use.

Logic App pre-processor (optional): For production-grade pipelines, a Logic App can unzip, validate headers, and attach metadata before ADF ingestion.

Also double-check the encoding (UTF-8) and delimiters in the dataset match the actual file content that’s a common hidden culprit.

Hope this helps. If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.
AE90 20 Reputation points

2025-06-24T15:15:30.43+00:00

@Venkat Reddy Navari I was able to manually add the correct headers in the mapping, but I still need to get rid of the first row containing the old headers. I used the following data flow which assigns row numbers, then deletes row<=1.

The preview shows correctly and shows row 1 being deleted through the alterRow1 step, but the row still exists in the sink step. How do I get the row to actually delete?
Venkat Reddy Navari 2,975 Reputation points Microsoft External Staff Moderator

2025-06-24T16:54:15.29+00:00
@AE90 You're absolutely right while the Alter Row step is correctly tagging the first row for deletion, the delete action doesn't get applied when writing to Blob (CSV) sinks. That's because delete logic in Data Flows only works with specific sinks like Azure SQL or Delta Lake not when exporting to flat files.

To actually remove the first row (the old header), Use a Filter transformation instead of Alter Row:

Add a Filter activity right after your rownum step.

Use a condition like Row > 1 to remove the first row.

This ensures only actual data rows are passed into the sink.

Alternatively, if you're using Derived Column, you can create a flag (isDataRow = Row > 1) and filter based on that.

Hope this helps. If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.
Venkat Reddy Navari 2,975 Reputation points Microsoft External Staff Moderator

2025-06-25T09:53:38.47+00:00

@AE90 Following up to see if the above answer was helpful. If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.
AE90 20 Reputation points

2025-06-25T19:11:44.9133333+00:00

Yes, that worked! Thank you. I was able to get the dataset to load the correct headers.

Share via

Copy data not recognizing existing headers

0 additional answers

Your answer