Copy data not recognizing existing headers

AE90 20 Reputation points
2025-06-17T13:11:56.3966667+00:00

I'm very new to DF, so I need help figuring out the most efficient way to set up a pipeline.  I have to copy .zip files from an SFTP, store them in Blob storage, and extract .csv files with certain names from the .zip file to process and move into a database.  .zip files are added the SFTP and need to be picked up daily. Each .zip file contains files with different information and different columns (ex. users_todaysdate.csv, demographics_todaysdate.csv, service_rates_todaysdate.csv).

 

I've been able to copy the files from SFTP to Blob and set the trigger to copy them daily, then, using wildcard file path, copying each file with a specific prefix (users, demographics, service_rates) to a new folder, so they're no longer compressed.  However, it's not recognizing my headers and keeps adding "PROP_#" as headers to the new file copied into the final folder.  Or it tries to force me to use the "merge files" copy behavior, which still doesn't recognize the headers.  

 

How do I fix this? Is there a better, more efficient way for this pipeline?

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
11,624 questions
{count} votes

Accepted answer
  1. Venkat Reddy Navari 2,975 Reputation points Microsoft External Staff Moderator
    2025-06-17T14:59:30.3233333+00:00

    Hi @AE90 You're on the right path, thanks for sharing the detailed steps. The issue you're facing (where columns show up as Prop_0, Prop_1, etc.) usually comes down to how ADF is interpreting the CSV schema during copy. By default, the Copy Data activity doesn’t treat the first row as headers unless it's explicitly told to which causes it to auto-generate column names.

    Here’s a step-by-step suggestion to make your pipeline both efficient and schema-aware:

    Copying ZIP files from SFTP to Blob: You’ve already done this great. Using wildcards and scheduling the trigger daily is exactly the right approach.

    Unzipping the files: Once your ZIP files are in Blob, ADF needs help to extract them, since there's no built-in unzip activity. You have a couple of solid options here:

    • Azure Function or Logic App: Write a small function or flow that extracts the ZIP contents to a staging folder in your Blob container.
    • ZipDeflate in Binary Dataset: If each ZIP contains only one file and the structure is predictable, ADF supports ZIP decompression when using the Binary dataset + ZipDeflate compression setting.

    Copying the extracted CSVs and fixing the header issue: This is where you’ll correct the schema recognition problem:

    • Use a DelimitedText dataset instead of Binary.
    • In the dataset settings, make sure "First row as header" is set to True.
    • Avoid checking "Import schema" if the CSV structures vary — this lets ADF treat them dynamically.
    • Skip "Merge Files" unless all your CSVs have the same schema (which it sounds like they don’t).

    With this setup, ADF should properly read the actual column headers from the file instead of generating PROP_# columns
    For more detailed information, you can refer to https://learn.microsoft.com/en-us/azure/data-factory/copy-activity-schema-and-type-mapping


    Hope this helps. If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.


0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.