Hi @AE90 You're on the right path, thanks for sharing the detailed steps. The issue you're facing (where columns show up as Prop_0
, Prop_1
, etc.) usually comes down to how ADF is interpreting the CSV schema during copy. By default, the Copy Data activity doesn’t treat the first row as headers unless it's explicitly told to which causes it to auto-generate column names.
Here’s a step-by-step suggestion to make your pipeline both efficient and schema-aware:
Copying ZIP files from SFTP to Blob: You’ve already done this great. Using wildcards and scheduling the trigger daily is exactly the right approach.
Unzipping the files: Once your ZIP files are in Blob, ADF needs help to extract them, since there's no built-in unzip activity. You have a couple of solid options here:
- Azure Function or Logic App: Write a small function or flow that extracts the ZIP contents to a staging folder in your Blob container.
- ZipDeflate in Binary Dataset: If each ZIP contains only one file and the structure is predictable, ADF supports ZIP decompression when using the Binary dataset + ZipDeflate compression setting.
Copying the extracted CSVs and fixing the header issue: This is where you’ll correct the schema recognition problem:
- Use a DelimitedText dataset instead of Binary.
- In the dataset settings, make sure "First row as header" is set to True.
- Avoid checking "Import schema" if the CSV structures vary — this lets ADF treat them dynamically.
- Skip "Merge Files" unless all your CSVs have the same schema (which it sounds like they don’t).
With this setup, ADF should properly read the actual column headers from the file instead of generating PROP_#
columns
For more detailed information, you can refer to https://learn.microsoft.com/en-us/azure/data-factory/copy-activity-schema-and-type-mapping
Hope this helps. If this answers your query, do click Accept Answer
and Yes
for was this answer helpful. And, if you have any further query do let us know.