Share via

Why would a data file created via ADF copy activity have the header record in it more than once

Laurent Delaquis 40 Reputation points
2026-02-11T02:54:47.98+00:00

We have an ADF copy data activity that copies a data file from a SFTP location to an ADLS folder. The file is then later copied from the ADLS folder into an Azure SQL database. This activity has been working for many years. Recently the second copy activity has been failing.

With some data file testing, it was realized that when the data file from the SFTP location has a file size of 16,383KB (or less) it loads successfully. When the SFTP data file has a size of 16,386 KB, the file was not successfully loaded. The cause of the failure is that the larger file has the header record a second time mixed in with the data, which is causing a number of columns error. This does not happen in the smaller file size.

Does anyone know why the header record would appear a second time during the SFTP to ADLS copy? I want to think that it must be related to the file size. Or is it possible that copy activity is interpreting a new/additional data file is being appending to the data that has already been written?

Any and all suggestion/ideas are welcome. Thank you.

Azure Data Factory
Azure Data Factory

An Azure service for ingesting, preparing, and transforming data at scale.

0 comments No comments
{count} votes

Answer accepted by question author
  1. SAI JAGADEESH KUDIPUDI 485 Reputation points Microsoft External Staff Moderator
    2026-02-11T06:01:16.6633333+00:00

    Hi Laurent Delaquis,
    This happens due to Azure Data Factory (ADF) Copy Activity behavior when reading larger text files, especially from SFTP sources.

    When ADF copies a delimited text file (such as CSV) from SFTP, it does not always read the file as a single stream. For smaller files, the file is read in one pass and the header row appears only once. However, when the file size exceeds a certain threshold, ADF may automatically switch to chunked (block‑based) reading for performance reasons.

    In chunked processing, the file is read in multiple segments. For text-based files, the header row can be re-read at the start of a new chunk. ADF does not automatically suppress or remove duplicate headers during this process, so the header row gets written again into the output file, often appearing in the middle of the data.

    As a result, the file created by the Copy Activity can contain the header record more than once, even though the source file has only a single header. This is a known and documented behavior of Copy Activity with certain connectors (such as SFTP) and is not caused by file appending or multiple source files being merged.

    Microsoft has documented similar scenarios where chunked reads during Copy Activity can lead to duplicate rows or headers when processing large files from SFTP sources.

    Mitigation: To avoid this behavior, disable chunking for the Copy Activity source or copy the file as‑is (binary copy) and handle header processing only during downstream ingestion.

    Reference links:
    Copy Activity internal behavior and processing model: https://learn.microsoft.com/en-us/azure/data-factory/copy-activity-overview
    SFTP connector behavior in ADF: https://learn.microsoft.com/en-us/azure/data-factory/connector-sftp?tabs=data-factory

    Hope this helps. If you have any follow-up questions, please let me know. I would be happy to help.

    Please do not forget to "up-vote" wherever the information provided helps you, as this can be beneficial to other community members.

    1 person found this answer helpful.

1 additional answer

Sort by: Most helpful
  1. Q&A Assist
    2026-02-11T02:55:00.6633333+00:00
    AI generated content. This question contains an answer created with AI from Q&A Assist. Learn more

    The issue you're encountering with the header record appearing multiple times in the data file during the copy activity from SFTP to ADLS could be related to how the copy activity processes the file. Here are a few potential reasons:

    1. File Size Limitations: It's possible that the copy activity has certain limitations or behaviors that change based on the file size. When the file exceeds a specific size, the activity might be handling the data differently, potentially leading to the header being duplicated.
    2. File Splitting: If the copy activity is configured to handle large files by splitting them into smaller chunks, it might be incorrectly interpreting the header row in each chunk, resulting in multiple headers being included in the final output.
    3. Data Consistency Verification: If data consistency verification is enabled, the copy activity might be encountering issues with how it verifies the data integrity, leading to unexpected behavior such as duplicating the header.
    4. File Format Issues: Ensure that the file format and delimiters are correctly defined in the copy activity settings. If the file format is not properly recognized, it may lead to misinterpretation of the header and data rows.
    5. Source Data Issues: Check the source data file itself for any anomalies. Sometimes, the source file might have unintended extra header rows that are not visible in smaller files but become apparent in larger ones.

    To troubleshoot this issue, you may want to:

    • Review the copy activity settings and ensure that the file format and delimiters are correctly set.
    • Test with different file sizes and inspect the contents of the files to see if there are any discrepancies.
    • Consider enabling session logging to capture more details about the copy activity's behavior during execution.

    If the problem persists, you may need to reach out to Microsoft support for further assistance specific to Azure Data Factory's handling of large files.


    References:

    0 comments No comments

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.