Copy activity successfully loads more rows than Write Batch Size in Azure pipeline

Surya Raja 20 Reputation points
2023-10-07T13:58:19.4+00:00

In an Azure pipeline copy activity, I set the Write Batch Size to 1000, but it successfully loaded 1200 rows from a CSV file in Azure blob storage to an Azure SQL database. I expected it to fail since batch size was smaller than the number of rows, what happened here? Are there any explanations for this behavior?

Azure SQL Database
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
11,625 questions
0 comments No comments
{count} votes

Accepted answer
  1. Vinodh247 34,661 Reputation points MVP Volunteer Moderator
    2023-10-07T14:16:49.1933333+00:00

    Hi Surya Raja:

    Thanks for reaching out to Microsoft Q&A.

    For your easy understanding, I am copying a answer from 'BhargavGunnam' for an similar question that has been asked earlier. Your's didn't fail because it has started the second batch as your batchsize parameter was lower than the amount of rows in your source.

    I hope this answers your doubt, let me know if this helped.

    ADF determines the batch size based on the writeBatchSize property and the number of rows in the source data. If the number of rows in the source data is less than the writeBatchSize, then the entire data is processed in a single batch. If the number of rows in the source data is greater than the writebatchsize, then the data is processed in multiple batches, with each batch containing writeBatchSize rows. If you left this value blank, the value of "Write Batch Size" will be determined dynamically based on the row size by the Azure Data Factory as per documentation. This is the default behavior of the Copy Activity when the "Write Batch Size" property is not specified. To control the batch size, you can also explicitly set the "Write Batch Size" property to a specific value. However, if you set the value too high, it may cause performance issues or timeouts. It's recommended to test and optimize the batch size based on your specific scenario. In case if you still want to control the batchidentifier parameter in the insert and update query then you can try dynamic SQL to generate the table name with the GUID suffix and use it in your SP. Please note: It is not recommended to modify this parameter as it is an internal parameter used by ADF to manage the interim table.

    User's image

    Please 'Upvote'(Thumbs-up) and 'Accept' as answer if the reply was helpful. This will be benefitting other community members who face the same issue.

    1 person found this answer helpful.
    0 comments No comments

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.