How do I split a large file into multiple files with specific number of rows in each file?

Bala 176 Reputation points
2021-03-22T17:45:22.733+00:00

I have a large source file that I want to split, with each file having 10K rows. Data flow allows me to split into set number of partitions. But there is a problem. I don't want to split source file if it has only 10K or less rows. Anything above 10K should be split into multiple 10K chunks.

As an example

12K rows => produces 2 files - 1 with 10K another with 2K
20K rows => produces 2 files - each with 10K
9K rows => produces 1 file
20.1K rows => produces 3 files - two 10k files and 1 with remaining rows and so on

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
6,292 questions
{count} votes

1 answer

Sort by: Most helpful
  1. MarkKromer-MSFT 5,136 Reputation points Microsoft Employee
    2021-03-22T20:47:11.843+00:00

    Use the techniques in this blog post below to create your formula for dynamically sizing the size of partition:

    https://kromerbigdata.com/2021/03/04/dynamic-data-flow-partitions-in-adf-synapse/

    In my example, I used a hardcoded value for the target file size. But you can use case() or iif() to apply your rule as described above in the size expression.

    No comments