How do I split a large file into multiple files with specific number of rows in each file?

Bala 176 Reputation points

I have a large source file that I want to split, with each file having 10K rows. Data flow allows me to split into set number of partitions. But there is a problem. I don't want to split source file if it has only 10K or less rows. Anything above 10K should be split into multiple 10K chunks.

As an example

12K rows => produces 2 files - 1 with 10K another with 2K
20K rows => produces 2 files - each with 10K
9K rows => produces 1 file
20.1K rows => produces 3 files - two 10k files and 1 with remaining rows and so on

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
6,292 questions
{count} votes

1 answer

Sort by: Most helpful
  1. MarkKromer-MSFT 5,136 Reputation points Microsoft Employee

    Use the techniques in this blog post below to create your formula for dynamically sizing the size of partition:

    In my example, I used a hardcoded value for the target file size. But you can use case() or iif() to apply your rule as described above in the size expression.

    No comments