An Azure service for ingesting, preparing, and transforming data at scale.
Hello Azharudheen r, Mohamed,
Yes, you are correct. You can set partitioning at the Dataflow level but can define cluster configuration in the pipeline settings only.
As the next steps, can you please check if you can break down the 4GB file into smaller files?
and to take advantage of partitioning, the structure the json needs to be one document per line.
By the way there is 2GB limit for block size on spark, so a row cannot be more than 2GB.