Size of Data Read in ADF copy activity is much bigger than the original size of source data

Guilherme Matte 0 Reputation points
2024-01-12T05:00:18.89+00:00

Hello guys In a similar way of the colleague in the following post: https://learn.microsoft.com/en-us/answers/questions/1411983/why-the-size-of-data-read-in-adf-copy-activity-is I'm having an issue where the Data Read in an ADF Copy Activity goes for more than 30GB where the Data Written is around 650MB.User's image

The source file is an uncompressed parquet with a 0.5GB size, which makes everything very unexpected and also taking much longer than it should, since when looking at the time, more than 80% is spent reading from the source, and only a minimal amount writing to sink.Below is just a screenshot of the file size. User's image

This copy is from an S3 bucket into a SQL Database and this issue was not happening when this same file was a .csv, just after changing the formats to .parquet this issue started to happen. The whole pipeline is about 70 of these files, so the time and cost is now a real issue. Obs: if i use the dataflow activity this issue does not happen. Any ideas? Cheers

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
11,226 questions
{count} votes

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.