Size of Data Read in ADF copy activity is much bigger than the original size of source data
Hello guys
In a similar way of the colleague in the following post: https://learn.microsoft.com/en-us/answers/questions/1411983/why-the-size-of-data-read-in-adf-copy-activity-is I'm having an issue where the Data Read in an ADF Copy Activity goes for more than 30GB where the Data Written is around 650MB.
The source file is an uncompressed parquet with a 0.5GB size, which makes everything very unexpected and also taking much longer than it should, since when looking at the time, more than 80% is spent reading from the source, and only a minimal amount writing to sink.Below is just a screenshot of the file size.
This copy is from an S3 bucket into a SQL Database and this issue was not happening when this same file was a .csv, just after changing the formats to .parquet this issue started to happen. The whole pipeline is about 70 of these files, so the time and cost is now a real issue. Obs: if i use the dataflow activity this issue does not happen. Any ideas? Cheers