Why the size of Data Read in ADF copy activity is actually much bigger than the original size of source data?
I have initiated a data transfer operation from Google Cloud Storage to Azure Data Lake Storage within Azure Data Factory. The objective was to transfer approximately 3,000 files in snappy.parquet format, with a combined size of approximately 30 GB.
Upon the completion of the data transfer operation, I observed that Azure Data Factory reported the processed read data as being up to 5.8 TB. This unexpected result has raised concerns about the associated cost on the Google Cloud Platform side.
In an effort to gain clarity and understanding of the underlying reasons for this substantial increase in processed data, I have conducted extensive research, referring to resources such as Microsoft Learn and Azure documentation. Unfortunately, I was unable to find a comprehensive explanation that would shed light on how and why the processed data volume reached 5.8 TB. I attached the image to provide more details about the task below.
Resolving this question is of utmost importance as it directly impacts our decision-making process regarding Azure as our preferred multicloud solution.
Thank you in advance for your answer
Cheers
Azure Data Lake Storage
Azure Data Factory
1 answer
Sort by: Most helpful
-
Deleted
This answer has been deleted due to a violation of our Code of Conduct. The answer was manually reported or identified through automated detection before action was taken. Please refer to our Code of Conduct for more information.
Comments have been turned off. Learn more