Why the size of Data Read in ADF copy activity is much bigger than the table size from BigQuery?

Question

Why the size of Data Read in ADF copy activity is much bigger than the table size from BigQuery?

Sheldon Wong 0

I am currently using Azure Data Factory (ADF) to copy data from a BigQuery table to Azure Blob Storage. However, I noticed a discrepancy where the "Data Read" metric in the copy activity is significantly larger than the actual size of the source table in BigQuery.

For example, when copying data for June 18, 2025, the BigQuery table size was much smaller, yet ADF reported 71.56GB of data read.
User's image

In Big Query, the logical storage is 9.72 GB
User's image

Could you kindly help clarify why this difference occurs and if there are any optimizations to reduce the data transfer volume?

Thank you in advance for your answer

MD ANOWARUL ISLAM 0 Reputation points

2025-06-25T02:36:12.6733333+00:00

Thanks for the update
Chandra Boorla 14,510 Reputation points Microsoft External Staff Moderator

2025-06-27T09:14:25.8233333+00:00

@Sheldon Wong

We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet. In case if you have any resolution, please do share that same with the community as it can be helpful to others. Otherwise, will respond with more details and we will try to help.

1 answer

Your answer

MD ANOWARUL ISLAM 0 Reputation points

2025-06-25T02:36:12.6733333+00:00

Thanks for the update
Chandra Boorla 14,510 Reputation points Microsoft External Staff Moderator

2025-06-27T09:14:25.8233333+00:00

@Sheldon Wong

We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet. In case if you have any resolution, please do share that same with the community as it can be helpful to others. Otherwise, will respond with more details and we will try to help.

Answer 1

Hi @Sheldon Wong

Thank you for the details - this helps clarify the situation.

Summary of what we’re seeing:

BigQuery table size (logical): 9.72 GB
Rows read: 14.56 million
ADF Copy Activity - Data Read: 71.56 GB
Data Written to Blob: 36.56 GB

Why Is ADF “Data Read” much larger?

This behavior is expected due to the way ADF pulls data from BigQuery:

Data materialized during transfer

BigQuery stores data in compressed, columnar format. When ADF reads the data, it:

Converts it into row-based JSON-like structure via the BigQuery Storage API
This leads to a significant expansion in size, especially with:
- Long text fields
- Repeated or nested fields
- Wide schemas

Serialization overhead

When data is transmitted, it includes:

Field names, delimiters, and formatting overhead (especially in formats like JSON or Avro)
UTF-8 encoding, which can further inflate string-based data

Data written is smaller because of compression or column selection

In your case, the written size (36.56 GB) is about half of what was read - this suggests either:

Compression was applied on write to Azure Blob
Or the sink format is more storage-efficient

Optimization Tips

To minimize inflated read size from BigQuery:

Avoid SELECT: Specify only the required columns in your ADF dataset query.
Flatten Nested Fields: If your schema has nested structures or arrays, flattening them can reduce transfer overhead.
Use Query Folding in Source: Use a custom SQL query in the source dataset to filter and limit the dataset early.
Use Avro or Parquet in Sink: These formats are more compact and reduce Blob storage size and write times.

I hope this information helps. Please do let us know if you have any further queries.

Kindly consider upvoting the comment if the information provided is helpful. This can assist other community members in resolving similar issues.

Sheldon Wong 0 Reputation points

2025-06-26T01:36:36.5833333+00:00

Hi @Smaran Thoomu
Thank you so much for your response with detailed explanation. This should answer community having similar questions.
For the ways of minimize the sizing of either "Data Read" or "Data Write", I will try to apply some of them and hope you don't mind if I have further questions later here.
Thank you
Smaran Thoomu 24,110 Reputation points Microsoft External Staff Moderator

2025-06-26T04:23:24.8466667+00:00

Sheldon Wong You're most welcome - I’m glad the explanation helped clarify the difference in reported sizes.

Absolutely feel free to follow up here if you have any further questions or need help implementing any of the optimization approaches. We're happy to assist further as you experiment with minimizing both Data Read and Data Write volumes.

Looking forward to your updates!

Share via

Why the size of Data Read in ADF copy activity is much bigger than the table size from BigQuery?

1 answer

Your answer