Hello @Jona So, are you expecting that the CSV file size on your disk must == ingress data size in Event hub?
Yes. However, I ran and interesting experiment. This are the parameters:
- 3.000 CSV files, with a toal size of 126 MB on disk (120 MB "not on disk", I don't how to express this)
- Those files are uploaded to ADLS, transformed to JSON and pushed to Event hubs.
- The size of the CSV files in JSON format in my local computer are 324 MB These are the metrics:
- 131 MB enters to ADLS. Compared to thos 126 MB, the difference is negligible. However, I would like to know where those 5 MB went
- After the Function transforming to JSON, 524 MB entered to Event hub. There is almost a difference of 200MB between my local calculation and the Event hub metric.
I can blame that difference to the transformacion from CSV to JSON, since I add some fields to the resulting JSON.
I'll be conducting some others tests
Did you check the size of the data after it has been transformed to JSON to see if it matches the ingress bytes metric reported by Azure Event Hubs? JSON is more Verbose complex data structure when compared to CSV which is compact in tabular format data, right? What is your Use Case scenario which you are working on this benchmarking? any performance test or load test? or is it related to cost estimation? if the transformation logic is generating unnecessary fields, removing them can reduce the file size!!!
Is a load test which must give some conclusions on budget estimations. So:
- Pushing CSV files to Event hub seem to be more budget friendly
- I'll run others test, to confirm wheater the fields I add represent those 200MB
I expect some opinions on transminting CSV or JSON files to Event hub. ¿is it right my experiments? In the end, I have to report about all this to my Manager.
By the way the display of metrics looks great, how did you build that metrics? any Public GitHub link, can we take a look?
It's just a normal azure workbook built on top of metrics. I don't know how to version it Regards