About Event Hub Ingress Bytes metric

Jona 395 Reputation points
2024-02-23T16:36:12.03+00:00

Hi everyone, I just ran an experiment using ADLS, Event Hubs and Functions. Basically, a shell script uploads file to ADLS (CSV), then a Functions executes transforming it to JSON,to finally push it to Event Hubs. Everything is right so far. I build a workbook to follow up the execution, an the results are these: User's image

In the first test, I uploaded 7.500 CSV files containing aprox 25 rows each. According to the workbook, this traslated to 1,31 GB ingress on Event Hub. In the second experiment, I uploaded 4.000 CSV files of 110 rows aprox each that represents more or less 695 MB ingress, This is the workbook for the first case User's image

This the second case User's image

I've taken the file properties of the second case and they don't even sum up 500MB (even with 10.000 files) User's image

Can anybody help me understand this metric and why doesn't reflect the files sizes on disk? I don't think the difference is because the transformation from CSV to JSON Regards

Azure Event Hubs
Azure Event Hubs
An Azure real-time data ingestion service.
568 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Jona 395 Reputation points
    2024-02-26T03:18:59.0966667+00:00

    Hi

    Hello @Jona So, are you expecting that the CSV file size on your disk must == ingress data size in Event hub?

    Yes. However, I ran and interesting experiment. This are the parameters:

    • 3.000 CSV files, with a toal size of 126 MB on disk (120 MB "not on disk", I don't how to express this) User's image
    • Those files are uploaded to ADLS, transformed to JSON and pushed to Event hubs.
    • The size of the CSV files in JSON format in my local computer are 324 MB User's image These are the metrics:

    User's image

    Surprinsigly,

    • 131 MB enters to ADLS. Compared to thos 126 MB, the difference is negligible. However, I would like to know where those 5 MB went
    • After the Function transforming to JSON, 524 MB entered to Event hub. There is almost a difference of 200MB between my local calculation and the Event hub metric.

    I can blame that difference to the transformacion from CSV to JSON, since I add some fields to the resulting JSON.

    User's image

    I'll be conducting some others tests

    Did you check the size of the data after it has been transformed to JSON to see if it matches the ingress bytes metric reported by Azure Event Hubs? JSON is more Verbose complex data structure when compared to CSV which is compact in tabular format data, right? What is your Use Case scenario which you are working on this benchmarking? any performance test or load test? or is it related to cost estimation? if the transformation logic is generating unnecessary fields, removing them can reduce the file size!!!

    Is a load test which must give some conclusions on budget estimations. So:

    • Pushing CSV files to Event hub seem to be more budget friendly
    • I'll run others test, to confirm wheater the fields I add represent those 200MB

    I expect some opinions on transminting CSV or JSON files to Event hub. ¿is it right my experiments? In the end, I have to report about all this to my Manager.

    By the way the display of metrics looks great, how did you build that metrics? any Public GitHub link, can we take a look?

    It's just a normal azure workbook built on top of metrics. I don't know how to version it Regards

    1 person found this answer helpful.