Azure kafka brokers performance bottlenecks

Sharma,Rajneesh 0 Reputation points
2025-06-06T16:07:20.0066667+00:00

Hi

My client hast 2 Kafka environments. One is onprem (Kafka brokers, zookeepers, schema registry, connect servers etc.) and another is on azure ( all components). In both environments we have virtual machines and virtual disks(VMware virtualization onprem, Hyper-V on azure. We are noticing that even though workload on azure servers is way less than workload on onprem servers, still they take more time. We have Dynatrace dashboards with side by side comparison charts comparing below metrics

Brokers disk write operations

Brokers disk read operations

Brokers throughput read

Brokers throughput write

Brokers write time

Brokers read time

Both the environment brokers have exactly same disk queue size. I have also compared the features of both environments virtual discs and they are almost same.

Any idea, whey azure disks response time is low and how can I diagnose the reasons for slowness.

Not Monitored
Not Monitored
Tag not monitored by Microsoft.
43,997 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. Divyesh Govaerdhanan 5,770 Reputation points
    2025-06-08T00:27:27.32+00:00

    Hello,

    Welcome to Microsoft Q&A,

    It's typically tied to the nuances of Azure's disk I/O architecture and how Kafka workloads stress disks differently in virtualized environments.

    If your Kafka brokers are using Standard SSDs or Premium SSDs below P30, you're likely I/O bound, not CPU/memory bound. Each Azure VM size has a maximum aggregate throughput/IOPS limit across all attached disks. Even if your disk supports more IOPS, the VM size can throttle total disk performance. Azure managed disks are network-attached, and thus always slower unless you use Lsv3 series VMs with ephemeral NVMe disks.

    The below recommendation you could try to improve the disk performance

    1. Upgrade to higher-tier Premium SSDs (P30+) or Ultra Disks
    2. Move to Lsv3 series VMs for local NVMe disks (ephemeral, extremely fast)
    3. Split Kafka log directories across multiple disks for better parallelism
    4. Ensure caching settings are optimal (usually None for Kafka logs)
    5. Use Azure Disk bursting cautiously – burst credits may deplete and throttle performance
    6. Scale horizontally – more brokers with smaller load each

    Please Upvote and accept the answer if it helps!!

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.