What will be the max throughput of Kafka rest proxy enabled on HDINSIGHT Kafka cluster

Sai Birada 21 Reputation points

I would like to set up a Kafka cluster, which needs an ingestion (producer) throughput of around 150MB/Second. In order to achieve that in my local setup I am needing 4 rest proxy servers of 8 CPUs each. However, when I am trying a create a Kafka cluster in HDINSIGHT, it's giving me an option just to enable the default rest proxy. Now I would like to know, whether there is any internal scaling enabled to the default rest proxy which can handle my 150Mbps load? Or I need to setup some rest proxies manually to reach my desired throughptu?

Azure HDInsight
Azure HDInsight
An Azure managed cluster service for open-source analytics.
191 questions
{count} votes

Accepted answer
  1. KranthiPakala-MSFT 45,807 Reputation points Microsoft Employee

    Hi@Sai Birada ,

    Sorry for the delay in response. We have heard back from internal team about your query.

    Kafka Rest Proxy producer throughout depends on the configuration of brokers and Kafka management nodes. The shoebox metrics is available in Azure portal, to help you understand if Kafka management nodes is the bottleneck for your current load and configuration.

    Kafka REST proxy has a very good performance result for Producing messages.

    Throughput: up to 250MB/sec
    Latency (p95)
    • Kafka Broker: 10ms ~ 12ms
    • Kafka REST proxy: 13ms~ 15ms
    • Client E2E: 22~25ms

    This is the result of the stress test of Kafka REST proxy which took under below configs.
    • REST proxy requests - 15 batched messages of 1KB per request of Kafka REST proxy
    • Producer config - 16KB batch size. Gzip compression
    • Topic config - 4 partitions with 3 replicas

    For producing messages, the bottleneck was Kafka REST proxy servers.
    As a result, replication factors and the number of partitions did not affect the overall performance a lot.
    Increasing the number of Kafka REST proxy servers increased its performance.

    FYI, for consuming messages, the bottleneck was Kafka Brokers.

    Please understand that performances get affected by the user's business use cases(traffic load) it may be different from the above results.

    We encourage users to run perf tests with your anticipated workloads.

    Hope this info helps.


    Thank you
    Please do consider to click on "Accept Answer" and "Upvote" on the post that helps you, as it can be beneficial to other community members.

0 additional answers

Sort by: Most helpful