How to measure network latency in AKS or App service?

Question

How to measure network latency in AKS or App service?

Tanul 1,291

Team,

Trust all are doing great.

For hosting rest api, we can use Azure Kubernetes or App services but how to measure the network latency especially after months of hosting them in production.

At times, we host single file binaries made in .net core or golang or rust in these platforms but after few months we feel latency in them.

Can any one suggest some software or any design pattern(in the programming language) which can be used/followed to measure the Network latency in the api especially when hosted in Azure Kubernetes.

Please help because network latency is becoming a major concern especially with public clouds.

Thank you

Accepted answer

0 additional answers

Your answer

Answer 1

Hi @Tanul

-There has been an ongoing issue with Q&A where the activity for some accounts is not showing up. The dev team is investigating this and working on resolving the issue.

-Regarding your question,
AKS Engineering has identified an issue leading to customers reporting service, workload and networking instability when running under load or with large numbers of ephemeral, periodic events (jobs). These failures are the result of Disk IO saturation and throttling at the file operation (IOPS) level.

Worker node VMs running customer workloads are regularly disk IO throttled/saturated on all VM operating system disks due to the underlying quota of the storage device potentially leading to cluster and workload failure.

This issue should be investigated (as documented in the link below) if you are seeing worker node/workload or API server unavailability. This issue can lead to NodeNotReady and loss of cluster availability in extreme cases.

Issue Identification using the prometheus operator (recommended)
The prometheus operator project provides a best practice set of monitoring and metrics for Kubernetes that covers all of the metrics above and more.

We recommend the operator as it provides both a simple (helm) based installation as well as all of the prometheus monitoring, grafana charts, configuration and default metrics critical to understanding performance, latency and stability issues such as this.

Additionally the prometheus operator deployment is specifically designed to be highly available - this helps significantly in availability scenarios that could risk missing metrics due to container/cluster outages.

Customers are encouraged to examine and implement using their own metrics/monitoring pipeline copying the the USE (Utilization and Saturation) metrics/dashboard, as well as the pod-level and namespace node level utilization reports from the operator. Additionally the node reports clearly display OS disk saturation leading to high levels of system latency and degraded application/cluster performance.

Please find a very detailed description of the issue as well as recommendations here: https://github.com/Azure/AKS/issues/1373

Share via

How to measure network latency in AKS or App service?

0 additional answers

Your answer