Hello @Justin Vieth
Thank you for your question as you have mentioned its an intermittent issue and it's a peered connection the connection is working fine once the workload is done you find there is a latency
- the connectivity between AKS nodes and the SQL Server is internal, it could be a different issue
1- it could be OS Disk throttling: IO latency due to IOPs or throughput limits that may affect the cluster nodes or SQL Server node.
AKS has identified this issue as contributing significantly to the following common error / failure reports:
· AKS Cluster nodes going NotReady (intermittent/under load, periodic tasks)
· Performance and stability issues when using istio or complex operator configurations.
· Networking Errors (intermittent/under load) and Latency (intermittent) (Pod, container, networking, latency inbound or outbound) including high latency when reaching other azure services from AKS worker nodes.
· API server timeouts, disconnects, tunnelfront and kube-proxy failures under load.
· Connection timed out accessing the Kubernetes API server
· Slow pod, docker container, job execution
· Slow DNS queries / core-dns latency spikes
· "GenericPLEG" / Docker PLEG errors on worker nodes
· RPC Context deadline exceeded in kubelet/docker logs
· Slow PVC attach/detach time at container start or under load / failover
please check this link
2- check the CPU and memory spikes once the workload happens on your AKS cluster and on your SQL Server.
You can check your AKS cluster using the diagnostic and solve problems feature on the Azure portal for troubleshooting your AKS cluster.
Node resources are utilized by AKS to make the node function as part of your cluster. This can create a discrepancy between your node's total resources and the resources allocatable when used in AKS. This is important to note when setting requests and limits for user-deployed pods.
To find a node's allocatable resources run:
kubectl describe nodes
If from the above results, you see that the utilization is expected, you may try to increase the number of nodes in the cluster and distribute the workload. This will help in reducing Memory Utilization on individual nodes, as well as ease up the Disk IOPS Throttling.
As part of best practice, you should include Requests and Limits in the Pod resources.
Thank you!
If this has been helpful, please take a moment to accept answers as this helps increase visibility of this question for other members of the Microsoft Q&A community. Thank you for helping to improve Microsoft Q&A!