Persistent TCP connections to AKS are dropping

Hans van den Elsen 0 Reputation points
2024-07-05T06:35:20.4966667+00:00

I have a device-gateway application (ASP.NET) that accepts incoming TCP connections and keeps them alive. We have an increasing amount of connections with now close to 3.000 connections. We are expecting around 10.000 by the end of the year.

I see that my database sometimes gets overloaded by connection requests, often at night. For some reason, connections are dropping and the clients are reconnecting, which causes a peak load for the database.

I am not sure why the connections are dropping. What could be a good way to investigate this issue?

The logging/application insights indicate a clear massive drop of connections.

Could it be related to network limitations? Maybe the bandwidth or something, but how could I control that.

I have this deployment in my AKS with Standard tier:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: device-gateway
  labels:
    app: device-gateway
spec:
  replicas: 2
  revisionHistoryLimit: 0
  selector:
    matchLabels:
      app: device-gateway
  template:
    metadata:
      labels:
        app: device-gateway
    spec:
      containers:
      - name: device-gateway
        image: registry.gitlab.com/my-images/image:latest
        imagePullPolicy: Always
      imagePullSecrets:
      - name: myregistrykey
---
apiVersion: v1
kind: Service
metadata:
  name: device-gateway
spec:
  type: ClusterIP
  ports:
  - name: my-tcp-port
    port: 8485
--

I am running an nginx-ingress that is installed like this:

helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx
helm uninstall nginx-ingress
helm install nginx-ingress ingress-nginx/ingress-nginx \
    --set controller.replicaCount=2 \
    --set controller.service.externalTrafficPolicy=Local \
    --set controller.nodeSelector."kubernetes\.io/os"=linux \
    --set defaultBackend.nodeSelector."kubernetes\.io/os"=linux \
    --set controller.admissionWebhooks.patch.nodeSelector."kubernetes\.io/os"=linux \
    --set controller.service.loadBalancerIP="[MY-PUBLIC-IP]" \
    --set controller.service.annotations."service\.beta\.kubernetes\.io/azure-dns-label-name"="[MY-DNS-LABEL]" \
    --set controller.service.annotations."service\.beta\.kubernetes\.io/azure-load-balancer-resource-group"="[MY-RESOURCE-GROUP]" \
    --set controller.service.annotations."service\.beta\.kubernetes\.io/azure-load-balancer-tcp-idle-timeout"="30" \
    --set tcp.8485="default/device-gateway:8485" \
    --set podLabels.app='nginx'

I hope someone can give me some insight.

Azure Kubernetes Service (AKS)
Azure Kubernetes Service (AKS)
An Azure service that provides serverless Kubernetes, an integrated continuous integration and continuous delivery experience, and enterprise-grade security and governance.
2,103 questions
{count} votes

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.