Persistent TCP connections to AKS are dropping
I have a device-gateway application (ASP.NET) that accepts incoming TCP connections and keeps them alive. We have an increasing amount of connections with now close to 3.000 connections. We are expecting around 10.000 by the end of the year.
I see that my database sometimes gets overloaded by connection requests, often at night. For some reason, connections are dropping and the clients are reconnecting, which causes a peak load for the database.
I am not sure why the connections are dropping. What could be a good way to investigate this issue?
The logging/application insights indicate a clear massive drop of connections.
Could it be related to network limitations? Maybe the bandwidth or something, but how could I control that.
I have this deployment in my AKS with Standard tier:
apiVersion: apps/v1
kind: Deployment
metadata:
name: device-gateway
labels:
app: device-gateway
spec:
replicas: 2
revisionHistoryLimit: 0
selector:
matchLabels:
app: device-gateway
template:
metadata:
labels:
app: device-gateway
spec:
containers:
- name: device-gateway
image: registry.gitlab.com/my-images/image:latest
imagePullPolicy: Always
imagePullSecrets:
- name: myregistrykey
---
apiVersion: v1
kind: Service
metadata:
name: device-gateway
spec:
type: ClusterIP
ports:
- name: my-tcp-port
port: 8485
--
I am running an nginx-ingress that is installed like this:
helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx
helm uninstall nginx-ingress
helm install nginx-ingress ingress-nginx/ingress-nginx \
--set controller.replicaCount=2 \
--set controller.service.externalTrafficPolicy=Local \
--set controller.nodeSelector."kubernetes\.io/os"=linux \
--set defaultBackend.nodeSelector."kubernetes\.io/os"=linux \
--set controller.admissionWebhooks.patch.nodeSelector."kubernetes\.io/os"=linux \
--set controller.service.loadBalancerIP="[MY-PUBLIC-IP]" \
--set controller.service.annotations."service\.beta\.kubernetes\.io/azure-dns-label-name"="[MY-DNS-LABEL]" \
--set controller.service.annotations."service\.beta\.kubernetes\.io/azure-load-balancer-resource-group"="[MY-RESOURCE-GROUP]" \
--set controller.service.annotations."service\.beta\.kubernetes\.io/azure-load-balancer-tcp-idle-timeout"="30" \
--set tcp.8485="default/device-gateway:8485" \
--set podLabels.app='nginx'
I hope someone can give me some insight.