TCP ClientIP Affinity/Session Stickiness for Pods Over Multiple Ports

Question

TCP ClientIP Affinity/Session Stickiness for Pods Over Multiple Ports

M van Staden 20

We have an application that needs to connect to the same pod based on the client ip. This application uses 3 different ports. We have an application gateway that exposes the public IP with a load balancer.

*IPs are for illustrative purposes only

We've tried running an nginx ingress LoadBalancer type service in our cluster with the following configuration (the IPs are the backend service pod IPs):

stream { 
	upstream app1 {     
		zone app1 256k;      
		hash $remote_addr consistent;      
		server 10.200.1.50:56000;     
		server 10.200.2.10:56000;     
		server 10.200.3.15:56000;     
		server 10.200.4.20:56000; 
	}  
	server {      
		listen 56000;     
		proxy_pass app1; 
	}  

	upstream app2 {     
		zone app2 256k;      
		hash $remote_addr consistent;      
		server 10.200.1.50:56002;     
		server 10.200.2.10:56002;     
		server 10.200.3.15:56002;     
		server 10.200.4.20:56002; 
	}  
	server {      
		listen 56002;     
		proxy_pass app2; 
	}
  
	upstream app3 {     
		zone app3 256k;      
		hash $remote_addr consistent;      
		server 10.200.1.50:56001;     
		server 10.200.2.10:56001;     
		server 10.200.3.15:56001;     
		server 10.200.4.20:56001; 
	}  
	server {      
		listen 56001;     
		proxy_pass app3; 
	}   
}

The hashes for each upstream are unfortunately different though (most likely due to the differing server+port combos), so we are unable to route traffic from the same client ip on different ports to the same pod. And the reason why we have separate upstream groups is because the incoming port needs to be the same port of the receiving server.

We've then tried to route the traffic on the nginx controller to the cluster ip of the backend service, with ClientIP affinity set on the receiving service:

stream {      
	server { 
		listen 56001; 
		proxy_pass 10.0.5.44:56001; 
	} 
	server { 
		listen 56000; 
		proxy_pass 10.0.5.44:56000; 
	} 
	server { 
		listen 56002; 
		proxy_pass 10.0.5.44:56002; 
	} 
}

At present the correct client IP is seen on the nginx controller but when it is proxied off to the cluster ip it is replaced with the nginx pod ip. That's a separate problem. For now the IP address should be the same regardless of the remote client, however, ClientIP affinity does not appear to be working as traffic is being spread across the pods. I am unsure what we've done wrong with the configuration to cause this.

The backend service configuration:

kind: Service apiVersion: v1 metadata:   name: camserver-svc   namespace: camera   uid: fb973598-908b-4b15-95f5-02582f965757   resourceVersion: "24990840"   creationTimestamp: "2023-04-27T08:17:41Z"   labels:     app.kubernetes.io/instance: appserver     app.kubernetes.io/managed-by: Helm     app.kubernetes.io/name: appserver     app.kubernetes.io/version: 1.16.0     helm.sh/chart: camserver-0.1.0   annotations:     meta.helm.sh/release-name: appserver     meta.helm.sh/release-namespace: app   managedFields:     - manager: helm       operation: Update       apiVersion: v1       time: "2023-04-27T08:17:41Z"       fieldsType: FieldsV1       fieldsV1:         f:metadata:           f:annotations:             .: {}             f:meta.helm.sh/release-name: {}             f:meta.helm.sh/release-namespace: {}           f:labels:             .: {}             f:app.kubernetes.io/instance: {}             f:app.kubernetes.io/managed-by: {}             f:app.kubernetes.io/name: {}             f:app.kubernetes.io/version: {}             f:helm.sh/chart: {}         f:spec:           f:ports:             .: {}             k:{"port":56000,"protocol":"TCP"}:               .: {}               f:name: {}               f:port: {}               f:protocol: {}               f:targetPort: {}             k:{"port":56001,"protocol":"TCP"}:               .: {}               f:name: {}               f:port: {}               f:protocol: {}               f:targetPort: {}             k:{"port":56002,"protocol":"TCP"}:               .: {}               f:name: {}               f:port: {}               f:protocol: {}               f:targetPort: {}           f:selector: {}           f:type: {}     - manager: Mozilla       operation: Update       apiVersion: v1       time: "2023-06-21T05:59:16Z"       fieldsType: FieldsV1       fieldsV1:         f:spec:           f:internalTrafficPolicy: {}           f:sessionAffinity: {}           f:sessionAffinityConfig:             .: {}             f:clientIP:               .: {}               f:timeoutSeconds: {} spec:   ports:     - name: app1       protocol: TCP       port: 56000       targetPort: 56000     - name: app2       protocol: TCP       port: 56002       targetPort: 56002     - name: app3       protocol: TCP       port: 56001       targetPort: 56001   selector:     app: camserver   clusterIP: 10.0.0.4   clusterIPs:     - 10.0.0.4   type: ClusterIP   sessionAffinity: ClientIP   sessionAffinityConfig:     clientIP:       timeoutSeconds: 10800   ipFamilies:     - IPv4   ipFamilyPolicy: SingleStack   internalTrafficPolicy: Cluster status:   loadBalancer: {}

The health of the nodes seem fine.

Any input would be greatly appreciated. If there is a way to debug the scheduling of how/why pods are being selected, please let me know. Or if there is an alternative approach to this please indicate how to accomplish this.

Ben Gimblett 4,560 Reputation points Microsoft Employee

2023-06-22T16:03:20.01+00:00

Not as "expert" in AKS as many here - so leaving this as a comment; I suspect the issue might be the NAT between the ingress and your service
A colleague of mine (same team:)) wrote a blog post on this here that might help you. There's also a community post that explains the same concept (in slightly more depth)

Accepted answer

1 additional answer

Your answer

Ben Gimblett 4,560 Reputation points Microsoft Employee

2023-06-22T16:03:20.01+00:00

Not as "expert" in AKS as many here - so leaving this as a comment; I suspect the issue might be the NAT between the ingress and your service
A colleague of mine (same team:)) wrote a blog post on this here that might help you. There's also a community post that explains the same concept (in slightly more depth)

Answer 1

Ben Gimblett 4,560 Microsoft Employee

@M van Staden - I had another look at your question. I don't believe the fact you're hosting different endpoints (on different ports) per pod is related to the issue. Your fundamental ask is: "...an application ...needs to connect to the same pod based on the client IP" - so client IP affinity.

This isn't going to work with a typical out-of-the-box ingress controller (nginx, AGIC, Ambassador etc). Ingress controllers are reverse proxies meaning the original client ip will only show between reverse proxy and backend in a forwarded-for header (where the reverse proxy supports doing this). The ip of the packets leaving the reverse proxy will be pointing to the reverse proxy (for the return path).

Without an ingress controller you can use client ip affinity on the service object

With an ingress controller you can use cookie based affinity (for controllers which support it) - which implies the client making the connection into the cluster can also handle cookies.

M van Staden 20 Reputation points

2023-06-26T09:23:53.9066667+00:00

Thank for your comment and for weighing in on this @Ben Gimblett - I agree with you that there is no way to maintain the client ip once it's passed through a proxy, especially considering this is a solution based on TCP (i.e. no cookies + HTTP headers).

What I don't understand is at the moment we have it setup that traffic is routed through the nginx controller, so the client ip should be the same for all requests, but the client ip affinity is not working. Requests are routed to different pods. I'm going to try set it up to go direct from the azure Load Balancer to the AKS service (configuring the AKS service as a LoadBalancer type, along with client ip session affinity & the externalTrafficPolicy set to local) to see if it works as expected.
Ben Gimblett 4,560 Reputation points Microsoft Employee

2023-06-26T10:23:10.0733333+00:00

Assuming you're using this nginx ingress https://kubernetes.github.io/ingress-nginx/ there is a note in the docs here https://kubernetes.github.io/ingress-nginx/user-guide/miscellaneous/#why-endpoints-and-not-services
When they say they do this to allow session affinity they mean cookie based. I dont know for 100% but this may be the reason the client ip affinity on the backend service fronting the backend PODs doesnt work when nginx ingress is in-between. I would need to do more digging unless someone else can chime in here. :)
M van Staden 20 Reputation points

2023-06-27T06:00:40.65+00:00

@Ben Gimblett after removing the nginx controller & directly exposing the service as a LoadBalancer, client ip affinity works as expected. This is unfortunate that we cannot use the nginx ingress controller. You were correct, we are using this one: https://github.com/kubernetes/ingress-nginx My understanding is that it is best practise to expose kubernetes services using an ingress controller. It does feel like my question is not entirely accurate given the change to our setup. Although @msrini-MSFT did technically answer the question, I feel you've given additional insight into our situation that your answer is most helpful. Thank you kindly for your assistance.

Answer 2

msrini-MSFT 9,291 Microsoft Employee

Application Gateway is a reverse proxy which means when you send traffic to Application Gateway, it creates a new session yo it's backend and the client IP is not preserved in the IP header. That is the reason your load balancer is not able to route it to the same pod. To have a solution for this, remove Application Gateway and expose your AKS with Public Load balancer in front with client based affinity enabled.

M van Staden 20

Thank you @msrini-MSFT for the suggestion, as I understand it the Application Gateway is not used, the load balancer is used due to the LoadBalancer service type configured on the nginx ingress controller:

  selector:
    app: nginx-camera-ingress
  clusterIP: 10.0.1.100
  clusterIPs:
    - 10.0.1.100
  type: LoadBalancer
  sessionAffinity: ClientIP
  loadBalancerIP: X.X.X.X  
  externalTrafficPolicy: Local
  healthCheckNodePort: 32332
  sessionAffinityConfig:
    clientIP:
      timeoutSeconds: 10800
  ipFamilies:
    - IPv4
  ipFamilyPolicy: SingleStack
  allocateLoadBalancerNodePorts: true
  internalTrafficPolicy: Cluster

Share via

TCP ClientIP Affinity/Session Stickiness for Pods Over Multiple Ports

1 additional answer

Your answer