Secure traffic between pods by using network policies in AKS
When you run modern, microservices-based applications in Kubernetes, you often want to control which components can communicate with each other. The principle of least privilege should be applied to how traffic can flow between pods in an Azure Kubernetes Service (AKS) cluster. Let's say you want to block traffic directly to back-end applications. The network policy feature in Kubernetes lets you define rules for ingress and egress traffic between pods in a cluster.
This article shows you how to install the network policy engine and create Kubernetes network policies to control the flow of traffic between pods in AKS. Network policies could be used for Linux-based or Windows-based nodes and pods in AKS.
Overview of network policy
All pods in an AKS cluster can send and receive traffic without limitations, by default. To improve security, you can define rules that control the flow of traffic. Back-end applications are often only exposed to required front-end services, for example. Or, database components are only accessible to the application tiers that connect to them.
Network policy is a Kubernetes specification that defines access policies for communication between pods. When you use network policies, you define an ordered set of rules to send and receive traffic. You apply the rules to a collection of pods that match one or more label selectors.
The network policy rules are defined as YAML manifests. Network policies can be included as part of a wider manifest that also creates a deployment or service.
Network policy options in AKS
Azure provides three Network Policy engines for enforcing network policies:
- Cilium for AKS clusters that use Azure CNI Powered by Cilium.
- Azure Network Policy Manager.
- Calico, an open-source network and network security solution founded by Tigera.
Cilium is our recommended Network Policy engine. Cilium enforces network policy on the traffic using Linux Berkeley Packet Filter (BPF), which is generally more efficient than "IPTables". See more details in Azure CNI Powered by Cilium documentation.
To enforce the specified policies, Azure Network Policy Manager for Linux uses Linux IPTables. Azure Network Policy Manager for Windows uses Host Network Service (HNS) ACLPolicies. Policies are translated into sets of allowed and disallowed IP pairs. These pairs are then programmed as IPTable
or HNS ACLPolicy
filter rules.
Differences between Network Policy engines: Cilium, Azure NPM, and Calico
Capability | Azure Network Policy Manager | Calico | Cilium |
---|---|---|---|
Supported platforms | Linux, Windows Server 2022 (Preview). | Linux, Windows Server 2019 and 2022. | Linux. |
Supported networking options | Azure Container Networking Interface (CNI). | Azure CNI (Linux, Windows Server 2019 and 2022) and kubenet (Linux). | Azure CNI. |
Compliance with Kubernetes specification | All policy types supported | All policy types are supported. | All policy types are supported. |
Other features | None. | Extended policy model consisting of Global Network Policy, Global Network Set, and Host Endpoint. For more information on using the calicoctl CLI to manage these extended features, see calicoctl user reference. |
None. |
Support | Supported by Azure Support and Engineering team. | Supported by Azure Support and Engineering team. | Supported by Azure Support and Engineering team. |
Limitations of Azure Network Policy Manager
Note
With Azure NPM for Linux, we don't allow scaling beyond 250 nodes and 20,000 pods. If you attempt to scale beyond these limits, you might experience Out of Memory (OOM) errors. For better scalability and IPv6 support, and if the following limitations are of concern, we recommend using or upgrading to Azure CNI Powered by Cilium to use Cilium as the network policy engine.
Azure NPM doesn't support IPv6. Otherwise, it fully supports the network policy specifications in Linux.
In Windows, Azure NPM doesn't support the following features of the network policy specifications:
- Named ports.
- Stream Control Transmission Protocol (SCTP).
- Negative match label or namespace selectors. For example, all labels except
debug=true
. except
classless interdomain routing (CIDR) blocks (CIDR with exceptions).
Note
Azure Network Policy Manager pod logs record an error if an unsupported network policy is created.
Editing/deleting network policies
In some rare cases, there's a chance of hitting a race condition that might result in temporary, unexpected connectivity for new connections to/from pods on any impacted nodes when either editing or deleting a "large enough" network policy. Hitting this race condition never impacts active connections.
If this race condition occurs for a node, the Azure NPM pod on that node enters a state where it can't update security rules, which might lead to unexpected connectivity for new connections to/from pods on the impacted node. To mitigate the issue, the Azure NPM pod automatically restarts ~15 seconds after entering this state. While Azure NPM is rebooting on the impacted node, it deletes all security rules, then reapplies security rules for all network policies. While all the security rules are being reapplied, there's a chance of temporary, unexpected connectivity for new connections to/from pods on the impacted node.
To limit the chance of hitting this race condition, you can reduce the size of the network policy. This issue is most likely to happen for a network policy with several ipBlock
sections. A network policy with four or less ipBlock
sections is less likely to hit the issue.
Before you begin
You need the Azure CLI version 2.0.61 or later installed and configured. Run az --version
to find the version. If you need to install or upgrade, see Install Azure CLI.
Create an AKS cluster and enable network policy
To see network policies in action, you create an AKS cluster that supports network policy and then work on adding policies.
To use Azure Network Policy Manager, you must use the Azure CNI plug-in. Calico can be used with either Azure CNI plug-in or with the Kubenet CNI plug-in.
The following example script creates an AKS cluster with system-assigned identity and enables network policy by using Azure Network Policy Manager.
Note
Calico can be used with either the --network-plugin azure
or --network-plugin kubenet
parameters.
Instead of using a system-assigned identity, you can also use a user-assigned identity. For more information, see Use managed identities.
Create an AKS cluster with Azure Network Policy Manager enabled - Linux only
In this section, you create a cluster with Linux node pools and Azure Network Policy Manager enabled.
To begin, you replace the values for the $RESOURCE_GROUP_NAME
and $CLUSTER_NAME
variables.
$RESOURCE_GROUP_NAME=myResourceGroup-NP
$CLUSTER_NAME=myAKSCluster
$LOCATION=canadaeast
Create the AKS cluster and specify azure
for the network-plugin
and network-policy
.
To create a cluster, use the following command:
az aks create \
--resource-group $RESOURCE_GROUP_NAME \
--name $CLUSTER_NAME \
--node-count 1 \
--network-plugin azure \
--network-policy azure \
--generate-ssh-keys
Create an AKS cluster with Azure Network Policy Manager enabled - Windows Server 2022 (preview)
In this section, you create a cluster with Windows node pools and Azure Network Policy Manager enabled.
Note
Azure Network Policy Manager with Windows nodes is available on Windows Server 2022 only.
Install the aks-preview Azure CLI extension
Important
AKS preview features are available on a self-service, opt-in basis. Previews are provided "as is" and "as available," and they're excluded from the service-level agreements and limited warranty. AKS previews are partially covered by customer support on a best-effort basis. As such, these features aren't meant for production use. For more information, see the following support articles:
To install the aks-preview
extension, run the following command:
az extension add --name aks-preview
To update to the latest version of the extension released, run the following command:
az extension update --name aks-preview
Register the WindowsNetworkPolicyPreview feature flag
Register the WindowsNetworkPolicyPreview
feature flag by using the az feature register command, as shown in the following example:
az feature register --namespace "Microsoft.ContainerService" --name "WindowsNetworkPolicyPreview"
It takes a few minutes for the status to show Registered. Verify the registration status by using the az feature show command:
az feature show --namespace "Microsoft.ContainerService" --name "WindowsNetworkPolicyPreview"
When the status reflects Registered, refresh the registration of the Microsoft.ContainerService
resource provider by using the az provider register command:
az provider register --namespace Microsoft.ContainerService
Create the AKS cluster
Now, you replace the values for the $RESOURCE_GROUP_NAME
, $CLUSTER_NAME
, and $WINDOWS_USERNAME
variables.
$RESOURCE_GROUP_NAME=myResourceGroup-NP
$CLUSTER_NAME=myAKSCluster
$WINDOWS_USERNAME=myWindowsUserName
$LOCATION=canadaeast
Create a username to use as administrator credentials for your Windows Server containers on your cluster. The following command prompts you for a username. Set it to $WINDOWS_USERNAME
. Remember that the commands in this article are entered into a Bash shell.
echo "Please enter the username to use as administrator credentials for Windows Server containers on your cluster: " && read WINDOWS_USERNAME
To create a cluster, use the following command:
az aks create \
--resource-group $RESOURCE_GROUP_NAME \
--name $CLUSTER_NAME \
--node-count 1 \
--windows-admin-username $WINDOWS_USERNAME \
--network-plugin azure \
--network-policy azure \
--generate-ssh-keys
It takes a few minutes to create the cluster. By default, your cluster is created with only a Linux node pool. If you want to use Windows node pools, you can add one. Here's an example:
az aks nodepool add \
--resource-group $RESOURCE_GROUP_NAME \
--cluster-name $CLUSTER_NAME \
--os-type Windows \
--name npwin \
--node-count 1
Create an AKS cluster with Calico enabled
Create the AKS cluster and specify --network-plugin azure
, and --network-policy calico
. Specifying --network-policy calico
enables Calico on both Linux and Windows node pools.
If you plan on adding Windows node pools to your cluster, include the windows-admin-username
and windows-admin-password
parameters that meet the Windows Server password requirements.
Important
At this time, using Calico network policies with Windows nodes is available on new clusters by using Kubernetes version 1.20 or later with Calico 3.17.2 and requires that you use Azure CNI networking. Windows nodes on AKS clusters with Calico enabled also have Floating IP enabled by default.
For clusters with only Linux node pools running Kubernetes 1.20 with earlier versions of Calico, the Calico version automatically upgrades to 3.17.2.
Create a username to use as administrator credentials for your Windows Server containers on your cluster. The following command prompts you for a username. Set it to $WINDOWS_USERNAME
. Remember that the commands in this article are entered into a Bash shell.
echo "Please enter the username to use as administrator credentials for Windows Server containers on your cluster: " && read WINDOWS_USERNAME
az aks create \
--resource-group $RESOURCE_GROUP_NAME \
--name $CLUSTER_NAME \
--node-count 1 \
--windows-admin-username $WINDOWS_USERNAME \
--network-plugin azure \
--network-policy calico \
--generate-ssh-keys
It takes a few minutes to create the cluster. By default, your cluster is created with only a Linux node pool. If you want to use Windows node pools, you can add one. For example:
az aks nodepool add \
--resource-group $RESOURCE_GROUP_NAME \
--cluster-name $CLUSTER_NAME \
--os-type Windows \
--name npwin \
--node-count 1
Install Azure Network Policy Manager or Calico in an existing cluster
Installing Azure Network Policy Manager or Calico on existing AKS clusters is also supported.
Warning
The upgrade process triggers each node pool to be re-imaged simultaneously. Upgrading each node pool separately isn't supported. Any disruptions to cluster networking are similar to a node image upgrade or Kubernetes version upgrade where each node in a node pool is re-imaged.
Example command to install Azure Network Policy Manager:
az aks update
--resource-group $RESOURCE_GROUP_NAME \
--name $CLUSTER_NAME \
--network-policy azure
Example command to install Calico:
Warning
This warning applies to upgrading Kubenet clusters with Calico enabled to Azure CNI Overlay with Calico enabled.
- In Kubenet clusters with Calico enabled, Calico is used as both a CNI and network policy engine.
- In Azure CNI clusters, Calico is used only for network policy enforcement, not as a CNI. This can cause a short delay between when the pod starts and when Calico allows outbound traffic from the pod.
It is recommended to use Cilium instead of Calico to avoid this issue. Learn more about Cilium at Azure CNI Powered by Cilium
az aks update
--resource-group $RESOURCE_GROUP_NAME \
--name $CLUSTER_NAME \
--network-policy calico
Upgrade an existing cluster that has Azure NPM or Calico installed to Azure CNI Powered by Cilium
To upgrade an existing cluster that has Network Policy engine installed to Azure CNI Powered by Cilium, see Upgrade an existing cluster to Azure CNI Powered by Cilium
Verify network policy setup
When the cluster is ready, configure kubectl
to connect to your Kubernetes cluster by using the az aks get-credentials command. This command downloads credentials and configures the Kubernetes CLI to use them:
az aks get-credentials --resource-group $RESOURCE_GROUP_NAME --name $CLUSTER_NAME
To begin verification of network policy, you create a sample application and set traffic rules.
First, create a namespace called demo
to run the example pods:
kubectl create namespace demo
Now create two pods in the cluster named client
and server
.
Note
If you want to schedule the client or server on a particular node, add the following bit before the --command
argument in the pod creation kubectl run command:
--overrides='{"spec": { "nodeSelector": {"kubernetes.io/os": "linux|windows"}}}'
Create a server
pod. This pod serves on TCP port 80:
kubectl run server -n demo --image=k8s.gcr.io/e2e-test-images/agnhost:2.33 --labels="app=server" --port=80 --command -- /agnhost serve-hostname --tcp --http=false --port "80"
Create a client
pod. The following command runs Bash on the client
pod:
kubectl run -it client -n demo --image=k8s.gcr.io/e2e-test-images/agnhost:2.33 --command -- bash
Now, in a separate window, run the following command to get the server IP:
kubectl get pod --output=wide -n demo
The output should look like:
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
server 1/1 Running 0 30s 10.224.0.72 akswin22000001 <none> <none>
Test connectivity without network policy
In the client's shell, run the following command to verify connectivity with the server. Replace server-ip
by using the IP found in the output from running the previous command. If the connection is successful, there's no output.
/agnhost connect <server-ip>:80 --timeout=3s --protocol=tcp
Test connectivity with network policy
To add network policies create a file named demo-policy.yaml
and paste the following YAML manifest:
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: demo-policy
namespace: demo
spec:
podSelector:
matchLabels:
app: server
ingress:
- from:
- podSelector:
matchLabels:
app: client
ports:
- port: 80
protocol: TCP
Specify the name of your YAML manifest and apply it by using kubectl apply:
kubectl apply –f demo-policy.yaml
Now, in the client's shell, verify connectivity with the server by running the following /agnhost
command:
/agnhost connect <server-ip>:80 --timeout=3s --protocol=tcp
Connectivity with traffic is blocked because the server is labeled with app=server
, but the client isn't labeled. The preceding connect command yields this output:
TIMEOUT
Run the following command to label the client
and verify connectivity with the server. The output should return nothing.
kubectl label pod client -n demo app=client
Uninstall Azure Network Policy Manager or Calico
Requirements:
- Azure CLI version 2.63 or later
Note
- The uninstall process does not remove Custom Resource Definitions (CRDs) and Custom Resources (CRs) used by Calico. These CRDs and CRs all have names ending with either "projectcalico.org" or "tigera.io". These CRDs and associated CRs can be manually deleted after Calico is successfully uninstalled (deleting the CRDs before removing Calico breaks the cluster).
- The upgrade will not remove any NetworkPolicy resources in the cluster, but after the uninstall these policies are no longer enforced.
Warning
The upgrade process triggers each node pool to be re-imaged simultaneously. Upgrading each node pool separately isn't supported. Any disruptions to cluster networking are similar to a node image upgrade or Kubernetes version upgrade where each node in a node pool is re-imaged.
To remove Azure Network Policy Manager or Calico from a cluster, run the following command:
az aks update
--resource-group $RESOURCE_GROUP_NAME \
--name $CLUSTER_NAME \
--network-policy none
Clean up resources
In this article, you created a namespace and two pods and applied a network policy. To clean up these resources, use the kubectl delete command and specify the resource name:
kubectl delete namespace demo
Next steps
For more information about network resources, see Network concepts for applications in Azure Kubernetes Service (AKS).
To learn more about policies, see Kubernetes network policies.
Azure Kubernetes Service