Azure Kubernetes Network Policies
Overview
Network Policies provides micro-segmentation for pods just like Network Security Groups (NSGs) provide micro-segmentation for VMs. The Azure Network Policy Manager (also known as Azure NPM) implementation supports the standard Kubernetes Network Policy specification. You can use labels to select a group of pods and define a list of ingress and egress rules to filter traffic to and from these pods. Learn more about the Kubernetes network policies in the Kubernetes documentation.
Azure NPM implementation works with the Azure CNI that provides VNet integration for containers. NPM is supported on Linux and Windows Server 2022. The implementation enforces traffic filtering by configuring allow and deny IP rules based on the defined policies in Linux IPTables or Host Network Service(HNS) ACLPolicies for Windows Server 2022.
Planning security for your Kubernetes cluster
When implementing security for your cluster, use network security groups (NSGs) to filter traffic entering and leaving your cluster subnet (North-South traffic). Use Azure NPM for traffic between pods in your cluster (East-West traffic).
Using Azure NPM
Azure NPM can be used in the following ways to provide micro-segmentation for pods.
Azure Kubernetes Service (AKS)
NPM is available natively in AKS and can be enabled at the time of cluster creation. Learn more about it in Secure traffic between pods using network policies in Azure Kubernetes Service (AKS).
Do it yourself (DIY) Kubernetes clusters in Azure
For DIY clusters, first install the CNI plug-in and enable it on every virtual machine in a cluster. For detailed instructions, see Deploy the plug-in for a Kubernetes cluster that you deploy yourself.
Once the cluster is deployed run the following kubectl
command to download and apply the Azure NPM daemon set to the cluster.
For Linux:
kubectl apply -f https://github.com/Azure/azure-container-networking/blob/master/npm/azure-npm.yaml
For Windows:
kubectl apply -f https://github.com/Azure/azure-container-networking/blob/master/npm/examples/windows/azure-npm.yaml
The solution is also open source and the code is available on the Azure Container Networking repository.
Monitor and Visualize Network Configurations with Azure NPM
Azure NPM includes informative Prometheus metrics that allow you to monitor and better understand your configurations. It provides built-in visualizations in either the Azure portal or Grafana Labs. You can start collecting these metrics using either Azure Monitor or a Prometheus Server.
Benefits of Azure NPM Metrics
Users previously were only able to learn about their Network Configuration with iptables
and ipset
commands run inside a cluster node, which yields a verbose and difficult to understand output.
Overall, the metrics provide:
- counts of policies, ACL rules, ipsets, ipset entries, and entries in any given ipset
- execution times for individual OS calls and for handling kubernetes resource events (median, 90th percentile, and 99th percentile)
- failure info for handling kubernetes resource events (these will fail when an OS call fails)
Example Metrics Use Cases
Alerts via a Prometheus AlertManager
See a configuration for these alerts below.
- Alert when NPM has a failure with an OS call or when translating a Network Policy.
- Alert when the median time to apply changes for a create event was more than 100 milliseconds.
Visualizations and Debugging via our Grafana Dashboard or Azure Monitor Workbook
- See how many IPTables rules your policies create (having a massive amount of IPTables rules may increase latency slightly).
- Correlate cluster counts (for example, ACLs) to execution times.
- Get the human-friendly name of an ipset in a given IPTables rule (for example, "azure-npm-487392" represents "podlabel-role:database").
All supported metrics
The following is the list of supported metrics. Any quantile
label has possible values 0.5
, 0.9
, and 0.99
. Any had_error
label has possible values false
and true
, representing whether the operation succeeded or failed.
Metric Name | Description | Prometheus Metric Type | Labels |
---|---|---|---|
npm_num_policies |
number of network policies | Gauge | - |
npm_num_iptables_rules |
number of IPTables rules | Gauge | - |
npm_num_ipsets |
number of IPSets | Gauge | - |
npm_num_ipset_entries |
number of IP address entries in all IPSets | Gauge | - |
npm_add_iptables_rule_exec_time |
runtime for adding an IPTables rule | Summary | quantile |
npm_add_ipset_exec_time |
runtime for adding an IPSet | Summary | quantile |
npm_ipset_counts (advanced) |
number of entries within each individual IPSet | GaugeVec | set_name & set_hash |
npm_add_policy_exec_time |
runtime for adding a network policy | Summary | quantile & had_error |
npm_controller_policy_exec_time |
runtime for updating/deleting a network policy | Summary | quantile & had_error & operation (with values update or delete ) |
npm_controller_namespace_exec_time |
runtime for creating/updating/deleting a namespace | Summary | quantile & had_error & operation (with values create , update , or delete ) |
npm_controller_pod_exec_time |
runtime for creating/updating/deleting a pod | Summary | quantile & had_error & operation (with values create , update , or delete ) |
There are also "exec_time_count" and "exec_time_sum" metrics for each "exec_time" Summary metric.
The metrics can be scraped through Azure Monitor for containers or through Prometheus.
Set up for Azure Monitor
The first step is to enable Azure Monitor for containers for your Kubernetes cluster. Steps can be found in Azure Monitor for containers Overview. Once you have Azure Monitor for containers enabled, configure the Azure Monitor for containers ConfigMap to enable NPM integration and collection of Prometheus NPM metrics. Azure Monitor for containers ConfigMap has an integrations
section with settings to collect NPM metrics. These settings are disabled by default in the ConfigMap. Enabling the basic setting collect_basic_metrics = true
, will collect basic NPM metrics. Enabling advanced setting collect_advanced_metrics = true
will collect advanced metrics in addition to basic metrics.
After editing the ConfigMap, save it locally and apply the ConfigMap to your cluster as follows.
kubectl apply -f container-azm-ms-agentconfig.yaml
Below is a snippet from the Azure Monitor for containers ConfigMap, which shows the NPM integration enabled with advanced metrics collection.
integrations: |-
[integrations.azure_network_policy_manager]
collect_basic_metrics = false
collect_advanced_metrics = true
Advanced metrics are optional, and turning them on will automatically turn on basic metrics collection. Advanced metrics currently include only npm_ipset_counts
Learn more about Azure Monitor for containers collection settings in config map
Visualization Options for Azure Monitor
Once NPM metrics collection is enabled, you can view the metrics in the Azure portal using Container Insights or in Grafana.
Viewing in Azure portal under Insights for the cluster
Open Azure portal. Once in your cluster's Insights, navigate to "Workbooks" and open "Network Policy Manager (NPM) Configuration".
Besides viewing the workbook (pictures below), you can also directly query the Prometheus metrics in "Logs" under the Insights section. For example, this query will return all the metrics being collected. | where TimeGenerated > ago(5h) | where Name contains "npm_"
You can also query Log Analytics directly for the metrics. Learn more about it with Getting Started with Log Analytics Queries
Viewing in Grafana Dashboard
Set up your Grafana Server and configure a Log Analytics Data Source as described here. Then, import Grafana Dashboard with a Log Analytics backend into your Grafana Labs.
The dashboard has visuals similar to the Azure Workbook. You can add panels to chart & visualize NPM metrics from InsightsMetrics table.
Set up for Prometheus Server
Some users may choose to collect metrics with a Prometheus Server instead of Azure Monitor for containers. You merely need to add two jobs to your scrape config to collect NPM metrics.
To install a Prometheus Server, add this helm repo on your cluster
helm repo add stable https://kubernetes-charts.storage.googleapis.com
helm repo update
then add a server
helm install prometheus stable/prometheus -n monitoring \
--set pushgateway.enabled=false,alertmanager.enabled=false, \
--set-file extraScrapeConfigs=prometheus-server-scrape-config.yaml
where prometheus-server-scrape-config.yaml
consists of
- job_name: "azure-npm-node-metrics"
metrics_path: /node-metrics
kubernetes_sd_configs:
- role: node
relabel_configs:
- source_labels: [__address__]
action: replace
regex: ([^:]+)(?::\d+)?
replacement: "$1:10091"
target_label: __address__
- job_name: "azure-npm-cluster-metrics"
metrics_path: /cluster-metrics
kubernetes_sd_configs:
- role: service
relabel_configs:
- source_labels: [__meta_kubernetes_namespace]
regex: kube-system
action: keep
- source_labels: [__meta_kubernetes_service_name]
regex: npm-metrics-cluster-service
action: keep
# Comment from here to the end to collect advanced metrics: number of entries for each IPSet
metric_relabel_configs:
- source_labels: [__name__]
regex: npm_ipset_counts
action: drop
You can also replace the azure-npm-node-metrics
job with the content below or incorporate it into a pre-existing job for Kubernetes pods:
- job_name: "azure-npm-node-metrics-from-pod-config"
metrics_path: /node-metrics
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [__meta_kubernetes_namespace]
regex: kube-system
action: keep
- source_labels: [__meta_kubernetes_pod_annotationpresent_azure_npm_scrapeable]
action: keep
- source_labels: [__address__]
action: replace
regex: ([^:]+)(?::\d+)?
replacement: "$1:10091"
target_label: __address__
Set up Alerts for AlertManager
If you use a Prometheus Server, you can set up an AlertManager like so. Here's an example config for the two alerting rules described above:
groups:
- name: npm.rules
rules:
# fire when NPM has a new failure with an OS call or when translating a Network Policy (suppose there's a scraping interval of 5m)
- alert: AzureNPMFailureCreatePolicy
# this expression says to grab the current count minus the count 5 minutes ago, or grab the current count if there was no data 5 minutes ago
expr: (npm_add_policy_exec_time_count{had_error='true'} - (npm_add_policy_exec_time_count{had_error='true'} offset 5m)) or npm_add_policy_exec_time_count{had_error='true'}
labels:
severity: warning
addon: azure-npm
annotations:
summary: "Azure NPM failed to handle a policy create event"
description: "Current failure count since NPM started: {{ $value }}"
# fire when the median time to apply changes for a pod create event is more than 100 milliseconds.
- alert: AzureNPMHighControllerPodCreateTimeMedian
expr: topk(1, npm_controller_pod_exec_time{operation="create",quantile="0.5",had_error="false"}) > 100.0
labels:
severity: warning
addon: azure-npm
annotations:
summary: "Azure NPM controller pod create time median > 100.0 ms"
# could have a simpler description like the one for the alert above,
# but this description includes the number of pod creates that were handled in the past 10 minutes,
# which is the retention period for observations when calculating quantiles for a Prometheus Summary metric
description: "value: [{{ $value }}] and observation count: [{{ printf `(npm_controller_pod_exec_time_count{operation='create',pod='%s',had_error='false'} - (npm_controller_pod_exec_time_count{operation='create',pod='%s',had_error='false'} offset 10m)) or npm_controller_pod_exec_time_count{operation='create',pod='%s',had_error='false'}` $labels.pod $labels.pod $labels.pod | query | first | value }}] for pod: [{{ $labels.pod }}]"
Visualization Options for Prometheus
When using a Prometheus Server only Grafana Dashboard is supported.
If you haven't already, set up your Grafana Server and configure a Prometheus Data Source. Then, import our Grafana Dashboard with a Prometheus backend into your Grafana Labs.
The visuals for this dashboard are identical to the dashboard with a Container Insights/Log Analytics backend.
Sample Dashboards
Following are some sample dashboard for NPM metrics in Container Insights (CI) and Grafana
CI Summary Counts
CI Counts over Time
CI IPSet Entries
CI Runtime Quantiles
Grafana Dashboard Summary Counts
Grafana Dashboard Counts over Time
Grafana Dashboard IPSet Entries
Grafana Dashboard Runtime Quantiles
Next steps
- Learn about Azure Kubernetes Service.
- Learn about container networking.
- Deploy the plug-in for Kubernetes clusters or Docker containers.
Feedback
Submit and view feedback for