Edit

Share via


Secure your operations

Operational security is important for maintaining control over your Kubernetes clusters and responding to emerging threats. This article covers best practices for managing access, enforcing policies, monitoring activity, and responding to incidents. By implementing strong operational controls, you can help ensure that only authorized users and processes can make changes to your clusters and workloads.

Control who can use the Azure control plane to manage your cluster

It’s important to control access to the Azure control plane, including the subscription and resources that enable Azure Arc cloud management of your edge clusters. Review and implement the Azure access control best practices, including multifactor authentication and conditional access. Use Azure RBAC to control who can perform which operations with which clusters in just the same way you may already use for controlling access to your other Azure cloud resources.

Further, Microsoft-generated certificates are used to help secure the connection between your edge clusters and the Azure control plane. These certificates are stored as Kubernetes secrets, so it’s important that the Kubernetes secret store is itself encrypted.

Control who can deploy to your cluster with Role Based Access Control (RBAC)

It’s also important to control access to the Kubernetes control plane (API server) itself, which is the means by which you can deploy and monitor your Kubernetes workloads.

For nonhuman access to the API server from workloads, use Kubernetes’ built-in RBAC to authorize only the specific service accounts that require it. See also the advice on issuing and protecting these service accounts.

For human access to the API server, Kubernetes doesn’t have built-in user accounts. So we recommended integrating it with an external user account service such as Microsoft Entra ID. You can then create authorization policies that use these identities to control who can do what in which namespaces. You can author these policies either using Kubernetes’ built-in RBAC or using Azure RBAC. Azure RBAC is recommended if you want to consistently manage and audit all your user authorization policies together in one central place, covering both your cloud and edge resources. It's easy to use Azure RBAC if you’re running AKS enabled by Azure Arc on Azure Local or if you connect your own cluster. Your users can then use their Entra ID account to access the cluster (its API server) either directly or via an Azure proxy using the "cluster connect" capability. We recommend taking a "least privilege" approach of assigning each user or workload a role that has the minimum permissions required.

More generally, follow standard best practice in separating development, test, and production clusters. And consider if production deployments to your clusters would be more reliably and securely managed by using a GitOps approach. If you follow this approach, then it’s also important to implement similar strong role-based access control for changes (pull requests) on the underlying source Git repository and branch used to define your deployments.

Finally, if you’re running AKS enabled by Azure Arc on Azure Local then you can also download an admin client certificate for full admin access. It isn't typically necessary to use this certificate, so only download it when required: for example, to diagnose issues that can’t be investigated any another way. You should also use this approach with care because it doesn’t use an Entra ID account and the per-user policies that you set up. Further, you must carefully store the client certificate and then delete when no longer required.

References

Follow a secure container lifecycle as you deploy and run containers with Azure Policy for Kubernetes

Continue to follow the Microsoft Containers Secure Supply Chain framework through the deploy phase. (See also the guidance for the acquire, catalog, and build phases.) This framework helps you deploy only from your own trusted registries, such as Azure Container Registry. Use the registry’s access control mechanisms to ensure that only trusted clusters pull containers that may contain sensitive information. Azure Container Registry supports both Role Based Access Control (RBAC) and Attribute Based Access Control (ABAC) to further scope assignments to specific repositories.

Additionally, enforce best practice standards for container security hygiene through Azure Policy. For example, you can validate that all pods meet the Pod Security Standards in a low-code approach by using Azure Policy's built-in definitions. You can also deploy the Azure Policy extension, which extends Gatekeeper, to your edge Kubernetes cluster to apply pod-based security enforcement at-scale. We recommend that you first apply policy assignments in "audit" mode. This mode provides an aggregated list of noncompliant results at a per-Kubernetes resource, per-policy granularity, allowing you to spot and remediate any existing issues with your running deployments first. Once you fix the noncompliant violations in your environment, you can then update the policy assignment to "deny" mode. Azure Policy’s rich safe-deployment mechanisms then roll out this policy enforcement gradually across resources. By applying policies in enforcement mode, you actively prevent any further deviations.

References

Detect emerging threats including monitoring control plane changes

Help ensure you have a way to detect threats as they arise in your clusters.

You can deploy the Defender for Containers extension to your Kubernetes cluster at the edge. This extension includes a sensor that collects logs and sends them to Defender for Cloud. Once there, they can be analyzed for anomalous behaviors that might indicate an attack or used for forensics after a possible incident. See the support matrix for which features are supported, as a preview or GA release, on which cluster types. In turn, Defender for Cloud can send events for analysis as part of Microsoft Defender XDR incident detection and response solution.

If you’re running on AKS enabled by Azure Arc on Azure Local, you can also configure it to send Kubernetes audit logs to Azure Monitor (Log Analytics Workspace). Also follow the related advice for monitoring your workloads themselves. And look at the best practices for monitoring clusters, which covers reliability, cost optimization, performance, and security.

In addition to monitoring, look to build an incident response plan and practice using it. The details of such a plan depend greatly on your overall deployment environment, and the security operations tools you use: see this guidance for more. But at minimum, think about how you’d preserve your cluster’s state (retain audit logs, snapshot suspicious states) and how you’d recover it to a known-good state.

References

Use deployment strategies to achieve zero-downtime updates

Critical security updates shouldn't compromise the reliability and availability of your workloads, even when rolled out urgently. Choose a Kubernetes deployment strategy that best helps maintain high availability in your environment. Consider also implementing readiness- and liveness-probes allow Kubernetes to better learn about the state of your workloads as it maintains your deployment. Combined with gradual rollouts and traffic management policies at your ingress load-balancer, you can use Kubernetes to drive updates without interrupting the availability of your applications.

Next steps