Platform automation and DevOps for AKS

As a cloud-native construct, Kubernetes requires a cloud-native approach to deployment and operations. Azure and Kubernetes are open and extensible platforms with rich and well-architected APIs, providing opportunity and ability to automate to the full extent. Plan for a DevOps and highly automated approach by relying on automation and general DevOps best practices.

Design considerations

Here are some design considerations for AKS platform automation and DevOps:

  • Consider the Azure service limitations and your continuous integration and continuous delivery (CI/CD) environment when determining your engineering and automation approach. For another example, see the GitHub usage limitations.

  • When securing and protecting access to development, test, Q&A, and production environments, consider security options from a CI/CD perspective. Deployments happen automatically, so map access control accordingly.

  • Consider using prefixes and suffixes with well-defined conventions to uniquely identify every deployed resource. These naming conventions avoid conflicts in deploying solutions next to each other, and improve overall team agility and throughput.

  • Inventory the workflows to support in engineering, updating, and deploying your solution in normal and Disaster Recovery Plan (DRP) regimes. Consider mapping pipelines according to those workflows, maximizing familiarity and productivity.

    Some example scenarios and pipelines to consider are:

    • Deploying, patching, and upgrading clusters
    • Deploying and upgrading applications
    • Deploying and maintaining add-ons
    • Failing over for disaster recovery
    • Blue-green deployments
    • Maintaining canary environments
  • Consider using a service mesh to add more security, encryption, and log capabilities to your workloads.

  • Consider deploying other resources like subscriptions, tagging, and labels to support your DevOps experience by tracking and tracing deployments and related artifacts.

  • Consider the impact of the cattle versus pets paradigm shift. Expect pods and other aspects of Kubernetes to be ephemeral, and align your automation and pipeline infrastructure accordingly. Don't rely on IP addresses or other resources to be fixed or permanent.

Design recommendations

Here are some design recommendations for AKS platform automation and DevOps:

  • Rely on pipelines or actions to:

    • Maximize applied practices across the team.
    • Remove much of the burden of reinventing the wheel.
    • Provide predictability and insights in overall quality and agility.
  • Deploy early and often by using trigger-based and scheduled pipelines. Trigger-based pipelines ensure changes go through proper validation, while scheduled pipelines manage behavior in changing environments.

  • Separate infrastructure deployment from application deployment. Core infrastructure changes less than applications. Treat each type of deployment as a separate flow and pipeline.

  • Deploy using cloud-native options. Use infrastructure as code to deploy infrastructure including the control plane, and use Helm and the Operator pattern in Kubernetes to deploy and maintain Kubernetes native components.

  • Use GitOps to deploy and maintain applications. GitOps uses the Git repository as a single source of truth, avoiding configuration drift and increasing productivity and reliability during rollbacks and related procedures.

  • Use pod-managed identities and Azure Key Vault provider for Secrets Store CSI Driver to protect secrets, certificates, and connection strings.

  • Strive for maximized deployment concurrency by avoiding hardcoded configuration items and settings.

  • Rely on well-known conventions across infrastructure and application-related deployments. Use admission controllers combined with the Azure Policy add-on for Kubernetes to validate and enforce conventions among the other defined policies.

  • Embrace shift left consistently with:

    • Security, by adding vulnerability scanning tools like container scanning early in the pipeline.
    • Policy, by using policy as code and enforcing policies in a cloud-native manner through admission controllers.
  • Treat every failure, error, or outage as an opportunity to automate and improve overall solution quality. Integrate this approach in your shift left and site reliability engineering (SRE) framework.