Explore operational excellence

Completed

DevOps practices not only cover software build, testing, and delivery, but they also extend to operational aspects of the organizational life. In particular, DevOps can help in reaching operational excellence by following its principles in day-to-day operations. The organization described in the sample scenario would benefit from adopting this approach in order to address its current challenges. In this unit, learn about the core aspects of operational excellence in the context of DevOps.

What is operational excellence?

Diagram showing a monitoring graph representation.

Operational excellence is a set of practices that promote efficiency, resiliency, and continuous improvement in day-to-day operations. The key aspects of operational excellence overlap to a large extent with DevOps practices such as automation, collaboration, continuous improvement, scalability, and flexibility. However, there are a few that is covered here due to their operational significance. These aspects include:

  • Continuous operations: targets creating and maintaining an environment where the need for downtime is minimized or even eliminated.
  • Continuous monitoring, observability: emphasizes the importance of monitoring applications and the underlying infrastructure in real-time. The ultimate objective is to proactively (rather than reactively) detect any impending issues.
  • Health modeling: involves creating models that represent the expected behavior and performance of a target system under different conditions. This serves as the baseline for detecting any anomalies, which might indicate potential issues.
  • Reliability engineering: uses chaos engineering and fault injection practices to apply proactive measures that lead to increased resiliency.
  • Incident management: focuses on efficient incident response and resolution, including well-defined incident management, reliable communication channels, automated remediation, and continuous learning to minimize the possibility of recurring issues.
  • Security integration: incorporates security practices into the operations lifecycle.
  • Shift-right testing: uses practices such as dark launching and feature flags in the production environment.