Episode

Armchair Architects: Using Chaos Engineering to plan for specific failure conditions

with David Blank-Edelman

Chaos Engineering: What is it? Who should do it? What can you learn? How do you get started? Our esteemed #ArmchairArchitects, Uli and Eric, join David for a lively discussion of Chaos Engineering from an architect’s point-of-view on the #AzureEnablementShow.

Related Episodes:

Chapters

  • 00:00 - Introduction
  • 01:20 - How do architects define and think about Chaos Engineering?
  • 02:35 - The cloud lifecycle and application lifecycle are independent of each other and can clash and cause failure
  • 03:48 - Should you do Chaos Engineering in production or development?
  • 04:20 - What does it mean to introduce failure?
  • 05:54 - Applying the scientific method to Chaos Engineering and some examples of what to test
  • 07:00 - Uli expands the list of “fun” tests to run, including what happens when services reappear after an outage
  • 08:10 - Using the Netflix example to show how a simple fallback can increase robustness
  • 09:09 - Create a database of past incidents to use when creating a Chaos Testing repertoire.
  • 09:55 - Start with the primitive testing then evolve your repertoire to include consider real-world events
  • 11:32 - Chaos Testing is when you test, not just what you test
  • 14:00 - How do you know if your application and environment are mature enough to test in a production environment?

Connect

Azure