Episode

Armchair Architects: Using Chaos Engineering to plan for specific failure conditions

with David Blank-Edelman

Chaos Engineering: What is it? Who should do it? What can you learn? How do you get started? Our esteemed #ArmchairArchitects, Uli and Eric, join David for a lively discussion of Chaos Engineering from an architect’s point-of-view on the #AzureEnablementShow.

Related Episodes:

Chapters

00:00 - Introduction
01:20 - How do architects define and think about Chaos Engineering?
02:35 - The cloud lifecycle and application lifecycle are independent of each other and can clash and cause failure
03:48 - Should you do Chaos Engineering in production or development?
04:20 - What does it mean to introduce failure?
05:54 - Applying the scientific method to Chaos Engineering and some examples of what to test
07:00 - Uli expands the list of “fun” tests to run, including what happens when services reappear after an outage
08:10 - Using the Netflix example to show how a simple fallback can increase robustness
09:09 - Create a database of past incidents to use when creating a Chaos Testing repertoire.
09:55 - Start with the primitive testing then evolve your repertoire to include consider real-world events
11:32 - Chaos Testing is when you test, not just what you test
14:00 - How do you know if your application and environment are mature enough to test in a production environment?

Recommended resources

Connect

Related Episodes:

Chapters

00:00 - Introduction
01:20 - How do architects define and think about Chaos Engineering?
02:35 - The cloud lifecycle and application lifecycle are independent of each other and can clash and cause failure
03:48 - Should you do Chaos Engineering in production or development?
04:20 - What does it mean to introduce failure?
05:54 - Applying the scientific method to Chaos Engineering and some examples of what to test
07:00 - Uli expands the list of “fun” tests to run, including what happens when services reappear after an outage
08:10 - Using the Netflix example to show how a simple fallback can increase robustness
09:09 - Create a database of past incidents to use when creating a Chaos Testing repertoire.
09:55 - Start with the primitive testing then evolve your repertoire to include consider real-world events
11:32 - Chaos Testing is when you test, not just what you test
14:00 - How do you know if your application and environment are mature enough to test in a production environment?

Recommended resources

Connect

Azure