Episode
Armchair Architects: Using Chaos Engineering to plan for specific failure conditions
with David Blank-Edelman
Chaos Engineering: What is it? Who should do it? What can you learn? How do you get started? Our esteemed #ArmchairArchitects, Uli and Eric, join David for a lively discussion of Chaos Engineering from an architect’s point-of-view on the #AzureEnablementShow.
Related Episodes:
Chapters
- 00:00 - Introduction
- 01:20 - How do architects define and think about Chaos Engineering?
- 02:35 - The cloud lifecycle and application lifecycle are independent of each other and can clash and cause failure
- 03:48 - Should you do Chaos Engineering in production or development?
- 04:20 - What does it mean to introduce failure?
- 05:54 - Applying the scientific method to Chaos Engineering and some examples of what to test
- 07:00 - Uli expands the list of “fun” tests to run, including what happens when services reappear after an outage
- 08:10 - Using the Netflix example to show how a simple fallback can increase robustness
- 09:09 - Create a database of past incidents to use when creating a Chaos Testing repertoire.
- 09:55 - Start with the primitive testing then evolve your repertoire to include consider real-world events
- 11:32 - Chaos Testing is when you test, not just what you test
- 14:00 - How do you know if your application and environment are mature enough to test in a production environment?
Recommended resources
- Azure Architecture Center
- Microsoft Azure Well-Architected Framework
- Chaos engineering
- Reliability documentation
- Azure Chaos Studio documentation
Connect
Chaos Engineering: What is it? Who should do it? What can you learn? How do you get started? Our esteemed #ArmchairArchitects, Uli and Eric, join David for a lively discussion of Chaos Engineering from an architect’s point-of-view on the #AzureEnablementShow.
Related Episodes:
Chapters
- 00:00 - Introduction
- 01:20 - How do architects define and think about Chaos Engineering?
- 02:35 - The cloud lifecycle and application lifecycle are independent of each other and can clash and cause failure
- 03:48 - Should you do Chaos Engineering in production or development?
- 04:20 - What does it mean to introduce failure?
- 05:54 - Applying the scientific method to Chaos Engineering and some examples of what to test
- 07:00 - Uli expands the list of “fun” tests to run, including what happens when services reappear after an outage
- 08:10 - Using the Netflix example to show how a simple fallback can increase robustness
- 09:09 - Create a database of past incidents to use when creating a Chaos Testing repertoire.
- 09:55 - Start with the primitive testing then evolve your repertoire to include consider real-world events
- 11:32 - Chaos Testing is when you test, not just what you test
- 14:00 - How do you know if your application and environment are mature enough to test in a production environment?
Recommended resources
- Azure Architecture Center
- Microsoft Azure Well-Architected Framework
- Chaos engineering
- Reliability documentation
- Azure Chaos Studio documentation
Connect
Have feedback? Submit an issue here.