Testing for reliability
Regular testing should be performed as part of each major change and if possible, on a regular basis to validate existing thresholds, targets and assumptions. Testing should also ensure the validity of the health model, capacity model, and operational procedures.
Checklist
Have you tested your applications with reliability in mind?
- Test regularly to validate existing thresholds, targets and assumptions.
- Automate testing as much as possible.
- Perform testing on both key test environments with the production environment.
- Perform chaos testing by injecting faults.
- Create and test a disaster recovery plan on a regular basis using key failure scenarios.
- Design disaster recovery strategy to run most applications with reduced functionality.
- Design a backup strategy that is tailored to business requirements and circumstances of the application.
- Test and validate the failover and failback approach successfully at least once.
- Configure request timeouts to manage inter-component calls.
- Implement retry logic to handle transient application failures and transient failures with internal or external dependencies.
- Configure and test health probes for your load balancers and traffic managers.
- Apply chaos principles continuously.
- Create and organize a central chaos engineering team.
Azure services
Reference architecture
- Failure Mode Analysis for Azure applications
- High availability and disaster recovery scenarios for IaaS apps
- Back up files and applications on Azure Stack Hub
Next step
Related links
- For information on performance testing, see Performance testing.
- For information on chaos engineering, see Chaos engineering.
- For information on failure and disaster recovery, see Failure and disaster recovery for Azure applications.
- For information on testing applications, see Testing your application and Azure environment.
Feedback
Submit and view feedback for