Monitoring for reliability
Monitoring and diagnostics are crucial for resiliency. If something fails, you need to know that it failed, when it failed — and why.
Checklist
How do you monitor and measure application health?
- The application is instrumented with semantic logs and metrics.
- Application logs are correlated across components.
- All components are monitored and correlated with application telemetry.
- Key metrics, thresholds, and indicators are defined and captured.
- A health model has been defined based on performance, availability, and recovery targets.
- Azure Service Health events are used to alert on applicable service level events.
- Azure Resource Health events are used to alert on resource health events.
- Monitor long-running workflows for failures.
Azure services for monitoring
- Azure Monitor
- Application Insights
- Azure Service Health
- Azure Resource Health
- Azure Resource Manager
- Azure Policy
Reference architecture
Next step
Related links
Feedback
Submit and view feedback for