Monitoring for reliability
Monitoring and diagnostics are crucial for reliability. If something fails, you need to know that it failed, when it failed, and why.
Checklist
How do you monitor and measure application health?
- The application is instrumented with semantic logs and metrics.
- Application logs are correlated across components.
- All components are monitored and correlated with application telemetry.
- Key metrics, thresholds, and indicators are defined and captured.
- A health model has been defined based on performance, availability, and recovery targets.
- Azure Service Health events are used to alert on applicable service level events.
- Azure Resource Health events are used to alert on resource health events.
- Monitor long-running workflows for failures.
Azure services for monitoring
- Azure Monitor
- Application Insights
- Azure Service Health
- Azure Resource Health
- Azure Resource Manager
- Azure Policy
Reference architecture
Related links
Next steps
Feedback
Submit and view feedback for