False unavailability alerts on Virtual Machines

2025-06-06T08:03:32.2+00:00

Hi,

We are receiving some unavailability alerts on monitoring about virtual machines. In not more than a minute, the alert is fixed, and when we check the VM, no error appears in event viewer and all services involved in this VM remain running as expected.

Why we received false unavailability alerts about Virtual machines?

Best regards

Azure Monitor
Azure Monitor
An Azure service that is used to collect, analyze, and act on telemetry data from Azure and on-premises environments.
3,662 questions
0 comments No comments
{count} vote

1 answer

Sort by: Most helpful
  1. Deepanshu katara 16,790 Reputation points MVP Moderator
    2025-06-06T08:22:55.1433333+00:00

    Hello , Welcome to MS Q&A

    It’s not uncommon in Azure and other cloud environments to receive unavailability alerts for virtual machines that resolve quickly and show no signs of issues within the VM itself. These alerts are often the result of transient conditions or platform-level events. Below are some typical causes:


    Common Causes

    1. Transient Platform Events Azure periodically performs underlying host maintenance, network updates, or live migrations. These events can briefly interrupt connectivity or heartbeats, triggering alerts—even though the VM's operating system and services remain unaffected.

    Network Glitches Short-lived network disruptions between the VM and Azure's monitoring infrastructure can cause missed heartbeats or probe failures. These issues often self-correct and leave no trace in the VM’s event logs.

    Monitoring Probe Sensitivity Health probes from Azure Monitor, Load Balancers, or Application Gateways can be sensitive to even minor delays or packet loss. If the probe interval is too short or aggressive, a brief blip in response time can result in a false alert.

    Resource Throttling or “Noisy Neighbor” Effects Temporary resource contention on the host—caused by other co-located workloads—can lead to brief periods of unresponsiveness without any fault in the VM itself.

    False Positives in Monitoring Pipeline Occasionally, false alerts can arise from transient issues within the monitoring agents or Azure's health-check mechanisms, rather than actual unavailability of the VM.


    Recommended Best Practices

    Adjust Alert Thresholds Tune alert conditions to trigger only after sustained unavailability (e.g., multiple missed heartbeats), reducing noise from brief or false events.

    Use Azure Service Health Regularly check Azure Service Health for platform maintenance, outages, or planned updates that may explain the alerts.

    Inspect the Activity Log Look for host-level maintenance, redeployments, or other events in the Azure Activity Log around the time of the alert.

    Correlate with Performance Metrics Review VM-level metrics (CPU, disk, memory, network) to determine if there were any actual resource anomalies during the alert window.

    Ensure Agent Health Keep monitoring agents such as Azure Monitor Agent (AMA) or Log Analytics agents up to date to avoid bugs or telemetry gaps.


    By applying these best practices, you can better differentiate between true availability issues and benign, transient conditions, leading to more reliable and actionable alerting.

    Pls check and let us know , let me know if needed any more help

    Thanks

    Deepanshu


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.