An Azure service that is used to provision Windows and Linux virtual machines.
To distinguish whether the outage was caused by VM host/platform issues or by networking, use a combination of platform diagnostics and in-guest/network checks.
- Check for platform/host issues
- In the Azure portal, open the VM and use Diagnose and Solve problems (or the Boot diagnostics and Activity log) to see:
- Any unplanned maintenance, storage-related forced shutdowns, or auto-recovery events around the time connectivity was lost.
- Any VM reboot or redeploy events that were not initiated manually.
- Review the guidance on VM reboots: Azure may reboot VMs for urgent node or network switch maintenance, storage connectivity issues, or hardware problems. Storage-related forced shutdowns occur if disk connectivity is affected for more than 180 seconds; the VM is then powered back on when storage is healthy again. These are indicators of a host/platform-side issue rather than a guest/network misconfiguration.
- Also check the Azure Service Health dashboard for the East US region to confirm whether there was a regional incident impacting compute or storage at the time of the problem.
If platform logs show forced shutdowns, unplanned maintenance, or storage connectivity issues, this points to a VM host/platform issue.
- Check for network path issues between VMs Since this VM is used as a tunnel and other VMs could not ping through it, validate the network path when the issue occurs again:
- Verify basic connectivity between VMs
- From a source VM, use a TCP-based probe (for example,
tcping) to the tunnel VM on the relevant port:tcping64.exe -t <tunnel VM address> <port> - If this fails, it indicates a network path or guest firewall issue.
- Validate NSG and routing
- On the tunnel VM’s NIC and subnet, confirm NSG rules allow:
- Allow VNet Inbound and Allow Load Balancer Inbound defaults.
- No higher-priority Deny rules blocking the required ports or protocols.
- Use Network Watcher → Connection troubleshoot (connectivity check) from a source VM to the tunnel VM. In the Hops section of the result, review any Issue type entries:
-
NetworkSecurityRule: indicates an NSG rule is blocking traffic. -
UserDefinedRoute: indicates a route table or NVA is misrouting or dropping traffic. -
Guest Firewall: indicates the OS firewall on the tunnel VM is blocking traffic. -
DNS Resolution: indicates name resolution issues.
-
- Fix issues according to the connectivity check output (for example, adjust NSG rules, UDRs, or guest firewall).
If connectivity checks show NSG/UDR/guest firewall/DNS issues but no platform incidents, the problem is likely a network configuration or in-guest issue.
- Check the VM itself (guest OS and performance)
- When connectivity is lost but the VM is still running, use tools like Azure Serial Console (for Linux/Windows) or existing management channels to check:
- CPU and memory usage (to rule out resource exhaustion).
- Whether the tunnel service or application is still listening on the expected ports.
- If the connectivity check flags
CPUorMemoryissues, follow the generic performance troubleshooting guidance for Azure VMs.
- Use restart/redeploy only as recovery, not diagnosis
- Restarting the VM (as done) can temporarily clear both host and guest issues, so it does not by itself distinguish root cause.
- If problems recur and platform diagnostics show no host issues, consider redeploying the VM to a new node. Redeploy moves the VM to different hardware and can clear underlying host/network problems, but note that ephemeral disk data is lost and dynamic IPs may change.
- Practical workflow for the next occurrence When/if the issue happens again:
- Check Azure Service Health and the VM’s Activity log/Diagnose and Solve for host or storage incidents.
- Run Network Watcher connectivity check from a source VM to the tunnel VM and review
Issue typein the result. - Inspect NSGs, UDRs, and guest firewall based on the connectivity check.
- If no network misconfiguration is found and platform logs show maintenance or storage-related shutdowns, treat it as a host/platform issue; if not, focus on the tunnel VM’s OS, services, and performance.
This approach will let you classify future outages as either network path/configuration issues or VM host/platform issues with evidence from Azure diagnostics.
References:
- Troubleshoot Azure VM connectivity problems
- Troubleshoot connectivity problems between Azure virtual machines
- Troubleshoot SSH connections to an Azure Linux VM that fails, errors out, or is refused
- Understand a system reboot for Azure VM
- Troubleshoot application connectivity issues on virtual machines in Azure