Share via

ACA-Azure-Velo VM

Andrew Valle 60 Reputation points
2026-03-20T18:01:10.32+00:00

I was having issues earlier with the linux VM. I was not able to communicate to it and since it's used as a tunnel to other sites the other VMs were not pinging.

I know the East US DC was having issues but not sure if it affects this VM.

The only way to get connection back was to reboot the VM.

I would love some help to know if it's a network issue or a VM host issue.

Azure Virtual Machines
Azure Virtual Machines

An Azure service that is used to provision Windows and Linux virtual machines.

0 comments No comments

Answer accepted by question author
  1. Q&A Assist
    2026-03-20T18:01:30.2733333+00:00
    AI generated content. This question contains an answer created with AI from Q&A Assist. Learn more

    To distinguish whether the outage was caused by VM host/platform issues or by networking, use a combination of platform diagnostics and in-guest/network checks.

    1. Check for platform/host issues
    • In the Azure portal, open the VM and use Diagnose and Solve problems (or the Boot diagnostics and Activity log) to see:
      • Any unplanned maintenance, storage-related forced shutdowns, or auto-recovery events around the time connectivity was lost.
      • Any VM reboot or redeploy events that were not initiated manually.
    • Review the guidance on VM reboots: Azure may reboot VMs for urgent node or network switch maintenance, storage connectivity issues, or hardware problems. Storage-related forced shutdowns occur if disk connectivity is affected for more than 180 seconds; the VM is then powered back on when storage is healthy again. These are indicators of a host/platform-side issue rather than a guest/network misconfiguration.
    • Also check the Azure Service Health dashboard for the East US region to confirm whether there was a regional incident impacting compute or storage at the time of the problem.

    If platform logs show forced shutdowns, unplanned maintenance, or storage connectivity issues, this points to a VM host/platform issue.

    1. Check for network path issues between VMs Since this VM is used as a tunnel and other VMs could not ping through it, validate the network path when the issue occurs again:
    1. Verify basic connectivity between VMs
    • From a source VM, use a TCP-based probe (for example, tcping) to the tunnel VM on the relevant port:
        tcping64.exe -t <tunnel VM address> <port>
      
    • If this fails, it indicates a network path or guest firewall issue.
    1. Validate NSG and routing
    • On the tunnel VM’s NIC and subnet, confirm NSG rules allow:
      • Allow VNet Inbound and Allow Load Balancer Inbound defaults.
      • No higher-priority Deny rules blocking the required ports or protocols.
    • Use Network Watcher → Connection troubleshoot (connectivity check) from a source VM to the tunnel VM. In the Hops section of the result, review any Issue type entries:
      • NetworkSecurityRule: indicates an NSG rule is blocking traffic.
      • UserDefinedRoute: indicates a route table or NVA is misrouting or dropping traffic.
      • Guest Firewall: indicates the OS firewall on the tunnel VM is blocking traffic.
      • DNS Resolution: indicates name resolution issues.
    • Fix issues according to the connectivity check output (for example, adjust NSG rules, UDRs, or guest firewall).

    If connectivity checks show NSG/UDR/guest firewall/DNS issues but no platform incidents, the problem is likely a network configuration or in-guest issue.

    1. Check the VM itself (guest OS and performance)
    • When connectivity is lost but the VM is still running, use tools like Azure Serial Console (for Linux/Windows) or existing management channels to check:
      • CPU and memory usage (to rule out resource exhaustion).
      • Whether the tunnel service or application is still listening on the expected ports.
    • If the connectivity check flags CPU or Memory issues, follow the generic performance troubleshooting guidance for Azure VMs.
    1. Use restart/redeploy only as recovery, not diagnosis
    • Restarting the VM (as done) can temporarily clear both host and guest issues, so it does not by itself distinguish root cause.
    • If problems recur and platform diagnostics show no host issues, consider redeploying the VM to a new node. Redeploy moves the VM to different hardware and can clear underlying host/network problems, but note that ephemeral disk data is lost and dynamic IPs may change.
    1. Practical workflow for the next occurrence When/if the issue happens again:
    2. Check Azure Service Health and the VM’s Activity log/Diagnose and Solve for host or storage incidents.
    3. Run Network Watcher connectivity check from a source VM to the tunnel VM and review Issue type in the result.
    4. Inspect NSGs, UDRs, and guest firewall based on the connectivity check.
    5. If no network misconfiguration is found and platform logs show maintenance or storage-related shutdowns, treat it as a host/platform issue; if not, focus on the tunnel VM’s OS, services, and performance.

    This approach will let you classify future outages as either network path/configuration issues or VM host/platform issues with evidence from Azure diagnostics.


    References:

1 additional answer

Sort by: Most helpful
  1. Jilakara Hemalatha 12,100 Reputation points Microsoft External Staff Moderator
    2026-03-20T19:16:06.05+00:00

    Thank you for reaching out Q/A and clearly explaining what happened. I understand how disruptive this would have been, especially since your Velo VM is acting as a tunnel and impacting connectivity for the dependent VMs.

    Based on the behavior you described, there are two likely possibilities. One is a transient platform-level issue, particularly since you mentioned there were concerns in the East US region at that time. In such cases, the VM may remain in a running state but lose network connectivity at the host layer, and a reboot typically restores connectivity — which aligns with what you observed.

    The second possibility is within the VM itself. It’s possible that the network interface or a background service (such as the tunneling process) became unresponsive. A reboot would reset these components and bring the system back to a healthy state.

    As a starting point, I would recommend reviewing the Boot Diagnostics screenshot and logs from the Azure portal to confirm that the VM completed the boot process successfully and that there are no operating system-level errors during startup. This can help validate whether the OS and its services are functioning as expected.

    Reference: How to use boot diagnostics to troubleshoot virtual machines in Azure https://learn.microsoft.com/en-us/azure/virtual-machines/boot-diagnostics

    Please also check the VM Resource Health status in the Azure portal to verify whether any platform-related issues were reported during that timeframe.

    Reference: https://learn.microsoft.com/en-us/azure/service-health/resource-health-overview

    To further isolate the issue, you may use Azure Network Watcher to validate the network path. Tools such as Connection Troubleshoot or IP Flow Verify can help determine whether traffic is being allowed or blocked by network rules.

    Reference: https://learn.microsoft.com/en-us/azure/network-watcher/connection-troubleshoot-overview

    Additionally, it would be helpful to review VM metrics (CPU, memory, and network) around the time of the issue to rule out any resource constraints or unusual spikes.

    Could you please check the private message and provide necessary details also provide below details:

    1. What was the approximate time of the issue (UTC), and how long did the VM remain unreachable?
    2. Was the VM completely unreachable, or were you able to access it via Serial Console / Run Command from the Azure portal?
    3. Did this impact only external connectivity, or was internal VNet communication also affected?
    4. Have there been any recent changes on the VM (updates, patches, configuration changes, or tunnel software modifications)?
    5. Is this a one-time occurrence, or have you noticed similar connectivity drops in the past?
    6. Are there any custom routing configurations (UDRs) or IP forwarding enabled on this VM since it is acting as a tunnel?
    7. Do the system logs (syslog/dmesg) show any network, kernel, or service-related errors around the time of the issue?

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.