How to fix virtual machine connection unknown to On-Prem Active Directory on VM start?

Ahmed Madhun 21 Reputation points
2024-04-24T19:19:58.1+00:00

I have an On-Prem Active Directory connected to my Azure subscription via VPN Gateway.

All the Virtual Machines in Azure have successfully joined the Active Directory and work fine in terms of AD policy, rules, users, etc. The VMs in Azure are auto-stopped automatically daily after work hours.

My issue happens when the VMs are started the next day using a pipeline in Azure DevOps that runs Az Powershell against the tenant and starts up the VMs. Where some VMs, RANDOMLY, often not always, get the following screen on RDP: (Keep reading the post, the error message below is misleading, it is not a policy issue, it is rather a connection issue)
User's image

It seems like the VMs with this error message are unable to connect to Active Directory on-prem. But this is weird as other VMs did already manage to get the connection with Active Directory correctly. The solution to fix this issue is to restart the VM from the Azure portal and it will go back to normal and work fine. In some cases, we had to do the restart twice for the same VM, as the first start did not solve the issue. Basically, restart the VM from the portal until it has the connection back to Active Directory.

Also, in some cases, if a VNet has 10 VMs and all started in one PowerShell process from Azure DevOps, this problem happens to maybe 2 or 3 VMs, where the rest of the VMs work fine.

The issue happens often, not every time, not on every VM in Azure at the same time, it is kind of random and hard to spot and debug. But it happens often and it is starting to be annoying as we need to restart the affected VMs manually.

Any suggestion of how to go further and attempt to fix this issue?

Active Directory
Active Directory
A set of directory-based technologies included in Windows Server.
5,898 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. Khaled El-Sayed Mohamed 1,150 Reputation points
    2024-04-28T10:13:56.05+00:00

    Hi AM

    The issue you're facing with intermittent connectivity problems to your on-prem Active Directory from Azure VMs can be quite challenging to troubleshoot, especially when it seems random and not consistent across all VMs. Here are some steps and suggestions to help you diagnose and potentially resolve the issue:

    1. Check Network Connectivity : Ensure that there are no network issues between the Azure VMs and your on-premises network. Verify that the VPN connection between Azure and your on-premises network is stable and reliable. You can use network monitoring tools or diagnostic logs to identify any connectivity issues.
    2. Review Azure DevOps Pipeline : Double-check your Azure DevOps pipeline to ensure that it's correctly starting up the VMs and not causing any issues during the process. Make sure that the PowerShell script used to start the VMs is error-free and properly handles any exceptions or errors that may occur.
    3. Investigate VM Startup Process : Look into the startup process of the affected VMs to see if there are any errors or warnings logged during boot-up. Check the system logs, event viewer, and Azure diagnostic logs for any clues about why the VMs are failing to connect to Active Directory.
    4. Examine VM Configuration : Verify that the affected VMs have the correct DNS settings configured to point to your on-premises Active Directory domain controllers. Ensure that the VMs are configured to use the internal DNS servers for name resolution.
    5. Review Active Directory Configuration : Check the health and configuration of your on-premises Active Directory environment. Look for any issues with domain controllers, DNS servers, replication, or trust relationships that could be impacting connectivity from Azure VMs.
    6. Monitor Resource Utilization : Monitor resource utilization on the affected VMs during startup to see if there are any spikes or anomalies that could be causing performance issues. High CPU, memory, or disk usage could indicate underlying issues that need to be addressed.
    7. Consider VM Restart Script : Since restarting the VM from the Azure portal seems to temporarily resolve the issue, you could consider implementing a script or automation process to automatically restart the affected VMs when connectivity problems are detected. This could help mitigate the impact of the issue until a permanent solution is found.
    8. Engage Azure Support : If you're unable to identify the root cause of the issue, consider reaching out to Microsoft Azure support for assistance. They can help you troubleshoot and diagnose the problem further, especially if it involves deeper investigation or configuration changes at the Azure infrastructure level.
    1 person found this answer helpful.
    0 comments No comments