Windows Server 2016 VM stops responding to RDP, Console hangs at profile service, eventually loses IP/NIC

Kenneth Kirchner 96 Reputation points
2021-04-16T03:48:52.53+00:00

Hello all,

We have an issue on a few of our Windows 2016 servers that started sometime in the last few months and seems to be getting progressively worse. Resetting the system seems to right it for a few days, but inevitably it will slide into the same useless state and require another hard reset. So far the system has bounced back, with sometimes nothing more than a chkdsk, but on our SQL servers this can sometimes take a few minutes of recovery.

These systems run well for a few days, but then we notice that we can no longer connect via RDP. If we try to log in on the VM console, it will usually hang on the "Waiting for user profile service" but that never resolves and the console is stuck on that login until reset. The SQL or web service on the VM continue to run as if there is no problem for several hours, but eventually we will notice the IP address that vCenter shows for the server disappears and the box is now completely isolated. We have to hard reset to restore service.

I have ran SFC on all of these servers and there is no corruption reported. I ran the DISM tools and it does report the component store can be repaired, but looking in the DISM and CBS logs, there are no errors reported, only Info and Warning. We dont seem to have any problem installing Windows updates, we are patched up to the March roll-up. These servers cant reach MS Update servers, so not sure how to clear these DISM issues. I have injected from a KB CAB before, but if the logs dont identify a KB, then what?

This behavior where it works ok for a few days, then services start to die off sounds to me like a memory leak in some component, but Im sure there could be other things. We recently installed Elastic Metricbeat to see if we can spot the process that might be running amok.

So I am looking for some tips on things to watch that might cause RDP/User profile service to die, or a NIC to suddenly stop working. I assume that the VMware tools installed on this server are getting killed or choked out by this supposed runaway process.

Or if anyone is a DISM/CBS guru and wants to tell me how to fix my component store, that would also be appreciated.

Windows for business | Windows Client for IT Pros | User experience | Remote desktop services and terminal services
Windows for business | Windows Server | User experience | Other
0 comments No comments
{count} votes

Accepted answer
  1. Kenneth Kirchner 96 Reputation points
    2021-06-14T18:11:39.22+00:00

    We got nowhere with this. It just stopped happening. So $500 wasted on MS Technical Services. I am going to assume this was some kind of conflict between our antivirus suite and Microsoft Trusted Installer. That seems to be a common thing we saw in the log files when the crash occurred. I guess a Windows update or a McAfee update resolved the issue at some unknown time. I just hope it doesnt come back.

    0 comments No comments

6 additional answers

Sort by: Most helpful
  1. Carl Fan 6,881 Reputation points
    2021-04-19T10:12:34.363+00:00

    Hi, Some ideas may helpful to you: 1. Turns out that somewhere along the line a firewall was dropping ICMP requests. Because the client didn't get a response, detecting a slow network connection timed out, and the user profile service continued with it's loading the profile. Network traffic being dropped of firewall between the RDP components. Also if you have multiple network cards, make sure your "production" card is on top in the network connections 2.- Start > Run >msconfig >”Service” Tab - Check the "Hide All Microsoft Services" box and click "Disable All" (if it is not gray) - Click the "Startup" tab, click "Disable All" and click "OK". - Please ensure that NLA is not disabled. Then restart the computer. NOTE: we can go back to normal boot by running msconfig again and checking on Normal Startup in the General tab. In the Clean Boot Environment, the third party services and applications are disabled, please check the issue persists. 3.Check the user profile size. If we use new user profile, if it could RDP. 4.If we want to repair system component, we could try to use Windows Server 2016 image. https://bornsql.ca/blog/repair-windows-server-2016-installation/ Hope this helps and please help to accept as Answer if the response is useful. Best Regards, Carl


  2. Vitor Marques 1 Reputation point
    2021-04-21T08:50:09.863+00:00

    I have the same problem this started i think in late february begining of march
    First the i would reset vm's and it woulkd last a few weeks , lately it's a few days with luck.
    They all stop responding and if try to logon with console it hangs on profile
    The only "error" i can see in the logs is this
    svchost (1068) SoftwareUsageMetrics-Svc: Um pedido para escrever no ficheiro "C:\Windows\system32\LogFiles\Sum\Svc.log"
    Î dont fully know if this is the actual problem or byproduct of the hang....
    I moved the vm to another server and the problem is the same
    i have malware bytes anti ransomware in the servers, i think im going to disable to see if it solves it


  3. Kenneth Kirchner 96 Reputation points
    2021-05-04T15:03:22.507+00:00

    It seems the same for us. We found out the IP is not disappearing, its the just vmtools service being taken down that makes the IP disappear in vCenter. The VM still pings, its just all the services have stopped.

    0 comments No comments

  4. Vitor Marques 1 Reputation point
    2021-05-05T15:06:42.993+00:00

    i did an upgrade on one of the vm's with rdp to server 2019 and its the same thing ...
    removed all antivirus/ransomware suite to test
    i am out of ideas :(

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.