I forgot to mention that all servers have a fresh install of Windows Server 2022 (no in-place updates) and have the latest Windows patches applied.
Problem with RDP service on Windows Server 2022
Hello
We have a problem with RDP service on Windows VMs running on our VxRail cluster.
For some reason, RDP stops working properly and when you try to connect to the VM via RDP, you receive the following error message:
=================================
Remote Desktop Connection
Remote Desktop can't connect to the remote computer for one of these reasons:
- Remote access to the server is not enabled
- The remote computer is turned off
- The remote computer is not available on the network Make sure the remote computer is turned on and connected to the network, and that remote access is enabled.
=================================
When problem is present, the service is running in OS, but when using the "netstat" command, it shows that the server is not listening on port 3389.
To recover, we have two options:
- Reboot affected VM
- Log in to the VM via vCenter Server console, and restart RDP service.
This is taking approx. 20 minutes to complete and we get a following error when service is stopping (screenshot attached)
This problems happens a few times per week for each VM, so it means we get several warnings per day.
We can't afford to restart production servers so often, and manually restarting RDP takes time...
VxRail Version: 7.0.452-28152920 (ESXi version: 7.0.3, 21930508)
VMware Tools version 12.3.0.22234872 (but this problem was also observed when running on older versions)
Affected OS version: Windows Server 2022 (We have a few Win 2019 machines and they don't seem to have this problem. We have also several Win 10 VDIs and they don't have this problem either).
Affected VMs: All VMs with Win 2022 seems to be affected.
VMware support checked this issue and asked to engage Microsoft as they do not see any problems with VMware infrastructure.
Any suggestions would be greatly appreciated.
5 answers
Sort by: Most helpful
-
-
MotoX80 32,736 Reputation points
2023-10-17T16:46:19.73+00:00 Any suggestions would be greatly appreciated.
You might have to open a case with MS product support for this one.
Do you have any 3rd party antivirus, firewall or intrusion detection products installed on your servers? That would be my #1 suspect. Check it's logs and maybe temporarily disable to see if your problem goes away.
Is anything crashing? Check Control Panel\System and Security\Security and Maintenance\Reliability Monitor. I hope WS2022 still has that, if not check C:\ProgramData\Microsoft\Windows\WER.
Have you reviewed the eventlogs? Application and System and the Remote Desktop logs in Application and Services Logs/Microsoft/Windows.
If you can somehow pinpoint a time when the problem starts, you can use my RecentEvents.ps1 script to gather all events from all eventlogs to try to see what happened around that time. That is the proverbial "needle in the haystack" though. There will be a lot of events.
https://docs.microsoft.com/en-us/answers/questions/102481/eventlog-madness.html
Log in to the VM via vCenter Server console, and restart RDP service.
Use Powershell remoting to restart the services. If you can't connect, run "winrm quickconfig" on the target server. You will need to experiment with the script sequence to get it working correctly. Here is some code to start with.
$TargetServer = "YourServerName" # I'm not sure about the RasMan service. It is not running on my WS2016 #Status Name DisplayName #------ ---- ----------- #Running SessionEnv Remote Desktop Configuration #Running TermService Remote Desktop Services #Running UmRdpService Remote Desktop Services UserMode Po... $sb = { # I don't know the correct sequence to recycle RDP, try this get-service UmRdpService | stop-service -force get-service TermService | stop-service -force get-service SessionEnv | stop-service -force get-service TermService | start-service } # If the account you are logged on with has admin access on the target server, you can remove the references to credentials. $User = ".\testuser" $PWord = ConvertTo-SecureString -String "testuser" -AsPlainText -Force $Credential = New-Object -TypeName System.Management.Automation.PSCredential -ArgumentList $User, $PWord Invoke-Command -ComputerName $TargetServer -ScriptBlock $sb -Credential $Credential
-
BD-6573 25 Reputation points
2023-10-18T14:06:46.5166667+00:00 Hi @MotoX80
Thank you so much for this reply!Unfortunately, there are no errors in the logs! :( That's why investigating this problem is so awkward...
I checked Windows Logs (Application/Security/Setup/System) and there are no errors or suspicious events at the time when we got the alert from our monitoring system (PRTG) that RDP service is malfunctioning (by malfunctioning I mean it's up and running in OS, but users can no longer connect via RDP to the affected system).
I also checked Applications and Services Logs – Microsoft – Windows-TerminalServices-* and Applications and Services Logs – Microsoft – Windows – RemoteDesktopServices-* - the same, no errors, no clues whatsoever :(
Nothing in Reliability Monitory either - no crashes, no problems reported.
I ran sfc /scannow as well - no integrity violations.
Just found one more symptom when the problem is present - I see session ID 65536 down when the problem is present:
I forgot to mention that restarting RDP service has one negative side effect as well - it's disconnecting all existing users.
That's why a permanent and complete fix for this problem is so much needed.
Thanks!
-
BD-6573 25 Reputation points
2024-04-19T08:18:34.95+00:00 @Lee Fimmel I have not observed this issue on our servers for some time. Not sure why - maybe MS finally released a patch for this problem?
I also introduced one change - maybe it helped as well?
After getting a notification from our monitoring system that the RDP service was not working as expected, I pushed the registry change via our endpoint management tool to the affected server:
SYSTEM\CurrentControlSet\Control\Terminal Server
REG_DWORD
fDenyTSConnections
0
This brought RDP back to life in 95% of cases.
I had to run this task a few times for each server but over time I noticed fewer and fewer reports about this problem...
Last time I had to run this a month ago, so it looks definitely better than when this issue started.
-
MotoX80 32,736 Reputation points
2024-04-19T13:09:45.23+00:00 Is fDenyTSConnections being changed back to 1?
I would think that if there is some inherent bug in the Windows RDP code, then a lot more users would be complaining about it. My gut feel is that there has to be something else that is causing this. Something on your networks, some security software, something that causes the listener to go away.
I had updated my script but didn't post it. Was waiting to see if anyone replied to these questions. Here it is.
Please understand that what it's trying to do is to just gather any relevant information so that we might know where to look next. You may still need to open a case with MS product support to get an actual fix if the root cause is something within the RDP codebase.
Run this on a test server that has the problem and see what it captures. Log on to the console of the server and run it from an admin PS prompt.
The script will produce 2 txt files and an eventlog file. The one txt file is the log that the script produces. The other txt file is a summary of all events that were written to all event logs in the 10 minutes prior to the crash.
Find the time of day when the listener goes away. Look to see if some IP tried to connect at that time. Look to see what other events occurred around that time.
I tested this on Win10 Pro and manually stopped the TS services to simulate an error.
# RDPWatcher.ps1 version 2.0 # Author Motox80 on the Microsoft Learn forums $logfile = "c:\temp\RdpWatcher.txt" # adjust as needed $etlfile = "c:\temp\RDP-Trace.etl" $evtxfile = "c:\temp\RDP-Trace.evtx" $eventsfile = "c:\temp\RDP-Events.txt" function LogIt($msg){ $msg "{0} - {1}" -f (get-date), $msg | Out-File $logfile -Append } # Cleanup our trace files get-process mmc | Stop-Process -Force -ErrorAction SilentlyContinue Remove-Item $logfile -ErrorAction SilentlyContinue Remove-Item $etlfile -ErrorAction SilentlyContinue Remove-Item $evtxfile -ErrorAction SilentlyContinue Remove-Item $eventsfile -ErrorAction SilentlyContinue # See https://learn.microsoft.com/en-us/troubleshoot/windows-server/system-management-components/event-tracing-for-windows-simplified # Mode 2 is circular # logman create trace RDP-Trace -ow -o c:\temp\RDP-Trace.etl -p "Microsoft-Windows-TerminalServices-RemoteConnectionManager" 0xffffffffffffffff 0xff -nb 16 16 -bs 1024 -mode 0x2 -max 2048 --v # # Also try these providers # Microsoft-Windows-TerminalServices-RemoteConnectionManager # Microsoft-Windows-RemoteDesktopServices-RdpCoreTS # Microsoft-Windows-RemoteDesktopServices-SessionServices # Use "logman query providers" to look for others logman create trace RDP-Trace -ow -o $etlfile -p "Microsoft-Windows-RemoteDesktopServices-RdpCoreTS" 0xffffffffffffffff 0xff -nb 16 16 -bs 1024 -mode 0x2 -max 2048 --v logman start RDP-Trace while ($true) { # Monitor the IP addresses connected to port 3389 $conn = netstat -aon | Select-String ':3389' | Select-String 'ESTABLISHED|CLOSE_WAIT' $connstat = "" foreach ($c in $conn) { $connstat += ($c -replace '\s+', ' ').split(' ')[3] + " " } if ($laststat -ne $connstat) { Logit "Connection change: $connstat" $laststat = $connstat Logit ((qwinsta.exe) -join "`n") } $status = query.exe session 65536 2>&1 if ((-join $status).Contains('Down') -or ((-join $status).Contains('No session exists')) ) { $curr = qwinsta.exe LogIt "Listener is down" LogIt "------------Prior qwinsta output---------" LogIt (($last) -join "`n") LogIt "----------Current qwinsta output---------" LogIt (($curr) -join "`n") break # listener went away, break out of our loop } $last = qwinsta.exe start-sleep -seconds 5 # how long to delay between checks } # Our listener went away. Generate log files. logman stop RDP-Trace logman delete RDP-Trace tracerpt $etlfile -o $evtxfile -of EVTX -y eventvwr /l:$evtxfile notepad $logfile # Now read all events from all event logs for the last 10 minutes $tf = (10 * 60 * 1000) $elna = (Get-WinEvent -ListLog * -EA silentlycontinue | where-object { $_.recordcount -gt 1}) # get all event log names that have records in them. $AllEvents = @() # prepare array so we can append to it foreach ($el in $elna) # look at each event log { $xml = "<QueryList><Query Id=""0"" Path=""$($el.logname)""> <Select Path=""$($el.logname)"">*[System[TimeCreated[timediff(@SystemTime) <= $tf ]]]</Select> </Query></QueryList>" $AllEvents += Get-WinEvent -FilterXml $XML -ErrorAction SilentlyContinue # append the events (if any) } $eldata = $AllEvents | sort-object -Descending -Property TimeCreated | Select-Object -property TimeCreated, ID, Logname, LevelDisplayName, Message Remove-Variable AllEvents # do a little memory cleanup $eldata | Out-GridView -Title "Recent Events ($hdr hours)" $eldata | Out-File $eventsfile Remove-Variable eldata notepad $eventsfile