Problem with RDP service on Windows Server 2022

BD-6573 25 Reputation points
2023-10-17T08:09:10.2066667+00:00

Hello

We have a problem with RDP service on Windows VMs running on our VxRail cluster.

For some reason, RDP stops working properly and when you try to connect to the VM via RDP, you receive the following error message:

=================================

Remote Desktop Connection

Remote Desktop can't connect to the remote computer for one of these reasons:

  1. Remote access to the server is not enabled
  2. The remote computer is turned off
  3. The remote computer is not available on the network Make sure the remote computer is turned on and connected to the network, and that remote access is enabled.

=================================

When problem is present, the service is running in OS, but when using the "netstat" command, it shows that the server is not listening on port 3389.

To recover, we have two options:

  1. Reboot affected VM
  2. Log in to the VM via vCenter Server console, and restart RDP service.

This is taking approx. 20 minutes to complete and we get a following error when service is stopping (screenshot attached) User's image

This problems happens a few times per week for each VM, so it means we get several warnings per day.

We can't afford to restart production servers so often, and manually restarting RDP takes time...

VxRail Version: 7.0.452-28152920 (ESXi version: 7.0.3, 21930508)

VMware Tools version 12.3.0.22234872 (but this problem was also observed when running on older versions)

Affected OS version: Windows Server 2022 (We have a few Win 2019 machines and they don't seem to have this problem. We have also several Win 10 VDIs and they don't have this problem either).

Affected VMs: All VMs with Win 2022 seems to be affected.

VMware support checked this issue and asked to engage Microsoft as they do not see any problems with VMware infrastructure.

Any suggestions would be greatly appreciated.

Windows
Windows
A family of Microsoft operating systems that run across personal computers, tablets, laptops, phones, internet of things devices, self-contained mixed reality headsets, large collaboration screens, and other devices.
5,065 questions
0 comments No comments
{count} vote

5 answers

Sort by: Most helpful
  1. BD-6573 25 Reputation points
    2023-10-17T08:47:50.7066667+00:00

    I forgot to mention that all servers have a fresh install of Windows Server 2022 (no in-place updates) and have the latest Windows patches applied.

    0 comments No comments

  2. MotoX80 32,736 Reputation points
    2023-10-17T16:46:19.73+00:00

    Any suggestions would be greatly appreciated.

    You might have to open a case with MS product support for this one.

    Do you have any 3rd party antivirus, firewall or intrusion detection products installed on your servers? That would be my #1 suspect. Check it's logs and maybe temporarily disable to see if your problem goes away.

    Is anything crashing? Check Control Panel\System and Security\Security and Maintenance\Reliability Monitor. I hope WS2022 still has that, if not check C:\ProgramData\Microsoft\Windows\WER.

    Have you reviewed the eventlogs? Application and System and the Remote Desktop logs in Application and Services Logs/Microsoft/Windows.

    If you can somehow pinpoint a time when the problem starts, you can use my RecentEvents.ps1 script to gather all events from all eventlogs to try to see what happened around that time. That is the proverbial "needle in the haystack" though. There will be a lot of events.

    https://docs.microsoft.com/en-us/answers/questions/102481/eventlog-madness.html

    Log in to the VM via vCenter Server console, and restart RDP service.

    Use Powershell remoting to restart the services. If you can't connect, run "winrm quickconfig" on the target server. You will need to experiment with the script sequence to get it working correctly. Here is some code to start with.

    $TargetServer = "YourServerName"
    # I'm not sure about the RasMan service. It is not running on my WS2016
    #Status   Name               DisplayName
    #------   ----               -----------
    #Running  SessionEnv         Remote Desktop Configuration
    #Running  TermService        Remote Desktop Services
    #Running  UmRdpService       Remote Desktop Services UserMode Po...
    
    $sb = {
    	# I don't know the correct sequence to recycle RDP, try this 
        get-service UmRdpService  | stop-service -force 
        get-service TermService  | stop-service -force 
        get-service SessionEnv   | stop-service -force 
        get-service TermService  | start-service 
    }
    
    # If the account you are logged on with has admin access on the target server, you can remove the references to credentials. 
    $User = ".\testuser"
    $PWord = ConvertTo-SecureString -String "testuser" -AsPlainText -Force
    $Credential = New-Object -TypeName System.Management.Automation.PSCredential -ArgumentList $User, $PWord
    Invoke-Command -ComputerName $TargetServer -ScriptBlock $sb -Credential $Credential 
     
    

  3. BD-6573 25 Reputation points
    2023-10-18T14:06:46.5166667+00:00

    Hi @MotoX80
    Thank you so much for this reply!

    Unfortunately, there are no errors in the logs! :( That's why investigating this problem is so awkward...

    I checked Windows Logs (Application/Security/Setup/System) and there are no errors or suspicious events at the time when we got the alert from our monitoring system (PRTG) that RDP service is malfunctioning (by malfunctioning I mean it's up and running in OS, but users can no longer connect via RDP to the affected system).

    I also checked Applications and Services Logs – Microsoft – Windows-TerminalServices-* and Applications and Services Logs – Microsoft – Windows – RemoteDesktopServices-* - the same, no errors, no clues whatsoever :(

    Nothing in Reliability Monitory either - no crashes, no problems reported.

    I ran sfc /scannow as well - no integrity violations.

    Just found one more symptom when the problem is present - I see session ID 65536 down when the problem is present:

    User's image

    I forgot to mention that restarting RDP service has one negative side effect as well - it's disconnecting all existing users.

    That's why a permanent and complete fix for this problem is so much needed.

    Thanks!


  4. BD-6573 25 Reputation points
    2024-04-19T08:18:34.95+00:00

    @Lee Fimmel I have not observed this issue on our servers for some time. Not sure why - maybe MS finally released a patch for this problem?

    I also introduced one change - maybe it helped as well?

    After getting a notification from our monitoring system that the RDP service was not working as expected, I pushed the registry change via our endpoint management tool to the affected server:

    SYSTEM\CurrentControlSet\Control\Terminal Server

    REG_DWORD

    fDenyTSConnections

    0

    This brought RDP back to life in 95% of cases.

    I had to run this task a few times for each server but over time I noticed fewer and fewer reports about this problem...

    Last time I had to run this a month ago, so it looks definitely better than when this issue started.


  5. MotoX80 32,736 Reputation points
    2024-04-19T13:09:45.23+00:00

    Is fDenyTSConnections being changed back to 1?

    I would think that if there is some inherent bug in the Windows RDP code, then a lot more users would be complaining about it. My gut feel is that there has to be something else that is causing this. Something on your networks, some security software, something that causes the listener to go away.

    I had updated my script but didn't post it. Was waiting to see if anyone replied to these questions. Here it is.

    Please understand that what it's trying to do is to just gather any relevant information so that we might know where to look next. You may still need to open a case with MS product support to get an actual fix if the root cause is something within the RDP codebase.

    Run this on a test server that has the problem and see what it captures. Log on to the console of the server and run it from an admin PS prompt.

    The script will produce 2 txt files and an eventlog file. The one txt file is the log that the script produces. The other txt file is a summary of all events that were written to all event logs in the 10 minutes prior to the crash.

    Find the time of day when the listener goes away. Look to see if some IP tried to connect at that time. Look to see what other events occurred around that time.

    I tested this on Win10 Pro and manually stopped the TS services to simulate an error.

    User's image

    # RDPWatcher.ps1 version 2.0
    # Author Motox80 on the Microsoft Learn forums
    $logfile = "c:\temp\RdpWatcher.txt"            # adjust as needed 
    $etlfile = "c:\temp\RDP-Trace.etl"
    $evtxfile = "c:\temp\RDP-Trace.evtx"
    $eventsfile = "c:\temp\RDP-Events.txt"
    function LogIt($msg){
        $msg    
        "{0} - {1}" -f (get-date), $msg | Out-File $logfile -Append 
    }
    # Cleanup our trace files
    get-process mmc | Stop-Process -Force -ErrorAction SilentlyContinue
    Remove-Item $logfile -ErrorAction SilentlyContinue
    Remove-Item $etlfile -ErrorAction SilentlyContinue
    Remove-Item $evtxfile -ErrorAction SilentlyContinue
    Remove-Item $eventsfile -ErrorAction SilentlyContinue
    # See https://learn.microsoft.com/en-us/troubleshoot/windows-server/system-management-components/event-tracing-for-windows-simplified
    # Mode 2 is circular
    # logman create trace RDP-Trace -ow -o c:\temp\RDP-Trace.etl -p "Microsoft-Windows-TerminalServices-RemoteConnectionManager" 0xffffffffffffffff 0xff -nb 16 16 -bs 1024 -mode 0x2 -max 2048 --v
    #
    # Also try these providers
    # Microsoft-Windows-TerminalServices-RemoteConnectionManager
    # Microsoft-Windows-RemoteDesktopServices-RdpCoreTS
    # Microsoft-Windows-RemoteDesktopServices-SessionServices
    # Use "logman query providers" to look for others 
    logman create trace RDP-Trace -ow -o $etlfile -p "Microsoft-Windows-RemoteDesktopServices-RdpCoreTS" 0xffffffffffffffff 0xff -nb 16 16 -bs 1024 -mode 0x2 -max 2048 --v
    logman start RDP-Trace
    while ($true) {
        # Monitor the IP addresses connected to port 3389      
        $conn = netstat -aon | Select-String ':3389' | Select-String 'ESTABLISHED|CLOSE_WAIT'
        $connstat = ""
        foreach ($c in $conn) {
            $connstat += ($c -replace '\s+', ' ').split(' ')[3] + " "
        }
        if ($laststat -ne $connstat) {
            Logit "Connection change:  $connstat"
            $laststat = $connstat
            Logit ((qwinsta.exe) -join "`n")
        }
        $status = query.exe session 65536 2>&1
        if ((-join $status).Contains('Down') -or ((-join $status).Contains('No session exists')) ) {
            $curr = qwinsta.exe
            LogIt "Listener is down"  
            LogIt "------------Prior qwinsta output---------"
            LogIt (($last) -join "`n")
            LogIt "----------Current qwinsta output---------"
            LogIt (($curr) -join "`n")
         
            break                                # listener went away, break out of our loop
        }
        $last = qwinsta.exe
        start-sleep -seconds 5                   # how long to delay between checks 
    }
    # Our listener went away. Generate log files. 
    logman stop RDP-Trace
    logman delete RDP-Trace
    tracerpt $etlfile -o $evtxfile  -of EVTX -y
    eventvwr /l:$evtxfile
    notepad $logfile
    # Now read all events from all event logs for the last 10 minutes
    $tf = (10 * 60 * 1000) 
    $elna = (Get-WinEvent -ListLog * -EA silentlycontinue | where-object { $_.recordcount -gt 1})     # get all event log names that have records in them. 
    $AllEvents = @()              # prepare array so we can append to it
    foreach ($el in $elna)        # look at each event log
    {
        $xml = "<QueryList><Query Id=""0"" Path=""$($el.logname)"">
                <Select Path=""$($el.logname)"">*[System[TimeCreated[timediff(@SystemTime) &lt;= $tf ]]]</Select>
                </Query></QueryList>"		
        $AllEvents += Get-WinEvent -FilterXml $XML -ErrorAction SilentlyContinue  # append the events (if any)
    }
    $eldata = $AllEvents | sort-object -Descending -Property TimeCreated  | 
        Select-Object -property TimeCreated, ID, Logname,  LevelDisplayName, Message 
    Remove-Variable AllEvents                                                           # do a little memory cleanup 
    $eldata | Out-GridView -Title "Recent Events ($hdr hours)" 
    $eldata | Out-File $eventsfile
    Remove-Variable eldata
    notepad $eventsfile