RDS 2019 users unable to logon & log off.

Nagesh N 5 Reputation points
2023-06-19T06:37:33.99+00:00

Hello Everyone.

We have an RDS Farm 2019 load balanced with three session host servers serving around 250 users, often either of the servers stops responding with the error "the task you are trying to do can't be completed because remote desktop services is currently busy", users session landing on the affected server unable to logon or connected users can't log off including admin console logon.

None of the services will respond to stop or start especially Remote desktop services/ Remote Desktop Configuration, windows search services. The only solution is to force reboot the server.

Common event IDs that can be seen during the crash event are as below but not limited to 

Eent ID 6005 - The winlogon notification subscriber <SessionEnv> is taking long time to handle the notification event (Logon).

Event ID 4005 - The Windows logon process has unexpectedly terminated.

Event ID: 20498 - Remote Desktop Services has taken too long to complete the client connection.

Event ID: 7011 - A timeout (30000 milliseconds) was reached while waiting for a transaction response from the UmRdpService service.

"DeleteUserAppContainersOnLogoff" registry fix is already applied. Could you please share any workaround or solution that you are aware of and recommend any tools that could be used to find the root cause of the issue?

Windows Server 2019
Windows Server 2019
A Microsoft server operating system that supports enterprise-level management updated to data storage.
3,470 questions
0 comments No comments
{count} vote

5 answers

Sort by: Most helpful
  1. Limitless Technology 43,951 Reputation points
    2023-06-20T16:29:40.82+00:00

    Hello Nagesh,

    Thank you for your question and for reaching out with your question today.

    The issue you're experiencing with your Remote Desktop Services (RDS) Farm can be challenging to diagnose and resolve. Here are some recommendations and tools that can help you troubleshoot the problem and find the root cause:

    1. Performance Monitoring: Monitor the performance of your RDS servers using built-in tools like Performance Monitor (perfmon) or third-party monitoring solutions. Look for any spikes or abnormalities in CPU, memory, disk usage, or network activity that could be causing the unresponsiveness.
    2. Event Log Analysis: Review the event logs on the affected RDS servers for any error messages or warnings related to Remote Desktop Services, Windows logon process, or other relevant services. Look for recurring patterns or specific errors that coincide with the server becoming unresponsive.
    3. Resource Utilization: Check the resource utilization on the RDS servers during the periods of unresponsiveness. Ensure that the servers have sufficient CPU, memory, and disk resources to handle the load. Monitor resource usage over time to identify any bottlenecks or resource exhaustion.
    4. Network Analysis: Examine the network infrastructure and configuration. Ensure that there are no network connectivity issues, packet drops, or high latency that could affect the performance and stability of the RDS Farm. Network analysis tools like Wireshark can help identify any network-related issues.
    5. RDS Diagnostics Tool: Microsoft provides the Remote Desktop Services Diagnostic Tool (RDSdiag) that can assist in troubleshooting RDS-related issues. It collects diagnostic information and generates a report that can help identify potential problems. You can download RDSdiag from the Microsoft website.
    6. Performance and Reliability Monitor: Utilize the Performance Monitor (perfmon) and Reliability Monitor to gather data on system performance and reliability. These tools can help identify any specific processes or services that may be causing the system to become unresponsive.
    7. Microsoft Sysinternals Tools: Microsoft Sysinternals suite includes various diagnostic and troubleshooting tools that can help identify system issues. Tools like Process Monitor, Process Explorer, and TCPView can provide insights into running processes, resource usage, and network connections.
    8. Load Testing: Consider performing load testing on your RDS Farm to simulate the user load and identify potential bottlenecks. Load testing tools like Apache JMeter or Microsoft's Load Testing Tool can help simulate multiple user sessions and assess the performance and stability of your RDS environment.

    Remember to thoroughly review the documentation and use caution when making changes to your production environment. It's always recommended to test changes in a non-production environment before applying them to your live RDS Farm.

    I used AI provided by ChatGPT to formulate part of this response. I have verified that the information is accurate before sharing it with you.

    If the reply was helpful, please don’t forget to upvote or accept as answer.

    Best regards.

    0 comments No comments

  2. Nagesh N 5 Reputation points
    2023-06-21T04:32:21.0166667+00:00

    Thank you for responding.

    We have verified the resource utilization across the RDS Farm servers and found the usage is <60%, the servers have sufficient resources allocated.

    Below are the events that are logged during the this event.

    Event ID 6005 - The winlogon notification subscriber <SessionEnv> is taking long time to handle the notification event (Logon).

    Event ID 4005 - The Windows logon process has unexpectedly terminated.

    Event ID: 20498 - Remote Desktop Services has taken too long to complete the client connection.

    Event ID: 7011 - A timeout (30000 milliseconds) was reached while waiting for a transaction response from the UmRdpService service.

    I wonder if there is any limit like a registry space or services buffer within the OS that could be the cause for the issue. We did notice the per-user services getting created for each user logon and at some point if the user session exceeds >120 on a server, services.msc would fail to open with error " Error 1783: The stub received bad data" and I couldn't open RDMS properties. so to limit the number of sessions per server, we bring in 3rd session host to load balance which resolved this issue.

    how could we get the below info from the reliability monitor?

    1. Performance and Reliability Monitor: Utilize the Performance Monitor (perfmon) and Reliability Monitor to gather data on system performance and reliability. These tools can help identify any specific processes or services that may be causing the system to become unresponsive.

  3. T Y 0 Reputation points
    2023-08-07T14:17:48.49+00:00

    Hi. We are experiencing this issue with AVD Win 11 Enterprise multi session host.

    Enviornment is 8 Azure Virtual Desktop sessionhosts - Win 11 Enterprise multi session host, image updated + FS Logix. 

    Issue: Suddenly and random a server starts peaking 100% CPU, no new sessions are being handled by the host. Already connected user are exepiencing very bad performance. Server is responding to ping and RDP but wont login or logout users. When trying to login it gets stuck on "Local session manager" and then disconnects.

    After a while it shows as "unavailable" in Azure portal, but seems to still try to send users to that server..

    Only way to fix is to restart host. After restarted we can access the host, but nothing specific in the eventlog, other then what is described in this post.

    If anyone has any idea what to do it is greatly appricated!

    Thanks!

    0 comments No comments

  4. T Y 0 Reputation points
    2023-08-07T14:18:00.5366667+00:00

    Double post

    0 comments No comments

  5. Billie Gadhof 0 Reputation points
    2024-01-02T14:31:10.06+00:00

    We're experiencing the same issue with Windows Server 2019 on a Hyper-V cluster. After the "The Windows logon process has unexpectedly terminated." error, users are unable to log in until the server is restarted. We have six terminal servers with 60 users. How did you further analyze or resolve this problem?

    0 comments No comments