We are having an issue where on several RDS Hosts, in multiple different Collections experience an issue where they loose all ability to connect to anything network related. The only corrolation we find is that all of the effected hosts are in a collection that have multiple session hosts, and use FSlogix profile disks. I believe all of these used RDS Profile disks at one time and were converted to FSLogix as well. We have a mix of RDS servers that are cloned and each built individually.
When the Issue presents it self. Even a user without FSLogix, which is any of our support users and admins fail to login with the error cannot connect to the domain. When we do login with the local administrator account. We find that we can ping all of the domain controllers and the domain it's self. Can ping google.com or google DNS. But should we try to \domain in explorer it will fail, or if we open, Edge or Chrome and we get a connection time out. Even if we try and load a site by IP address in a browser we get a connection time out.
At this point the only corrective action we can do to get the servers back in a place to allow users to log in is to turn off the VM and turn it back on. We cannot gracefully reboot the server as the users that are still logged on will not log off since they cannot talk to their profile disks.
Our environment is a single deployment of RDS Gateway and Connection broker with multiple collections. Everything runs as VMs on Hyper-V both 2016 and 2019 servers. All RDS hosts are on a 2019 server in one of 2 HV clusters. All settings on each HV host are the same configured through VMM and some manual PowerShell scripts for things VMM won't set making sure all hosts are configured the same. The only difference at the host level is the hard ware running each cluster. but we've had the issues on the same VMs no matter which cluster they are on.
We are looking for any idea on a root cause so that we can stop this from occurring. I'm considering a new rebuild on each collection with a proper clone of the VM to the number of VMs we need to keep the user load per host at what we need.