Hello all,
We have been having an extremely curious problem on one of our servers for months now (since purchasing and installing the system).
I hope someone can help me with this and has tips for the problem or the right thread for it.
About the issue. We have a Lenovo server running as a standalone host with HyperV. The virtual machines running on this HyperV include file server, web server and a SCCM DP with a network share. The host server itself is running Server2022 21H2 Build 20348.1787.
The host has 2x 1G ports and 2x 10G ports. We have switched all vms to the 10G nic, this works fine until we want to install a client on the network via the mentioned SCCM DP using PXE. Suddenly the SCCM DP fails, it is neither reachable via RDP nor can one directly connect to it via the hypervisor; the failure happens as soon as a client is in the "install updates" phase.
If you wait too long, all other vms connected via the 10G interface will also gradually become unreachable and the host will then freeze completely. It then has to be restarted. Since then we have had a very special eye on it, if you change the ethernet interface of the DP to the 1G in time, the error is "fixed", no other host vms are affected and you can install clients normally.
The curiousity is that the error does not always occur during client installation via PXE, only sporadically. So far, however, we have only ever observed it during this action, and not during any other action.
We have already had a long conversation with the server's manufacturer there was also a motherboard exchange (the 10G card is onboard not separate, just like the 1G ports). Latest firmware was installed, logs were swapped back and forth. Lenovo concludes by saying there is no hardware issue and we should contact Microsoft. However, I can neither open a ticket, nor do I get any help via my country's MS support number.
We also had already set up the SCCM DP as a completely new machine as part of the troubleshooting, no cloning or using old vhdx, just really from scratch. However, the problem is still occurring. The only option so far is that we run the DP over the 1G interface. The only network share is just the software on it as an alternate smb share for us in IT to be able to manually load the software from there as well. So we don't expect horrendously high network loads; likewise, we don't install 100 clients at the same time in the DP's network, but rather two to five devices at the same time where it could get high.
We also activated the PowerOptions in the UEFI after Lenovo's feedback and switched from Balance to I/O Setting - unfortunately, this did not have any effect either.
Everything speaks first of all for a bug in Windows, but we can't explain it.
I say thank you in advance
Kind regards