Hello,
for three out of 256 Linux machines we regularly receive the following alert:
"Cannot resolve hostname"
Error Description:
WSManFault
WS-Management cannot process the request. The operation failed because of an HTTP error. The HTTP error (12152) is: The server returned an invalid or unrecognized response .
Those errors clear after five minutes (which is the default monitoring interval). I've done all of the basic checks and while in error the hostnames can be resolved ok from the management servers.
There is no real pattern in how often those alerts trigger. As a test I've added manual entries into the hosts file on the management servers which has made no difference.
Here is some additional background information about our environment:
- SCOM 2022 with UR1 and KB5024286 patch (OpenSSL 3.0)
- 8 SCOM managenment servers, 2 dedicated to Linux, located in main datacenter
- globally distributed sites
- majority of Linux machines at the main datacenter, but also at various remote sites
So, there is one thing in common among the three affected machines:
they run Ubuntu 22 an RHEL 9
Interestingly enough, we have one Ubuntu 22 in the main datacenter which does not behave like this.
So to me it looks like there is a timing issue which only affects machines using the most recent version of the OpenSSL library.
Ping response time to the affected machines is <300ms (60ms for one of them).
Thanks
Thorsten