!!! UPDATE!!!
Have been digging into this a little deeper and this is what i have come up with so far. Hope someone has got any way forward.
From a fleet of approx 300 servers only 12 servers belonging to a specific production application stack are not multihoming. If other servers in the same network are multihoming than this is not a firewall or a port issue. Started comparing server configuration between working servers and faulting servers.
What I found tells me it is a DNS issue. The working Servers are configured with the environments standard Preferred DNS and Alternate DNS IPs
Whereas the faulting Servers are configured with only one Prefered IP and this IP is not from the standard Prefered or Alternate IPs in this environment.
There was no DNS entry for this IP but was pingable and resolved to a DNS name...more confusing.
Had to use this single IP and RDP in to the server.
Found out that this Server is not connected to the Domain and is configured as a Stand Alone DNS Server for this environment.
Asked the team and only one person responded that he too found this server while troubleshooting another issue in the environment and says it was setup by a contractor who was brought in to do some security and redundancy piece of work for this Application Stack.
But again - i haven't seen any documentation.
Anyway - the DNS records doesn't seem to have been updated for more that 4 years. Do i update them or not -- asked the Application Support SME and he too wasn't sure about this setup as he has come in after the fact. Asked if I could instead go and update the faulting servers to our standard Prefered and Alternate DNS IPs --- again no one wants to make that decision as they dont know what will be the impact or why this setup was in place.
Checked the Conditional Forwarder for our Domain (which has the Primary SCOM Mgmt server) and that was out of date as well. So have updated the Conditional Forwarder with latest DNS records.
Restarted Agents on the faulting server and waited hoping that they might come up in the Pending Mgmt view in SCOM.
It hasn't.
Looking at the OpsMgr Event Log on the Agent/Faulting Server still shows:
Event ID:21006 The OpsMgr Connector could not connect to GatewayServer.contoso.com:5723. The error code is 11001L(No such host is known.). Please verify there is network connectivity, the server is running and has registered it's listening port, and there are no firewalls blocking traffic to the destination.
Again - this doesn't make sense to me as both the Gateway and the faulting servers are in the same network/domain.
Anyone with any ideas on what i can check in this scenario?