Azure UNIX server SCOM agent setup errors with OEL v7.x
Ran into some customers with UNIX agent problems, including Azure Oracle Enterprise Linux servers with SCOM agents.
Basically this error means
- Fully-qualified domain name cannot be determined from the UNIX or Linux host itself
- The FQDN known to the UNIX/Linux host does not match the FQDN used by the management server to reach the host
Full error message text
Agent verification failed. Error detail: The server certificate on the destination computer (agentname.contoso.net:1270) has the following errors:
The SSL certificate could not be checked for revocation. The server used to check for revocation might be unreachable.
The SSL certificate is signed by an unknown certificate authority.
It is possible that:
- The destination certificate is signed by another certificate authority not trusted by the management server.
- The destination has an invalid certificate, e.g., its common name (CN) does not match the fully qualified domain name (FQDN) used for the connection. The FQDN used for the connection is: agentname.contoso.net.
- The servers in the resource pool have not been configured to trust certificates signed by other servers in the pool.
The server certificate on the destination computer (agentname.contoso.net:1270) has the following errors:
The SSL certificate could not be checked for revocation. The server used to check for revocation might be unreachable.
The SSL certificate is signed by an unknown certificate authority.
It is possible that:
- The destination certificate is signed by another certificate authority not trusted by the management server.
- The destination has an invalid certificate, e.g., its common name (CN) does not match the fully qualified domain name (FQDN) used for the connection. The FQDN used for the connection is: agentname.contoso.net.
- The servers in the resource pool have not been configured to trust certificates signed by other servers in the pool.
Troubleshooting links
Old TechNet article for SCOM 2007R2
Docs site - link for 1801 - Steps haven't changed, and IMHO, docs site is better documented
Here are some commands to help troubleshoot UNIX agent
ScxAdmin
Check UNIX Agent status
scxadmin -status
Example Output
$ scxadmin -status
scxcimserver: is running
scxcimprovagt: 2 instances running
Set Unix agent to START verbose logging
scxadmin -log-set all verbose
Restart Health Service & tail scx log
scxadmin -restart
cd /var/opt/microsoft/scx/log
tail -f scx.log
To correct a SCOM agent getting a SSL certificate error:
From the Docs site, the SCXsslConfig "tool is useful in correcting issues in which the fully-qualified domain name cannot be determined from the UNIX or Linux host itself, or the FQDN known to the UNIX/Linux host does not match the FQDN used by the management server to reach the host."
As root:
1. Get the exact hostname of the server with the hostname command
2. Stop the SCOM agent - /opt/microsoft/scx/bin/tools/scxadmin -stop
3. Rebuild the cert - /opt/microsoft/scx/bin/tools/scxsslconfig -v -f -h HOSTNAME -d <FQDN_Here>
4. Start the SCOM agent - /opt/microsoft/scx/bin/tools/scxadmin -start
Additional Configuration topics from the docs site
Configuring SSL Ciphers link
Specifying an alternate Temporary Path for scripts link
Universal Linux - Operating System Name/Version link
Other document links
Holman SCOM 2012R2 Deploying Unix agents Holman SCOM 2016 Monitor Unix/Linux Adding agents via PowerShell