Troubleshooting Cross Platform Discovery and Agent Installation (part 1)
Troubleshooting discovery and agent installation can be challenging when it comes to the cross platform extensions, mainly because the UI isn't always intuitive or directly helpful in determining the root cause of an issue. You often have to use logs or debugging tools to follow what's going on behind the UI and at the same time know what is supposed to be going on, so that you can figure out where the process is breaking. This process is even more challenging when you're working with a custom Management Pack (like the one I did for CentOS) because there are more "untested" variables. To demonstrate how some of these issues occur and how some of them are more prevalent with a custom MP, I'm actually going to use my CentOS 5.5 computer in the discovery process.
I was originally going to write this as one article and just divide it into sections, but when the word count got over 2500, I thought it would be best to break it up into smaller pieces. I'll go ahead and link all the parts here so you can be sure to get them:
(this article) Scenario #1 - "There were no computers that met the specified discovery criteria"
(part 2) Scenario #2 - "SSH discovery failed." with "unspecified problem"
(part 3) Scenario #3 - "Did not find a matching supported agent"
(part 4) Scenario #4 – "New computer shows as Platform: Unknown and Version: Unknown"
Tools and Logs Mentioned
I'll be talking about a number of tools and log files for this set of articles, and for your convenience, I'll summarize them here:
- DebugView is one of my favorite tools for troubleshooting issues with installation of UNIX and Linux agents. The reason is that it shows me information that doesn't appear in the module logging, and it's quicker than activating and viewing the overall OpsMgr trace logging. Using DebugView, I can start it, begin a discovery, and see debugging information output to the DebugView window in real time. In this article, I will discuss a few scenarios and how DebugView can help you resolve them.
- Module Debug Logging is a set of log files that are generated when you enable them using the process described here.
- OpsMgr Trace Logging is another set of log files generated when you enable trace logging using the process described here.
- SCX Agent logs and CIM logs are the log files on the Unix or Linux computer, and are controlled using the process described here.
- WinSCP and PuTTY are tools for remote file sharing and remote login to Unix and Linux computers from Windows.
Discovery Process Flow
First of all, it's helpful to understand the actual agent discovery and installation process. Here's a diagram of that:
Scenario #1 - "There were no computers that met the specified discovery criteria"
When you do a discovery and you see the following message in the discovery results, it can sometimes be a little difficult to figure out.
In this example, I entered an IP address (not a DNS name) in the discovery criteria, and I knew I had the root account correct, so I wondered what it could be. Maybe DNS reverse lookup? I go and check DNS settings (my HOSTS file) and it has my Linux computer in there so that shouldn't be the problem. Going to DebugView output, I see this:
[11160] initializeWorkers
[11160] Initializing _workerSupportedUnixAgents
[11160] Initializing _workerAvailableUnixAgents
[11160] Begin Supported Agent Info Query
[11160] Begin Available Unix Agents Query
[11160] Microsoft.MOM.UI.Console.exe Information: 0 :
[11160] Supported agents result: {### lots of XML here ###}
[11160] Microsoft.MOM.UI.Console.exe Information: 0 :
[11160] Available agents result:
[11160] Queries completed
[11160] Microsoft.MOM.UI.Console.exe Error: 0 :
[11160] One of the queries returned a null value
So if I look at this output and compare it to the flowchart of discovery steps above, it looks like the process to find the list of supported agents worked, but the process to actually find the available agent files returned nothing. I actually know this is supposed to return an XML-formatted list of the files in that directory, so the fact that it's not means something's going wrong in running the script that gets that list.
I know that the agent files are located in the following directory:
C:\Program Files\System Center Operations Manager 2007\AgentManagement\UnixAgents
Going to that directory, I see a bunch of agent files. So if I can browse that directory and see files, but OpsMgr can't get a list of files, what's going on? I know that OpsMgr runs under a configured set of user credentials (not just the logged in user), so maybe it's there? Oh yeah, I remember that I had to change the admin password the other day, but I guess I forgot to change it in OpsMgr.
I go to the Run As Configuration settings and update the password in the administrator account, and re-run the discovery. It's still not completely succeeding but I'm past the point I was. The DebugView output now shows this:
[6556] initializeWorkers
[6556] Initializing _workerSupportedUnixAgents
[6556] Initializing _workerAvailableUnixAgents
[6556] Begin Supported Agent Info Query
[6556] Begin Available Unix Agents Query
[6556] Microsoft.MOM.UI.Console.exe Information: 0 :
[6556] Supported agents result: <SupportedAgents>
[6556] aris.10.Computer" KitOSVersion="10" TaskVersion="10" DisplayName="Solaris 10 (x86)" />
[6556] Microsoft.MOM.UI.Console.exe Information: 0 :
[6556] Available agents result: <SupportedAgents>
[6556] Queries completed
[6556] Call Submit Discovery Task
... (more after that)
After fixing the above, the discovery details in the UI shows this:
More on this in part 2!