Undiscovering with Multiple Discoveries
I am going to include this issue in a video in the Discovery series I’ve been putting out (Script Discoveries is in the works right now), and we’ll probably get some more detail on this in the Authoring Guide. I wanted to write up a quick blog entry to get the information out there quick though and also give some deep information as to the inner workings of discovery. This is actually an issue that’s been nagging at me for a bit, and I just got final confirmation on exactly how it works from our developers. I‘m going to give quite a bit of detail here, so if you just want the high level summary answer, go ahead and scroll to the bottom.
The issue is when you have multiple discoveries for a particular class and want to undiscover an object. Undiscover simply means that the object no longer exists and should be removed. SCOM assumes that an object should be undiscovered when the discovery that first discovered it returns data that doesn’t include the object. For example, you might have a script discovery that discovers databases. When that discovery runs, it finds 10 databases, and SCOM creates an object for each. You then delete one of the databases. The next time that the discovery runs, SCOM receives the discovery data with 9 databases in it, figures out which one is missing, and goes and and deletes the object for that database (or “undiscovers” it).
If you just have a single discovery for that class, then that’s exactly how the undiscovery works. What if you had two separate discoveries for the class though? The basic description that we tend to give is that each discovery must run and undiscover that object before it’s actually removed. To understand why this is, SCOM creates a RefCount on an object when it’s discovered. If another discovery discovers the same object, it will increment the RefCount by one. When a discovery undiscovers an object, it decrements its RefCount by one. Only when the RefCount hits zero, is the object actually removed.
Let’s consider a couple of cases where you might have multiple discoveries for a single class:
- You have two different discoveries each discovering unique properties of the class. Maybe you can get some properties through the registry but need a script for other properties.
- You have a script discovery to create containment relationships from the original object. For example, you might discover a top level object from the registry, and then you target that class with a script discovery that discovers some application components that are contained by the top level class. To create a containment relationship in a discovery script, you have to create instances of the source and destination class. Technically, you’re rediscovering the class that was already discovered.
The question for both of these scenarios is whether we need both of the discoveries to run and to undiscover a particular object for it to go away.
Consider #1 supposing that each discovery used the same target. The first discovery runs, creates the object, and sets the RefCount to 1. The second discovery runs, and increments the RefCount to 2. If the application is removed though, then the next time each discovery runs, they each decrement the RefCount by 1. It hits 0 after each runs, and the object is removed.
But now consider #2. The RefCount for the discovered object would be 2 since both discoveries would increment it. Suppose the application is then uninstalled. The registry discovery runs decrements the RefCount down to 1. The script discovery goes ahead and runs (because its target object still exists), but it’s not checking whether the application is installed or not – it’s just happily creating containment relationships. So the RefCount stays at 1, and the object essentially lives forever.
Fortunately, SCOM knows how to identify this kind of circular reference. If the class being discovered is the target of the discovery, then the RefCount is not incremented. This means that when the first discovery runs, it decrements the RefCount to 0, and the object goes away. The second discovery never runs because there is no object for it to run against.
To summarize this rule: If one of the discoveries uses the class being discovered as its target, than that discovery will not have to run for the object to be undiscovered. If both discoveries use any other class than the one being discovered, then they both need to undiscover the object.
This is good, because #2 is actually a common scenario. Without that added bit of logic baked into SCOM, it would be quite the headache as the discovery for the containment relationships would have to recreate some of the initial discovery’s logic in order to determine if the object should still exist. Thanks to that extra logic though, we don’t have to worry about it.
Comments
Anonymous
January 01, 2003
There is a sample management pack in the Management Pack Library on www.opsmanjam.com called MPAuthor.Discvery that has a script discovery of a containment relationship - which is scenario #2 above. I need to get an update to that MP with some slight changes from the Discovery Series videos that I've been putting out, but it should illustrate that scenario pretty well. I'm going to be putting out a video on discovering relationships where I'll cover that concept in more detail - that will come after the video on discovery scripts that's coming next. I don't have an example currently of #1. I'll think on a specific example and see if I can get that into the updated MP as well.Anonymous
December 11, 2010
Hi Brian, Can you suppliment this guidance with a MP that includes each of these scenarios? I think I understand what you're explaining here, but would like a working example to solidify these concepts. Thanks!Anonymous
February 19, 2013
Hi Brian, I'm facing the scenario #1, have you implemented an example for that case? I don't find the proper way of resolving it. Thanks!Anonymous
October 22, 2013
Hey Brian, This was extremely informational! Thanks.Anonymous
July 09, 2015
Hi Brian!
Thanks for the informational videos on MP Authoring on MS Virtual Academy.
However - I do think I have painted myself into a corner here (Scenario #2), as my group-populating script was targeted at the root management server, read information from a database, and created relationship-instances. I read your statement " If the class being discovered is the target of the discovery, then the RefCount is not incremented" as a hint of this.
Now, what to do? Is it possible to make a script report the Reference Count, so that we can know which computer Objects have an extraneous Reference. Is it possible to edit the Reference Count in some way?
I realise we may be way out on an unsupported limb here,but maybe needs must?
Setting the Object to "deleted" in the database, and reinstalling the agent on the server could be an option, perhaps? (although equally unsupported I would think)
I would be Grateful for any advice.
ØrnulfAnonymous
July 09, 2015
Maybe I should add that the reason we discovered this, was becaouse computer Objects linger in the database even after server has been decomissioned. Listing computer Objects and agents in SCOM, results in a list of severs that should have been removed.
This was tested in a stage environment, but sadly the consequences of this behaviour wasn't detected until it was installed in Production.Anonymous
July 09, 2015
If I understand this correctly, then the server to be added to the group is listed in the database. If the server is decommissioned, and you remove it from the database, then the next time the discovery script runs it should deliver discovery data without that server in it. So the reference count should be appropriately decremented. There isn't a way to view the reference count.Anonymous
July 10, 2015
Thank you for your response Brian! Very much appreciated.
I get that the discovery script that created the discoverydata should decrease refcount. I'm now thinking this may have something to do With a later Upgrade of the same MP. (without first removing the MP from SCOM).
It is likely anyway that "something" has hindered the MP from decreasing the refcount, but SCOM thinks that it has.
Related to this is strange behaviour from SCOM when uninstalling agents. The Computer Object is apparently not deleted, and a random management server starts running the rules targeded at the Windows Computer Class on behalf of the uninstalled computer, resulting in errors.
I Guess this could be normal behaviour if refcount is too large, removal of agent does not decrease refcount to 0 and the management server steps in as new toplevelbasemanagementid. (we had a look at the basemanagedentity-table while doing an agent removal, and what happened is that the single entry that were there pre removal was tagged deleted, but another With a MS guid was added, and this had isdeleted to 0.
Have now tried removing 1 agent through isdeleted=1 instead of uninstall/delete via console, and this succeeds in removing the account With appearantly no ill effects. (on a staging server)
I'm thinking this is because isdeleted=1 means refcount goes directly to 0, while all other delete-Methods decrease it.Anonymous
July 13, 2015
Brian, can I just ask - if a management pack that contains a discoveryscript that populate computers into Groups ( and thus increasing refcount) is REMOVED, does the MP removal trigger a decrease of the refcounts on the Objects?
The same MP contains the Groups that were populated, and these will be removed also.