Add-In Lifetime Management [Pete Sheill, Jim Miller]
A good question to ask when evaluating a design is ‘how is the lifetime of this object or component managed?’ That is,
· How do you know it isn’t needed anymore?
· Is it possible to release the resources while the object is still in use?
· Is it possible to “leak” the resources by not releasing them at all?
· How soon after the object isn’t needed are the resources released?
How does this apply to the managed add-in model? If you are content to keep the add-in loaded until the process exits (and the system cleans up the entire process) you can stop reading right away, as this topic isn’t of particular interest to you.
One of the greatest benefits of managed code, of course, is automatic lifetime management (garbage collection). Within an application domain (“appdomain”) objects no longer referenced by “live” objects are automatically reclaimed by the garbage collector and there’s a standard mechanism (finalization and IDisposable) to release any unmanaged resources those objects are using.
The resources to be reclaimed include the memory for the instance members, any operating system handles, database connections, etc. Another resource, often overlooked in managed code, is the memory taken by the code itself, including the JIT-compiled code. Releasing this last resource is not possible if the code is loaded in the same appdomain as the host. We simply don’t have a way to unload an assembly without unloading the appdomain that hosts it.
In general, there are three ways to manage the lifetime of objects:
1. Garbage collection. This works within a single appdomain as described above. Unfortunately, there’s no good general purpose solution for use in distributed systems (although it’s been an active research area for years).
2. Reference counting. This is probably familiar to you from COM and other systems. When you create an object you set a variable to 1, whenever you add a new reference to it you increment the variable (“addref”), when you release a reference you decrement the variable (“release”). When the variable goes to 0 there are no more references and you can release the object and any data it holds. This works as long as there are no cycles between objects, but leads to memory leaks when a cycle occurs.
3. Sponsorship. This is the mechanism used for both DCOM and .NET Remoting. The idea is that when you create a remote object you declare something as a “sponsor” for the object and periodically ask the sponsor if the object is still needed. If the sponsor says it isn’t needed, or if you can’t reach the sponsor after some number of tries, you can release the object. One common way of implementing sponsorship is what’s called the “lease” model where the sponsor provides a lease that states how long the object should be considered alive and the sponsor isn’t contacted until the lease expires.
The managed add-in framework uses a combination of all three of these mechanisms. Within a single appdomain it uses garbage collection. But it gets a lot more interesting when the add-in is being run in a separate appdomain in the same host process, or in a separate process. Even though the HostSideAdapter has a reference to the AddInSideAdapter, because the AddInSideAdapter is “remote” it is subject to reclaiming under normal garbage collection.
In designing the lifetime management system for managed add-ins, our goal was to make simple cases simple but still allow more complicated systems to be built. The framework itself takes responsibility for the sponsorship model, but exposes a reference counting model to control the behavior of the sponsor. In simple cases the reference count is maintained automatically by the framework, too, using the garbage collector and finalization. The rest of this article talks about the details of this implementation.
In the case where the add-in is isolated from the host, either in a different appdomain or a different process, the garbage collector still reclaims objects when they are no longer referenced. If you release the reference to the HostSideAdapter the garbage collector will release its memory. But what about the corresponding AddInSideAdapter and the add-in itself? And what about the case where you allowed the add-in framework to create an appdomain for running the add-in – when is the appdomain released?
The standard solution to the issue of keeping the AddInSideAdapter alive in .Net Remoting is to use leases and sponsorship to keep the remote object alive. While we do use sponsorship underneath, we decided not to expose this to the add-in or pipeline developers. Instead, we use a variant of the “ref counting” technique to keep remote objects alive by providing the methods AcquireLifetimeToken() and RevokeLifetimeToken(). When the HostSideAdapter is constructed, it immediately makes a call to AcquireLifetimeToken() on the AddInSideAdapter to tell the AddInSideAdapter that it has a reference to it. When the HostSideAdapter is disposed of or reclaimed, it will make a call to RevokeLifetimeToken() on the AddInSideAdapter to let it know that it is done referring to it. Usually, this will lead to the AddInSideAdapter and the add-in being reclaimed.
Let’s consider a simple scenario first – an add-in in its own appdomain in the host process. Say you have activated an add-in using the overload of Activate() that takes an AddInSecurityLevel. This method creates a new appdomain to host the add-in. As described in the post on activation, the HostSideAdapter calls AcquireLifetimeToken() on the AddInSideAdapter when it is created. This increments the “ref count” on the AddInSideAdapter to one. As a convenience, this is usually done by constructing and holding a ContractHandle, passing in the AddInSideAdapter to the constructor -- the constructor of the ContractHandle actually calls AcquireLifetimeToken. We’ll assume that this is the case for this example. As long as this ref count is above zero, the AddInSideAdapter will stay alive in memory. Since it has a reference to the add-in, the add-in will therefore also stay alive in memory.
The host uses the add-in through the HostViewOfAddIn (previously called HostAddInView), and then decides that it no longer needs it. The variable that used to reference it is set to null. At some later point, when the garbage collector runs, it notices that the HostViewOfAddIn is no longer needed, and therefore neither is the ContractHandle. It calls the finalize method of the ContractHandle, which then calls RevokeLifetimeToken() on the AddInSideAdapter. The AddInSideAdapter decrements the ref count to zero. This triggers cleanup – the add-in is no longer needed. If the AddInSideAdapter has been implemented by deriving from the framework class called ContractBase (highly recommended), then the following happens:
1) A flag is set to end the sponsorship (in remoting) that is keeping the object alive.
2) OnFinalRevoke() is called, allowing the derived class to clean up any native resources it holds. We chose to name the method this way instead of calling it Dispose() because Dispose() is a public method typically called by an external object, while this method is a protected method called by the object itself.
3) We unload the appdomain with a call to AppDomain.Unload(). Before unloading completes, the finalizers of all the objects in the appdomain will run.
So in this example, the cleanup happened without any explicit instruction to do so by the host or the add-in, similar to normal single-appdomain garbage collection.
The nice part about this approach is that the host code is isolated from the lifetime management of the add-in. It just uses the HostViewOfAddIn as it would any other managed object, without doing any reference counting itself – because that code is only done in the adapter. When there are no more live references to the HostViewOfAddIn, all the memory and resources of the adapters and add-in will automatically be reclaimed.
There are times where steps #1 and #2 will happen, but not #3. That is, the add-in is ready for cleanup, but the appdomain is not. If the appdomain was not created during activation of this add-in, then the AppDoamin will not be unloaded here. The appdomain may be the default host appdomain, it may have been explicitly created by the host, or it may have been created for another add-in. In any case, there are other objects in the appdomain that are likely still in use, so we can’t unload it. Stated another way, if we didn’t create the appdomain, then we won’t unload it.
Assuming that the appdomain was created specifically for the add-in, there is another situation that will postpone the cleanup. Reference counting relies on an object "owning" its own memory and other resources. That is, the object itself is responsible for deallocating the resources at the right time, once it is sure that no other object needs it. The appdomain created for the add-in will typically have the same lifetime as the add-in. While we could have kept a separate ref count for the appdomain, having the appdomain own itself, we decided to simplify things and have the primary add-in own the appdomain. Therefore the two always have the exact same lifetime. Only when the ref count of the add-in goes to zero does the appdomain get unloaded, causing any objects in that appdomain to be finalized and unavailable.
One question that might arise is what happens if the add-in creates an object and passes a reference to it to the host, and then goes out of scope before the host uses the object. A separate pipeline is created for the object as it is passed to the host, similar to the pipeline between the host and the add-in. If the add-in's reference count went to zero, the add-in and the appdomain would be reclaimed. But then if the host still held a reference to the object and tried to call one of its methods, it would fail with an AppDomainUnloadedException.
To address this, we don't allow the add-in's reference count to go to zero in this situation until the host no longer references the object. When the object is passed to the host and a pipeline is created for it, we look up the owner of the appdomain and increment its ref count. Then when the object is no longer needed and the pipeline is being finalized we decrement its ref count. Only at this point can the ref count of the primary add-in go to zero, leading to the cleanup of the appdomain.
If the host calls Activate to create a new appdomain for a new add-in, then uses that same appdomain in subsequent calls to activate secondary add-ins, a similar process happens. The secondary add-ins increment the ref count of the primary add-in, so that the appdomain won't be cleaned up while they are still in use. The secondary add-ins decrement the primary add-in's ref count when they are no longer referenced, permitting appdomain cleanup to proceed.
Note that this version of reference counting improves on some other versions in that you can’t accidentally remove a reference that a different object created. That’s because AcquireLifetimeToken() returns a unique “token” number that should be stored and passed back to RevokeLifetimeToken(). Revoking a token that you didn’t previously acquire will by and large have no effect, aside from causing an InvalidOperationException to be thrown.
There’s more to say about lifetime management, but we’ll leave that for later posts. It’s a multifaceted issue, but we hope that most hosts and addins will be able to be written as if all the resource cleanup is governed by garbage collection, even though there is a bit more to it than that underneath.
Comments
- Anonymous
July 19, 2007
The comment has been removed - Anonymous
August 08, 2007
David, I assume you are enabling unmanaged code in your Add-in's. We can't unwind threads that are not in managed code. We must wait for the call to unmanaged code to return. The library author could use the DomainUnload event. Or you could possibly run the Add-in out of process. - JackG