The target parameter identifies the object that the WeakReference object should track. The trackResurrection parameter indicates whether the WeakReference object should track the object after it has had its Finalize method called. Usually, false is passed for the trackResurrection parameter and the first constructor creates a WeakReference that does not track resurrection. (For an explanation of resurrection, see part 1 of this article at https://msdn.microsoft.com/msdnmag/issues/1100/GCI/GCI.asp.)
For convenience, a weak reference that does not track resurrection is called a short weak reference, while a weak reference that does track resurrection is called a long weak reference. If an object's type doesn't offer a Finalize method, then short and long weak references behave identically. It is strongly recommended that you avoid using long weak references. Long weak references allow you to resurrect an object after it has been finalized and the state of the object is unpredictable.
Once you've created a weak reference to an object, you usually set the strong reference to the object to null. If any strong reference remains, the garbage collector will be unable to collect the object.
To use the object again, you must turn the weak reference into a strong reference. You accomplish this simply by calling the WeakReference object's Target property and assigning the result to one of your application's roots. If the Target property returns null, then the object was collected. If the property does not return null, then the root is a strong reference to the object and the code may manipulate the object. As long as the strong reference exists, the object cannot be collected.
Weak Reference Internals
From the previous discussion, it should be obvious that WeakReference objects do not behave like other object types. Normally, if your application has a root that refers to an object and that object refers to another object, then both objects are reachable and the garbage collector cannot reclaim the memory in use by either object. However, if your application has a root that refers to a WeakReference object, then the object referred to by the WeakReference object is not considered reachable and may be collected.
To fully understand how weak references work, let's look inside the managed heap again. The managed heap contains two internal data structures whose sole purpose is to manage weak references: the short weak reference table and the long weak reference table. These two tables simply contain pointers to objects allocated within the managed heap.
Initially, both tables are empty. When you create a WeakReference object, an object is not allocated from the managed heap. Instead, an empty slot in one of the weak reference tables is located; short weak references use the short weak reference table and long weak references use the long weak reference table.
Once an empty slot is found, the value in the slot is set to the address of the object you wish to trackâ€"the object's pointer is passed to the WeakReference's constructor. The value returned from the new operator is the address of the slot in the WeakReference table. Obviously, the two weak reference tables are not considered part of an application's roots or the garbage collector would not be able to reclaim the objects pointed to by the tables.
Now, here's what happens when a garbage collection (GC) runs:
- The garbage collector builds a graph of all the reachable objects. Part 1 of this article discussed how this works.
- The garbage collector scans the short weak reference table. If a pointer in the table refers to an object that is not part of the graph, then the pointer identifies an unreachable object and the slot in the short weak reference table is set to null.
- The garbage collector scans the finalization queue. If a pointer in the queue refers to an object that is not part of the graph, then the pointer identifies an unreachable object and the pointer is moved from the finalization queue to the freachable queue. At this point, the object is added to the graph since the object is now considered reachable.
- The garbage collector scans the long weak reference table. If a pointer in the table refers to an object that is not part of the graph (which now contains the objects pointed to by entries in the freachable queue), then the pointer identifies an unreachable object and the slot is set to null.
- The garbage collector compacts the memory, squeezing out the holes left by the unreachable objects.
Once you understand the logic of the garbage collection process, it's easy to understand how weak references work. Accessing the WeakReference's Target property causes the system to return the value in the appropriate weak reference table's slot. If null is in the slot, the object was collected.
A short weak reference doesn't track resurrection. This means that the garbage collector sets the pointer to null in the short weak reference table as soon as it has determined that the object is unreachable. If the object has a Finalize method, the method has not been called yet so the object still exists. If the application accesses the WeakReference object's Target property, then null will be returned even though the object actually still exists.
A long weak reference tracks resurrection. This means that the garbage collector sets the pointer to null in the long weak reference table when the object's storage is reclaimable. If the object has a Finalize method, the Finalize method has been called and the object was not resurrected.
Generations
When I first started working in a garbage-collected environment, I had many concerns about performance. After all, I've been a C/C++ programmer for more than 15 years and I understand the overhead of allocating and freeing memory blocks from a heap. Sure, each version of Windows® and each version of the C runtime has tweaked the internals of the heap algorithms in order to improve performance.
Well, like the developers of Windows and the C runtime, the GC developers are tweaking the garbage collector to improve its performance. One feature of the garbage collector that exists purely to improve performance is called generations. A generational garbage collector (also known as an ephemeral garbage collector) makes the following assumptions:
- The newer an object is, the shorter its lifetime will be.
- The older an object is, the longer its lifetime will be.
- Newer objects tend to have strong relationships to each other and are frequently accessed around the same time.
- Compacting a portion of the heap is faster than compacting the whole heap.
Of course, many studies have demonstrated that these assumptions are valid for a very large set of existing applications. So, let's discuss how these assumptions have influenced the implementation of the garbage collector.
When initialized, the managed heap contains no objects. Objects added to the heap are said to be in generation 0, as you can see in Figure 2. Stated simply, objects in generation 0 are young objects that have never been examined by the garbage collector.
Figure 2Generation 0
Now, if more objects are added to the heap, the heap fills and a garbage collection must occur. When the garbage collector analyzes the heap, it builds the graph of garbage (shown here in purple) and non-garbage objects. Any objects that survive the collection are compacted into the left-most portion of the heap. These objects have survived a collection, are older, and are now considered to be in generation 1 (see Figure 3).
Figure 3Generations 0 and 1
As even more objects are added to the heap, these new, young objects are placed in generation 0. If generation 0 fills again, a GC is performed. This time, all objects in generation 1 that survive are compacted and considered to be in generation 2 (see Figure 4). All survivors in generation 0 are now compacted and considered to be in generation 1. Generation 0 currently contains no objects, but all new objects will go into generation 0.
Figure 4Generations 0, 1, and 2
Currently, generation 2 is the highest generation supported by the runtime's garbage collector. When future collections occur, any surviving objects currently in generation 2 simply stay in generation 2.
As I stated earlier, generational garbage collecting improves performance. When the heap fills and a collection occurs, the garbage collector can choose to examine only the objects in generation 0 and ignore the objects in any greater generations. After all, the newer an object is, the shorter its lifetime is expected to be. So, collecting and compacting generation 0 objects is likely to reclaim a significant amount of space from the heap and be faster than if the collector had examined the objects in all generations.
This is the simplest optimization that can be obtained from generational GC. A generational collector can offer more optimizations by not traversing every object in the managed heap. If a root or object refers to an object in an old generation, the garbage collector can ignore any of the older objects' inner references, decreasing the time required to build the graph of reachable objects. Of course, it is possible that an old object refers to a new object. So that these objects are examined, the collector can take advantage of the system's write-watch support (provided by the Win32® GetWriteWatch function in Kernel32.dll). This support lets the collector know which old objects (if any) have been written to since the last collection. These specific old objects can have their references checked to see if they refer to any new objects.
If collecting generation 0 doesn't provide the necessary amount of storage, then the collector can attempt to collect the objects from generations 1 and 0. If all else fails, then the collector can collect the objects from all generationsâ€"2, 1, and 0. The exact algorithm used by the collector to determine which generations to collect is one of those areas that Microsoft will be tweaking forever.
Most heaps (like the C runtime heap) allocate objects wherever they find free space. Therefore, if I create several objects consecutively, it is quite possible that these objects will be separated by megabytes of address space. However, in the managed heap, allocating several objects consecutively ensures that the objects are contiguous in memory.
One of the assumptions stated earlier was that newer objects tend to have strong relationships to each other and are frequently accessed around the same time. Since new objects are allocated contiguously in memory, you gain performance from locality of reference. More specifically, it is highly likely that all the objects can reside in the CPU's cache. Your application will access these objects with phenomenal speed since the CPU will be able to perform most of its manipulations without having cache misses which forces RAM access.
Microsoft's performance tests show that managed heap allocations are faster than standard allocations performed by the Win32 HeapAlloc function. These tests also show that it takes less than 1 millisecond on a 200Mhz Pentium to perform a full GC of generation 0. It is Microsoft's goal to make GCs take no more time than an ordinary page fault.
Direct Control with System.GC
The System.GC type allows your application some direct control over the garbage collector. For starters, you can query the maximum generation supported by the managed heap by reading the GC.MaxGeneration property. Currently, the GC.MaxGeneration property always returns 2.
It is also possible to force the garbage collector to perform a collection by calling one of the two methods shown here:
|
|