Guest Post: Dylan Meeus on Garbage Collection in C#
A small introduction
I’m Dylan Meeus, an IT student at the University College in Leuven, with my focus mainly on software engineering. I’m in my first year at the moment but I have been programming long before I came to this uni. I started programming when I was 15 years old. Most of the time I program in C#, though at university we learn to program in Java instead. I'm also a proud member of U-Crew Tech, the Microsoft Student Partners.
Garbage Collection in C#
Garbage Collection is clearing up more memory, by getting rid of useless objects that live on the heap. But how do we determine which objects are deemed “useless” and which ones aren’t?
This part is simple, if there is a “live reference” to the object on the heap, it should be kept alive. As long as there is a way of leading to the object, you can’t get rid of it. Doing this has the potential (and most likely will) destroy your application.
Imagen that you rely on some object named “Bank”. But, because of a faulty garbage collection, the Bank object is removed from the heap, and so is all the values it held. This could cause many bugs.
Whether an object should or should not be collected is done in different steps, checking everything on the heap would be a HUGE task and the memory required to do so, is about twice as high as the memory your application is using. (Though this might sound like a bad thing, GC is of great importance). The first step is checking which objects have a reference, the GC goes past all objects (remembering where he has been, we don’t want an infinite loop) and when he sees that there is no connection to the object anymore, he will mark it as “waste”.
After he went over them, at the point when the second step (actual GC) kicks in, he will either promote the objects, or remove them. There are 3 levels of promotion an object can get (Actually, there is a 4th one that is kind of special, which I’ll cover later). We call them respectively Gen0, Gen1 and Gen2.
After Gen0 gets full, the GC will kick in and when an object survives the first pass, he will get promoted to Gen1. Because most of the objects on the heap aren’t used for such a long time, most of them will get removed by the GC at this first Generation. So this causes Gen1 to not get full as quickly as Gen0 does. By doing this, he already managed to divide the heap into a section “Gen0” where he will have to remove objects quite frequently, and a Gen1 that is less likely to get full. Because Gen1 won’t get filled as quickly as Gen0, there is less GC required here. This saves the GC quite a bit of work, which is a good thing because as mentioned before, it is a very stressful task for your computer.
Same goes for Gen2, if Gen1 is full, it will have to do garbage collection on that section, and the surviving objects get promoted to Gen2. But what happens if Gen2 is full and there can’t be any space freed by the garbage collector? Your program will throw an “OutOfMemoryException”. Ending your program immediately.
As mentioned before, there is indeed a 4th section. This one is called the “Large Object Heap”, this one provides storage for objects of size “ > 85000 bytes”. These are deemed too expensive to move around all the time from one Gen to another, so they are just stuck in here.
When moving objects from one section to another, you have to keep in mind that the location changes. So when there is a reference to that particular object, the reference has to be to the new address. This is another task complicating the work of the GC, it has to make sure that every reference has a matching point on the heap. Otherwise GC would ruin your application
One of the other advantages of automatic GC, is that when there is freed up space, it gets allocated next to eachother. To get why this is valuable, look at the following image of Native GC.
The grey area is the memory that the GC freed, this is a total of 12bytes. Though there are living objects between “A” and “D E”. This means that if I’d want to allocate a new object of 10bytes, I wouldn’t be able to because it wouldn’t fit anywhere. Though while in some language it might still be like this, in C# this is not the case, because the GC in C# makes sure that when he has free space, he will put it together as one big block of free space. Let’s look at this very same image in C#.
If I’d want to allocate an object of 10b now, It would perfectly fit in the block of 12b.
So to come to a conclusion; while the garbage collector is actually a complex thing, and does require quite a bit of calculation power from your computer, it is a very valuable thing. It saves us programmers a lot of work trying to figure out how we can manage the memory ourselves.