Heap fragmentation - when it all goes to pieces

Heap fragmentation is not a good thing and it happens constantly in most applications. The heap manager keeps fixing it up and the application keeps messing it up again.

It is easy to concentrate on the implementation of your algorithm. Well, that is the idea really. Deadlines are tight and you can optimise later if you need to. This would presumably be the later when the documentation will get properly rewritten and the last of the not-that-serious bugs get fixed. As a result, it is very normal to see code like:

Dim

sSum As String = ""

Dim

sDelta As String = ""

While

Not sDelta.EndsWith("*")

    sDelta = GetNextDelta()

    sSum = sSum + sDelta

End While

So, this code will keep concatenating strings on to a longer string until the shorter string ends with a *. It looks reasonable to a human but if your heap could make a sound, it would be swearing at you. Lets dismantle the memory requirements.

Dim

sSum As String = ""

Dim

sDelta As String = ""

Two objects on the stack – but the text will go on the heap. Just the reference is on the stack. Still, that is fine

While

Not sDelta.EndsWith("*")

I don’t know the implementation of EndsWith offhand but I wouldn’t be surprised to hear that it copies at least some of the string. That will mean heap allocations but that is to be expected.

sDelta = GetNextDelta()

Ok, this is a needless allocation. We could have coded it to append the return value without an intermediary string since the end of sSum will be the same as the end of sDelta. So, that is an allocate and copy that we didn’t need and each allocation is probably a different size which doesn’t help.

sSum = sSum & sDelta

This one is the nasty one. Let us assume that sSum is 40 characters long and sDelta is 8 characters long. Of course, they are Unicode so their memory foot print will be twice their length plus the overhead for an object. We will ignore the overhead for now. sSum is 80 characters before the addition of sDelta. Afterwards it is 96 bytes. So, there is a free block of 80 characters and we have just allocated 96 new bytes probably straight after the 80 bytes. Then we do the same thing with the next string. Memory fills up with free blocks which get larger and larger but never large enough to host the new value of sSum. In a single user application this will work fine but won’t be as fast as we might like. The heap will get defragmented which costs some CPU. However, imagine that this was an ASP.NET application and you had 50 threads doing this. The heap will be mostly free blocks waiting to be coalesced.

Add in some pinned (unmovable) objects which makes the heap management less efficient and you are in serious trouble. Your code runs slower than molasses in December, the process is a CPU hog and the memory goes up and up. Eventually, you end up with a fatal out of memory error and you crash or recycle depending on what you are.

Code that is kind to the heap scales so much better.

One question that we are sometimes asked is why "Out of memory" is a fatal error. Since it is a managed world and there is some free memory, couldn’t the GC kick in and make some room? Well, it could, except for one thing. It needs memory to do that and it has to be contiguous memory. When the heap is badly fragmented, there isn’t much of that around and the GC will need around 50 MB of it. If it isn’t there to be had then we can’t GC and the process is doomed. Why don’t we pre-allocate 50 MB at the start? Because we don’t know how much we will need. It could be a fair bit less or a great deal more. To be sure, we would have to allocate a big chunk of your process space and that is not the way to make friends and influence people.

For my next rant, we have a different topic. Remote debugging.