Q: Aren't C++ pointers alone enough to "handle" GC? A: No.

A few days ago on ,
Nicola Musatti wrote:

-
As for GC, pure implementations exist.

\[that add no new extensions to ISO C++\]  

Not for a pure definition of "pure," they don't. :-)

To explain why C++ pointers are insufficient (unless their semantics were to be changed
at least a little, which would mean breaking existing code), consider two counterexamples:

1. Not for a compacting GC. Certainly a bald pointer can't point directly to an object
that moves around in memory, because C++ pointers are required to be stable, to always
have the same value while pointing to the same object. Changing the semantics of a
pointer to make it track will break lots of code, starting with set<T*> ,
because such tracking pointers cannot be ordered (their values will after all be changed
arbitrarily at unpredictable times by the GC). There are also other restrictions,
but that's one of the most noticeable. [Aside: Such a tracking pointerlike abstraction
is needed, and is provided in C++/CLI. It just can't be spelled * without
fundamentally scuttling ISO C++ conformance, is all.]

2. Not for a non-compacting GC, either. This case can be got a lot closer, but even
Great Circle / Boehm style collectors impose restrictions that break some conforming
C++ programs. In particular, they restrict, if only slightly, the operations that
Standard C++ allows on pointers. Consider the following well-formed ISO C++ program
with well-defined semantics:

  int* pi = new int(42); // line 1

  pi = (int*)((int)pi ^ 0xaaaaaaaa);

  // ... do other work ...

  pi = (int*)((int)pi ^ 0xaaaaaaaa);

  cout << *pi; // perfectly ok, prints "42",
won't crash

  delete pi; // ok

Add-on GCs can't see such disguised pointers, and are liable to reclaim the memory
allocated in line 1 before its later use, resulting in an attempt to access freed
memory. Boom.

This isn't perverse or theoretical, by the way. Consider "two-way pointers" as
one example of a well-known implementation technique where two pointers are XOR'd
together like this for a perfectly reasonable and legal use. In particular, a motivation
behind two-way pointers is that you can have a more space-efficient doubly linked
list if you store only one (not two) pointer's worth of storage in each node. But
how can the list still be traversable in both directions? The idea is that each node
stores, not a pointer to one other node, but a pointer to the previous node XOR'd
with a pointer to the next node. To traverse the list in either direction, at each
node you get a pointer to the next node by simply XORing the current node's two-way
pointer value with the address of the last node you visited, which yields the address
of the next node you want to visit. For more details, see:

  "Running Circles Round You, Logically"

  by Steve Dewhurst

  C/C++ Users Journal (20, 6), June 2002

I don't think the article is available online, alas, but Steve's website has some source
code demonstrating the technique
.

This perfectly standards-conforming and useful technique won't work correctly with
any GC implementation I know of that does not extend the language so that pointers
can retain their full standard meaning.

Steve's technique works perfectly fine and unbroken, however, under C++/CLI. It works
because C++/CLI preserves exactly the full semantics of * pointers
without any limitations. To do so, C++/CLI needed to add a new abstraction for GC
semantics instead of pretending that raw pointers are by themselves a complete solution
for safe use in a GC environment (they aren't, only because they were never designed
to be).

For more about the design motivations behind the ^ declarator (aka
a "handle"), see also Brandon Bray's excellent blog entry Behind
the Design: Handles
posted earlier today.