Value Type Representation Between the Original and Revised C++

For the work I’ve been engaged in currently in machine translation of the original language design [thing1] to the revised design of the language [thing2], I have been variously making stabs at understanding the possible usages of a managed Value type [V] and pointer modifications of that type [V*, __box V*]. Artur Laksberg and Mahesh Hariharan have both provided much helpful feedback.

Here is the canonical trivial value type used in the thing1 language spec:

            __value struct V { int i; };

            __gc struct R { V vr; }

In V1, we can have four syntactic variants of a value type [where forms 2 and 3 are the same semantically]:

  1. V v = { 0 };
  2. V *pv = 0;
  3. V __gc *pvgc = 0; // Form (2) is an implicit form of (3)
  4. __box V* pvbx = 0;  // must be local

Form (1) is the canonical value object, and it is reasonably well understood, except when someone attempts to invoke an inherited virtual method such as ToString(). For example,

v.ToString(); // error!

In order to invoke this method, the compiler must have access to the associated virtual table of the base class. Because value types are in-state storage without an associated vptr, this requires that v be boxed. In thing1, implicit boxing is not supported but must be explicitly specified by the programmer, as in

            __box( v )->ToString(); // thing1: note the arrow

The primary motive behind this design was pedagogical: it wished to make the underlying mechanism visible to the programmer so that she would understand the `cost’ of not providing an instance within her value type. Were V to contain an instance of ToString, the implicit boxing would not be necessary.

In thing2 [yes, referring to the two languages in this way is annoying, isn’t it?], the implicit boxing is carried out transparently:

            v.ToString(); // thing2

but at the cost of possibly encouraging the class designer to introduce an instance of ToString within V. The reason the implicit boxing is preferred is because while there is usually one class designer, there are an unlimited number of users, none of whom would have the freedom to modify V to eliminate the possibly onerous explicit box.

Another difference with a value type between thing1 and thing2 is the removal of support for a default constructor. [It has been explained to me that this is because there are instances in which the CLR can create an instance of the value type without invoking the associated default constructor. That is, the thing1 addition of support of a default constructor within a value type cannot be guaranteed. Given that absence of guarantee, it was felt to be better to drop the support altogether rather than have it be non-deterministic in its application.]

This is not as bad as it might seem because each object of a value type is zeroed out automatically, so that the members of a local instance are not undefined. This also meant that in thing1 a default constructor that simply zeroed out its members was being redundant. The problem is that a non-trivial default constructor in a thing1 program has no mechanical mapping to thing2. The code within the constructor will need to be migrated into a named init function that would then be explicitly invoked by the user.

The declaration of a value type object within thing2 is otherwise unchanged. [Which means there is still no support for a destructor within a value type. When you couple that with the continued requirement that non-POD native classes be pointer members within the value type, this makes the use of a value type for wrapping non-POD native classes virtually useless.]

Forms (2) and (3) can address nearly anything in this world or the next [that is, anything managed or native]. So, for example, all the following are permitted in thing1:

R* r;

pv = &v; // address a value type on the stack

pv = __nogc new V; // address a value type on native heap

pv = pvgc; // we are not sure what this addresses

pv = pvbx; // address a boxed value type on managed heap

pv = &r->vr; // an interior pointer to value type within a

             // reference type on the managed heap

So, a V* can address a location within an activation record [and therefore can be dangling] or global data segment, within the native heap [and therefore can be undefined], within the managed heap [and therefore will be tracked if it should be relocated by the gc], and within the interior of a reference type object on the managed heap [again, requires tracking].

Forms (2) and (3) map into interior_ptr<V>, although the revised language supports both interior_ptr<V> and V*. The primary behavior difference is that the interior_ptr is a tracking pointer; that is, if the object addressed is on the managed heap and that object is relocated by the gc, the interior_ptr is updated with its new address. A V* is restricted to only address non-managed heap memory. It would be an error to attempt to assign a V* the address, for example, of &r->vr, or the address of pvbx [that is, __box V*]. An interior_ptr requires a nullptr to indicate a pointer to no object; a V* would require a 0. For example,

V *pv = 0; // may not address within managed heap

interior_ptr<V> pvgc = nullptr;

Form (3) is a tracking handle. It addresses the whole object that has been boxed within the managed heap [remember that boxing copies the value type into a reference type of the value]. It is translated in the revised language into a V^:

            V^ pvbx = nullptr; // __box V* pvbx = 0;

The following declarations in the original language design all map to interior_ptrs in the revised language design being value types within the System namespace,

            Int32 *pi; -> interior_ptr<Int32> pi;

      Boolean *pb; -> interior_ptr<Boolean> pb;

      E *pe; -> interior_ptr<E> pe; // Enumeration

           

The built-in types are not considered managed types, although they do serve as aliases to the types within the System namespace. Thus the following mappings hold true between thing1 and thing2:

     int * pi; -> int * pi;

      int __gc * pi -> interior_ptr< int > pi;

So, when translating a V* in your existing thing1 program, the most conservative strategy is to always turn it to an interior_ptr<V>. This is how it was treated under the original language. In the revised language, the programmer has the option of restricting a value type to non-managed heap addresses by specifying V* rather than interior_ptr<V>. If, on translating your program, you can do a transitive closure of all its uses and be sure that no assigned address is within the managed heap, then leaving it as V* is fine. All V __gc * should, of course, go to interior_ptr<V>.