Reference vs. Value: Types, Passing and Marshalling

The .NET Programming Model makes a clean distinction between Reference and Value on three different levels. These three levels interact directly, and it is important to understand the distinction. This article may be old hat to many of you, but it is confusing enough on a daily basis over here that I decided to write it all down so we’d have a place to look to keep it all straight. And I figured if we needed to keep it straight, then you might too. The main reason for writing this is getting to the Marshalling bit, but the other pieces are necessary background.

Understanding how different types Marshal (if they Marshal at all) is essential to getting Addin Models correct. Even if you are sure you understand the first two bits, I encourage you to skim over them and go to the end. Because an understanding of all of this is critical to understanding my next article upcoming in a week or so: Implementing Contracts : Proxies and Adapters.


Reference Types vs. Value Types

This is the first level, and probably the most well understood. In .NET all types are by default reference types unless they derive from System.ValueType. Of course, the various languages don’t allow you to derive directly from System.ValueType, there is some syntax to indicate that a type is specifically a value or reference type. In C#, of course, "class" is reference and "struct" is value. In VB .NET they are "Class" and, now, "Structure" (the keyword "Type" used to be used for struct-like things in VB, but Type means something else now…).

But what does that mean: what is the difference between a class and a struct? Those of you that know C++ (traditional, not managed) remember that in that language the only difference between class and struct was whether or not members were public by default (struct, yes, class, no). Both classes and structs are really value types: it was not the type that is "reference" but how the memory was allocated that determined it. If the object was declared on the stack, it was a value instance, if it was allocated on the heap the variable was a reference to the value.

In .NET, of course, the developer is freed from worrying about memory allocations. But the concept of "value" and "reference" live on in the types themselves. The differences end up being similar to C++ conceptually. There are three main differences in value types and reference types: how they are constructed/initialized, how assignment works, and what happens when you pass them into other functions. BTW, from now on I’m going to use C# in examples because I am more familiar with it.

First, let’s declare two simple types to use as examples:

public

struct ValType
{
    public int member;
}

public

class RefType
{
    private int member;

    public int Member
{
        get { return member; }
        set { member = value; }
}
}

Now, let’s see how these things are constructed and initialized. To use one of these things, you must declare a variable of the type, and the initialize it – this part is deliberately the same in C#. Well, the same set of steps, what happens is different.

public

static void UseTypes()
{
    ValType vt = new ValType();
    RefType rt = new RefType();
}

Both types come with default parameterless constructors. But they do two different things. For ValType, you can think of the variable vt as the object itself (allocated "on the stack" conceptually), but in the case of the RefType, the variable is *not* the object, but a reference to the object (allocated "on the heap" conceptually).

The default constructor of a Value Type does not instantiate the object – it is already there. It simply initializes all of the fields to their default values – so "constructors" for value types are really more like initialization functions, or "intializers." You cannot use an instance of a value type before it is initialized, but you don’t have to call a constructor to do it. I could have written the code above as follows, and it would be just as valid – ValType would be usable:

public

static void UseTypes()
{
    ValType vt;
vt.member = 5;

RefType rt = new RefType();
}

"Constructors" for value types are useful, though, if there are many fields that need initialization – one stop shopping. It is illegal to directly initialize members in a struct declaration like this:

public

struct ValType
{
    public int member = 0;
}

Note that it is also illegal to declare a parameterless constructor in a struct – if you want to initialize the fields to other than default values you either do it directly, outside of the declaration, or declare a parameterized constructor and do the initialization there. Of course if you make the fields private, with property accessors, then you *must* have a parameterized constructor to initialize the fields to other than default values, because calling a property is "using" the instance (it calls the underlying set_ method) and you cannot use the instance until it is initialized. Make sense? 

With Reference Types, conversely, the constructor actually constructs the instance of the object – the memory is allocated, the constructor code is run, and a reference to that new memory is assigned back to the variable. So you cannot use a reference type without calling a constructor, as above. Otherwise you will get a NullReferenceException. The "constructor code" of the default, parameterless constructor just assigns default values (usually zero or null) to the members. But in reference types you *are* allowed to declare and implement a parameterless constructor and do whatever you want in it. Reference types also allow member intializers like this:

public class RefType
{
private int member = 5;

    public int Member
{
        get { return member; }
        set { member = value; }
}
}

The memory for "member" is initialized to 5 when the memory is allocated.

Assignment – the = operator – is different for Reference types and Value types as well. We have seen the operator used above for initialization, assignment of other instances is an expansion of the same thing:

public void ReassignTypes()
{

    ValType vt1 = new ValType();
    RefType rt1 = new RefType();

    ValType vt2 = new ValType();
    RefType rt2 = new RefType();

    Console.WriteLine("vt1.member = " + vt1.member);
    Console.WriteLine("ReferenceEquals(vt1, vt2) ? " + ReferenceEquals(vt1, vt2));

vt2.member = 5;

    vt1 = vt2;

    Console.WriteLine("vt1.member = " + vt1.member);
    Console.WriteLine("ReferenceEquals(vt1, vt2) ? " + ReferenceEquals(vt1, vt2));

    Console.WriteLine("rt1.Member = " + rt1.Member);
    Console.WriteLine("ReferenceEquals(rt1, rt2) ? " + ReferenceEquals(rt1, rt2));

rt2.Member = 5;

rt1 = rt2;

    Console.WriteLine("rt1.Member = " + rt1.Member);
    Console.WriteLine("ReferenceEquals(rt1, rt2) ? " + ReferenceEquals(rt1, rt2));
}

This little program’s output is as follows:

vt1.member = 0
ReferenceEquals(vt1, vt2) ? False
vt1.member = 5
ReferenceEquals(vt1, vt2) ? False
rt1.Member = 0
ReferenceEquals(rt1, rt2) ? False
rt1.Member = 5
ReferenceEquals(rt1, rt2) ? True

ReferenceEquals is the way we determine identity in .NET. Of course in native programming, identity was determined by the physical address of the memory. But in .NET memory is managed by the Garbage Collection mechanism, and can be moved about, so addresses are not constant for a particular object. We use ReferenceEquals to tell us of two objects have the same identity. So this shows us something. For vt1 and vt2, the assignment copies the value from one instance to another. For rt1 and rt2 the assignment changes the reference held in the variable to the other chunk of memory. After the assignment the variables rt1 and rt2 reference the same instance, while vt1 and vt2 remain separate instances.

We can’t have a complete discussion of Value Types without mentioning "boxing and unboxing". The link is to the MSDN article on boxing and unboxing for further information. In short, boxing is what happens when you assign a value type to a variable of type System.Object. System.Object is, of course, a reference type, but it is also the base class of System.ValueType so the assignment is allowed. There is a special case, then, for this assignment. When you "box" a value type by assigning it to a variable of type System.Object, what happens is a new instance of the value type is created on the heap, its value is copied to this new instance, and a reference to that instance is assigned to the variable. "Unboxing" goes the other way, when you assign the object back to a variable of the type of the Value type. The value is copied from the referenced object back to the local variable.

The next difference between Reference Types and Value types opens up the next level. Reference Types and Value Types behave differently when they are passed into a function. But there are also two forms of passing arguments: Passing by Value and Passing by Reference. Both forms apply to both kinds of types.


Passing by Value vs. Passing by Reference

Both Value Types and Reference Types can be passed either by Value or by Reference. This is starting to get confusing already. By default – in other words, if you do nothing special – what you are doing is passing by value.

So let’s look at that first. When you pass a Value type by value, you are literally passing the value of the type. The value is copied into the argument. If the value of a member is altered within the call, it is not reflected in the passed in value. This is demonstrated here:

public

static void PassValueType()
{
    ValType vt;
vt.member = 5;

    Console.WriteLine("vt.member = " + vt.member);

PassValueTypeByValue(vt);

    Console.WriteLine("vt.member = " + vt.member);
}

public

static void PassValueTypeByValue(ValType vt)
{
    Console.WriteLine("vt.member = " + vt.member);

vt.member = 1111;

    Console.WriteLine("vt.member = " + vt.member);
}

The output for this is as follows:

vt.member = 5
vt.member = 5
vt.member = 1111
vt.member = 5

So you can see vt.member is changed within the call, but the original passed-in vt is not changed. Value types passed by value are copied into their destination.

Reassigning the variable in the call produces the exact same result, because as we have seen, assignments to value types actually just copy the value from one instance to the other. So this code produces the exact same output:

public

static void PassValueTypeByValue(ValType vt)
{

    Console.WriteLine("vt.member = " + vt.member);

    ValType vtx = new ValType();
vtx.member = 1111;
vt = vtx;

    Console.WriteLine("vt.member = " + vt.member);
}

Why is this important? Because it is different than with reference types.

With Reference types, it is the reference that is passed by value rather that the value. So if I pass a reference type by value and change its state by altering one of its members, the value *is* reflected on the other side of the call.

public

static void PassReferenceType()
{
    RefType rt = new RefType();
rt.Member = 5;

    Console.WriteLine("rt.Member = " + rt.Member);

PassReferenceTypeByValue(rt);

    Console.WriteLine("rt.Member = " + rt.Member);
}

public

static void PassReferenceTypeByValue(RefType rt)
{

    Console.WriteLine("rt.Member = " + rt.Member);

    rt.Member = 1111;

    Console.WriteLine("rt.Member = " + rt.Member);
}

The output of the above code, then, is this:

rt.Member = 5
rt.Member = 5
rt.Member = 1111
rt.Member = 1111

But, if I change the code to do a reassignment, I am actually trying to change the reference in the call, but the reference was passed by value, so the change is *not* reflected on the other side. In other words this code:

public

static void PassReferenceTypeByValue(RefType rt)
{
    Console.WriteLine("rt.Member = " + rt.Member);

    RefType rtx = new RefType();
rtx.Member = 1111;

rt = rtx;

    Console.WriteLine("rt.Member = " + rt.Member);
}

Produces this result:

rt.Member = 5
rt.Member = 5
rt.Member = 1111
rt.Member = 5

This all makes sense if you think about it. Value types passed by value passes the value, Reference types passed by value passes the Reference. So for value types you can’t alter the value, and for reference types you can’t alter the reference – but you can alter the value. Reference types are one level of indirection removed from value types.

The .NET Programming model provides us with one more level of indirection for both Value Types and Reference Types, and that is, predictably enough, Passing by Reference. If you pass a Value type by reference, you are allowed to change the value, and if you pass a reference type by reference you are allowed to change the reference. So let’s consider the above examples by reference instead.

In C# the "ref" keyword indicates passing by reference. Both the parameter of the function and the argument passed to the function have to be decorated with "ref" for this to work. In other words the signature must specify passing by reference, and the code that passes the argument must match.

So passing a value type by reference looks like this:

public

static void PassValueType()
{

    ValType vt;
vt.member = 5;

    Console.WriteLine("vt.member = " + vt.member);

    PassValueTypeByReference(ref vt);

    Console.WriteLine("vt.member = " + vt.member);
}

public

static void PassValueTypeByReference(ref ValType vt)
{
    Console.WriteLine("vt.member = " + vt.member);

vt.member = 1111;

    Console.WriteLine("vt.member = " + vt.member);
}

Predictably, the output of passing a value type by reference looks the same as passing a reference type by value – we have elevated the value type to the same level of indirection that a reference type is by default.

vt.member = 5
vt.member = 5
vt.member = 1111
vt.member = 1111

But, as in the value type example above, reassigning the instance inside the call, produces this exact same output – it is not the same as trying to change the reference because, well, it is still a value type that is being assigned to, and that means copy the value, not the reference. So this code produces the same output as above:

public

static void PassValueTypeByReference(ref ValType vt)
{
    Console.WriteLine("vt.member = " + vt.member);

    ValType vtx = new ValType();
vtx.member = 1111;

vt = vtx;

    Console.WriteLine("vt.member = " + vt.member);
}

Next, when passing Reference types by value, you can alter the value, as above. Again, we get the same output as above for this code:

public

static void PassReferenceType()
{
    RefType rt = new RefType();
rt.Member = 5;

   Console.WriteLine("rt.Member = " + rt.Member);

PassReferenceTypeByReference(ref rt);

    Console.WriteLine("rt.Member = " + rt.Member);
}

public

static void PassReferenceTypeByReference(ref RefType rt)
{
    Console.WriteLine("rt.Member = " + rt.Member);

rt.Member = 1111;

    Console.WriteLine("rt.Member = " + rt.Member);
}

This is because it is really doing the same thing as we did when passing by value. It is changing the value of a reference. But now we can change the code to this and get the same output:

public

static void PassReferenceTypeByReference(ref RefType rt)
{
    Console.WriteLine("rt.Member = " + rt.Member);

    RefType rtx = new RefType();
rtx.Member = 1111;

rt = rtx;

    Console.WriteLine("rt.Member = " + rt.Member);
}

So the output was now exactly the same for all four examples here of passing by reference. But it was for different reasons. With value types passed by reference, you can alter the state of the instance and reassign the instance, but because it is a value type you never reassign the reference. With reference types, you continue to be able to alter the value of the reference, plus now you can reassign the reference as well.

C# also provides another keyword that indicates passing by reference, the "out" keyword. The difference is that in either case – using a value or reference type – the argument passed in must be fully initialized in the called function. It may have already been initialized outside the call, or not. With "ref" calls, the object must be initialized before calling the function. So only the last example of each – the reassignment -- is valid for the "out" keyword. You’ll get a compile error otherwise. I’ll leave it as an exercise to the reader to play with "out" value and reference type parameters.


Marshalling by Value vs. Marshalling by Reference

Finally we get to the good part. Marshalling is important to the Managed Addin Framework (MAF) because we assume addins are isolated from each other and the host across a remoting boundary. They don’t have to be, but in order for them to be allowed to be, the system must assume the boundary is there. This places restrictions on the model, and these restrictions have to do with Marshalling.

Marshalling is what happens when you pass an object across a remoting boundary. For the addin model, we usually use separate AppDomains as the unit of isolation. We can also use separate processes. Marshalling between either AppDomains or processes is roughly the same, and those are what are considered here. Marshalling cross-machine using Web Services is a separate subject.

By default neither Value Types nor Reference Types can be Marshalled at all. In other words, neither ValType nor RefType as declared in the examples above could be passed across a remoting boundary. If you tried, you’d get a "SerializationException." I’ll explain what that means in a minute. This is done on purpose, of course. We want you to be explicit and know what you are doing when you Marshal something.

So how do you "try" to Marshal an object? You pass it into a function across a remoting boundary, or return it from a function through the boundary. I’m not going to go into all of the details of .NET remoting in this article – RealProxy, TransparentProxy and all of the infrastructure can be explored at the provided links. What I am going to cover is what it means to make a type Marshal by Value or Marshal by Reference.

One interesting thing here is that it is not possible to marshal a value type by reference. The best you can do is make a value type marshal by value. This is a good thing, it keeps the model symetrical. With Reference types it *is* possible to make them marshal either by value or reference. There is a good reason for this, and it has to do with how far the object is getting marshalled. For Addin models, that are only expected to go as far as another AppDomain or process, the best thing to do is to make your reference types marshal by reference – again to keep the model symetrical. Unfortunately, there are cases when this is out of our control.

So let’s look at Marshal By Value first. What does this mean? It is similar to passing by value: to Marshal by Value we copy the value of the object from one side of the boundary to a new instance on the other side. The way this is done is through a process called "serialization." I’m not going to go into too much detail on the process of serialization here, it is a book in itself. Use the link for more information.

A Marshal by Value object, then, is also said to be "Serializable." There are two ways to mark a type as "Serializable." The simplest way is to simply add the SerializableAttribute to its declaration:

[

Serializable]
public struct ValType
{
    public int member;
}

When applied, all fields, regardless of visibility, are serialized (copied), unless specifically excluded by applying the NotSerializedAttribute. Of course this attribute should be used with care – excluding fields could render the copy on the other side invalid.

What this means is that the transitive closure of all types held in members of a serializable type (and their members on down) must be Marshallable, either by value or by reference. If it is not true a Serialization Exception will result, meaning serialization failed an your program can't work. It is this transitive closure requirement that made us define the MAF contract layer as a closed system. More on that as we go along.

Versioning is an issue with Serialized types of course. If you try to serialize v1 of a type into an instance of v2 of a type (or vice versa) there could new, or missing, members and serialization may fail. Work was done in Whidbey to make serialization more version resilient. But of course with regard to the MAF this versioning stuff is moot: for this very reason, the only things allowed to cross a remoting boundary in MAF must be declared in the Contract layer, and contract assemblies do not version. Value types declared in contract assemblies *must* be serializable to work, but they can never version, so this restriction, along with the transitive closure restriction means that serialization won’t fail.

The other way to mark a type as Serializable is to implement the System.Serialization.ISerializable interface. This gives you fine grained control over the serialization process. The link above covers this in detail, too. Though not strictly necessary if you implement ISerializable, you should apply the SerializableAttribute to make it clear that this is a serializable type. ISerializable gives you a way to work around the versioning issue, but again for MAF you don’t need it. In fact, for MAF the contract-level serializable value types should be simple compositions of primitive types, other contract-level serializable value types, or contracts, and that’s it. They should contain no implementation. For this reason, implementing ISerializable should be unnecessary. See my previous entry Contracts and IContract for more information.

Marshal by Reference is, as you would expect, the ability to Marshal a reference to an object across the remoting boundary. The object is not copied, but a Proxy is created in the new domain that refers back to the original object. Calls into the proxy cross the boundary and are called on the original instance. Again, see the remoting articles linked above for more information on how proxies are created and used.

The way you mark a type as Marshal by Reference is completely different from Marshal by Value. To make an object Marshal by Reference you must derive it from System.MarshalByRefObject. Obviously the imposes a strict requirement on your object hierarchy – you have to know you are doing it. This is the reason value types cannot marshal by reference – you can’t change their derivation chain. So only reference types can marshal by reference.

Of course tranisitive closure comes into play here too. It is not as restrictive as the Serialization case, because we don't have to worry about private members. But the transitive closure of all types exposed in the public API of the Marshal by Reference type must also be marshalable, again either by reference or by value. Again, it is the tranisive closure requirement here that was a main driving force behind MAF.

But as noted reference types *can* marshal by value. You can just as easily apply the SerializableAttribute to a reference type as to a value type. In fact, you can apply the SerializableAttribute to a reference type the derives from System.MarshalByRefObject. I’ll cover this last case in a minute, there really is a valid reason for doing this, and some types in the .NET Framework are actually implemented this way.

I said above, for addin models you should make reference types marshal by reference to keep the model symetrical. This is important. Because, if you think about it, a reference type that marshals by *value* exhibits two different behaviors whether passed to a function within an AppDomain or passed to a function across a remoting boundary. Within an AppDomain the reference type is still a reference type, I can change its state and that state is reflected in the original object after the call. But if it is passed across a boundary to me, I only get a copy of the object and if I alter its state I do not alter the original object. It has in effect become a value type once it is marshalled. As the recipient of the object, I can’t know for sure which case its is  have I – received a copy of it or not? – it just looks like a local reference type to me. So breaking the symetry is bad for addin models.

Of course with MAF the only types allowed to cross a boundary are defined within the contract layer. And the "reference types" of MAF are contracts. We have seen that contracts are actually just interfaces, they have no implementation, at the contract level. The transitive closure of types exposed by contracts must be other contracts, serializable value types defined within the contract layer or primitive types, meeting the restriction.

But of course, Contracts must be implemented somewhere. So, you can tell from this that Contracts must be implemented on reference types that derive from MarshalByRefObject. The actual type of the class that implements the contract is not visible across the domain (and really should have no other public surface besides the contract), only the contract is visible, but nonetheless the class must derive from MarshalByRefObject in order to Marshal by Reference.

To close out this section, let’s consider why one would make a reference type serializable – even a MarshalByRef type. The reason, as alluded to above, has to do with Web Services. One cannot marshal a reference cross-machine, objects must be stateless and serializable. It is not always practical to make types that one wants to expose for cross-machine serialization value types. One might want a type hierarchy that you cannot get with value types, for instance. And a Serializable MarshalByRef type is a special case of this. Here the type author wants the type to MarshalByRef when possible (MarshalByRef always wins when crossing AppDomain and Process boundaries) but to Serialize when going cross-machine. Of course this object must implemented with the utmost care. Another reason to mark a reference type as Serializable is to allow its state to be persisted with standard serialization. Again, see the serialization article linked above for more information. None of this pertains to MAF or addins, though.

As noted at the beginning of this very long article, my next article will be about implementing contracts and actually providing a real example of how this stuff works. All of this Value\Reference stuff is a prerequisite, we’re going to put it into play.