This article may contain URLs that were valid when originally published, but now link to sites or pages that no longer exist. To maintain the flow of the article, we've left these URLs in the text, but disabled the links.
Type Fundamentals | |
Jeffrey Richter | |
n the October issue, I introduced many of the fundamental concepts related to types in the Microsoft® .NET common language runtime. In particular, I discussed how all types are derived from the System.Object type, and showed the various mechanisms (for example, C# operators) that a programmer can use to cast from one type to another. Finally, I mentioned how namespaces are used by compilers and how they are ignored by the common language runtime. Primitive TypesCertain data types are used so commonly that many compilers allow your code to manipulate them using simplified syntax. For example, you could allocate an integer using the following syntax in C#:
But I'm sure you'll agree that declaring and initializing an integer using this syntax is rather cumbersome. Fortunately, many compilers (including C#) allow you to use syntax similar to the following instead:
This certainly makes the code more readable. And, of course, the intermediate language (IL) that is generated when using either syntax is identical.Any data types directly supported by the compiler are called primitive types. Primitive types map directly to types that exist in the base class library. For example, in C# an int maps directly to the System.Int32 type. Because of this, the following two lines of code are identical to the two lines of code shown previously:
Figure 1 shows the base class library types that have corresponding primitives in C# (other languages will offer similar primitive types).
Reference and Value TypesWhen an object is allocated from the managed heap, the new operator returns the memory address of the object. You usually store this address in a variable. This is called a reference type variable because the variable does not actually contain the object's bits; instead, the variable refers to the object's bits.There are some performance issues to consider when working with reference types. First, the memory must be allocated from the managed heap, which could force a garbage collection to occur. Second, reference types are always accessed via their pointers. So every time your code references any member of an object on the heap, code must be generated and executed to dereference the pointer in order to perform the desired action. This adversely affects both size and speed. In addition to reference types, the virtual object system supports lightweight types called value types. Value type objects cannot be allocated on the garbage-collected heap, and the variable representing the object does not contain a pointer to an object; the variable contains the object itself. Since the variable contains the object, a pointer does not have to be dereferenced in order to manipulate the object. This, of course, improves performance. The code in Figure 2 demonstrates how reference types and value types differ. In Figure 2, the Rectangle type is declared using struct instead of the more common class. In C#, a type declared using struct is a value type, while types declared using class are reference types. Other languages may have different syntax for describing value types versus reference types. For example, C++ uses the __value modifier. Recall the following line of code discussed in the section on primitive types:
When this statement is compiled, the compiler detects that the System.Int32 type is a value type and optimizes the resulting IL code so that this "object" is not allocated from the heap; instead, this object is placed on the thread's stack in the local variable a.When possible, you should use value types instead of reference types because your application's performance will be better. In particular, you should declare a type as a value type if all of the following are true:
Value type objects have two representations: an unboxed form and a boxed form. Reference types are always in a boxed form. Value types are implicitly derived from System.ValueType. This type offers the same methods as defined by System.Object. However, System.ValueType overrides the Equals method so that it returns true if the values of the two objects' instance fields match. In addition, System.ValueType overrides the GetHashCode method so that it produces a hash code value using an algorithm that takes into account the values in the objects' instance fields. When defining your own value types, it is highly recommended that you override and provide explicit implementations for the Equals and GetHashCode methods. Since you cannot declare a new value type or a new reference type using a value type as a base class, value types should not have virtual functions, cannot be abstract, and are implicitly sealed (a sealed type cannot be used as the base of a new type). Reference type variables contain the memory address of objects in the heap. By default, when a reference type variable is created, it is initialized to null, indicating that the reference type variable doesn't currently point to a valid object. Attempting to use a null reference type variable causes a NullReferenceException exception. By contrast, value type variables always contain a value of the underlying type. By default, all members of the value type are initialized to zero. It is not possible to generate a NullReferenceException exception when accessing a value type. When you assign a value type variable to another value type variable, a copy of the value is made. When you assign a reference type variable to another reference type variable, only the memory address is copied. Because of the previous point, two or more reference type variables may refer to a single object in the heap. This allows operations on one variable to affect the object referenced by the other variable. On the other hand, value type variables each have their own copy of the object's data, and it is not possible for operations on one value type variable to affect another. There are rare situations when the runtime must initialize a value type and is unable to call its default constructor. For example, this can happen when a thread local value type must be allocated and initialized when an unmanaged thread first executes managed code. In this situation, the runtime can't call the type's constructor but still ensures that all members are initialized to zero or null. For this reason, it is recommended that you don't define a parameterless constructor on a value type. In fact, the C# compiler (and others) consider this an error and won't compile the code. This problem is rare, and it never occurs on reference types. There are no restrictions on parameterized constructors for both value types and reference types. Since unboxed value types are not allocated on the heap, the storage allocated for them is freed as soon as the method that defines an instance of the type is no longer active. This also means that unboxed value type objects cannot receive a notification when their memory is reclaimed. However, a boxed value type will have its Finalize method called when it is garbage-collected. You are strongly discouraged from implementing a value type with a Finalize method. Like a parameterless constructor, C# considers this an error and will not compile the source code. Boxing and UnboxingThere are many situations in which it is convenient to treat a value type as a reference type. Let's say that you wanted to create an ArrayList object (a type defined in the System.Collections namespace) to hold a set of Points. The code might look like Figure 3.With each iteration of the loop, a Point value type is initialized. Then, the Point is stored in the ArrayList. But let's think about this for a moment. What is actually being stored in the ArrayList? Is it the Point structure, the address of the Point structure, or something else entirely? To get the answer, you must look up the ArrayList's Add method and see what type its parameter is defined as. In this case, you see that the Add method is prototyped in the following manner:
The previous code plainly shows that Add takes an Object as a parameter. Object always identifies a reference type. But here I'm passing p, which is a Point value type. For this code to work, the Point value type must be converted into a true heap-managed object, and a reference to this object must be obtained.Converting a value type to a reference type is called boxing. Internally, here's what happens when a value type is boxed:
When the Add method is called, memory is allocated in the heap for a Point object. The members currently residing in the Point value type (p) are copied into the newly allocated Point object. The address of the Point object (a reference type) is returned and is then passed to the Add method. The Point object will remain in the heap until it is garbage-collected. The Point value type variable (p) can be reused or freed since the ArrayList never knows anything about it. Boxing enables a unified view of the type system, where a value of any type can ultimately be treated as an object. The opposite of boxing is, of course, unboxing. Unboxing retrieves a reference to the value type (data fields) contained within an object. Internally, the following is what happens when a reference type is unboxed:
The following code demonstrates boxing and unboxing: From this code, can you guess how many boxing operations occur? You might be surprised to discover that the answer is three! Let's analyze the code carefully to really understand what's going on. First, an Int32 unboxed value type (v) is created and initialized to 5. Then an Object reference type (o) is created and it wants to point to v. But reference types must always point to objects in the heap, so C# generated the proper IL code to box v and stored the address of the boxed version of v in o. Now 123 is unboxed and the referenced data is copied into the unboxed value type v; this has no effect on the boxed version of v, so the boxed version keeps its value of 5. Note that this example shows how o is unboxed (which returns a pointer to the data in o), and then the data in o is memory copied to the unboxed value type v. Now, you have the call to WriteLine. WriteLine wants a String object passed to it but you don't have a String object. Instead, you have these three items: an Int32 unboxed value type (v), a string, and an Int32 reference (or boxed) type (o). These must somehow be combined to create a String. To accomplish this, the C# compiler generates code that calls the String object's static Concat method. There are several overloaded versions of Concat. All of them perform identically; the difference is in the number of parameters. Since you want to format a string from three items, the compiler chooses the following version of the Concat method:
For the first parameter, arg0, v is passed. But v is an unboxed value parameter and arg0 is an Object, so v must be boxed and the address to the boxed v is passed for arg0. For the arg1 parameter, the address of the ", " string is passed, identifying the address of a String object. Finally, for the arg2 parameter, o (a reference to an Object) was cast to an Int32. This creates a temporary Int32 value type that receives the unboxed version of the value currently referred to by o. This temporary Int32 value type must be boxed once again with the memory address being passed for Concat's arg2 parameter.
This line is identical to the previous version except that I've removed the (Int32) cast that preceded the variable o. This code is more efficient because o is already a reference type to an Object and its address may simply be passed to the Concat method. So, removing the cast saved both an unbox and a box operation.
How many boxing operations do you count in this code? The answer is one. There is only one boxing operation because there is a WriteLine method that accepts an Int32 as a parameter:
In the two calls to WriteLine, the variable v (an Int32 unboxed value type) is passed by value. Now, it may be that WriteLine will box this Int32 internally, but you have no control over that. The important thing is that you've done the best you could and have eliminated the boxing from your code. ConclusionThe concepts discussed in this column are extremely important to all .NET developers. You should really understand the difference between reference types and value types. You must also understand which operations require boxing, and if you're using a compiler that boxes value types automatically (like C# and Visual Basic®) you should also learn when compilers are going to do this and what effect it has on your code. I can't emphasize enough that a misinterpretation of these concepts can easily cause you to create subtle bugs and performance slowdowns in your program. |
|
Jeffrey Richter is the author of Programming Applications for Microsoft Windows (Microsoft Press, 1999), and cofounder of Wintellect (https://www.Wintellect.com), a software education, debugging, and consulting firm. He specializes in programming/design for .NET and Win32. Jeff is currently writing a Microsoft .NET Frameworks book, and offers .NET seminars. |
From the December 2000 issue of MSDN Magazine