Pure C++
Generic Programming Under .NET
Stanley B. Lippman
Contents
What Are Parameterized Types?
The Type Parameter List
Type Instantiation
Visual Studio® 2005 brings the type parameter model of generic programming to the Microsoft® .NET Framework. Parameterized types are, of course, standard fare for C++ programmers. So, for those who are unfamiliar with them, I'll give a brief introduction to generic programming in my next few columns.
The primary idea behind generic programming is the delivery of an invariant code base that supports a potentially infinite set of types. There are two general models used for generic programming: the Universal Type Container Model (UTCM) and the Type Parameter Model (TPM).
Under the UTCM, the type information associated with the object is stripped. This is simple to support because it is reductive. All objects are stored in a uniform and opaque manner. In a monolithic type system such as the Common Type System (CTS), the universal type container is Object; all CTS types are either derived directly or derived indirectly from Object. In a language like C, void* is the universal type.
Under the TPM, the binding of type information associated with the object is factored out and delayed. Values that may vary from one invocation to the next are factored out into parameters. That's why this implementation model is called parameterized types—it's more complex, but also more powerful.
For example, let's say that you're implementing the IEnumerator interface of the System::Collections namespace. It seems pretty trivial, providing two methods and a property. But providing a single implementation available to all users is incredibly hard in a strongly typed language. The intractable aspect of our implementation is flagged by "???":
interface class IEnumerator
{
property ??? Current { ??? get(); }
bool MoveNext();
void Reset();
};
The type system requires you to statically identify the type associated with the property's backing store and the get accessor return type, but of course that's not possible. There are an infinite number of potential types that users will need to enumerate over. So what do you do?
The simpler general solution is the UTCM, in which Object serves as container:
interface class IEnumerator
{
property Object^ Current { Object^ get(); }
bool MoveNext();
void Reset();
};
This provides one degree of separation. It allows you to support a potentially infinite number of types with a single, invariant code base. And for the passive storage and retrieval of objects of reference types, it works quite elegantly.
Things become somewhat less elegant as soon as you need to retrieve and manipulate the objects as concrete types. This requires a downcast back into the original object type. Unfortunately, the compiler is without the necessary type information to guarantee cast correctness and so it falls to the programmer to intervene manually with an explicit downcast, as shown in the following example:
extern void f( Object^ anyTypeWorks );
Object^ o = "a string of all things";
// no downcast ... passive storage
f( o );
// downcast ... we need to manipulate
String^ s = safe_cast<String^>( o );
For implementing collections, things become even more problematic. There is no way to statically constrain a collection to only hold objects of a single type under the universal type container model. This can only be provided at the program level, and is somewhat complicated and error prone. Moreover, because it is a program solution, it is only able to be applied during runtime. Again, you are left without compiler support.
Apart from safety and complexity, there is also a performance issue with large-scale storage and retrieval of value types under the universal type container model. These three problems are solved under type parameterization.
What Are Parameterized Types?
The type parameter model provides a second degree of separation, eliminating both downcasting and boxing and allowing for compile-time flagging of element type violations of a homogeneous container. It is a two-step solution. In the first step, you substitute a type parameter as a placeholder for the actual type, analogous to what you do with a function:
interface class IEnumerator
{
property typeParameter Current { typeParameter get(); }
bool MoveNext();
void Reset();
}
In the second step, you indicate to the compiler (the machine reader of the program) that typeParameter is a placeholder and not a real program entity. You do this through a generic signature referred to as the parameter list. In C++/CLI, this is introduced with either the generic keyword, if you choose the common language runtime (CLR) generic mechanism, or the template keyword, if you choose to use the C++ template type mechanism, as shown in Figure 1.
Figure 1 Indicating typeParameter
Generic Keyword
generic <typename typeParameter>
interface class IEnumerator
{
property typeParameter Current { typeParameter get(); }
...
};
Template Keyword
template <typename typeParameter>
interface class IEnumerator
{
property typeParameter Current { typeParameter get(); }
...
};
This results in a type-independent interface definition. Later, when a class implements IEnumerator, it must provide an actual type to bind to the typeParameter placeholder. You do this by pairing the parameterized class name with an actual type enclosed in brackets, such as IEnumerator<int>.
C++/CLI supports two parameterized type mechanisms, templates and generics, for defining parameterized reference, value, and interface classes, functions, and delegates. On the surface, at least, the definition of a parameterized generic or template type is syntactically equivalent (other than the use of the template or generic keyword). In other ways, they are remarkably different.
Consider the two stack declarations in Figure 2, the template instance (tStack) making use of the dynamic vector container provided by a CLI implementation of the Standard Template Library (STL) and the generic instance (gStack) making use of the dynamic List<T> container provided by the System::Collections::Generic namespace. Both parameterized type collection libraries are new to Visual Studio 2005.
Figure 2 Template and Generic Stack Declaration
C++/CLI Template Stack Declaration
#include <cliext/vector>
using namespace cliext;
template <class elemType>
ref class tStack
{
// the CLI vector is a reference class ...
vector<elemType> ^m_stack;
int top;
public:
tStack();
elemType pop();
void push( elemType et );
...
};
C++/CLI Generic Stack Declaration
using namespace System::Collections::Generic
generic <class elemType>
public ref class gStack {
List<elemType> ^m_stack;
int top;
public:
gStack();
elemType pop();
void push( elemType et );
...
};
You create an instance of the parameterized type by specifying an actual type within angle brackets following the class name. For example, Figure 3 shows the template stack instantiated with an integer and a string type argument in turn. Figure 4 shows the two same type arguments that are used in order to create equivalent instances of the generic stack.
Figure 4 Instantiating the Generic Stack
void demo_generic_Stack()
{
// an int value type argument...
gStack<int>^ is = gcnew gStack<int>( 10 );
for ( int ix = 0; ix < 10; ix++ )
is->push( ix*2 );
int elem_cnt = is->size();
for ( int ix = 0; ix < elem_cnt; ++ix )
Console::WriteLine( "({0}) {1}", ix+1, is->pop());
// a String^ reference type argument...
gStack<String^> ^ss = gcnew gStack<String^>( 10 );
ss->push( "Pooh" ); ss->push( "Piglet" );
ss->push( "Rabbit" ); ss->push( "Eeyore" );
elem_cnt = ss->size();
for ( int ix = 0; ix < elem_cnt; ++ix )
Console::WriteLine( "({0}) {1}", ix+1, ss->pop());
}
Figure 3 Instantiating the Template Stack
void demo_template_Stack()
{
// an int value type argument...
tStack<int>^ is = gcnew tStack<int>( 10 );
for ( int ix = 0; ix < 10; ix++ )
is->push( ix*2 );
int elem_cnt = is->size();
for ( int ix = 0; ix < elem_cnt; ++ix )
Console::WriteLine( "({0}) {1}", ix+1, is->pop());
// a String^ reference type argument...
tStack<String^> ^ss = gcnew tStack<String^>( 10 );
ss->push( "Pooh" ); ss->push( "Piglet" );
ss->push( "Rabbit" ); ss->push( "Eeyore" );
elem_cnt = ss->size();
for ( int ix = 0; ix < elem_cnt; ++ix )
Console::WriteLine( "({0}) {1}", ix+1, ss->pop());
}
The actual manipulation of the objects of the parameterized types, such as is and ss, are exactly the same as if they were objects of a nonparameterized type. One benefit of a parameterized type is that a single-source definition can spawn potentially an infinite number of specific type instances. In the example, both the generic and template stack classes support a string and an integer class out of the same parameterized class source code. There are no real constraints on using them with any and every type known to the application. As you'll see in a subsequent column, this is not the case with all parameterized types.
The generic and template definition and specific type instances are nearly equivalent here, though that isn't the case with all parameterized types, or there would be little benefit in C++/CLI supporting both mechanisms. As I go through the details of the two mechanisms in subsequent columns, you'll see additional differences. Even with these differences, it seems the better approach to integrate their presentation, highlighting the differences as I encounter them, rather than say the same thing multiple times.
The Type Parameter List
Each type parameter is prefaced with either the class or typename keyword. These keywords do not hold any platform significance—for example, class is not meant to suggest a native type nor is typename meant to suggest a Common Language Infrastructure (CLI) type. Both indicate that the name following represents a parameterized type placeholder that will be replaced by a user-specified type argument.
The reason for the two keywords is historical. In the original template specification, Stroustrup reused the existing class keyword to specify a type parameter rather than introduce a new keyword that might break existing programs. And up until the ISO-C++ standard, the use of the class keyword was the only way to declare a type parameter.
Reuse of existing keywords seems to always sow confusion. Did the use of class to indicate a type parameter limit the available type arguments to be class types rather than, say, a built-in or a pointer type? No. Then isn't the use of class in this context misleading? Well, yes. So, there was some feeling that not having introduced a new keyword caused unnecessary confusion. But that was not why the keyword typename was added.
Rather, typename was added to C++ to allow the parsing of template definitions. This topic is somewhat advanced, and I'll only briefly touch on it here. A fuller description can be found in Stroustrup's Design and Evolution of C++ (Addison-Wesley, 1994).
It is not always possible for the compiler to distinguish between type declarations and expressions in some cases. If the compiler encounters the expression Parm::name in a template definition and Parm is a template type parameter representing a class, does name refer to a type member or to a data member of Parm?
template <class Parm, class U>
Parm minus( Parm* array, U value )
{
Parm::name * p; // Is this a pointer declaration or
// a multiplication expression?
// By default treated as expression.
}
By default, this statement is treated as an expression multiplying the operands Parm::name and p. The keyword typename was introduced to allow the programmer to override this default interpretation. For example, to declare p as a pointer of type Parm::name, you would rewrite the template function as follows:
template <class Parm, class U>
Parm minus( Parm* array, U value )
{
typename Parm::name * p; // ok: pointer declaration
}
Since the keyword was now on the payroll, it was pretty much a no-brainer to fix the confusion caused by the reuse of the class keyword. Given the extensive body of existing code, books, articles, talks, and postings using the class keyword, it couldn't just be sent packing. And that is why C++ supports both keywords.
Following the class or typename keyword is an identifier that serves as a placeholder within the template or generic definition. Each identifier within the parameter list must be unique. However, across declarations both the keyword and the identifier can vary:
template <class T>
public ref class tStack;
// ok: both the keyword and identifier can vary across
// declarations of the same type
template <typename elemType>
public ref class tStack {};
The scope of the identifier persists for the extent of the type declaration. In the forward declaration of tStack, this is terminated by the semicolon, and the name is never referenced. In the actual definition, the identifier is visible both within the class definition and within each member function of the class definition defined out-of-line.
Type Instantiation
A template or generic definition specifies how individual classes or functions can be constructed given a set of one or more actual types. The time of the instantiation is one of the major differences between templates and generics. Template instantiation is done at compile time; generic instantiation is done by the CLR at run time. (I'll address these in more detail in my next column.)
The template definition serves as a diagram for the automatic generation of type-specific instances; the compiler literally plugs in the specific type arguments provided by the user. The generic definition is more like a blueprint; the runtime constructs the type-specific instances, modifying the general syntax based on whether the type argument is a reference or value type. For example, a stack class for objects of type int and a stack class for objects of type String are created automatically from the template or generic definition when you write the following:
tStack<int> ^si;
tStack<String^> ^ss;
This generation of a class from the template definition is called template instantiation—this is how it is discussed in the ISO-C++ standard. In the literature describing generics, the class generation is referred to as construction—this again follows from the distinction between a template and a blueprint. Here, I use "instantiation" to describe this process. When a stack class for objects of type String is instantiated, each occurrence of the template parameter within the generic or template definition is replaced with type String. The correctness of the definition is verified for that type.
The name of the instantiation is Stack<int> or Stack<String^>. The <int> or <String^> tokens that follow the name are called the template arguments in ISO-C++. They are referred to as type arguments in the generic literature, and in this case I will follow their example. The type arguments must be specified in a comma-separated list and bracketed by the less than and greater than tokens. The name of an instantiation must always specify the type arguments explicitly. Unlike type arguments for functions instantiations, the type arguments for class instantiations are never deduced from the context in which the class instantiation is used—what that means will be made clear in a future column on parameterized functions and function type deduction.
A class instantiation can be used by the general program wherever a nonparameterized class type can be used. Similarly, the objects of an instantiated class are declared and used exactly the same as objects of a nonparameterized class.
Finally, it is important to realize that there is no special relationship—as there is, for example, between a derived and base class—between two instances of a generic (or template) type bound to independent type arguments. In the example, the two stack instantiations represent objects of two independent classes. You could not, for example, initialize or assign one to the other without explicitly programming for that. Nor does the String instantiation of the stack have access permission to the nonpublic members of the integer instantiation of the stack.
That pretty much covers what is common between templates and generics. More interesting, of course, are the differences. I briefly mentioned one difference—templates are instantiated at compile time while generics are instantiated by the runtime. The consequences of this are actually far-reaching, and will be the topic of the next column. See you then!
Send your questions and comments to purecpp@microsoft.com.
Stanley B. Lippman is an Architect with the Visual C++ team at Microsoft. He began working on C++ with its inventor, Bjarne Stroustrup, in 1984 at Bell Laboratories. In between, he worked in feature animation at Disney and DreamWorks, was a Distinguished Consultant with JPL, and was a Software Technical Director on Fantasia 2000.