Share via


Netting C++

Resource Cleanup

Stanley B. Lippman

Code download available at:NettingC++2006_08.exe(166 KB)

In my last column, we successfully wrapped my native Text Query Language (TQL) application for processing natural language texts. That is, the application executed correctly, but I did make a number of subtle program design errors that I’d like to correct in this column. For those who haven’t incised the code in memory, Figure 1 shows how we left it.

Figure 1 Initial TQL Reference Class

#include "TextQuery.h" ref class TQL { public: // constructor creates a TextQuery object on native heap // destructor frees it ... TQL() { pTQuery = new TextQuery; } ~TQL() { delete pTQuery; } // the operations we wish to publish within .NET // these will generally be inlined by the compiler void build_up_text(){ pTQuery->build_up_text(); } void query_text() { pTQuery->query_text(); } private: // our native object through which to invoke // our application compiled in Section 1 ... TextQuery *pTQuery; }; // Our Managed TQL Wrapper Put to Work ... int main() { TQL ^tq = gcnew TQL; tq->build_up_text(); tq->query_text(); return 0; }

The first problem is pretty trivial, but it would be embarrassing should the class be made generally available because all types within a module default to internal scope. That is, they can’t be seen from outside the module. This is a good default because it prevents the global namespace of the application from being polluted by module-only types. You must explicitly declare as public all classes you want to make visible across modules:

public ref class TQL { public: // no change ... };

This also holds for native classes you compile under C++/CLI, which is the opposite of C++ visibility of native types. So if you need your native class to be public across modules, you’ll also need to label that class public. This is an extension to ISO-C++ in Visual C++® 2005.

The second problem involves cleaning up resources other than memory from the common language runtime (CLR) heap, which is garbage collected. A garbage-collected heap has deep ramifications for destructors, and that’s where we’ll start. Under the Microsoft® .NET Framework, there is no notion of a destructor, so we’ve artificially mapped destruction to .NET, as I’ll explain shortly.

First, remember that destruction is a two-phase operation. When you delete a pointer, for example, internally the compiler first checks to see that the pointer is not null. If it is, nothing else happens. Otherwise, the compiler invokes the destructor associated with the object being addressed. If that concludes without an exception, the actual heap memory is freed. For CLR types, this two-phase operation is carried out by two different components. The invocation of the destructor is carried out by the compiler. The actual freeing of the memory is always handled by the garbage collector. When you invoke a destructor in .NET, the actual memory associated with the reference object is not yet reclaimed and won’t be until the garbage collector kicks in.

So, the problem with my initial implementation is that while each TQL object will eventually be reclaimed by the garbage collector, the second phase of destruction, it will not have the associated destructor invoked, the first and (from my point of view) more important phase of destruction. Before the memory associated with an object is reclaimed by the garbage collector, an associated Finalize method, if present, is invoked. Think of this method as a kind of super-destructor since it is not tied to the program lifetime of the object. This is referred to as finalization. The timing of when or whether a Finalize method is invoked remains undefined, meaning that garbage collection exhibits nondeterministic finalization.

Nondeterministic finalization works well with dynamic memory management. When available memory becomes sufficiently scarce, the garbage collector kicks in and things pretty much just work. Under a garbage-collected environment, destructors to free memory are unnecessary.

Nondeterministic finalization does not work well, however, when an object maintains a critical resource such as a database connection, native heap memory, or a lock of some sort. In such cases, you need to release that resource as soon as possible. In the native world, that is done through the pairing of a constructor and a destructor. As soon as the lifetime of the object ends, either through the completion of the local block within which it is declared or through the unraveling of the stack because of a thrown exception, the destructor kicks in and the resource is automatically released. This approach works very well, but was unfortunately missing in the CLR world.

It soon became clear that programmers using .NET needed a canonical way of indicating that the potentially scarce resources held by objects of a type need to be quickly disposed of, and the design solution is the System::IDisposable interface with a single Dispose method that contains the clean-up code. The primary drawback of this solution is that Dispose requires an explicit invocation by the user. This is potentially error-prone and therefore a step backwards. The C# language provides a modest form of automation through a special using statement which, when followed correctly, generates the call to Dispose for the objects within the statement.

Beginning with Visual Studio® 2005, Visual C++ instead translates the class destructor into the class Dispose method. The destructor is renamed internally to the Dispose method and the reference class is automatically extended to implement the IDispose interface:

// internal transformation of destructor under C++/CLI public ref class TQL : IDisposable { ... void Dispose() { // suppress finalize method for this object // then generate the user code ... System::GC::SuppressFinalize(this); delete pTQuery; } };

When either a destructor is invoked explicitly under C++/CLI or when delete is applied to a tracking handle, the underlying Dispose method is invoked automatically. If it is a derived class, a call of the Dispose method of the base class is inserted at the close of the synthesized method.

While this transformation is a good thing, by itself it’s not good enough. First, reference objects don’t have scope constraints, so without explicitly having the programmer delete the reference object, the destructor doesn’t get invoked. Second, since the destructor now goes to Dispose rather than Finalize, the garbage collector is without any method to invoke. So, at first glance, this design change seems to have been a mistake!

It is certainly the mistake I made in the implementation. As a native programmer, I didn’t realize that a destructor under .NET isn’t a complete solution to managing objects, due to the nature of garbage collection. Therefore, I didn’t think to explicitly provide a finalization function for the garbage collector to invoke. For just one object, probably nobody would notice. In a running system spawning a new TQL object for every query session, it would be quite an embarrassment. For your convenience, I redisplay the declaration of the TQL object:

// the tq object never invokes the destructor ... // and we have not provided a finalize method ... // so the native memory held by tq is never freed ... int main() { TQL ^tq = gcnew TQL; tq->build_up_text(); tq->query_text(); return 0; }

You need to provide a finalizer, and I’ll show you how in a minute. First let’s walk through how the CLR features in Visual C++ simulate deterministic finalization—by syntactically binding a reference object to a local or class scope; each represents a deterministic lifetime. The tricky part is that .NET itself doesn’t support this, and so we had to be clever.

Visual C++ supports the declaration of an object of a reference class on the local stack or as a member of a class by declaring the object using the type name but without requiring its formal top hat (^). All uses of the object, such as invoking a member function, are done through the member selection dot (.) rather than arrow (->). At the end of the block, the associated destructor, transformed into Dispose, is invoked automatically:

// OK, this invokes our destructor ... int main() { TQL tq; tq.build_up_text(); tq.query_text(); // destructor is invoked here ... return 0; }

For those with more of a library bent to your syntax there is an auto-handle<> template that functions equivalently. (I prefer going hatless in my designs.) As with the using statement within C#, this is syntactic sugar rather than defiance of the underlying .NET constraint that all reference types must be allocated on the CLR heap. The underlying semantics remain unchanged, except that the invocation of the destructor is automatic. In effect, the destructor is again paired with constructors as an automated acquisition/release mechanism tied to an object’s lifetime.

The problem with this solution is that you can’t force programmers to use it and, therefore, you can’t safely provide a destructor without providing a finalizer as well. Here’s how to do that:

public ref class TQL { public: // constructor creates a TextQuery object on native heap TQL() { pTQuery = new TextQuery; } // destructor frees it ~TQL() { delete pTQuery; } // finalizer frees it, called by garbage collector if // destructor is not invoked ... !TQL() { delete pTQuery; } };

The ! prefix is meant to suggest the analogous tilde (~) that introduces a class destructor—that is, both post-lifetime methods have a token prefixing the name of the class. If the synthesized Finalize method occurs within a derived class, an invocation of the base class Finalize method is inserted at its end. If the destructor is explicitly invoked, the finalizer is suppressed.

There are some remaining issues, as always, with these sorts of language designs. A finalizer, in general, is inefficient. (For a great discussion of this issue, see the book CLR via C#, Second Edition, by Jeffrey Richter.) When possible, it’s preferable not to define one. But currently there is no way to restrict a class containing a destructor from always having its objects defined such that deterministic finalization is guaranteed. I can’t force users of my class to always declare local reference objects:

void f() { TQL t; // ok, guaranteed to be disposed TQL ^ht; // oops. No guarantee. Need a finalizer, then ... ... }

One possibility would be to play with the accessibility of the finalizer to indicate whether you want to allow users to invoke it. That is, placing a finalizer in a private section would indicate that you are disallowing the use of objects outside of bound scope. However, in the current extensions design, all finalizers are public, regardless of the accessibility the user specifies.

Therefore, the finalizer always seems to be necessary as a failsafe companion definition to a reference class just as, canonically, you generally need to pair a copy constructor with a copy assignment operator or an operator new with an operator delete.

The other remaining issue, admittedly trivial, is that the code for the destructor and finalizer tends to be the same and, therefore, it is vexing to some of us to have to either duplicate the code or fret over how it should canonically be designed. Michael Vanier, a programming language professor at CalTech, suggested a ~! syntax to indicate to the compiler that it should use this code to support both destructor and finalization. I really like that idea, and perhaps the ECMA C++/CLI committee will too in some future revision!

So, for a small piece of code, there were quite a few mistakes. Why? I think it’s because these issues—visibility of types, nondeterministic finalization—don’t exist in native programming and, therefore, represent leaks in our thinking about .NET. Next time we’ll look at regular expressions—in the quest to turn verbose C++/CLI into pithy Perl. Until then, may your code compile without error and execute to completion swiftly and correctly.

Send your questions and comments for Stanley to  purecpp@microsoft.com.

Stanley B. Lippman began working on C++ with its inventor, Bjarne Stroustrup, in 1984 at Bell Laboratories. Later, Stan worked in feature animation both at Disney and DreamWorks and served as a Software Technical Director on Fantasia 2000. He has since served as Distinguished Consultant with JPL, and an Architect with the Visual C++ team at Microsoft. Thanks to Jim Hogg and Michael Vanier for their help with this column.