Named Return Value Optimization in Visual C++ 2005

Here is white paper that I will soon be submitting. Feel free to leave your comments.

The Microsoft VC++ optimizing compiler is always looking for new techniques and optimizations to provide the programmer with higher performance when possible. This article will show how the compiler tries to eliminate redundant Copy constructor and Destructor calls in various situations.

Typically when a method returns an instance of an object, a temporary object is created and copied to the target object via the copy constructor. The C++ standard allows the elision of the copy constructor (even if this results in different program behavior) which has a side effect of enabling the compiler to treat both objects as one (see section 12.8. Copying class objects, paragraph 15). The VC8.0 compiler makes use of the flexibility that the standard provides and adds a new feature which is the Named Return Value Optimization (NRVO). NRVO eliminates the copy constructor and destructor of a stack based return value. This optimizes out the redundant copy constructor and destructor calls and thus improves overall performance. It is to be noted that this could lead to different behavior between optimized and non optimized programs (see optimization side effects section).

There are some cases in which the optimization will not take place (see Optimization Limitations section for samples). The more common ones are:

a) Different paths returning different named objects.

b) Multiple return paths (even if the same named object is returned on all paths) with EH states introduced.

c) The named object returned is referenced in an inline asm block.

 

Optimization Description

Here is a simple example in Figure 1 to illustrate the optimization and how it is implemented:

A MyMethod (B &var)

{

          A retVal;

          retVal.member = var.value + bar(var);

          return retVal;

}

Figure 1 – Original Code

 

The program that uses the above function may have a construct such as:

valA = MyMethod(valB);

 

That value that is returned from MyMethod is created in the memory space pointed to by ValA through the use of hidden argument. Here is what the function looks like when we expose the hidden argument and explicitly show the constructors and destructors:

A MyMethod (A &_hiddenArg, B &var)

{

          A retVal;

          retVal.A::A(); // constructor for retVal

          retVal.member = var.value + bar(var);

          _hiddenArg.A::A(retVal); // the copy constructor for A

          return;

retVal.A::~A(); // destructor for retVal

 

}

Figure 2 – Hidden argument code without NRVO (pseudo code)

 

From the above code, it is noticeable that there are some optimization opportunities available. The basic idea is to eliminate the temporary stack based value (retVal) and use the hidden argument. Consequently, this will eliminate the copy constructor and destructor of the stack based value. Here is the NRVO based optimized code:

 

A MyMethod(A &_hiddenArg, B &var)

{

          _hiddenArg.A::A();

          _hiddenArg.member = var.value + bar(var);

          Return

}

Figure 3 – Hidden argument code with NRVO (pseudo code)

 

Code Samples

Sample 1: Simple Example:

#include <stdio.h>

class RVO

{

public:

           

            RVO(){printf("I am in constructor\n");}

            RVO (const RVO& c_RVO) {printf ("I am in copy constructor\n");}

            ~RVO(){printf ("I am in destructor\n");}

            int mem_var;

};

RVO MyMethod (int i)

{

            RVO rvo;

            rvo.mem_var = i;

            return (rvo);

}

int main()

{

            RVO rvo;

            rvo=MyMethod(5);

}

Figure 4 – Sample1.cpp

 

Compiling sample1.cpp with and without NRVO turned on will yield different behavior.

Without NRVO (cl /Od sample1.cpp), the expected output would be:

  I am in constructor

I am in constructor

I am in copy constructor

I am in destructor

I am in destructor

I am in destructor

 

With NRVO (cl /O2 sample1.cpp), the expected output would be:

I am in constructor

I am in constructor

I am in destructor

I am in destructor

 

 

Sample 2: More complex sample:

#include <stdio.h>

class A {

  public:

    A() {printf ("A: I am in constructor\n");i = 1;}

   ~A() { printf ("A: I am in destructor\n"); i = 0;}

    A(const A& a) {printf ("A: I am in copy constructor\n"); i = a.i;}

    int i, x, w;

};

 class B {

  public:

    A a;

    B() { printf ("B: I am in constructor\n");}

    ~B() { printf ("B: I am in destructor\n");}

    B(const B& b) { printf ("B: I am in copy constructor\n");}

};

A MyMethod()

{

    B* b = new B();

    A a = b->a;

    delete b;

    return (a);

}

int main()

{

    A a;

    a = MyMethod();

}

Figure 5 – Sample2.cpp

 

The output without NRVO (cl /Od sample2.cpp) will look like:

A: I am in constructor

A: I am in constructor

B: I am in constructor

A: I am in copy constructor

B: I am in destructor

A: I am in destructor

A: I am in copy constructor

A: I am in destructor

A: I am in destructor

A: I am in destructor

 

While when the NRVO optimization kicks in (cl /O2 sample2.cpp), the output will be:

A: I am in constructor

A: I am in constructor

B: I am in constructor

A: I am in copy constructor

B: I am in destructor

A: I am in destructor

A: I am in destructor

A: I am in destructor

 

Optimization Limitations:

There are some cases where the optimization won’t actually kick in. Here are few samples of such limitations:

 

Sample 3: Exception Sample:

In the face of exceptions the hidden argument must be destructed within the scope of the temporary that it is replacing. To illustrate:

//RVO class is defined above in figure 4

#include <stdio.h>

RVO MyMethod (int i)

{

            RVO rvo;

            rvo.mem_var = i;

            throw "I am throwing an exception!";

            return (rvo);

}

int main()

{

            RVO rvo;

            try

            {

                        rvo=MyMethod(5);

            }

            catch (char* str)

            {

                        printf ("I caught the exception\n");

            }

}

Figure 6 – Sample3.cpp

 

Without NRVO (cl /Od /EHsc sample3.cpp), the expected output would be:

I am in constructor

I am in constructor

I am in destructor

I caught the exception

I am in destructor

 

If the “throw” gets commented out, the output will be:

I am in constructor

I am in constructor

I am in copy constructor

I am in destructor

I am in destructor

I am in destructor

 

Now, if the “throw” gets commented out and the NRVO gets triggered in, the output will look like:

I am in constructor

I am in constructor

I am in destructor

I am in destructor

 

That is to say, sample3.cpp as it is in Figure 6 will behave the same with and without NRVO.

 

Sample 4: Different Named Object Sample:

To make use of the optimization all exit paths must return the same named object. To illustrate consider sample4.cpp:

#include <stdio.h>

class RVO

{

public:

           

            RVO(){printf("I am in constructor\n");}

            RVO (const RVO& c_RVO) {printf ("I am in copy constructor\n");}

            int mem_var;

};

RVO MyMethod (int i)

{

            RVO rvo;

            rvo.mem_var = i;

                        if (rvo.mem_var == 10)

                                    return (RVO());

            return (rvo);

}

int main()

{

            RVO rvo;

            rvo=MyMethod(5);

}

Figure 7 – Sample4.cpp

The output while optimizations are enabled (cl /O2 sample4.cpp) is the same as not enabling any optimizations (cl /Od sample.cpp). The NRVO doesn’t actually take place since not all return paths return the same named object.

I am in constructor

I am in constructor

I am in copy constructor

If you change the above sample to return rvo (as shown below in figure8àsample4.cpp modified) in all exit paths, the optimization will eliminate the copy constructor:

 

#include <stdio.h>

class RVO

{

public:

           

            RVO(){printf("I am in constructor\n");}

            RVO (const RVO& c_RVO) {printf ("I am in copy constructor\n");}

            int mem_var;

};

RVO MyMethod (int i)

{

            RVO rvo;

     if (i==10)

                                    return (rvo);

                        rvo.mem_var = i;

            return (rvo);

}

int main()

{

            RVO rvo;

    rvo=MyMethod(5);

}

Figure 8 – Sample4_Modified.cpp modified to make use of NRVO

 

The output (cl /O2 Sample4_Modified.cpp) will look like:

I am in constructor

I am in constructor

 

Sample 5: EH Restriction Sample:

Figure 9 below illustrates the same sample as in figure 8 except with the addition of a destructor to the RVO class. Having multiple return paths and introducing such a destructor creates EH states in the function. Due to the complexity of the compiler’s tracking which objects needs to be destructed, it avoids the return value optimization. This is actually something that the VC 2005 will need to improve in the future.

//RVO class is defined above in figure 4

#include <stdio.h>

RVO MyMethod (int i)

{

            RVO rvo;

     if (i==10)

                                    return (rvo);

                        rvo.mem_var = i;

            return (rvo);

}

int main()

{

            RVO rvo;

            rvo=MyMethod(5);

}

Figure 8 – Sample5.cpp

 

Compiling Sample5.cpp with and without optimization will yield the same result:

I am in constructor

I am in constructor

I am in copy constructor

I am in destructor

I am in destructor

I am in destructor

To make use of NRVO, try to eliminate the multiple return points in such cases by changing MyMethod to be something like:

RVO MyMethod (int i)

{

            RVO rvo;

                        if (i!=10)

                                    rvo.mem_var = i;

                        return(rvo);

}

Sample 6: Inline asm Restriction:

Another case where the compiler avoids performing NRVO is the when the named return object is referenced in an inline asm block. To illustrate, consider the below sample (sample6.cpp):

#include <stdio.h>

//RVO class is defined above in figure 4

RVO MyMethod (int i)

{

            RVO rvo;

__asm {

                        mov eax,rvo //comment this line out for RVO to kick in

                        mov rvo,eax //comment this line out for RVO to kick in

            }

            return (rvo);

}

int main()

{

            RVO rvo;

            rvo=MyMethod(5);

}

Figure 9 – Sample6.cpp

Compiling sample6.cpp with optimization turned on (cl /O2 sample6.cpp) will still not take advantage of NRVO. That is because the object returned was actually referenced in an inline asm block. Hence the output with and without optimizations will look like:

I am in constructor

I am in constructor

I am in copy constructor

I am in destructor

I am in destructor

I am in destructor

From the output, it is clear that the elimination of the copy constructor and destructor calls did not take place. If the asm block gets commented out, such calls will get eliminated.

Optimization Side Effects:

The programmer should be aware that such optimization might affect the flow of the application. The following example illustrates such a side effect:

#include <stdio.h>

int NumConsCalls=0;

int NumCpyConsCalls=0;

class RVO

{

public:

           

            RVO(){NumConsCalls++;}

            RVO (const RVO& c_RVO) {NumCpyConsCalls++;}

};

RVO MyMethod ()

{

            RVO rvo;

            return (rvo);

}

void main()

{

           RVO rvo;

           rvo=MyMethod();

            int Division = NumConsCalls / NumCpyConsCalls;

            printf ("Constructor calls / Copy constructor calls = %d\n",Division);

}

Figure 10 – Sample7.cpp

 

Compiling Sample7.cpp with no optimizations enabled (cl /Od sample7.cpp) will yield what most users expect. The “constructor” is called twice and the “copy constructor” is called once and hence the division (2/1) yields 2.

 

Constructor calls / Copy constructor calls = 2

On the other hand, if the above code gets compiled with optimization enabled (cl /O2 sample7.cpp), The NRVO will kick in and hence the “copy constructor” call will be eliminated. Consequently, NumCpyConsCalls will be ZERO leading to a division by ZERO exception which if not handled appropriately (as in sample7.cpp) might cause the application to crash.

 

 

References:

[1] The C++ Standard Incorporating Technical Corrigendum 1

 BS ISO/IEC 14882:2003 (Second Edition)

Thanks,

  Ayman B. Shoukry

  Program Manager, VC++ Team