Share via


Bad Recursion Revisited

We have internal email lists for questions about programming languages. Here's one that came across recently that I thought illustrated a good point about language design.

An interview candidate gave the following awful implementation of the factorial function. (Recall that factorial is notated "n!", and is defined as the product of all the integers from 1 to n. 0! is defined as 1. So 4! = 4 x 3 x 2 x 1 = 24.)

If you note that n! = n x ((n-1)!) then a recursive solution comes to mind. When asked to implement the factorial function in C, an interview candidate came up with this:

int F(int x){
return (x > 1) ? (x * F(--x)) : x;
}

Now, leaving aside for the moment the fact that this badly-named function does no bounds checking on the inputs, potentially consumes the entire stack, returns a wrong or nonsensical answer for inputs less than one, and is a recursive solution to a problem that can easily be solved with a simple lookup table, it has another big problem -- it doesn't even return the correct answer for any input greater than one either! F(4) will return 6, not 24, in Microsoft C.

And yet the seemingly equivalent C#, JScript and VBScript programs return 24.

The question was "What the heck is up with that?"

This looks perfectly straightforward. If x is 4, then we evaluate the consequence of the conditional operator...

4 * F(3)
= 4 * 3 * F(2)
= 4 * 3 * 2 * F(1)
= 4 * 3 * 2 * 1
= 24

right? What is broken with C?

Page 52 of K&R has the answer.

"C, like most languages, does not specify the order in which the operands of an operator are evaluated. (The exceptions are &&, ||, ?: and ','.) For example, in a statement like x = f()+g(); f may be evaluated before g or vice versa; thus if either f or g alters a variable on which the other depends, x can depend on the order of evaluation."

The Microsoft C compiler actually evaluates the function call first, which causes x to be decremented before the multiplication. So this is actually the same as

3 * F(3)
= 3 * 2 * F(1)
= 3 * 2 * 1
= 6.

Why does the specification call out that the compiler can choose any order for subexpression evaluation? Because then the compiler can choose to optimize the order so that it consumes a small number of stack or register slots for the results.

Of course, C was written back in the dark ages -- the 1970's -- when indeed most languages were pretty ill-specified and full of terrible "gotchas" like this. JScript, VBScript, C#, and most modern languages do guarantee that functions will be evaluated in left-to-right order. In all these languages,

x = f() + g() * h();

will evaluate f, then g, then h.

As K&R notes "The moral is that writing code that depends on order of evaluations is a bad programming practice in any language." Amen to that!

Comments

  • Anonymous
    April 28, 2005
    Just FYI:

    I tried it using gcc and I got the right answers. I used cygwin's gcc version 3.3.3 on Win XP SP2:

    ------------------------SNIP--------------------------------
    #include "stdio.h"

    int F(int x)
    {
    return (x > 1) ? (x * F(--x)) : x;
    }

    int main()
    {
    printf("F(1): %dn",F(1));
    printf("F(2): %dn",F(2));
    printf("F(3): %dn",F(3));
    printf("F(4): %dn",F(4));

    return 0;
    }
    ------------------------SNIP--------------------------------
  • Anonymous
    April 28, 2005
    This also illustrates the tendency for many C/C++ programmers to try and cram as much 'functionality' into one expression.

    This is usually a pointless exercise in efficiency, at the cost of readability (and often, as in this case, correctness).

    For some reason, it's often perceived as bad form in C/C++ to write such a function like so:

    int factoral( int x) {
    if (x < 0) {
    return -1; /* error result */
    }

    if (x <= 2) {
    return( x);
    }

    return( x * factoral( x-1));
    }


    Granted - this example still doesn't deal with stack or arithmetic overflow issues and it still uses a recursive algorithm when an iterative one would be better in most respects.

    However my point is that I contend that this example is every bit as efficient as an implementation that tries to cram the whole logic into a single expression (the magic of compilers takes care of the efficiency), while being much easier to read, comprehend, and get correct.

    Why do many (most?) programmers still believe that the original example's style is still more 'elegant'?
  • Anonymous
    April 28, 2005
    The comment has been removed
  • Anonymous
    April 28, 2005
    This also highlights the fact that so many colleges/universities teach recursion in their programming courses, the classic example being Factorials and the Fibonacci number sequence.

    What they do not tell you is that recursion is something that should only be used in very rare cases.
  • Anonymous
    April 28, 2005
    "How would parens help? The PROBLEM is that the --x is evaluated first!"

    Oh well, uh... I don't know how parens would help. I think ths real solution is that I probably should read your articles more closey. :)
  • Anonymous
    April 28, 2005
    I think something is wrong with your blog. Somehow I posted in the past to a response to your comment. Is it really not using the server-side time?
  • Anonymous
    April 28, 2005
    How would parens help? The PROBLEM is that the --x is evaluated first!
  • Anonymous
    April 28, 2005
    I have heard the tendancy for programmers to love these tiny, jewel-like, over-clever expressions called "APL Syndrome", as APL particularly encourages that style of programming. However, googling that for the origin of the phrase results in an avalanche of pages about antiphospholipid syndrome and acute promyelocytic leukemia.

    It's only natural to try and find a clever and concise way to represent an algorithm. What some programmers fail to realize is that "clever" and "extremely dense" are not necessarily good qualities for code to have! They work against goals like correctness, debuggability and understandability.
  • Anonymous
    April 28, 2005
    The problem with code like xF(--x) isn't simply order-of-evaluation -- it goes much deeper. The C standard says that, between any pair of adjacent sequence points, you can modify a given object no more than once, and that if you modify it, you may read it only to determine the new value to be stored. So x=x+1 is ok, as is x++; but in xF(--x) you modify x once (with the predecrement), read it once to determine the value to be stored, and read it a second time for the multiplication. The standard says that this is undefined behaviour; a conforming compiler is permitted to do anything at all when compiling or running such code, including format your hard disk.

    Order of evaluation isn't as bad as this -- the order is merely unspecified, rather than actually undefined.

    'Sequence points' occur at: the end of a statement; after an expression evaluated for controlling an if, while, or for; a sequencing comma (though not a function-argument-separating comma); after the antecedent of a logical || or &&; after the antecedent of a ?: conditional operator; and immediately before a function call (though you don't know at what point the function is called in the evaluation of the surrounding expression).
  • Anonymous
    April 28, 2005
    I have heard the tendancy for programmers to love these tiny, jewel-like, over-clever expressions called "APL Syndrome", as APL particularly encourages that style of programming. However, googling that for the origin of the phrase results in an avalanche of pages about antiphospholipid syndrome and acute promyelocytic leukemia.

    It's only natural to try and find a clever and concise way to represent an algorithm. What some programmers fail to realize is that "clever" and "extremely dense" are not necessarily good qualities for code to have! They work against goals like correctness, debuggability and understandability.
  • Anonymous
    April 28, 2005
    Billy Bob:

    The problem is that expression evaluation in C can only be ordered by 'sequence points'. Between sequence points, the compiler is permitted to order the evaluation of sub-expressions however it likes, and having side-effects of sub-expressions modify the values of elements used elsewhere in the expression results in undefined behavior. (Wow, that's a mouthful).

    Precedence does not factor into sequence points, so in general, parens do not affect the order of evaluation of sub-expressions (except to the extent that the compiler must ensure that the final result of the expression is correct according to precedence rules).

    See the comp.lang.c FAQ, Section 3, for the gory details on this sometimes confusing subject:

    http://www.faqs.org/faqs/C-faq/faq/
  • Anonymous
    April 28, 2005
    The best part of the proposed solution is that the decrement on the x is totally unnecessary. Replacing F(--x) with F(x-1) produces the correct solution (still inefficiently, but possibly slightly less so).
  • Anonymous
    April 28, 2005
    Yikes! You have totally spooked me now, since I have always assumed that the order of evaluation was defined by the associativity of an operator [I was at least aware that there is no defined eval order for function parameters]
  • Anonymous
    April 28, 2005
    As a current undergraduate at an American University, my first CS class used Scheme in a "functional" style to get us away from any preconceived notions about computing. And we were shown a bunch of "elegant"/"clever" solutions and required to solve our problems recursively.

    I suppose this style is not good in C because of its lack of tail recursion and poor in the real world due to the messiness of real problems.
  • Anonymous
    April 28, 2005
    I agree with mikeb.

    Its bad form to think that a one liner function is more 'elegant' than a verbose one. Programmers always have to keep in mind that performance is not their one and only goal, maintainability is an equally important goal.

    Mikeb's function is so much easier to 'read' compared to the original one, where I had to mentally run through a few 'what-if-x-is-1' type of scenarios to determine which item would be evaluated.

    Recursion vs. iteration is a good debate. Programmers (especially beginners) like the elegance provided by recursive functions, since its something like a self-solving system - the solution refers to a subset of the problem (turtles all the way down?).

    A couple of crash-and-burn episodes with recursion will teach them to better analyze the more suitable approach.
  • Anonymous
    April 28, 2005
    The comment has been removed
  • Anonymous
    April 28, 2005
    Brian: replacing F(--x) with F(x-1) helps, but does not make the function entirely correct. It makes the expression well-defined as far as C goes, but the function still returns an in correct result when the input is 0.

    Phylyp: as far as precedence rules go, I've always liked the one put forth by Steve Oualline in "Practical C":

    --------------------
    There are fifteen precedence rules in C (&& comes before || comes before ?:). The practical programmer reduces these to two:

    1) Multiplication and division come before addition and subtraction.
    2) Put parentheses around everything else.
    --------------------
  • Anonymous
    April 29, 2005
    The comment has been removed
  • Anonymous
    April 30, 2005
    Brian had the best post:

    "The best part of the proposed solution is that the decrement on the x is totally unnecessary. Replacing F(--x) with F(x-1) produces the correct solution (still inefficiently, but possibly slightly less so)."

    That seems like a much clearer way to write it anyway!


    David W.
  • Anonymous
    May 01, 2005
    mikeb's precedence rules: Hear, Hear!

    I'm surprised the compiler doesn't issue some warning on expressions like x + (x++). Of course it can't catch all these problems (especially with aliasing), but at least catch the simple ones.
  • Anonymous
    May 02, 2005
    The comment has been removed
  • Anonymous
    May 02, 2005
    > The standard says that this is undefined behaviour; a conforming compiler is permitted to do anything at all when compiling or running such code, including format your hard disk.

    Rufus is correct, and indeed, the runtime can format your hard disk. If you did something like

    a[i] = ++i;

    then if the increment is done first, it might write a byte beyond where the array is valid, and that could potentially corrupt memory such that the "format disk?" boolean gets flipped accidentally.

    However, I'm not sure why it is that the compiler could format the disk when given a bad program. Surely the compiler can choose to give an error, or choose to generate bad code, or whatever, but the compiler can't itself crash and die.