Share via


Subtleties of C# IL codegen

It must be CLR week over at The Old New Thing because it's been non-stop posts about C# lately. Raymond's last two technical posts have been about null checks and no-op instructions generated by the jitter when translating IL into machine code.  

I'll comment on both posts here, but I want to get the no-op discussion done first, because there are some subtleties here I believe that Raymond's statement that the jitter does not generate no-ops when not debugging is not entirely correct. This is not a mere nitpick -- as we'll see, whether it does so or not actually has semantic relevance in rare stress cases.

Now, I'll slap a disclaimer of my own on here: I know way more about the compiler than the jitter/debugger interaction. This is my understanding of how it works. If someone who actually works on the jitter would like to confirm my and Raymond's interpretation of what we see going on here, I'd welcome that.

Before I get into the details, let me point out that in the C# compiler, "debug info emitting on/off" and "IL optimizations on/off" are orthogonal settings. One controls whether debug info is emitted, the other controls what IL the code generator spits out. It is sensible to set them as opposites but you certainly do not have to.

With optimizations off, the C# compiler emits no-op IL instructions all over the place.  With debug info on and optimizations off, some of those no-ops will be there to be targets of breakpoints for statements or fragments of expressions which would otherwise be hard to put a breakpoint on.

The jitter then cheerfully turns IL no-ops into x86 no-ops. I suspect that it does so whether there is a debugger attached or not.

Furthermore, I have not heard that the jitter ever manufactures no-ops out of whole cloth for debugging purposes, as Raymond implies. I suspect -- but I have not verified -- that if you compile your C# program with debug info on AND optimizations on, then you'll see a lot fewer no-ops in the jitted code (and your debugging experience will be correspondingly worse). The jitter may of course generate no-ops for other purposes -- padding code out to word boundaries, etc.

Now we come to the important point: It is emphatically NOT the case that a no-op cannot affect the behaviour of a program, as many people incorrectly believe.

In C#, lock(expression) statement is a syntactic sugar for something like

temp = expression;
System.Threading.Monitor.Enter(temp);
try { statement } finally { System.Threading.Monitor.Exit(temp); }

The x86 jitter has the nice property that the code it generates guarantees that an exception is never thrown between the Enter and the try. This means that the finally always executes if the lock has been taken, which means that the locked resource is always unlocked.

That is, unless the C# compiler generates a no-op IL instruction between the Enter and the try! The jitter turns that into a no-op x86 instruction, and it is possible for another thread to cause a thread abort exception while the thread that just took the lock is in the no-op. This is a long-standing bug in C# which we will unfortunately not be fixing for C# 3.0.

If the scenario I've described happens then the finally will never be run, the lock will never be released and hey, now we're just begging for a deadlock.

That's the only situation I know of in which emitting a no-op can cause a serious semantic change in a program -- turning a working program into a deadlocking one. And that sucks.

I've been talking with some of the CLR jitter and threading guys about ways we can fix this more robustly than merely removing the no-op. I'm hoping we'll figure something out for some future version of the C# language.

As for the bit about emitting null checks: indeed, at the time of a call to an instance method, whether virtual or not, we guarantee that the object of the call is not null by throwing an exception if it is. The way this is implemented in IL is a little odd. There are two instructions we can emit: call, and callvirt. call does NOT do a null check and does a non-virtual call. callvirt does do a null check and does a virtual call if it is a virtual method, or a non-virtual call if it is not.

If you look at the IL generated for a non-virtual call on an instance method, you'll see that sometimes we generate a call, sometimes we generate a callvirt. Why? We generate the callvirt when we want to force the jitter to generate a null check. We generate a call when we know that no null check is necessary, thereby allowing the jitter to skip the null check and generate slightly faster and smaller code.

When do we know that the null check can be skipped? If you have something like (new Foo()).FooNonVirtualMethod() we know that the allocator never returns null, so we can skip the check. It's a nice, straightforward optimization, but the realization in the IL is a bit subtle.

Comments

  • Anonymous
    August 17, 2007
    The JIT does emit different code when a debugger is attached.  I don't know if that specifically has an effect on no-ops; but it wouldn't be surprising.  It would be fairly easy to see if the assembly was NGENed or attached to by the debugger after the code had been run outside the debugger.

  • Anonymous
    August 17, 2007
    >>This means that the finally always executes if the lock has been taken, >>which means that the locked resource is always unlocked. Not always, if it's in a background thread then finally might not be called if the process ends (all forground threads have ended). I found that out the hard way when I had a finally that sometimes never completed in some code I wrote :-) Great blog by the way! Regards Lee

  • Anonymous
    August 20, 2007
    i'm sure there is a legitimate reason that the Monitor.Enter(temp) can't go in the try. but i don't know what it is. so ... why not? it would seemingly make sense.

  • Anonymous
    August 20, 2007
    Mikey: because if the Monitor.Enter were in the try block and it caused an exception, the finally block would always get executed.  You can only assume that if Monitor.Enter threw an exception that it didn't lock.  If it didn't lock then there's no reason to enter the finally block to ensure that Monitor.Exit is called.  So, it's outside the try block.

  • Anonymous
    August 20, 2007
    peter: true. oops. but surely there is a reasonably sensible way to handle that in the try? (i.e. wrap it in an inner try, perhaps, [not pretty i suppose]) or check if you received the lock before exiting.

  • Anonymous
    August 20, 2007
    Mikey: sure, you can skip the whole C# lock keyword and do whatever you want with Monitor.Enter, Monitor.Exit, try and finally; adding any number of check you want. But, what would be the point?  Likely you'd want it so it's debug-only; so you've got that complexity, add that with all the other complexities can you guarantee that all instances of that code will be fault free? Keep in mind, the scenario that Eric discusses will only occur on a debug built on a multi-processor computer and two threads have to be executing the same instruction (essentially) at the same time.  That's an extremely rare occurrence.  Yes, it might happen; but I wouldn't suggest changing your code to compensate for it.  Debugging multithreaded code has many other problems.

  • Anonymous
    August 20, 2007
    > or check if you received the lock before exiting. As I said, we're talking with the CLR guys to try to do something like that. For example, we could have a version of Enter which reports whether the lock was taken, and then put the Enter in the try. Then "lock(x) statement" would translate to bool mustExit = false; bool temp = x; try{  Enter(temp, out mustExit);  statement; } finally { if (mustExit) Exit(temp); } Enter would have to set the out parameter atomically with taking out the lock. However, there are drawbacks to that approach as well.  I'll probably write a blog article about that at some point.

  • Anonymous
    August 20, 2007
    Even if you patched up the block to guarantee that the finally block would be entered, isn't there still the potential problem that the same thing could happen in the finally block? e.g.:  try { ... } finally { NOP; cleanup; } Is it guaranteed that an exception cannot be thrown between the start of the finally block and the first statement of the finally block?  If not, it seems like that would need to be patched as well.

  • Anonymous
    August 20, 2007
    Peter: "Keep in mind, the scenario that Eric discusses will only occur on a debug built on a multi-processor computer and two threads have to be executing the same instruction (essentially) at the same time." That's not how I read it.  It seemed that Eric was saying that the NOP can occur even when not in debug mode.  It also seems that the problem wasn't from two threads executing the same code, but instead, that one thread is attempting to take the lock when another thread kills it.  This is probably (hopefully) still rare, but not as rare as what you're describing.

  • Anonymous
    August 20, 2007
    Derek: to be clear, yes you can get NOPs to be emitted in release mode; but you'd have to disable optimizations.  Optimizations by default are only disabled for debug mode. To correct myself: it's not that the two threads would be executing the same instruction at the same time, it's that one thread would need to be executing the NOP after Monitor.Enter (the try block begins at the instruction after that, which may also be a nop, i.e. the next instruction is not a "try" instruction) and the other thread would have to call that thread's Abort method while the NOP instruction was being executed.  I would think that would be even more rare.

  • Anonymous
    August 20, 2007
    Eric: could the .try directive simply not include the NOP following the Monitor.Enter to solve the problem?   This is what I'm seeing in IL:    L_0011: call void [mscorlib]System.Threading.Monitor::Enter(object)    L_0016: nop    L_0017: nop    L_0018: ldc.i4.1    L_0019: stloc.1    L_001a: nop    L_001b: leave.s L_0025    L_001d: ldloc.2    L_001e: call void [mscorlib]System.Threading.Monitor::Exit(object)    L_0023: nop    L_0024: endfinally    L_0025: nop   .try L_0017 to L_001d finally handler L_001d to L_0025 What would be the debugging consequences of changing it to:?   .try L_0016 to L_001d finally handler L_001d to L_0025

  • Anonymous
    August 20, 2007
    peter: do you mean trying it to L_0018 instead of L_0016? otherwise you're including two nops. also, wouldn't the fact that you aren't including the nops mean you can no longer put a break point on the start of the try { statement? i think a nop before a try is valid and fine; it just seems that the generated Monitor.Enter should be within the try, with a 'achievedLock' boolean result from .Enter.

  • Anonymous
    August 20, 2007
    The comment has been removed

  • Anonymous
    August 20, 2007
    Eric perhaps you can help out with this? Any explanation? http://11011.net/archives/000714.html

  • Anonymous
    August 22, 2007
    Continuing the theme of Thead.Sleep is a sign of a poorly designed program , I've been meaning to

  • Anonymous
    September 02, 2007
    Hi there!  Great, but I have two questions: "...The jitter may of course generate no-ops for other purposes -- padding code out to word boundaries, etc.." Isn't that true that a code is ALWAYS padded out to word boundaries by default?? Why do we need nops for that? "...I suspect -- but I have not verified -- that if you compile your C# program with debug info on AND optimizations on..." Maybe should read "debug info off AND optimizations on"? Please correct me if I'm wrong...

  • Anonymous
    September 04, 2007
    > Isn't that true that a code is ALWAYS padded out to word boundaries by default?? No, that is not true. (Hint: you need to think about all possible architectures, not just x86.) > Why do we need nops for that? The jump instruction is faster on a 64 bit machine if it jumps to an instruction aligned on an eight byte boundary. The jitter may therefore choose to introduce nops so that frequently targeted instructions -- like loop beginnings -- are aligned to an eight byte boundary. > Maybe should read "debug info off AND optimizations on"? No, I meant "on". This echoes my earlier point that debug info vs optimization is orthogonal. Having debug info on does not change the IL codegen.

  • Anonymous
    September 06, 2007
    Eric thanks for the answers! Another portion: "...it is possible for another thread to cause a thread abort exception while the thread that just took the lock is in the no-op..."

  1. Does this mean, say, thread A has acquired the lock and is standing on the nop - but thread B at this point causes thread A to abort? Or do you see any more complicated scenarios?
  2. Why simply including Monitor.Enter in the try block wouldn't help?
  • Anonymous
    September 07, 2007
  1. That is the scenario I had in mind, yes.
  2. Because then you have the opposite problem. What if the thread abort happens before the lock is taken out? The finally will then release a lock that was never taken, which is potentially as bad as never releasing a taken lock. What we need is Enter(object obj, out bool entered). If we had such a method then we could generate object temp = expr; bool entered = false; try { Enter(temp, out entered); statement } finally { if (entered) Exit(temp); } which would have none of these problems. I am hoping that in a future version of C#/CLR we have such a method available to us.
  • Anonymous
    September 07, 2007
    The comment has been removed

  • Anonymous
    September 07, 2007
    > were you hoping that the C# compiler would then use that new Enter method for the lock keyword? Yes. > What is the likelihood of red bits being changed to accomodate that? Low. But not zero. (The implementation of that functionality already exists in the red bits, it is just not publically exposed.) > If Thread.Abort was deprecated, would a new Enter still be needed? If wishes were horses, would beggars ride?   I try to not reason from counterfactuals. It is unlikely that Thread.Abort will be deprecated, and even if it were, deprecated does not mean nonexistant. > Seems like a better idea That's the idea, yes.

  • Anonymous
    September 07, 2007
    The comment has been removed

  • Anonymous
    March 06, 2009
    A couple years ago I wrote a bit about how our codegen for the lock statement could sometimes lead to