IL offset 0 vs. Native offset 0

Article
09/08/2005

Within a function, offset 0 into the native code stream corresponds to the very first native instruction in that function. Since the function is ultimately executed via native code (and not via interpreted IL), it's safe to say that native offset 0 corresponds to the very start of the function. When native-debugging, if you place a function-level breakpoint, it is placed at native offset 0.
Likewise, offset 0 into the IL stream corresponds to the very first IL instruction in that function. However, since the IL doesn't describe the prolog, IL offset 0 starts after the prolog. (There's no IL for the epilog either).
Thus a breakpoint placed at IL offset 0 will skip the prolog. In practice, this only matter if you want to debug the prolog. Since the prolog has only one exit point, IL offset 0 is always guaranteed to be hit.

Take a trivial function that compiles to IL:

 
int Add(int x, int y)
{
    int z = x+y;
    return z;
}

Here's a merged view of IL (in red), Native x86 (in normal font) and the source (in bold).
[update:] Note that this is specifically full unoptimized, debuggable code. That way nothing gets inlined, breakpoints all work, you can inspect all locals, etc. Once you enable optimizations, everything gets folded into a single add instruction (see comments for details).

 int Add(int x, int y) 
 { 
00000000  push        edi       <-- start of prolog, Native Offset 0
00000001  push        esi  
00000002  push        ebx  
00000003  push        ebp  
00000004  mov         ebx,ecx 
00000006  mov         esi,edx 
00000008  cmp         dword ptr ds:[001AA30Ch],0 
0000000f  je          00000016 
00000011  call        769AF339   <-- End of prolog  

00000016  xor         edi,edi   <-- zero out local #0
00000018  xor         ebp,ebp   <-- zero local #1

     int z = x+y; 
  IL_0000:  ldarg.0
  IL_0001:  ldarg.1
  IL_0002:  add
0000001a  lea         eax,[ebx+esi]    <-- Here's the native code for IL offset 0. 

  IL_0003:  stloc.1
0000001d  mov         ebp,eax 

    return z; 
  IL_0004:  ldloc.1
  IL_0005:  stloc.0
0000001f  mov         edi,ebp   
 } 

  IL_0006:  ldloc.0
  IL_0007:  ret
00000021  mov         eax,edi 

00000023  pop         ebp    <-- epilog and return (return value is in eax). 
00000024  pop         ebx  
00000025  pop         esi  
00000026  pop         edi  
00000027  ret

I often find this 3-way view convenient. As another pet project, I'd love to add a debugger tool window that automatically stitches these 3 views together.

Comments

Anonymous
September 08, 2005
OT: Is the JIT-compiler that bad? 20 instructions for a simple "return x+y"?
Anonymous
September 08, 2005
I understand most of that x86 assembly, but what exactly are these three lines in the prolog doing?

00000008 cmp dword ptr ds:[001AA30Ch],0
0000000f je 00000016
00000011 call 769AF339
Anonymous
September 08, 2005
I should have clarified: this is fully-debuggable code with all optimizations disabled (even the simple ones).

For example, you'll notice it refrained from inlining anything; and all the locals are still available, and it eagerly zero-initialized things, etc.

When I throw the switch and run as optimized, it folds everything. If I call it with constants, like:
int z2 = Add(5,6);
Console::WriteLine(z2);

It optimizes very nicely to:
00000058 mov ecx,0Bh
0000005d call 75AD2A98

Even with vars, it's still smart and produces this code:
int z2 = Add(x1,y1);
0000007a mov eax,dword ptr [ebp-4Ch]
0000007d add esi,eax
Console::WriteLine(z2);
0000007f mov ecx,esi
00000081 call 75AD2A98
Anonymous
September 08, 2005
Eric W - those lines are basically some instrumention at the start of the method (like a "Function-Enter hook" for the CLR). They only appear in non-optimized code.

Share via

IL offset 0 vs. Native offset 0

Comments

Additional resources