The case of the inconsistent right shift results…
One of our testers just filed a bug against something I’m working on. They reported that if they compiled code which calculated: 1130149156 >> –05701653 it generated different results on 32bit and 64bit operating systems. On 32bit machines it reported 0 but on 64bit machines, it reported 0x21a.
I realized that I could produce a simple reproduction for the scenario to dig into it a bit deeper:
int _tmain(int argc, _TCHAR* argv[]) { __int64 shift = 0x435cb524; __int64 amount = 0x55; __int64 result = shift >> amount; std::cout << shift << " >> " << amount << " = " << result << std::endl; return 0; }
That’s pretty straightforward and it *does* reproduce the behavior. On x86 it reports 0 and on x64 it reports 0x21a. I can understand the x86 result (you’re shifting right more than the processor size, it shifts off the end and you get 0) but not the x64. What’s going on?
Well, for starters I asked our C language folks. I know I’m shifting by more than the processor word size (85), but the results should be the same, right?
Well no. The immediate answer I got was:
From C++ 03, 5.8/1: The behavior is undefined if the right operand is negative, or greater than or equal to the length in bits of the promoted left operand.
Ok. It’s undefined behavior. But that doesn’t really explain the difference. When in doubt, let’s go to the assembly….
000000013F5215D3 mov rax,qword ptr [amount] 000000013F5215D8 movzx ecx,al 000000013F5215DB mov rax,qword ptr [shift] 000000013F5215E0 sar rax,cl 000000013F5215E3 mov qword ptr [result],rax 000000013F5215E8 mov rdx,qword ptr [shift]
The relevant instruction is highlighted. It’s doing a shift arithmetic right of “shift” by “amount”.
What about the x86 version?
00CC14CA mov ecx,dword ptr [amount] 00CC14CD mov eax,dword ptr [shift] 00CC14D0 mov edx,dword ptr [ebp-8] 00CC14D3 call @ILT+85(__allshr) (0CC105Ah) 00CC14D8 mov dword ptr [result],eax 00CC14DB mov dword ptr [ebp-28h],edx
Now that’s interesting. The x64 version is using a processor shift function but on 32bit machines, it’s using a C runtime library function (__allshr). And the one that’s weird is the x64 version.
While I don’t have an x64 processor manual, I *do* have a 286 processor manual from back in the day (I have all sorts of stuff in my office). And in my 80286 manual, I found:
“If a shift count greater than 31 is attempted, only the bottom five bits of the shift count are used. (the iAPX 86 uses all eight bits of the shift count.)”
A co-worker gave me the current text:
The destination operand can be a register or a memory location. The count operand can be an immediate value or the CL register. The count is masked to 5 bits (or 6 bits if in 64-bit mode and REX.W is used). The count range is limited to 0 to 31 (or 63 if 64-bit mode and REX.W is used). A special opcode encoding is provided for a count of 1.
So the mystery is now solved. The shift of 0x55 only considers the low 6 bits. The low 6 bits of 0x55 is 0x15 or 21. 0x435cb524 >> 21 is 0x21a.
One could argue that this is a bug in the __allshr function on x86 but you really can’t argue with “the behavior is undefined”. Both scenarios are doing the “right thing”. That’s the beauty of the “behavior is undefined” wording. The compiler would be perfectly within spec if it decided to reformat my hard drive when it encountered this (although I’m happy it doesn’t ).
Now our feature crew just needs to figure out how best to resolve the bug.