Detect Shift Overflow
This is an intellectual exercise: when shifts a 32-bit unsigned integer in C++, how to detect whether the calculation overflows efficiently?
Here is the function prototype. shl_overflow will return true if v << cl overflows (cl is between 0 and 31. And we assume that sizeof(unsigned long) == 4 and sizeof(unsigned long long) == 8).
bool shl_overflow(unsigned long v, int cl)
The most natural way to implement this function is to extend v to 64-bit integer:
bool shl_overflow(unsigned long v, int cl)
{
unsigned long long vl = v;
return (vl << cl >> 32) != 0;
}
Now, let’s dig into the assembly world. We’ll limit the discussion on x86.
mov eax, DWORD PTR _v$[esp-4]
mov ecx, DWORD PTR _cl$[esp-4]
xor edx, edx
call __allshl
xor eax, eax
or eax, edx
jne overflow
The implementation has to use three specific registers: eax, edx and ecx. And there is an expensive external function call.
If you step into __allshl in the debugger, you can find that it will use shld to shift 64-bit integer. VC provides some intrinsics which map to CPU instructions. For example, __ll_lshift will map to shld.
Because the high dword of vl is 0, we can simplify our code:
bool shl_overflow(unsigned long v, int cl)
{
unsigned long long vl = __ll_lshift(v, cl);
return (static_cast<unsigned long>(vl >> 32)) != 0;
}
The assembly looks like:
mov eax, DWORD PTR _v$[esp-4]
mov ecx, DWORD PTR _cl$[esp-4]
xor edx, edx
shld edx, eax, cl
test edx
jne overflow
Much better now.
Another approach is based on bit representation.
bool shl_overflow(unsigned long v, int cl)
{
v = _rotl(v, cl);
unsigned long index;
return _BitScanForward(&index, v) ? index >= cl : false;
}
The idea is simple. If v << cl overflows, that means the most significant cl bits of v should contains "1".
There are two ways to test that.
1. Scan v from the least significant bits to the most, and test the index against 32 – cl. However, we have to handle the case when cl = 0.
2. Rotate v cl bits left first, so the most significant cl bits will be the least significant cl bits. Then we can scan and test the index against cl directly.
Notice that, the scan may fail if v is 0. The second way is simpler and more efficient.
The assembly looks like:
mov ecx, DWORD PTR _cl$[esp-4]
mov eax, DWORD PTR _v$[esp-4]
rol eax, cl
bsf eax, eax
je notoverflow
cmp eax, ecx
jl overflow
It only uses two registers. It can also be extended to handle 64-bit shift. One drawback is an extra conditional jump (The extra jump can be replaced by "cmovz eax, ecx", but there is no way to ask the compiler to generate that)