Optimization Note (C++) 1: push, pop, call _chkstk
I was looking at assembly code trying to improve an important performance scenario when I found a strange call to _chkstk
011E100E push 10h
011E1010 pop eax
011E1011 call _chkstk (011E1810h)
011E1016 mov ecx,esp
My understanding is that _chkstk call is generatesd by C++ compiler when there are more than 4kb local variable allocation. But here the code is only allocating 16 bytes. This is important because _chkstk is not cheap:
011E1810 push ecx
011E1811 lea ecx, [esp+4]
011E1815 sub ecx,eax
011E1817 sbb eax,eax
011E1819 not eax
011E181B and ecx,eax
011E181D mov eax,esp
011E181F and eax,0FFFFF000h
011E1824 cmp ecx,eax
011E1826 jb cs20 (011E1832h)
011E1828 mov eax,ecx
011E182A pop ecx
011E182B xchg eax,esp
011E182C mov eax,dword ptr [eax]
011E182E mov dword ptr [esp],eax
011E1831 ret
After digging through layers of macros, I finally found out that this _chkstk call is generated by an _alloca call for 16-byte of storage. Here is a repro case:
int _tmain(int argc, _TCHAR* argv[]){
RECT * pRect = (RECT *) _alloca(sizeof(RECT));
pRect->left = 10;
return pRect->left;}
As the object is always needed, the fix is quite easy, just allocate on the stack together with other local variables (for free).