Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
A reader wrote in recently and asked me why the non-caching version of the following code was faster than the caching version. This goes against the intuition of any longtime C\C++ developer. The reader found Mark’s blog that confirmed his findings, but wanted to know *why* it was true…
//caching example
void m(Object[] ar) {
int len = ar.Length; // cache length
for (int i = 0; i < len; i++) {
// do something with ar[i]
}
}
//non-caching example
void m(Object[] ar) {
for (int i = 0; i < ar.Length; i++) {
// do something with ar[i]
}
}
The time for a loop like this is often noticeably affected by the time it takes to do an array bounds check each time you index into the array. This is an important feature of managed code as it greatly decreases the risk of common security vulnerabilities known as buffer overflows.
But if you observe the code carefully, you will see that the checks are redundant. The loop itself starts at 0 (the low bound of the array) and stops before the end of the array. So there is no need to check those conditions again. If we recognize the pattern, the engine can check just once at the top of the loop to be sure 0 and ar.Length are within bounds of the array. This is a process known as hoisting. That is we hoist or pull up the range checks out of the loop and check them once rather than many times.
So the reason the "non-caching" sample is faster is that the engine recognizes this pattern and not the pattern for the caching example. Could we teach the JIT about the other example? Sure, and maybe we will someday, but every new special case we do increases the JIT'ing time, so we have to find a balance.
The moral here is not "learn all the JIT-tricks" as those change from release to release. The lesson here is more along the lines of measure, measure, measure when you are doing perf work -- and don't try to outsmart the system ;-)
Please check out the managed code section of Improving .NET Application Performance and Scalability book which our local perf expert contributed heavily to.
Hope that was helpful!
- Anonymous
April 23, 2005
Any reason you don't just suggest foreach() in this kind of scenario? Since, after all, foreach() over an array ends up expanding to the pattern that is recognized by the JIT upon compilation. Obviously if you need the index ya gotta stick with the for(), but that's usually the rarer case.
Just curious,
Drew - Anonymous
April 23, 2005
Thanks, it was helpful indeed. But then again, I'm a little unhappy with that.
It's a area where developers can only guess what the right thing is. In your example, what I would have considered a safe bet was wrong and the opposite was true.
In Ian Hoff's example (on channel 9), a similar call (Array.GetUpperBound) was the cause of a 25% performance hit.
Profiling each and every loop is just too much work. There has to be more consistent guidance to let us make safe 80/20 bets.
Performance Considerations on MSDN is a very good summary but it is probably outdated and way to short on JIT.
Improving Managed Code Performance says:
Avoid repetitive field or property access.
I'd like to call for enhanced FxCop rules on performance, so we could get an automated grip on the problem without having to second guess every little loop.
I understand that the JIT compiler can only perform so much optimization, so wouldn't it be actually better to leave out obscure features if they are impossible to be implemented consistently?
Why can't the C# compiler come to the rescue here? You could put const/readonly propagation and all the special knowledge into it.
During (release) compilation to IL you have (almost) all the time in the world. - Anonymous
April 24, 2005
Has this been changed in Whidbey? I've made this flawed optimization before specifically after a "measure, measure, measure" step. It wasn't flawed when I did it, as that loop construct became 39% faster. I just tried it again, and it continues to be faster. - Anonymous
April 24, 2005
Brad: Thanks for the answer. That reader you mentioned was me. ;-)
Lee: I can confirm that. The for loop that caches the length is no longer slower in Whidbey. It seems to perform pretty much identically. - Anonymous
April 25, 2005
I don't think that's convenient. Not only for developers but also other languages/compliers. It just confused me. - Anonymous
April 25, 2005
I know about that optimization, but what about ArrayLists? I use them a whole lot more than arrays. - Anonymous
May 02, 2005
Brad Abrams explains a JIT optimisation which could catch you out.
This kinda stuff is interesting,... - Anonymous
June 11, 2006
This article discusses looping performance at the lowest level (native). It dispels a few myths that have been circulating the blogosphere for quite some time and makes some general performance recommendations. - Anonymous
June 11, 2006
This article discusses looping performance at the lowest level (native). It dispels a few myths that have been circulating the blogosphere and makes some general performance recommendations. - Anonymous
June 25, 2006
[code language="C#"]for (int i = 0; i &lt; collection.Count; i++){&nbsp;&nbsp;&nbsp; //do something with... - Anonymous
June 25, 2006
I have posted an entry on
my blog on how to have an efficient for loops by storing the length
before... - Anonymous
August 10, 2006
[code language="C#"]for (int i = 0; i collection.Count; i++){ //do something with collection}[/code]I’ve - Anonymous
August 10, 2006
I have posted an entry on my blog on how to have an efficient for loops by storing the length before