Probing a Hidden .NET Runtime Performance Enhancement
Jomo Fisher--Matt Warren once told me that the runtime had a performance optimization involving calling methods through an interface. If you only had a small number of implementations of a particular interface method the runtime could optimize the overhead of those calls. Coming from the C++ world where a vtable is a vtable this seemed a little odd to me. I finally got around to trying this out myself and he was right. Here's the code so you can try it for yourself:
using System;
using System.Diagnostics;
class Program {
static void Main(string[] args) {
Stopwatch sw = new Stopwatch();
sw.Start();
DoManyTimes(new Call1());
DoManyTimes(new Call2());
DoManyTimes(new Call3());
sw.Stop();
Console.WriteLine(sw.ElapsedMilliseconds);
}
interface ICall {
void Do();
}
class Call1 : ICall { public void Do() { } }
class Call2 : ICall { public void Do() { } }
class Call3 : ICall { public void Do() { } }
static void DoManyTimes(ICall ic) {
for (int i = 0; i < 100000000; ++i)
ic.Do();
}
}
On my machine this code reports values around ~2300 ms. Now, make a slight change and only use the Call1 class:
DoManyTimes(new Call1());
DoManyTimes(new Call1());
DoManyTimes(new Call1());
Now I get numbers like ~1800 ms. Generally, I observed the following:
- It doesn't seem to matter how many implementations of ICall there are. Its only whether there are many implementations of 'Do' called.
- One implementation of 'Do' performs better than two implementations. Two implementations performs better than three. After three, it doesn't seem to matter how many there are.
- Delegate calls don't have an equivalent behavior.
This posting is provided "AS IS" with no warranties, and confers no rights.
Comments
Anonymous
August 13, 2007
Is this only true of internal interfaces? If not, how is the number of implementations determined?Anonymous
August 13, 2007
PingBack from http://msdnrss.thecoderblogs.com/2007/08/13/probing-a-hidden-net-runtime-performance-enhancement/Anonymous
August 13, 2007
Jacob, I don't believe its internal only--its not a static thing. Rather, when the second or third implementation is encountered the runtime uses a more and more general-purpose strategies. I'm only guessing here.Anonymous
August 14, 2007
the fastest version that I have see was the combination of generic method and structs instead of classes using System; using System.Diagnostics; class Program { static void Main(string[] args) { Stopwatch sw = new Stopwatch(); sw.Start(); DoManyTimes(new Call1()); DoManyTimes(new Call2()); DoManyTimes(new Call3()); sw.Stop(); Console.WriteLine(sw.ElapsedMilliseconds); } interface ICall { void Do(); } struct Call1 : ICall { public void Do() { } } struct Call2 : ICall { public void Do() { } } struct Call3 : ICall { public void Do() { } } static void DoManyTimes<T>(T ic) where T : ICall{ for (int i = 0; i < 100000000; ++i) ic.Do(); } } in this version we obtain result in ~220 msAnonymous
August 14, 2007
Desco, your technique is very interesting--your timings are equivalent to what I see when calling through class methods instead of interface methods. I also notice that it doesn't make any difference how many implementations of ICall are passed through. I would say this is a good tool for any C# dev's toolbox.Anonymous
August 14, 2007
I dug a little more into your technique. Its not exactly a free lunch. You need to know the concrete type when calling DoManyTimes<T>. If you were storing your Call1, Call2, Call3 instances in ICall-typed variables you will lose the perf. For example, the following runs at normal (slow) speed: using System; using System.Diagnostics; class Program { static void Main(string[] args) { Stopwatch sw = new Stopwatch(); sw.Start(); DoManyTimes((ICall)new Call1()); DoManyTimes((ICall)new Call2()); DoManyTimes((ICall)new Call3()); sw.Stop(); Console.WriteLine(sw.ElapsedMilliseconds); } interface ICall { void Do(); } struct Call1 : ICall { public void Do() { } } struct Call2 : ICall { public void Do() { } } struct Call3 : ICall { public void Do() { } } static void DoManyTimes(ICall ic) { for (int i = 0; i < 100000000; ++i) ic.Do(); } }Anonymous
August 14, 2007
Sure. We get perfomance bonus just because
- There is no boxing occured (ECMA-334 25.7.3)
- Simple method calls can be used insteam of virtual ones But type information should be provided in compile time.
- Anonymous
September 20, 2007
The generics/struct code performs well because struct-instantiations of generic methods are specialized at runtime. In other words, the CLR will actually create three methods, as though you had typed static void DoManyTimesCall1(Call1 ic){ for (int i = 0; i < 100000000; ++i) ic.Do(); } and similarly for DoManyTimesCall2 and DoManyTimesCall3. Possibly the JIT will even inline the call to Do inside the loop. For ordinary interface dispatch on objects, performance depends on the dynamic pattern of calls, as you have seen. The runtime is tuned to respond well to a sequence of calls all on the same class of object. It would be interesting to test code in which a single interface dispatch site was fed a sequence of objects of differing types (e.g. cycling through classes Call1, Call2, and Call3).