How fast is interop code?
How fast is interop code? If you’re in one kind of code and your calling another, what is the cost of the interop?
For example, .Net code can call native C++ code (like Windows APIs) and vice versa. Similarly with Foxpro and C++ code. .Net code is often referred to as Managed code because much is managed for the programmer, such as memory allocation. That leaves C++ code to be called “Unmanaged”. An easy way to interop with C++ code is to use COM (Component Object Model, or sometimes ActiveX) as glue. Whether it’s COM calling .Net or vice versa, the managed boundary is traversed twice: there and back. Similarly with Fox code calling COM code.
Fox code calling .Net code (e.g. A Visual Basic COM object is simple to create, call and debug from Excel) will have both Fox to COM and COM to .Net interop.
I want to measure raw interop performance, so I want to remove memory allocation and Unicode/String marshalling issues from the tests. I want to have a loop on one side call a very fast method on the other, so that most of the execution time is in the interop, not the loop or the method call. I want to use in-process, same thread calls, so remote procedure calls/marshalling are not being measured.
We’ll create a native C++ method that just returns consecutive integers. A simple loop in the .Net or Fox client that calls this method keeps a running total would be a good perf test.
Start with this sample ActiveX control code: Create an ActiveX control using ATL that you can use from Fox, Excel, VB6, VB.Net. You don’t need the events and methods from that sample, just the control itself.
(If you’re using VS2008, in the ATL Project wizard, select DLL and just choose “Finish”. When adding a method in Class View, make sure to choose the ITestCtrl Interface (defined in MyCtrl.IDL, not ITestCtrl VCCodeStruct defined in MyCtrl_i.h. Similarly, if you’re adding an event, make sure to choose the _ITestCtrlEvents interface under MyCtrlLib in Class View. Also, you need to run the “Implement ConnectionPoint Wizard” and change the call to “Fire_MyEvent”, see https://msdn.microsoft.com/en-us/library/9h7xedd1.aspx)
When COM code is called from VB.Net or FoxPro, the calls are not quite direct: COM is used for creating the object and initialization and there is some parameter/return value massaging required per call. Then it’s either a straight virtual function call (vTable) call to IDispatch (late bound) or IUnknown (early bound). IOW, the performance would be slower than the a direct PInvoke or DECLARE DLL call.
Let’s add a simple method RetInt with no parameters that just returns an int. Add a method to our COM Control by right clicking on the ITestCtrl interface in Class View and choosing Add->Method to start the “Add Method Wizard”
Since all COM interface method calls return HRESULTS, to return a value an additional parameter is added and marked with the RetVal attribute and is passed by ref. So make the Method Name “RetInt”, the Parameter type “LONG *”, and the Parameter Name “RetVal”. Choose the Retval checkbox. Then choose “Add” to add the param to the method.
Add another method DoSum similarly. This method will run with no interop, so we have a baseline for comparison. (It runs the loop multiple times because it goes so much faster, but the timing measurement divides out the multiple runs.)
The resulting code is added to TestCtrl.CPP. Add the implementation:
static LONG g_Int = 0;
STDMETHODIMP CTestCtrl::RetInt(LONG* RetVal)
{
*RetVal = ++g_Int; // just return consecutive integers
return S_OK;
}
// DoSum will calculate the value with no interop whatsoever
STDMETHODIMP CTestCtrl::DoSum(LONG nTimes,LONG nInternalLoopCount, DOUBLE* Retval)
{
LONGLONG nSum;
for (LONG j = 0 ; j < nInternalLoopCount ; j++) // this code runs so fast we have to do it multiple times
{
nSum = 0;
for (LONG i = 1 ; i <= nTimes ; i++)
{
nSum += i;
}
}
*Retval = (DOUBLE)nSum;
return S_OK;
}
// RetIntStatic can be called directly via PInvoke or Declare Dll
extern "C" HRESULT __declspec(dllexport) WINAPI RetIntStatic(LONG *RetVal)
{
*RetVal = ++g_Int;
return S_OK;
}
You can add more methods, like a way to reset g_Int to get more accurate results, but I don’t really care about the results, just how long it takes to get them.
Of course, you’ll want to run perf tests using optimized Release builds, so you’re not including debug asserts, etc. A really smart optimizing compiler would remove the loops in DoSum altogether!
If you have Foxpro, try running this Fox code. Notice that DoLoop can take either the Form or the Control as a parameter. There’s a RetInt method on each.
CLEAR ALL
CLEAR
MODIFY COMMAND PROGRAM() NOWAIT
_screen.FontName="Courier New" && Make font monospace, not proportional
SET DECIMALS TO 6
g_Int=0
ox=CREATEOBJECT("MyForm")
*ox.visible=1
nLoops=1000000
nInternalLoopCnt=1000
ns=SECONDS()
zObj=ox.oc && use temp var so we don't deref ox.oc in loop
r = zObj.DoSum(nLoops,nInternalLoopCnt)
?"Internal DoSum ",r,(SECONDS()-ns)/nInternalLoopCnt
?DoLoop(ox,nLoops, "With No Interop" )
?DoLoop(ox.oc,nLoops,"With COM Interop")
*Use early binding:
oy=CREATEOBJECTEx("MyCtrl.TestCtrl","","")
?DoLoop(oy,nLoops,"With COM Interop Early Bound")
*Try Declare DLL: like PInvoke
DECLARE integer _RetIntStatic@4 IN "d:\dev\vc\myctrl\release\myctrl.dll" as RetIntStatic integer @ Retval
?DoLoopStatic(nLoops,"With DeclareDLL interop")
FUNCTION DoLoop(zObj as object,nTimes as Integer, sDesc as String) as String
LOCAL nSum
nSum=0
ns=SECONDS()
FOR i = 1 TO nTimes
nSum = nSum + zObj.RetInt()
ENDFOR
RETURN sDesc+" Sum= "+TRANSFORM(nSum) + " "+ TRANSFORM(SECONDS()-ns)
RETURN
FUNCTION DoLoopStatic(nTimes as Integer, sDesc as String) as String
LOCAL nSum
nSum=0
nRetval=0
ns=SECONDS()
FOR i = 1 TO nTimes
RetIntStatic(@nRetval)
nSum = nSum + nRetval
ENDFOR
RETURN sDesc+" Sum= "+TRANSFORM(nSum) + " "+ TRANSFORM(SECONDS()-ns)
RETURN
DEFINE CLASS MyForm as Form
ADD OBJECT OC as olecontrol WITH ;
oleClass="MyCtrl.TestCtrl",;
height=200,width=300
left=200
AllowOutput=.f.
PROCEDURE RetInt as Integer
g_Int = g_Int+1
RETURN g_Int
ENDDEFINE
The DoSum call (Fox and VB) was consistent as expected: they both execute in about the same time because there is only one interop call.
I consistently saw the COM Interop loop taking about 50% longer than the non interop loop. This makes sense. The code that calls the COM object has to deal with all sorts of parameter types, marshalling, etc. The non interop did the entire calculation within Fox code.
The DoSum method has its own internal loop to do the calculation, which does NO interop of any kind in the loop, runs roughly 2000 times faster. That implies there are about 2000 times more instructions executed in the loop.
Now I want to run a similar test using VB.Net. Let’s add a new project to the ActiveX control project from above.
Choose the Solution Explorer, right click on the solution, choose Add New Project, VB->Windows Forms Application. I put my VB Project within the folder of the TestCtrl project.
Right click on the project, and choose “Set As Startup Project” so hitting F5 will start this project.
If you’re on a 64 bit OS, then make sure you target x86 (Project->Properties->Compile->Advanced Compile Options->Target CPU->x86
Add the ActiveX control to your toolbox: Right click on the toolbox, choose items\COM Components…TestCtrl class.
Now drag the control from the toolbox onto the form. Dbl Click on the form and paste in this code:
Public Class Form1
'Note the path: "..\..\..\Release\MyCtrl.dll"
<Runtime.InteropServices.DllImport( _
"..\..\..\Release\MyCtrl.dll", _
CallingConvention:=Runtime.InteropServices.CallingConvention.Winapi, _
entrypoint:="_RetIntStatic@4")> _
Friend Shared Function RetIntStatic(ByRef RetVal As Integer) As Integer
End Function
Private Sub Form1_Load(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles MyBase.Load
Dim nLoops = 1000000
Dim nInternalLoopCnt = 1000
Dim sStopWatch = Stopwatch.StartNew
Dim r = Me.AxTestCtrl1.DoSum(nLoops, nInternalLoopCnt)
Console.WriteLine("Internal DoSum Native=" + r.ToString + " " + (sStopWatch.ElapsedMilliseconds / 1000 / nInternalLoopCnt).ToString)
sStopWatch = Stopwatch.StartNew
r = Me.DoSum(nLoops, nInternalLoopCnt)
Console.WriteLine("Internal DoSum .Net =" + r.ToString + " " + (sStopWatch.ElapsedMilliseconds / 1000 / nInternalLoopCnt).ToString)
Console.WriteLine(DoLoop(Me, nLoops, "With Late bound No interop, calling local VB.Net method"))
Console.WriteLine(DoLoop(Me.AxTestCtrl1, nLoops, "With Late bound COM interop"))
Console.WriteLine(DoLoopEarlyForm(Me, nLoops, "With Early bound No interop, calling local VB.Net method"))
Console.WriteLine(DoLoopEarlyCtrl(Me.AxTestCtrl1, nLoops, "With Early bound COM interop"))
Console.WriteLine(DoLoopPInvoke(nLoops, "With PInvoke interop"))
End
End Sub
Function DoLoop(ByVal zObj As Object, ByVal nTimes As Integer, ByVal sDesc As String) As String
Dim nSum = 0L
Dim sStopWatch = Stopwatch.StartNew
For i = 1 To nTimes
nSum += zObj.RetInt
Next
Return sDesc + " Sum = " + nSum.ToString + " " + (sStopWatch.ElapsedMilliseconds / 1000).ToString()
End Function
Function DoLoopEarlyForm(ByVal zObj As Form1, ByVal nTimes As Integer, ByVal sDesc As String) As String
Dim nSum = 0L
Dim sStopWatch = Stopwatch.StartNew
For i = 1 To nTimes
nSum += zObj.RetInt
Next
Return sDesc + " Sum = " + nSum.ToString + " " + (sStopWatch.ElapsedMilliseconds / 1000).ToString()
End Function
Function DoLoopEarlyCtrl(ByVal zObj As AxMyCtrlLib.AxTestCtrl, ByVal nTimes As Integer, ByVal sDesc As String) As String
Dim nSum = 0L 'L for Long so doesn't overflow 32 bits
Dim sStopWatch = Stopwatch.StartNew
For i = 1 To nTimes
nSum += zObj.RetInt
Next
Return sDesc + " Sum = " + nSum.ToString + " " + (sStopWatch.ElapsedMilliseconds / 1000).ToString()
End Function
Function DoLoopPInvoke(ByVal nTimes As Integer, ByVal sDesc As String) As String
Dim nSum = 0L 'L for Long so doesn't overflow 32 bits
Dim sStopWatch = Stopwatch.StartNew
For i = 1 To nTimes
Dim RetVal = 0
RetIntStatic(RetVal)
nSum += RetVal
Next
Return sDesc + " Sum = " + nSum.ToString + " " + (sStopWatch.ElapsedMilliseconds / 1000).ToString()
End Function
Private Shared g_Int As Long
Public Function RetInt() As Long
g_Int += 1
Return g_Int
End Function
Function DoSum(ByVal nTimes As Long, ByVal nInternalLoopCount As Long) As Double
Dim nSum As Long
For j = 1 To nInternalLoopCount ' calculated multiple times because it's fast
nSum = 0
For i = 1 To nTimes
nSum += i
Next
Next
Return nSum
End Function
End Class
Run the code with the Output Window visible. Here, the VB code with interop ran maybe 40% slower, and several times slower than the Fox code. I realized that this was because of the late binding calls the VB code does. The VB DoLoop method takes zObj as an Object, and I invoke the RetInt method on it. That means, the VB runtime latebinder code is called to reflect on the object and see if it has a Retint method on it that can be called. Both the Form and the control have a method with this name. The latebinding was code that I didn’t want to measure, so I added some strongly typed calls that forced the calls to be early bound direct calls, which were much faster. For the Non-interop code doing the entire calculation within VB, the late bound was around 1000 times slower than the early bound, due to the late binder code. For the Interop case, the late bound was about 30 times slower than the early.
(Comparing .Net speed with native, the DoSum call (with no interop at all) in .Net was almost 3 times slower than Native, but that’s expected too: native code runs faster than managed.)
These early bound calls are several times faster than the Fox code too: they don’t have to do any parameter packing/checking.
However, even the Fox code doing early binding, Fox still has to do a lot of parameter translation between fox types and COM types.
The Fox and VB calls via PInvoke/Declare DLL were the fastest of all. They have to do the least parameter translation/packing/checking. This makes sense: the method call is declared to have N parameters of certain types, so less work needs to be done.
Using ILDasm to see the IL for the RetInt, you can see that there isn’t much code. The Fox code for RetInt, however, causes much more code to run.
.method public instance int64 RetInt() cil managed
{
// Code size 24 (0x18)
.maxstack 2
.locals init ([0] int64 RetInt)
IL_0000: nop
IL_0001: ldsfld int64 WindowsApplication1.Form1::g_Int
IL_0006: ldc.i4.1
IL_0007: conv.i8
IL_0008: add.ovf
IL_0009: stsfld int64 WindowsApplication1.Form1::g_Int
IL_000e: ldsfld int64 WindowsApplication1.Form1::g_Int
IL_0013: stloc.0
IL_0014: br.s IL_0016
IL_0016: ldloc.0
IL_0017: ret
} // end of method Form1::RetInt
Or use the debugger to see the native code in DoSum: (cdq is ConvertDoubleToQuadWord)
LONGLONG nSum=0;
for (LONG i = 1 ; i <= nTimes ; i++)
{
692B1DF1 8B C1 mov eax,ecx
692B1DF3 99 cdq
692B1DF4 03 D8 add ebx,eax
692B1DF6 13 EA adc ebp,edx
692B1DF8 8D 41 01 lea eax,[ecx+1]
692B1DFB 99 cdq
692B1DFC 03 F0 add esi,eax
692B1DFE 8B 44 24 20 mov eax,dword ptr [esp+20h]
692B1E02 13 FA adc edi,edx
692B1E04 83 C1 02 add ecx,2
692B1E07 48 dec eax
692B1E08 3B C8 cmp ecx,eax
692B1E0A 7E E5 jle CTestCtrl::DoSum+21h (692B1DF1h)
692B1E0C 3B 4C 24 20 cmp ecx,dword ptr [esp+20h]
692B1E10 7F 0B jg CTestCtrl::DoSum+4Dh (692B1E1Dh)
{
nSum += i;
692B1E12 8B C1 mov eax,ecx
692B1E14 99 cdq
692B1E15 89 44 24 10 mov dword ptr [esp+10h],eax
692B1E19 89 54 24 14 mov dword ptr [esp+14h],edx
}
This (optimized) code sums 32 bit values to a 64 bit running sum, so you can see instructions like “ADC”, which is AddWithCarry.
As an exercise, on 64 bit, create code like DoSum that natively handles 64 bit ints (or modify this code to use just 32 bits). You’ll see that the loop is trivial.
Hint: make sure you have the 64 bit tools installed.
Comments
Anonymous
May 29, 2008
But, where is the conclusion How fast is interop code? Still do not knowAnonymous
May 29, 2008
Thanks for the interesting and well thought-out post. Overall I agree with your objective and methodology on measuring pure interop overhead. But, string marshalling and memory allocations are very real costs of most interop code I've seen, so I think in all fairness they should be included, to some degree, in a real-word discussion/measurement of interop performance. Again, thanks for the analysis and post!Anonymous
May 29, 2008
The comment has been removedAnonymous
May 29, 2008
Nice Article. Good Comparison. Once I was working on a project for which ran into performance related issues due to parameter checking/packing because I was passing too many parameters and moreover I was calling MFC COM code from VB6 which was taking too much. So I converted my whole VB6 ActiveX to MFC ActiveX project to overcome the performance issue. And you wont believe the application was 73 times faster.Anonymous
May 29, 2008
The comment has been removedAnonymous
May 29, 2008
So the moral of the story is to reduce the number of calls to interop. It sure would be cool to see some examples of that strategy. Good article though. Gets the right idea across.Anonymous
May 29, 2008
Excellent build-up, but there was no crescendo. Where are your conclusions (timings)?Anonymous
May 29, 2008
If the point is to prove it's better to minimize interop and COM calls, I think you've made it. What does it mean in practical terms though? If I make one interop call to VFP from ASP.net and VFP does 100 calculations within itself (using the built-in db/cursor engine) and returns the result, couldn't this still be faster than doing the same sort of routine in managed code using a SQl backend?? in that case interop might still be the way to go?Anonymous
May 29, 2008
<script>alert('hello')</script>Anonymous
May 29, 2008
I think there are no figures published because when you register VS, agree to license terms that prohibit to one to use the product for any kind of benchmarking or performance-related comparisons of the Microsft technologies. Read the license terms :-)Anonymous
May 30, 2008
If you are using C++ and going from managed to unmanaged code I would avoid COM. That just adds an extra layer of marshalling. The VC++ compiler in VisualStudio 2005/2008 has efficient C++ .Net Interop code generation. Though, if you are using higer level languages using COM might be the only way to go.Anonymous
May 30, 2008
Any chance MS will release a code generator that automatically generates a C++ interop wrapper for a win32 call? This C++ wrapper could be included in a C# project for example. Generally, MS could make it a goal to have sucessive parts of the win32 api replaced with .NET equivalents over the next 2 years. Specalized APIs, such as device driver ones, are excluded.Anonymous
June 02, 2008
If you are concerned about performance, why use .NET in the first place ?Anonymous
June 12, 2008
Hello Calvin, I'm a VFP programmer for at least 13 years. Thanks to be there sharing your ideas. My question is, What language you're more confortable with? C#, C++, VB, etc. (to build desktop applications, of course). And if you were thinking to replace VFP (as in my case my only programming language) which one you'd suggest to shift to? Thanks, Ramses ReinosoAnonymous
June 12, 2008
Of course interop will have greater impact with the number of parameters also. So when looking for speed, sometimes it pays to serialise parameters to an XML string and only marshall the one string parameter ;o)Anonymous
June 27, 2008
Zolpidem tartrate extended-release tablets civ. Cheap zolpidem. Zolpidem online. Zolpidem fedex. Zolpidem dosing. Zolpidem eszopiclone indications. Zolpidem.Anonymous
October 28, 2008
When running VB.Net or C# code, often it’s useful to call native code. One way is using PInvoke. (forAnonymous
February 12, 2009
The comment has been removedAnonymous
February 12, 2009
See this post for ideas to get started: Create a .Net UserControl that calls a web service that acts as an ActiveX control to use in Excel, VB6, Foxpro http://blogs.msdn.com/calvin_hsia/archive/2006/07/14/665830.aspxAnonymous
June 09, 2009
PingBack from http://greenteafatburner.info/story.php?id=2393Anonymous
June 09, 2009
PingBack from http://insomniacuresite.info/story.php?id=11415