Reverse P/Invoke Marshaling Performance
Platform Invoke allows managed code to call unmanaged functions exported by an DLL. While the reverse P/Invoke allows managed code to pass a managed delegate as a callback to native code.
Both P/Invoke and Reverse P/Invoke requires parameter marshaling between managed code and native code. However, they may have different marshaling performance characteristics.
The example below shows three ways to marshal a string in reverse p/invoke, and the performance of each.
First the native code:
#include "stdafx.h"
typedef void (*RunTest)(LPWSTR string);
void WINAPI PerfTest(RunTest test, int length, int loop)
{
LPWSTR string = new WCHAR[length+1];
LARGE_INTEGER begin;
LARGE_INTEGER end;
for(int i=0;i<length;i++) string[i]=L'a';
string[length]=L'\0';
printf("runnning function %p %d time with string of length %d.\n", test, loop, length);
if (loop >0)
{
QueryPerformanceCounter(&begin);
for(int i=0;i<loop;i++)
{
test(string);
}
QueryPerformanceCounter(&end);
printf("Total CPU Cycle: %I64d.\n", end.QuadPart - begin.QuadPart);
printf("Average CPU Cycle: %I64d.\n", (end.QuadPart - begin.QuadPart)/loop);
}
delete[] string;
}
And the managed code:
using System;
using System.Runtime.InteropServices;
public class Test
{
public static void Main(string[] args)
{
if (args.Length != 2)
{
Console.WriteLine("Usage: rpinvoke stringlength loopcount");
return;
}
int length = Int32.Parse(args[0]);
int loop = Int32.Parse(args[1]);
Console.WriteLine("Testing MarshalAs(UnamangedType,LPWStr) with string length {0}.", length);
PerfTest(Target1, length, loop);
Console.WriteLine("Testing Marshal.PtrToStringUni with string length {0}.", length);
PerfTest(Target2, length, loop);
Console.WriteLine("Testing new string((char *) with string length {0}.", length);
PerfTest(Target3, length, loop);
}
static void Target1(string _str)
{
}
static void Target2(IntPtr _str)
{
string str = Marshal.PtrToStringUni(_str);
}
unsafe static void Target3(IntPtr _str)
{
string str = new string((char*)_str);
}
public delegate void CallBackDelegate(IntPtr str);
public delegate void CallBackDelegate2([MarshalAs(UnmanagedType.LPWStr)]string str);
[DllImport("nativedll.dll")]
internal static extern void PerfTest(CallBackDelegate callback, int length,int loop);
[DllImport("nativedll.dll")]
internal static extern void PerfTest(CallBackDelegate2 callback, int length, int loop);
}
And the result:
C:\temp>rpinvoke.exe
Usage: rpinvoke stringlength loopcount
C:\temp>rpinvoke.exe 10 1000000
Testing MarshalAs(UnamangedType,LPWStr) with string length 10.
runnning function 0034209A 1000000 time with string of length 10.
Total CPU Cycle: 3278641482.
Average CPU Cycle: 3278.
Testing Marshal.PtrToStringUni with string length 10.
runnning function 003421D2 1000000 time with string of length 10.
Total CPU Cycle: 2944844982.
Average CPU Cycle: 2944.
Testing new string((char *) with string length 10.
runnning function 003422E2 1000000 time with string of length 10.
Total CPU Cycle: 227894670.
Average CPU Cycle: 227.
C:\temp>rpinvoke.exe 100 1000000
Testing MarshalAs(UnamangedType,LPWStr) with string length 100.
runnning function 0034209A 1000000 time with string of length 100.
Total CPU Cycle: 4376874681.
Average CPU Cycle: 4376.
Testing Marshal.PtrToStringUni with string length 100.
runnning function 003421D2 1000000 time with string of length 100.
Total CPU Cycle: 4277364030.
Average CPU Cycle: 4277.
Testing new string((char *) with string length 100.
runnning function 003422E2 1000000 time with string of length 100.
Total CPU Cycle: 825768279.
Average CPU Cycle: 825.
In both cases, the first two methods have roughly the same performance. The third method is significantly faster than the first two. For small strings (10 characters) the difference is more than 10x. For larger strings (100 characters), the difference is smaller, but is still about 5x.
Comments
Anonymous
July 09, 2007
Can you explain why there is such a large difference?Anonymous
July 10, 2007
Yeah, I wondered that. So I fired up .Net Reflector. Marshal.PtrToStringUni allocates a StringBuilder, copies in the new string data, then allocates a new string from the StringBuilder. In other words, it seems to be doing twice as much work as necessary. (The further overhead may be extra GC work, but I'm speculating.) This should probably be filed as a performance bug.