Saturday, December 03, 2005 

PInvoke and IJW

One of the additions to the C++\CLI was a new method to call unmanaged code from managed code, and Microsoft guys called it IJW, which is a short form for "It Just Works".
The old method for invoking unmanaged code from inside managed code in C# was called "Platfrom invoke" or simply PInvoke.

I was very interested when I read in an article that the performance of IJW is extremely higher than that of PInvoke. The writer wrote two small programs, one was written in C# and used PInvoke to call an unmanaged function, and the other one was written in C++, and used the IJW method in calling the same unmanaged function.
But in one of the comments to the article, someone noted that there's a performance overhead in the C# code for changing a string builder into a char array. When removing this overhead by using other data type than StringBuilder, the performance of the program written in C# using PInvoke became the higher one!

I thought of trying it myself. I modified the two programs to look the same as much as possible. The input to the unmanaged method was an array of TCHAR. I used a char array in C# instead of using a StringBuilder or even a string, to make the C++ and the C# code as similar as possible. I used the QueryPerformanceCounter method to measure the elapsed time to get the most possible accuracy.
Here are the two programs- I am sorry because this blogging application doesn't have C++ or C# syntax highlighting features, so the code may be some what difficult to read.

The C++ program:


void Test (int x)
{
TCHAR Buffer[512];
for (int i=0;i < x ;i++)
GetWindowsDirectory(Buffer,511);
}

void DoTest(int x)
{
LARGE_INTEGER Initial;
if(!QueryPerformanceCounter(&Initial))
{
Console::WriteLine("Failed");
return;
}
Test (x);
LARGE_INTEGER Final;
if (QueryPerformanceCounter(&Final))
{
__int64 ElapsedTime = Final.QuadPart - Initial.QuadPart;
Console::WriteLine("It Took {0} cycles to perform {1} iterations",ElapsedTime,x);
}
else
{
Console::WriteLine("Failed");
}
}

void main(void)
{
LARGE_INTEGER Frequency;
QueryPerformanceFrequency(&Frequency);
Console::WriteLine ( "Counter frequency =" + Frequency.QuadPart);
DoTest(100);
DoTest(1000);
DoTest(100000);
DoTest(1000000);
Console::Read();
}

The C# Program:

[SuppressUnmanagedCodeSecurity]
[DllImport("Kernel32.dll")]
static extern bool QueryPerformanceCounter(out long PerformanceCount);

[SuppressUnmanagedCodeSecurity]
[DllImport("Kernel32.dll")]
static extern bool QueryPerformanceFrequency(out long PerformanceFrequency);

[SuppressUnmanagedCodeSecurity]
[DllImport("kernel32.dll", SetLastError = true, CharSet = CharSet.Auto)]
static extern uint GetWindowsDirectory([Out] char[] Buffer, uint uSize);

static void Test(int x)
{
char[] Buffer = new char[512];
for (int i = 0; i < x; i++)
{
GetWindowsDirectory(Buffer, 511);
}
}

static void DoTest(int x)
{
long Initial;
if (!QueryPerformanceCounter(out Initial))
{
Console.WriteLine("Failed");
return;
}
Test(x);
long Final;
if (QueryPerformanceCounter(out Final))
{
long ElapsedTime = Final - Initial;
Console.WriteLine("It Took {0} cycles to perform {1} iterations", ElapsedTime, x);
}
else{Console.WriteLine("Failed");
return;
}
}
static void Main()
{long PerformanceFrequency;
QueryPerformanceFrequency(out PerformanceFrequency);
Console.WriteLine("Performance Frequency = {0}", PerformanceFrequency);
DoTest(100);
DoTest(10000);
DoTest(100000);
DoTest(1000000);
Console.Read();
}

The result was that the C# program was faster with a small margin. This was strange for me, because I expected the C++ program to be faster.

I used the ILdasm tool to look at the IL of the two generated exe files. Something caught my attention: in the C++ application, the calling to the unmanaged function- GetWindowsDirectory- is done using the stdcall calling convention. This was the IL of calling the GetWindowsDirectory function:

uint32 modopt([mscorlib/*23000001*/]System.Runtime.CompilerServices.CallConvStdcall/*01000001*/) GetWindowsDirectoryW(char*, uint32) /* 06000058 */

On the other hand, In the IL of the C# program, the stdcall calling convention was not used for calling the GetWindowsDirectory method, the winapi calling convention was used instead. This is the IL of calling the GetWindowsDirectory method in the C# application:

.method /*06000001*/ private hidebysig static pinvokeimpl("Kernel32.dll" winapi) bool QueryPerformanceCounter([out] int64& PerformanceCount) cil managed preservesig

I searched the documentation, and I found that the stdcall is the default method for calling methods when using PInvoke. Of course this was not true for our case. I forced the function call to use the stdcall calling convention by explicitly specifying this to the DllImport attribute like this:

[DllImport("kernel32.dll", SetLastError = true, CharSet = CharSet.Auto, CallingConvention = CallingConvention.StdCall)]

I tested the two programs, and I found that the C++ application was the better one this time, but also with a small margin.

These are some notes to be taken into consideration when comparing between PInvoke and IJW:

  1. Theoretically, when using PInvoke, a managed data type should be marshaled to the same unmanaged type as when using IJW. This is not true for all cases. Arjun Bijanki from the Visual C++ team has noted in a discussion that some data types can be marshaled to different types when using IJW than when using PInvoke.
  2. The MSIL code generated is not the same for the PInoke as that for the IJW, even for a very similar code.
  3. Not all things mentioned in the docs are true!
  4. From the syntax point of view, the IJW method is more easy and elegant. This is a strange thing to find that there's something that can be done in C++ in a more easy and convenient way than in C#, but it is true. Microsoft has played it well this time!