Sunday, May 11, 2008

C++/CLI vs C# (4)

Finished the 2nd point proposed for analysis:

2. Study the unmanaged to managed wrapping methods and make a performance estimation for both managed languages

Testing system

Single core Celeron M 1.5 Ghz processor.
Executed the test method call 100 million times.

Tested code
A c++ dll contains a class that has a method that receives as parameter a structure pointer (code shown below). The purpose of the exercise is to wrap the C++ class into a managed class and use this class to pass a structure as parameter.

struct MyStruct
{
int SomeId;
double SomePrice;
};

class EXAMPLEUNMANAGEDDLL_API TestInterop
{
public:
TestInterop();
void PassStructIn(MyStruct* myStruct);
void PassStructIn(int* myStruct);
virtual ~TestInterop();
};

Provided also a C bridge that allows calling the class method by providing the class pointer as parameter (needed for P/Invoke calls):
extern "C" EXAMPLEUNMANAGEDDLL_API void PassStructInBridge(TestInterop* pObject, MyStruct* myStruct)

Test case results
- native C++ test: called using the C bridge the class method
for 100 million calls the time is 1127 milliseconds
- C# test: imported the C bridge function with P/Invoke, wrapped the P/Invoke functions into a managed class, created a managed version of the structure passed as parameter. (note: it is also possible to wrap the class by P/Invoking directly the mangled name. That method should have the same performance).
for 100 million calls the time is 10950 milliseconds
- C++/CLI test: tested two cases, one case when directly used the unmanaged code into the managed one using the C++ interop and another case in which made a managed wrapper (in a separate dll) that holds a pointer to the unmanaged class and translates method calls to the unmanaged code (used as struct passed as parameter the definition from the unmanaged header file and didn't create a managed version)
100 million calls with C++ interop and no bridge/wrapper take 4280 milliseconds
100 million calls with C++/CLI bridge as managed wrapper take 4031 milliseconds

Conclusions
Wrapping the unmanaged code in C++/CLI (~4000 ms) is 2.5 times faster than wrapping it into C# (~11000 ms).
This speed difference is probably given by the fact that in the C++/CLI case used the unmanaged struct to be passed as parameter and no marshaling appeared.

Looking at the test results I think wrapping OpenCascade should be made with C++/CLI if there would be many repetitive calls to the unmanaged code. I will analyze the application structure and think at the functionality areas where repetitive calls might be used.

1 comment:

Ciprian Khlud said...

The tests are biased.

The native-to-managed and managed-to-native is always made via marshaling. That checking and copy of data are time consuming.

A native-to-native call you used it, even you export via C linkage, makes no difference.

So, every time you will make a call from C# managed code, you will lost that marshaling performance.

Anyway, there is the unsafe keyword: http://www.codersource.net/csharp_unsafe_code.html
that let you not lose performance for marshaling (or not as much).

The code in C# both native-to-native (via unsafe keyword) and managed-to-managed code works at the level of STL code in C++, sometimes faster, sometimes slower, but not 2.5 times.

If you will expand your testing methodology I can make to you an unsafe code and hopefully getting the same speed.

P.S. the strong point of C# over C++ is not speed in some specific benchmark. The strong points are: Linq, XML support in a standard way, clean syntax, better strong checking compiler, etc. In a specific managed-to-native call, you may reduce the call numbers to that code, and that happened in the same way in scientific applications that use Java.