Performance Tuning a .NET Application
05.05.2009 • 23:10 • permalink • Comments (0)
I have been working on a ray tracing engine in C# and working on it is a constant battle for performance. Performance is literally the currency of ray tracing - each time your optimize your code, you get to implement some new effect. This optimization cycle has led to some interesting insights into performance tuning in .NET, and C# specifically.
If there's one thing you need to realize it's this: the more you think you know, the less you really know. You may think that some clever little hack or algorithm you make will increase performance but there is only one way to tell - measure! I cannot recount all the times which I thought I had made an optimization only to discover that performance was the same if not actually worse. How can this be? You might see a piece of code which appears to have some obvious optimization but the thing that will catch you unaware in many (and surprisingly subtle) cases is CPU cache. Cache stores the most recent data you've accessed very close to the CPU. It has this wonderful ability to make performance tuning seem a very non-deterministic job at times, when something you just did a few line
s of code earlier indirectly affects the performance of the executing code. At the level of micro optimizations, cache makes the abstraction that each instruction has a specific cost leaky, along with other features of modern CPUs such as out-of-order execution and branch prediction.
The most important aspect of performance tuning, and you'll hear this advice again and again, is that you have to measure the actual performance. You can single out different operations and measure them in a tight loop but what I find the most useful is using a profiler. This allows you to get the most realistic image of the performance of the application where different pieces of code gets to influence how the cache is populated which will cause unforeseen side effects. I use the profiler shipping with the Visual Studio Team System edition and I'm afraid I have to point you to Google for any free alternatives.
Regardless of whether you have access to a profiler, there is a point which you have to keep in mind when you use Visual Studio. There are basically three modes to run your program in during development: debug mode, release mode and outside Visual Studio. For some reason it took me a while before I realized the obvious fact that running in release mode was faster than debug mode, where a lot of optimizations are disabled and debugging code is inserted into your program. This switch alone almost doubled the performance of my ray tracer. What I didn't realize was that release mode still has some performance impacts, such as the lack of method inlining. This little fact had caused me to make some optimizations which would turn out to be redundant once the code ran outside of VS (where the performance about doubled once again). For ray tracing I have a Vector class which I use very heavily during computations and I had found that turning public properties into read-only fields to avoid the extra layer of indirection gave about 10-15% increase in performance. Do not waste your time with such optimizations which the compiler will do for you (and that without hurting the design of your application).
There is a particular downside when optimizing .NET code, an aspect which is also one of the strengths of the platform. There has been much discussion on the web regarding the speed of .NET code vs. C++ code and it turns out that in some cases .NET code runs just as fast if not faster than the native C++ code. This is due to its use of JIT compilation which allows the runtime to optimize for the specific PC as well as perform increasingly better optimizations as the framework is improved. This essentially means that you get continous free optimizations of all your .NET applications with each realease of the runtime. The reason this can be a bit of a pain is that it makes it difficult to tell how relevant older articles regarding performance tuning are. I have experimented with different performance techniques which appear to no longer have any effect, probably because the problems they were supposed to solve have been fixed. To make a concrete example, in Service Pack 1 for .NET 3.5, a lot of optimizations were introduced for structs such as inlining of functions which the compiler, previously, only did for classes. This change made any prior articles on the performance characteristics of structs, if not worthless, then at least unlikely to be accurate.
When developing a ray tracer, you sometimes need to face the reality that the precision of floating point values is finite and at a time I was experimenting with the use of doubles rather than floats for most of the operations in the engine. There was a clear performance penalty for for this (around 25% if I recall correctly), and the main reasons for this are that the memory bandwidth and cache size is effectively halved when dealing with doubles. Other factors such as the different instructions used for operating on floats and doubles affect the performance as well but here I must admit my knowledge of the subject comes up a bit short.
One of the casualties when you wage the war for performance long enough inescapably seems to be the design and readability of your code. For my ray tracer I have on numerous occasions found myself having to make the choice between good, high-level abstractions and performance. Each layer of indirection you create in your system to increase readability and maintainability has an adverse effect on performance. Since their introduction, I have been very fond of lambda expressions in C# and these are a prime example of how you can keep your code clean and succinct at the cost of performance. I admit that in this case I have been sinful and haven't acutally tested their performance but tests I have read indicated a not insignificant penalty for calling delegates compared to ordinary function calls and therefore I have tried to avoid them in the more trafficked parts of the engine.
To keep your code clean an elegant while still keeping a strong focus on performance, my main advice would be to isolate the more arcane optimizations to very specific places that other code only interacts with indirectly. Also, make sure to identify the hot parts of your code and optimize them aggresively while being more lax in other parts. For example, most of the execution time in my ray tracer somehow concerns the Vector class so this is were I have kept a lot of focus. One common thing to do with this class is retrieving the value of a specific axis so to get a bit of performance increase I turned to unsafe pointer arithmetic:
- public float this[byte axis]
- {
- get
- {
- unsafe
- {
- fixed (float* pX = &_x)
- return *(pX + axis);
- }
- }
- }
This is not the most pretty and readable code you're likely to find in any C# program but it is isolated in the Vector class and other classes remain unaware of the implementation since it does not change the interface of the class.
To round off this lengthy article, I recall a story I read somewhere about a team who weren't happy with the performance of their application. Profiling the application led them to a peice of code which took up a huge amount of the CPU time. The team really labored to optimize this code as best they could, with unrolling and inlining and the like until someone realized their folly - they had been optimizing the idle loop!