Performance Benchmarking
Performance Benchmarking
I'd like to know how efficient my module is, and whether changes I've made to try to make it more efficient have been effective. What's the best way of getting a measure of this?
Re: Performance Benchmarking
It's something that developers (and users) have been wanting for a long time. The dream is for CA to add a performance meter on a module-by-module basis but it hasn't happened yet.
You might find some useful tips in this thread. All current methods are still a bit clunky and old-school, though.
You might find some useful tips in this thread. All current methods are still a bit clunky and old-school, though.
______________________
Dome Music Technologies
Dome Music Technologies
Re: Performance Benchmarking
Grant's post is very succint.
I suspect people are sick and tired of me ranting on about the need for proper metering in VM so I'll not repeat myself.
A few basic development tips...
It's pretty tricky optimizing these days. There are so many different configurations, so one thing is to try and build a diverse beta testing team. This will provide you with solid practical feedback. Also although there are about five times more PC users than Mac users you still need to cater for both platforms.
In pure programming terms forget about the overheads of code executed just once per call to ProcessSample() as what really matter is what's going on in inner loops. Search for the deepest loops and concentrate your efforts there.
Algorithm choice really matters inside deep loops.
If you are doing anything fancy then you need to focus on the math.
Avoid memory access where possible. Don't use large look up tables when a modern CPU can calculate a function faster.
Run multiple instances of your module to amplify the load. If you can run 100 instances at once without any glitching then there's no point in worrying about optimization.
I suspect people are sick and tired of me ranting on about the need for proper metering in VM so I'll not repeat myself.
A few basic development tips...
It's pretty tricky optimizing these days. There are so many different configurations, so one thing is to try and build a diverse beta testing team. This will provide you with solid practical feedback. Also although there are about five times more PC users than Mac users you still need to cater for both platforms.
In pure programming terms forget about the overheads of code executed just once per call to ProcessSample() as what really matter is what's going on in inner loops. Search for the deepest loops and concentrate your efforts there.
Algorithm choice really matters inside deep loops.
If you are doing anything fancy then you need to focus on the math.
Avoid memory access where possible. Don't use large look up tables when a modern CPU can calculate a function faster.
Run multiple instances of your module to amplify the load. If you can run 100 instances at once without any glitching then there's no point in worrying about optimization.
Re: Performance Benchmarking
This is an interesting point. My first module, Cross Fade Grid, applies a cosine-curve crossfade between input sources; the length of the crossfade is dynamically determined (it can be modulated via a control signal), but if we know that nothing's connected to the control signal input we could precompute a table of <2000 cosine values for [0...HALF_PI] and recompute it only when the knob gets turned. But is the cost of calling Math.cos(x) once per sample high enough to make this worthwhile, compared to the cost of the array fetch? Essentially, the meat of the ProcessSample() method is this:Avoid memory access where possible. Don't use large look up tables when a modern CPU can calculate a function faster.
Code: Select all
public double processSample() {
if (crossFadeCount == 0) {
return currentInput.GetValue();
}
double oldSample = oldInput.GetValue();
double currentSample = currentInput.GetValue();
double curveAmount = Math.cos(crossFadeRadians);
double interpolated = (oldSample * curveAmount) + (currentSample * (1.0 - curveAmount));
crossFadeCount--;
crossFadeRadians += crossFadeRadiansDelta;
return interpolated;
}
An orthogonal question: the UI rendering is not particularly optimised. There's a canvas that is redrawn in its entirety on refresh, and I'm using some of the Graphics2D primitives to paint gradiant fills and suchlike. I could check the clipping rectangle and redraw only the portions of the grid inside it; I could invalidate only the parts of the canvas that have changed when an event modifies its content. I could replace the graphics primitive drawing operations with (potentially faster) bitmap blitting, or throw away the canvas altogether and represent everything using the standard UI controls. But does it matter? I'm assuming that UI painting is done on a separate schedule to sample processing, i.e. much less frequently and on a completely different thread. Is the impact big enough to make these kinds of optimisations worthwhile?
Re: Performance Benchmarking
(update: looking over on this thread, I see a much more efficient way than Math.cos of getting a nice sigmoid curve...)
Re: Performance Benchmarking
As someone fairly new to DSP, I'm also curious to know where Values.FastAtan an Values.FastTanh might come in handy?
Re: Performance Benchmarking
Your Cross Fade Grid thing looks cool, it should add something lke the Devious Machines Infiltrator sequencer functionality to VM.
Glad you found the related discussion between Reid and I.
I wouldn't worry too much about a single call to Math.cos(). Transcendentals should be avoided whenever possible and if you had to do it hundreds of times per sample then yes it would become a serious bottleneck as I think it's microcoded as a successive approximation - something like 1 - x^2 / 2! + x^4 / 4! - ... but calling it just once isn't going to have much impact. My old friend -2x^3 + 3x^2 is a better option but a single call isn't going to create significant load.
My point about LUTs was more about cache impact than GC. People worry about garbage collection but in reality it's not a big deal as modern computers have such large RAM that it hardly ever happens. Besides modern GC algorithms are far more efficient than they were in the old days. The JVM uses a stack model so most operations are self-cleaning and often don't even involve a stack as Hotspot turns many stack operations into native register ops.
In many decades I've only needed arctan once. If I recall correctly it was in some code doing fancy ray-tracing based collision detection.
Your questions about graphics efficiency are more involved so I'll address that later.
Glad you found the related discussion between Reid and I.
I wouldn't worry too much about a single call to Math.cos(). Transcendentals should be avoided whenever possible and if you had to do it hundreds of times per sample then yes it would become a serious bottleneck as I think it's microcoded as a successive approximation - something like 1 - x^2 / 2! + x^4 / 4! - ... but calling it just once isn't going to have much impact. My old friend -2x^3 + 3x^2 is a better option but a single call isn't going to create significant load.
My point about LUTs was more about cache impact than GC. People worry about garbage collection but in reality it's not a big deal as modern computers have such large RAM that it hardly ever happens. Besides modern GC algorithms are far more efficient than they were in the old days. The JVM uses a stack model so most operations are self-cleaning and often don't even involve a stack as Hotspot turns many stack operations into native register ops.
In many decades I've only needed arctan once. If I recall correctly it was in some code doing fancy ray-tracing based collision detection.
Your questions about graphics efficiency are more involved so I'll address that later.
Re: Performance Benchmarking
Small scale drawing operations are pretty efficient as the GPU will do the brunt of the work. However the CPU still needs to tell the GPU what to do and that overhead can get significant in some applications. A simple optimzation like only refreshing things when they actually change state is probably worthwhile.poetix wrote: ↑Tue Nov 29, 2022 10:31 am An orthogonal question: the UI rendering is not particularly optimised. There's a canvas that is redrawn in its entirety on refresh, and I'm using some of the Graphics2D primitives to paint gradiant fills and suchlike. I could check the clipping rectangle and redraw only the portions of the grid inside it; I could invalidate only the parts of the canvas that have changed when an event modifies its content. I could replace the graphics primitive drawing operations with (potentially faster) bitmap blitting, or throw away the canvas altogether and represent everything using the standard UI controls. But does it matter? I'm assuming that UI painting is done on a separate schedule to sample processing, i.e. much less frequently and on a completely different thread. Is the impact big enough to make these kinds of optimisations worthwhile?
I use a mix of standard UI elements and primitives. Blitting to avoid gradients probably isn't worth it as GPU shaders are so fast.
Drawing should never be done in ProcessSample() obviously. You should instead schedule drawing in a timer thread and you can adjust the refresh rate for best tradeoff.
One thing to watch out for is thread safety issues that cause parts of your display to flicker or worse still get stuck in the wrong state. Such things are difficult to test for as they only show up once in a blue moon. An example is VM's virtual keyboard getting note sticking sometimes.
Another problem is display artifacts when zoom isn't a simple ratio. I've not figured out an easy way to cure this yet.
The graphics load in your Cross Fade Grid module isn't going to be massive but things can add up if you expect people to use many instances of it. One simple option is to offer a setting that disables or greatly simplifies the graphics (in my Granular Synth module for instance there's an option in the setttings to only draw one instead of all grains).
I asked CA ages ago if we coud have access to a module's background for full graphical control (and to improve dynamic skinning) but doubt it will ever happen.
-
- Posts: 625
- Joined: Mon Nov 15, 2021 9:23 pm
Re: Performance Benchmarking
Another use for arctan is for distortion waveshaping.
Reid
Reid
Cyberwerks Heavy Industries -- viewforum.php?f=76
Re: Performance Benchmarking
Thanks for pointing that out Reid. I'd not considered arctan for distortion.
Those outer limits are nicely asymptotic so you could eaily tune the distortion just by scaling the input.
Although if all one wants is simple soft distortion the far less expensive polynomial shown in blue is a close approximation in the range [-1,1].