Performance Benchmarking

poetix · Post by **poetix** » Mon Nov 28, 2022 3:28 pm

I'd like to know how efficient my module is, and whether changes I've made to try to make it more efficient have been effective. What's the best way of getting a measure of this?

utdgrant · Post by **utdgrant** » Mon Nov 28, 2022 11:31 pm

It's something that developers (and users) have been wanting for a long time. The dream is for CA to add a performance meter on a module-by-module basis but it hasn't happened yet.

You might find some useful tips in this thread. All current methods are still a bit clunky and old-school, though.

ColinP · Post by **ColinP** » Tue Nov 29, 2022 1:04 am

Grant's post is very succint.

I suspect people are sick and tired of me ranting on about the need for proper metering in VM so I'll not repeat myself.

A few basic development tips...

It's pretty tricky optimizing these days. There are so many different configurations, so one thing is to try and build a diverse beta testing team. This will provide you with solid practical feedback. Also although there are about five times more PC users than Mac users you still need to cater for both platforms.

In pure programming terms forget about the overheads of code executed just once per call to ProcessSample() as what really matter is what's going on in inner loops. Search for the deepest loops and concentrate your efforts there.

Algorithm choice really matters inside deep loops.

If you are doing anything fancy then you need to focus on the math.

Avoid memory access where possible. Don't use large look up tables when a modern CPU can calculate a function faster.

Run multiple instances of your module to amplify the load. If you can run 100 instances at once without any glitching then there's no point in worrying about optimization.

poetix · Post by **poetix** » Tue Nov 29, 2022 10:31 am

Avoid memory access where possible. Don't use large look up tables when a modern CPU can calculate a function faster.

This is an interesting point. My first module, Cross Fade Grid, applies a cosine-curve crossfade between input sources; the length of the crossfade is dynamically determined (it can be modulated via a control signal), but if we know that nothing's connected to the control signal input we could precompute a table of <2000 cosine values for [0...HALF_PI] and recompute it only when the knob gets turned. But is the cost of calling Math.cos(x) once per sample high enough to make this worthwhile, compared to the cost of the array fetch? Essentially, the meat of the ProcessSample() method is this:

Code: Select all

   public double processSample() {
      if (crossFadeCount == 0) {
         return currentInput.GetValue();
      }
      double oldSample = oldInput.GetValue();
      double currentSample = currentInput.GetValue();
      double curveAmount = Math.cos(crossFadeRadians);
      double interpolated = (oldSample * curveAmount) + (currentSample * (1.0 - curveAmount));
      crossFadeCount--;
      crossFadeRadians += crossFadeRadiansDelta;
      return interpolated;
   }

which doesn't seem too onerous. I'm guessing that the main thing you want to avoid in the main loop is allocation - keep things as far as possible to operations on Java primitives - otherwise you're going to have to pay a GC tax.

An orthogonal question: the UI rendering is not particularly optimised. There's a canvas that is redrawn in its entirety on refresh, and I'm using some of the Graphics2D primitives to paint gradiant fills and suchlike. I could check the clipping rectangle and redraw only the portions of the grid inside it; I could invalidate only the parts of the canvas that have changed when an event modifies its content. I could replace the graphics primitive drawing operations with (potentially faster) bitmap blitting, or throw away the canvas altogether and represent everything using the standard UI controls. But does it matter? I'm assuming that UI painting is done on a separate schedule to sample processing, i.e. much less frequently and on a completely different thread. Is the impact big enough to make these kinds of optimisations worthwhile?

poetix · Post by **poetix** » Tue Nov 29, 2022 10:56 am

(update: looking over on this thread, I see a much more efficient way than Math.cos of getting a nice sigmoid curve...)

poetix · Post by **poetix** » Tue Nov 29, 2022 10:58 am

As someone fairly new to DSP, I'm also curious to know where Values.FastAtan an Values.FastTanh might come in handy?

ColinP · Post by **ColinP** » Tue Nov 29, 2022 10:41 pm

Your Cross Fade Grid thing looks cool, it should add something lke the Devious Machines Infiltrator sequencer functionality to VM.

Glad you found the related discussion between Reid and I.

I wouldn't worry too much about a single call to Math.cos(). Transcendentals should be avoided whenever possible and if you had to do it hundreds of times per sample then yes it would become a serious bottleneck as I think it's microcoded as a successive approximation - something like 1 - x^2 / 2! + x^4 / 4! - ... but calling it just once isn't going to have much impact. My old friend -2x^3 + 3x^2 is a better option but a single call isn't going to create significant load.

My point about LUTs was more about cache impact than GC. People worry about garbage collection but in reality it's not a big deal as modern computers have such large RAM that it hardly ever happens. Besides modern GC algorithms are far more efficient than they were in the old days. The JVM uses a stack model so most operations are self-cleaning and often don't even involve a stack as Hotspot turns many stack operations into native register ops.

In many decades I've only needed arctan once. If I recall correctly it was in some code doing fancy ray-tracing based collision detection.

Your questions about graphics efficiency are more involved so I'll address that later.

ColinP · Post by **ColinP** » Sat Dec 03, 2022 7:00 pm

poetix wrote: ↑Tue Nov 29, 2022 10:31 am An orthogonal question: the UI rendering is not particularly optimised. There's a canvas that is redrawn in its entirety on refresh, and I'm using some of the Graphics2D primitives to paint gradiant fills and suchlike. I could check the clipping rectangle and redraw only the portions of the grid inside it; I could invalidate only the parts of the canvas that have changed when an event modifies its content. I could replace the graphics primitive drawing operations with (potentially faster) bitmap blitting, or throw away the canvas altogether and represent everything using the standard UI controls. But does it matter? I'm assuming that UI painting is done on a separate schedule to sample processing, i.e. much less frequently and on a completely different thread. Is the impact big enough to make these kinds of optimisations worthwhile?

Small scale drawing operations are pretty efficient as the GPU will do the brunt of the work. However the CPU still needs to tell the GPU what to do and that overhead can get significant in some applications. A simple optimzation like only refreshing things when they actually change state is probably worthwhile.

I use a mix of standard UI elements and primitives. Blitting to avoid gradients probably isn't worth it as GPU shaders are so fast.

Drawing should never be done in ProcessSample() obviously. You should instead schedule drawing in a timer thread and you can adjust the refresh rate for best tradeoff.

One thing to watch out for is thread safety issues that cause parts of your display to flicker or worse still get stuck in the wrong state. Such things are difficult to test for as they only show up once in a blue moon. An example is VM's virtual keyboard getting note sticking sometimes.

Another problem is display artifacts when zoom isn't a simple ratio. I've not figured out an easy way to cure this yet.

The graphics load in your Cross Fade Grid module isn't going to be massive but things can add up if you expect people to use many instances of it. One simple option is to offer a setting that disables or greatly simplifies the graphics (in my Granular Synth module for instance there's an option in the setttings to only draw one instead of all grains).

I asked CA ages ago if we coud have access to a module's background for full graphical control (and to improve dynamic skinning) but doubt it will ever happen.

UrbanCyborg · Post by **UrbanCyborg** » Sun Dec 04, 2022 10:29 am

Another use for arctan is for distortion waveshaping.

Reid

ColinP · Post by **ColinP** » Sun Dec 04, 2022 12:46 pm

UrbanCyborg wrote: ↑Sun Dec 04, 2022 10:29 am Another use for arctan is for distortion waveshaping.

Thanks for pointing that out Reid. I'd not considered arctan for distortion.

: atan.png (72.86 KiB) Viewed 2181 times

Those outer limits are nicely asymptotic so you could eaily tune the distortion just by scaling the input.

Although if all one wants is simple soft distortion the far less expensive polynomial shown in blue is a close approximation in the range [-1,1].

Cherry Audio Forums

Performance Benchmarking

Performance Benchmarking

Re: Performance Benchmarking

Re: Performance Benchmarking

Re: Performance Benchmarking

Re: Performance Benchmarking

Re: Performance Benchmarking

Re: Performance Benchmarking

Re: Performance Benchmarking

Re: Performance Benchmarking

Re: Performance Benchmarking