[PD] Findings regarding performance

Fri Dec 2 22:16:12 CET 2011

Le 2011-12-02 à 12:41:00, Charles Henry a écrit :

> You make a good point--I wasn't counting the data transfer that occurs
> between registers or the way that the compiler breaks out the steps
> involved, and of which I am mostly ignorant.

Ok, well, when you copy, there is a pipeline that goes from RAM to RAM and 
goes through the CPU and they're just connected to each other. When you 
multiply, there is a pipeline that goes from RAM to multiplier to RAM. 
Depending how the CPU is made, RAM access could be taking turns 
alternating between reading or writing, or there could be two RAM units, a 
reader and a writer. I don't know how current machines are made, but 
differences about this can make a theoretical difference between observing 
a 3/2 speed ratio and a 2/1 speed ratio between cases of [*~].

Pipelining means that the time of multiplication can be hidden by the time 
of memory access, as the RAM-access counts as 1 or 2 sub-CPUs, and the 
multiplier counts as 1 sub-CPU, and they all run at the same time, so, as 
long as you do many things in a row to keep all parts busy, the total time 
will be only a bit more than max(time of each sub-CPU) because the 
instructions' times will overlap as much as they can.

There are also several sub-CPUs for programme-decoding and other stuff I 
haven't talked about.

Conditional jumps mean that you have to pause the pipeline long enough to 
get the result of the decision to know what the next thing to do might be. 
Loop-unrolling (perf8 and such) sets up longer todo-lists to reduce the 
pausing by a factor of 4 or 8 or more.

> So, using switch~ as in Roman's example involves 2 copy operations on 
> the signals.  Is that what we're seeing?

I don't know... maybe... I haven't looked much at d_ugen.c... and won't do 
it now.

  ______________________________________________________________________
| Mathieu BOUCHARD ----- téléphone : +1.514.383.3801 ----- Montréal, QC