[PD] Findings regarding performance
matju at artengine.ca
Fri Dec 2 22:16:12 CET 2011
Le 2011-12-02 à 12:41:00, Charles Henry a écrit :
> You make a good point--I wasn't counting the data transfer that occurs
> between registers or the way that the compiler breaks out the steps
> involved, and of which I am mostly ignorant.
Ok, well, when you copy, there is a pipeline that goes from RAM to RAM and
goes through the CPU and they're just connected to each other. When you
multiply, there is a pipeline that goes from RAM to multiplier to RAM.
Depending how the CPU is made, RAM access could be taking turns
alternating between reading or writing, or there could be two RAM units, a
reader and a writer. I don't know how current machines are made, but
differences about this can make a theoretical difference between observing
a 3/2 speed ratio and a 2/1 speed ratio between cases of [*~].
Pipelining means that the time of multiplication can be hidden by the time
of memory access, as the RAM-access counts as 1 or 2 sub-CPUs, and the
multiplier counts as 1 sub-CPU, and they all run at the same time, so, as
long as you do many things in a row to keep all parts busy, the total time
will be only a bit more than max(time of each sub-CPU) because the
instructions' times will overlap as much as they can.
There are also several sub-CPUs for programme-decoding and other stuff I
haven't talked about.
Conditional jumps mean that you have to pause the pipeline long enough to
get the result of the decision to know what the next thing to do might be.
Loop-unrolling (perf8 and such) sets up longer todo-lists to reduce the
pausing by a factor of 4 or 8 or more.
> So, using switch~ as in Roman's example involves 2 copy operations on
> the signals. Is that what we're seeing?
I don't know... maybe... I haven't looked much at d_ugen.c... and won't do
| Mathieu BOUCHARD ----- téléphone : +1.514.383.3801 ----- Montréal, QC
More information about the Pd-list