matju at artengine.ca
Tue Oct 16 17:39:56 CEST 2007
On Tue, 16 Oct 2007, Georg Holzmann wrote:
> Of course one has to use numpy, scipy and all that ... They use quite
> optimized BLAS/LAPACK algorithms, which should be faster than pure pd -
> but I never tried the pd-numpy DSP combination.
No, the main reason for the speedup is not the use of optimised
algorithms, it's simply the fact that GCC optimises calls and loops much
more than pd does, because pd doesn't optimise at all.
every time you send a message, pd looks up the connection-list of your
outlet, to find each receiver. For each receiver, it looks up the class,
and in that class it looks up the anything-slot, and/or the float-slot or
the symbol-slot, and if you use a named method, it goes through a
linked-list of method-slots.
If you add up two arrays of 240 rows by 320 columns by 3 channels, that's
230400 additions, but because you have to manage counters and such it's
triple or quadruple or worse, in number of messages.
a simple program compiled with GCC (even without any optimisation option)
doesn't do the repetitive lookup because it was decided in advance that it
wouldn't change and GCC looked it up once and it was over.
For a patch that tries to do float operations as fast as possible, that is
still, most likely, the single biggest execution time saver, more so than
SIMD, SMP/hyperthreading, ... and I would believe that
BLAS/LAPACK-specific optimisations would be only for really special
operations that I very rarely think about using when patching in pd.
_ _ __ ___ _____ ________ _____________ _____________________ ...
| Mathieu Bouchard - tél:+1.514.383.3801, Montréal QC Canada
More information about the Pd-list