[PD] puredata evolution

Wed May 30 13:56:21 CEST 2007

Charles Henry wrote:
> I think it depends on the application.... for the most part, we can't
> get a generic speedup from using multiple cores (forgive me if wrong)
> that would apply to every single pd program..... but some types of
> computations such as large ffts can be performed faster when
> distributed to different cores, in which case, the code for the fft

in the general case of this example (forking threads when appropriate to 
calculate single dsp units) you end up having to have a barrier at the end 
of each fork to collect up all the results and then pass them on to the 
next unit in the dsp chain.

now, i don't know any of the details of the pd rendering engine, but it 
would seem to me that the best way to deal with multithreading would be 
based around a particular algorithm for sorting the dsp chain. we calculate 
some kind of dependency tree with nodes at each point where two or more 
audio lines meet, then, in some kind of a reverse breadth-first traversal, 
process non-dependent edges in parallel.

say we have a patch like this:

                           [phasor~]
                           |
[sig~ 1] [osc~ 20]        [*~ 200]
|        |                |
[+~ 5]   [osc~] [sig~ 10] [osc~]
|  ______|      |  _______|
|  |            |  |
[*~]            [*~]
|  _____________|
|  |
[+~]
|\
| [*~ 2]
|/
[dac 1]

so we build an acyclic dependency graph like this, branching wherever a 
signal line splits or merges:

[dac]
|\
| 0
|/
1
|  \
2   3
|\  |\
4 5 6 7

in this case the edge 2-4 is the partial chain [sig~ 1]--[+~ 5], edge 2-5 
is the partial chain [osc~ 20]--[osc~], and so on.

so: we need a queue of partial chains, represented by edges of the 
dependency tree, and a collection of worker threads which pulls partial 
chains off the queue and processes them.

using the example above, we push edges 2-4, 2-5, 3-6, and 3-7 on to our 
process queue and process them in parallel. once both 2-4 and 2-5 have been 
processed, ie all dependencies for for node 2 have been satisfied, we can 
push 1-2 on to our process queue. once both 3-6 and 3-7 have been 
processed, we can push 1-3 on to our process queue. once both 1-2 and 1-3 
have been processed, we push [dac]-1 and 0-1 on to our queue. once 0-1 is 
done we push [dac]-0 on to our queue.

the only bits of shared memory here are the partial chain queue and the 
buffers at each node where splitting or merging has to happen.

does this make sense?

hmm. methinks this should be on pd-dev. someone care to cc? i'm not 
subscribed to pd-dev atm.

-- 
damian stewart | +44 7854 493 796 | damian at frey.co.nz
frey | live art with machines | http://www.frey.co.nz