[PD] puredata evolution
damian at frey.co.nz
Wed May 30 13:56:21 CEST 2007
Charles Henry wrote:
> I think it depends on the application.... for the most part, we can't
> get a generic speedup from using multiple cores (forgive me if wrong)
> that would apply to every single pd program..... but some types of
> computations such as large ffts can be performed faster when
> distributed to different cores, in which case, the code for the fft
in the general case of this example (forking threads when appropriate to
calculate single dsp units) you end up having to have a barrier at the end
of each fork to collect up all the results and then pass them on to the
next unit in the dsp chain.
now, i don't know any of the details of the pd rendering engine, but it
would seem to me that the best way to deal with multithreading would be
based around a particular algorithm for sorting the dsp chain. we calculate
some kind of dependency tree with nodes at each point where two or more
audio lines meet, then, in some kind of a reverse breadth-first traversal,
process non-dependent edges in parallel.
say we have a patch like this:
[sig~ 1] [osc~ 20] [*~ 200]
| | |
[+~ 5] [osc~] [sig~ 10] [osc~]
| ______| | _______|
| | | |
| [*~ 2]
so we build an acyclic dependency graph like this, branching wherever a
signal line splits or merges:
4 5 6 7
in this case the edge 2-4 is the partial chain [sig~ 1]--[+~ 5], edge 2-5
is the partial chain [osc~ 20]--[osc~], and so on.
so: we need a queue of partial chains, represented by edges of the
dependency tree, and a collection of worker threads which pulls partial
chains off the queue and processes them.
using the example above, we push edges 2-4, 2-5, 3-6, and 3-7 on to our
process queue and process them in parallel. once both 2-4 and 2-5 have been
processed, ie all dependencies for for node 2 have been satisfied, we can
push 1-2 on to our process queue. once both 3-6 and 3-7 have been
processed, we can push 1-3 on to our process queue. once both 1-2 and 1-3
have been processed, we push [dac]-1 and 0-1 on to our queue. once 0-1 is
done we push [dac]-0 on to our queue.
the only bits of shared memory here are the partial chain queue and the
buffers at each node where splitting or merging has to happen.
does this make sense?
hmm. methinks this should be on pd-dev. someone care to cc? i'm not
subscribed to pd-dev atm.
damian stewart | +44 7854 493 796 | damian at frey.co.nz
frey | live art with machines | http://www.frey.co.nz
More information about the Pd-list