[PD] threads in pd, dataflow

Sun Feb 23 05:13:49 CET 2014

On 23/02/14 08:16, Jonathan Wilkes wrote:
> On 02/21/2014 10:04 PM, Simon Wise wrote:
>> On 22/02/14 06:28, Jonathan Wilkes wrote:
>>> On 02/21/2014 06:41 AM, Simon Wise wrote:
>>
>>>> Something to really make pd parallel would involve treating fan-outs as
>>>> opportunities for the interpreter to launch each branch in a new thread,
>>>> implementing the inherent parallelism in the dataflow paradigm (e.g. in
>>>> the pd definition of fan-outs as being executed in undefined order). Here
>>>> the trigger object is used to force sequential execution where required,
>>>> just as it is now.
>>>
>>> Practically speaking, it's completely different for control than for signal
>>> domain. For signal domain fanouts there's an understanding that Pd gets
>>> stuff done when it needs to get done. In the control domain, there's even a
>>> philosophy of _never_ having fanouts at all. I don't know what the effect
>>> would be of trying to auto-parallellize a signal diagram, but I'm pretty
>>> sure trying to auto-parallellize a control diagram wouldn't make much of a
>>> dent.
>>
>> I was referring to parallelising using control fanouts only, but didn't make
>> that clear. 'No fanouts, always use triggers' is a very sensible policy to
>> avoid easily overlooked bugs when, as in pd, fanouts are just an implied
>> trigger with an undefined order.
>>
>
> [...]
>
>> Even the dsp<->gui problem would be addressed by a proper dataflow
>> implementation if it was done well. Keeping all the gui stuff in branches
>> which don't have ~ objects should result in these branches being separate
>> threads, and well implemented these would not be allowed to block ~ branches.
>
> To know whether a control branch interacts with the signal domain is to solve
> the halting problem, no?

especially not if you allow a little syntactical help from the programmer .. as 
you note here. And note the point of this is that generally the interaction with 
the dsp does not have to be in zero logical time after it is initiated, although 
often discrete sequences of interactions must be applied together in a single 
dsp timeslice.

But also consider we are already making several simplifying assumptions and 
arbitrary (sometimes confusing) decisions as we turn the graph drawn as the pd 
patch into trees in the dsp and the message domains so that we can traverse them 
separately. If we allow fan-outs as parallel branches we change one of those 
arbitrary decisions. Instead of assigning an arbitrary order and re-writing the 
fan-out as a trigger we create new independent trees which we execute via a 
scheduler that runs in a similar way to any very basic OS scheduler ... when 
data is received for that tree it is put on a queue and executed the next time 
one of the cpu threads that pd has running is free. The usual priority queue 
stuff could be implemented regarding dsp interaction, scheduling on a basic 
level is very mundane stuff ... optimisations of all sorts at this point have 
been very well studied and can get as complex as you want. Note that we already 
break cycles in the graph, so we can indeed take each branch as a separate tree. 
There are obviously interesting complications and decisions regarding cold 
inlets, however the point of this is that by using a fan out the programmer is 
indicating that the branches may be run in parallel so cold inlets with data 
coming from outside should simply be updated whenever that data arrives ... use 
a trigger to ensure it is all part of the same tree if that is not good.

>
> But you could have some kind of "seal" object that verifies the user thinks a
> subpatch or canvas is 100% pure control domain. And then Pd could take that to
> mean throw it in its own thread (and throw warnings/errors if it finds a message
> going to a signal object, or fudging with dsp in any way).
>
> It could look like a wax seal and always be at the top-left of the patch.

that's somewhat like the notion in functional languages of 'pure' functions 
compared to ones with side effects, in this context the dsp could be considered 
as a side effect, in the same way any output from a functional program is 
ultimately a side effect.

Each functional language deals with this differently, and they are useful in 
different contexts. Unless you get into seriously strange constructs like monads 
for output and remain a strictly pure language (lambda calculus is 
turing-complete after all) there is some syntactical way (like your wax seal) to 
flag non-pure objects. In pd the ~ naming convention already does this, and 
could be enforced by the interpreter.

In pd the dsp and message passing domains are dealt with quite separately, and 
if we wanted to treat the message passing domain as a parallelisable dataflow 
graph with its effect on the dsp as one of its outputs (a side effect in 
functional languages jargon) then there is a wealth of research and 
implementations in the functional area to look into as a comparison.

A very crucial point here is that separating gui from dsp so that gui 
calculations do not block dsp means allowing the gui parts notion of 'zero 
logical time' to be distinct from the dsp parts notion of that. Essentially we 
want the result of some gui calculation to be applied a dsp block or so later 
rather than miss the deadline for calculating the next dsp block.

Currently this is most easily achieved explicitly by running a completely 
different pd process for each part. In our sequential tree traversal 
implementation we make sure that there are two message domain trees being 
traversed, and that the one controlling dsp shares a system thread with the 
(main) dsp and does as little as possible that is not strictly required for the 
dsp output. The other tree can run a bit late, we do not care if its messages to 
the dsp thread are received in the same dsp block as one the calculation started in.

If we interpret the pd patch as a dataflow graph allowing parallel execution of 
fanouts then we create several trees and schedule them in a dataflow manner, 
executing them in a collection of system threads maintained by our scheduler for 
that purpose. The notion of zero logical time is then on a per tree basis, and 
the interaction with the dsp thread(s) resulting from that tree must be applied 
in one atomic interaction when it completes.

Simon