[PD] Threading in Pd/libpd

Wed Sep 28 22:16:37 CEST 2016

Thanks Jonathan.

Also [readsf~] supports threading and so do [udpsend] and [udpreceive], for obvious reasons involving system calls.

> Can you guarantee that the revisions you've implemented generate the same output as Pd Vanilla, for all cases?

I'd rather say it does not, in all cases. At the very least there is going to be a delay involved. But, if this brings to a different behaviour, yet still deterministic, would that be bad? After all, the above mentioned objects are not deterministic themselves, yet they are widely used, with a very high success rate. And what happens if you realize your [readsf~] glitches ? You change your code so that it sends the [open( message earlier on. As the objects I am talking about ( fft~, fiddle~, sigmund~) do not rely on system calls, I expect their behaviour to be more predictable than that of, e.g.: [readsf~].
I think I'll see if I can put together a [blockthread~] object which can do something useful.

Best,
Giulio

> From: Jonathan Wilkes <jancsika at yahoo.com>
>To: Giulio Moro <giuliomoro at yahoo.it>; Pd-List <pd-list at lists.iem.at> 
>Sent: Tuesday, 27 September 2016, 18:35
>Subject: Re: [PD] Threading in Pd/libpd
> 
>
>
>> So, probably this point has been discussed previously, I'd like to know:
>> - are there any existing objects doing this already?
>
>
>There is a creation argument to [coll] in Pd-l2ork that enables threading.
>
>
>
>
>
>> - what are the pitfalls that prevented such an approach from making its way into Pd?
>
>
>The second biggest pitfall is that such an approach can easily (and subtly) break determinism.
>
>
>The biggest pitfall is overestimating the benefit of the performance gains to the detriment of 
>
>determinism.  Can you guarantee that the revisions you've implemented generate the same 
>
>output as Pd Vanilla, for all cases?
>
>
>- how can I help?
>
>
>A good place to start might be regression tests for block~.  I'd especially look at cases that 
>
>use vline~ in conjunction with it, using very small delays, and make sure that you are getting 
>
>the exact same samples output using your revised objects.
>
>
>-Jonathan
>
>
>
>
>
>
>
>>>>________________________________
>>From: Giulio Moro <giuliomoro at yahoo.it>
>>To: Pd-List <pd-list at lists.iem.at> 
>>Sent: Sunday, 18 September 2016, 2:23
>>Subject: Threading in Pd/libpd
>>
>>
>>
>>Hi all,
>>if I understand correctly, using the [block~] and [switch~] objects to increase the blocksize for a given subpatch, means that the DSP computation for that subpatch is delayed until the moment when enough input samples have been collected, at which point the entire DSP stack for the subpatch is performed at once and the outputs are written to the output buffer.
>>This means that the DSP load is not spread over time, rather it is concentrated in that single audio driver callback when the buffer for that subpatch happens to be ready to be processed.
>>
>>
>>Now, if what I say makes sense, then this approach has the disadvantage that the CPU load is not spread evenly across audio callbacks, eventually causing dropouts if whatever computation takes too long in that one callback, forcing you to increase the internal buffering of Pd (``Delay'') to cope with this. At the same time, though, the CPU will be pretty much idle in all the other audio callbacks.
>>
>>
>>If we could spread the load of the expensive, but occasional, computation (say fft) over multiple audio callbacks, then the CPU load would be more even, with no spikes and there would be no need to increase Pd's internal buffering.
>>This would require to have the output of the fft available a few processing blocks after the one where it was started, while the current approach allows to have it immediately available. A fine tuning of the system would be required to understand how much this latency should be, and worst case it would be the number of overlap samples as set by [block~] (as in: if the system cannot process these blocks fast enough, then you should lower your requirements, as your system cannot provide the required throughput). Now this may seem a downside, but the actual overall roundtrip latency of the Pd subpatch would be not much larger than the one currently achievable (if at all larger), with the added advantage that the rest of Pd could work at smaller blocksizes, and with a ``Delay'' set to 0.
>>The ultimate advantage would be to have a more responsive system, in terms of I/O roundtrip for most of the patch, except those subpatches where a longer latency is anyhow imposed by the algorithm. Think for instance of having a patch processing the live the sound of an instrument, which also uses [sigmund~] to detect its pitch to apply some adaptive effect. A low roundtrip latency could be used for the processed instrument while the latency imposed by [sigmund~] would only affect e.g.: the parameters of the effect. I see how this approach may be useful in many cases.
>>Multi-core hardware would take extra advantage from this way of spreading the CPU usage.
>>
>>
>>I am in the situation where I hacked together a threaded version of [sigmund~] for use with libpd on Bela which works fine and I am wondering if it is worth going down the route of making threaded versions of all objects with similar requirements (which I really would not want to do) or I should rather try to create some higher-level objects (say [blockThread~] ) that perform the threading strategy mentioned above.
>>It may be that [pd~] could probably(?) provide the solution requested, but it seems to me there is lots of overhead associated with it, and I do not see how to easily integrate it with our use of libpd.
>>
>>
>>So, probably this point has been discussed previously, I'd like to know:
>>- are there any existing objects doing this already?
>>- what are the pitfalls that prevented such an approach from making its way into Pd?
>>- how can I help?
>>
>>
>>Best,
>>Giulio
>
>