[PD] puredata evolution

Thu May 31 02:19:03 CEST 2007

Tim Blechmann wrote:
> On Wed, 2007-05-30 at 12:13 +0200, Niklas Klügel wrote:
>   
>>> I think it depends on the application.... for the most part, we
>>>       
>> can't
>>     
>>> get a generic speedup from using multiple cores (forgive me if
>>>       
>> wrong)
>>     
>>> that would apply to every single pd program..... but some types of
>>> computations such as large ffts can be performed faster when
>>> distributed to different cores, in which case, the code for the fft
>>> has to be parallelized a priori.  Plus, the memory is tricky.  You
>>>       
>> can
>>     
>>> have a memory access bottleneck, when using a shared memory resource
>>> between multiple processors.
>>> It's definitely a problem that is worth solving, but I'm not
>>> suggesting to do anything about it soon.  It sounds like something
>>> that would require a complete top-down re-design to be successful.
>>> yikes
>>>
>>> Chuck
>>>
>>>   
>>>       
>> I once wrote such a toolset that does automatically scale up
>> with multiple threads throughout the whole network. it worked
>> by detecting cycles in the graph and splits of the signals while
>> segmenting the graph in autonomous sequential parts and essentially
>> adding some smart and lightweight locks everyhwere the signals
>> split or merged. it even reassigned threats on the lock-level to
>> "balance" the workload in the graph and preventing deadlocks.
>> the code is/was around 2.5k lines of c++ code and a bloody mess :)
>> so, i don't know much about the internals of pd but it'd be probably
>> possible. 
>>     
>
> detaching ffts (i.e. canvases with larger blocksizes than 64) should be
> rather trivial ... 
>
> distributing a synchronous dsp graph to several threads is not trivial,
> especially when it comes to a huge number of nodes. for small numbers of
> nodes the approach of jackdmp, using a dynamic dataflow scheduling, is
> probably usable, but when it comes to huge dsp graphs, the
> synchronization overhead is probably to big, so the graph would have to
> be split to parallel chunks which are then scheduled ...
>   
true, i didn't try big graphs, so i can't really say how it would behave.
it was more a fun project to see if it was doable. at that time i had
the impression that the locking and the re-assignment of threads
was quite efficient and done only on demand, if the graph
has more sequential parts than the number of created threads
; i am curious how it can be achieved in a lock-free way.

about the issues of explicitely threading parts of the graph (that came 
up in the
discussion lateron), i must say i don't get why you would want to do it.
 seeing how the numbers of cores are about
to increase, i'd say that it is contraproductive in relation to the 
technological
development of hardware and the software running on top of it lagging 
behind as well
as the steady implicit maintenance of the software involved. from my 
point of view
a graphical dataflow language has the perfect semantics to express the 
parallelisms
of a program in an intuitive way. therefore i'd say that rather than 
adding constructs
for explicit parallelism to the language that is able to express them anyhow
adding constructs for explicit serialization of a process makes more sense.
maybe i'm talking nonsense here, please correct me.

so long...
Niklas