[PD] how to iterate over left and right channel separately in one Pd class?

Sat Jan 12 22:45:40 CET 2013

It's interesting, but rather compiler-and-processor-specific. Such
code is maintanance-intensive. At the moment, ARM processors are
screaming loudest for optimization. Best thing for a community project
is probably plain C code which reckons with parallel processing,
because that won't go away for the next few decades. Functions like
copy_perform8(), times_perform8() etc. can profit from SIMD
instructions without a need for compiler intrinsics and asm code.
Well-structured data storage and access can make a 50 % or more
performance gain, in my experience.

Another important thing: avoid float precision conversions. Throughout
Pd there are many untyped float defines and literal constants which
default to double, and I have introduced more when making libs
double-ready. Not good. I'll come back to this in another thread.

Katja

On Sat, Jan 12, 2013 at 8:14 PM, Hans-Christoph Steiner <hans at at.or.at> wrote:
>
> If you are interested, there is still the hand-coded SIMD stuff from pd-devel:
> https://pure-data.svn.sourceforge.net/svnroot/pure-data/branches/pd-devel/v0-39
>
> .hc
>
> On 01/12/2013 09:34 AM, katja wrote:
>> Function copy_perform8() is also eligible for SIMD processing. I used
>> memcpy() because it is straightforward to use, while Pd's functions
>> pointed to the wrong locations for this case. On the reverb's total
>> load there is no significant performance difference.
>>
>> Katja
>>
>>
>> On Sat, Jan 12, 2013 at 1:00 AM, Hans-Christoph Steiner <hans at at.or.at> wrote:
>>>
>>> I recently learned that libc's memcpy actually uses things like SSE2 or SSSE2
>>> so it can be quite fast on CPUs from the past 10 years, especially of the last
>>> 5 years.
>>>
>>> It would be worth profiling to see if that's noticeable.
>>>
>>> .hc
>>>
>>> On 01/11/2013 05:12 PM, katja wrote:
>>>> Ok so I did the ugly thing with the right channel input and output pointers:
>>>>
>>>> memcpy(outR, inR, vectorsize * sizeof(t_float));
>>>> inR = outR;
>>>>
>>>> Works like a charm, thanks again.
>>>>
>>>> Katja
>>>>
>>>>
>>>>
>>>> On Fri, Jan 11, 2013 at 10:05 PM, Miller Puckette <msp at ucsd.edu> wrote:
>>>>> copy_perform assumes the data is 4-byte aligned so might save a test
>>>>> or two compared to memcopy() - but I really don't know.  I never
>>>>> benchmarked the two against each other :)
>>>>>
>>>>> M
>>>>>
>>>>> On Fri, Jan 11, 2013 at 09:36:41PM +0100, katja wrote:
>>>>>> Hi Miller,
>>>>>>
>>>>>> Thanks for the solution. The routines are in place so copying the
>>>>>> right channel input to output should do it. Is there any reason to
>>>>>> prefer copy_perform() over memcpy()? I'm trying to make the most
>>>>>> efficient reverb for RPi & Co.
>>>>>>
>>>>>> Katja
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Fri, Jan 11, 2013 at 7:57 PM, Miller Puckette <msp at ucsd.edu> wrote:
>>>>>>> Hi Katja -
>>>>>>>
>>>>>>> There's one example of this in sigfft_dspx() - a complex FFT that 'natively'
>>>>>>> works on 2 signals in-place but has to deal with various cases in which
>>>>>>> buffers get re-used.  It's ugly but the basic idea is first to get the
>>>>>>> inputs copied to the outputs (unless they're already there in the correct
>>>>>>> order in which case nothing needs to be done) and then run the in-place
>>>>>>> algorithm.
>>>>>>>
>>>>>>> If the algo only works out-of-place (i.e. you need 4 distinct buffers, 2
>>>>>>> in and 2 out) the only way out is (at least conditionally) allocate temporary
>>>>>>> copies of the inputs before writing to any outputs.
>>>>>>>
>>>>>>> I may be able to add an optional way tilde objects can request that output
>>>>>>> buffers be distinct from input ones sometime in the future - but this is a
>>>>>>> couple of steps away for me right now :)
>>>>>>>
>>>>>>> M
>>>>>>>
>>>>>>> On Fri, Jan 11, 2013 at 03:32:09PM +0100, katja wrote:
>>>>>>>> Hello,
>>>>>>>>
>>>>>>>> I'm working on a Pd class with stereo channels (reverb), and the
>>>>>>>> routine happens to be most efficient when iterating over the samples
>>>>>>>> per channel, instead of left and right together in the perform loop.
>>>>>>>> However, when doing two while loops in one object, one for left and
>>>>>>>> one for right, the right channel samples get overwritten because of
>>>>>>>> sample-wise in-place computation. Is this an inescapable truth? I
>>>>>>>> mean, I could write a left channel class and a right channel class
>>>>>>>> (actually did that to verify that it works), but it's inconvenient to
>>>>>>>> use. What could be an efficient way to get them in one object?
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Katja
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Pd-list at iem.at mailing list
>>>>>>>> UNSUBSCRIBE and account-management -> http://lists.puredata.info/listinfo/pd-list
>>>>>>
>>>>>> _______________________________________________
>>>>>> Pd-list at iem.at mailing list
>>>>>> UNSUBSCRIBE and account-management -> http://lists.puredata.info/listinfo/pd-list
>>>>
>>>> _______________________________________________
>>>> Pd-list at iem.at mailing list
>>>> UNSUBSCRIBE and account-management -> http://lists.puredata.info/listinfo/pd-list
>>>>
>>>
>>> _______________________________________________
>>> Pd-list at iem.at mailing list
>>> UNSUBSCRIBE and account-management -> http://lists.puredata.info/listinfo/pd-list