[PD] Raspberry Pi does denormals

katja katjavetter at gmail.com
Wed Jan 23 19:23:37 CET 2013


Now I recompiled the Pd-0.44.0 release on Raspberry Pi (took me a few
hours, not only because Pi is so slow) with PD_BIGORSMALL enabled for
arm in m_pd.h. Using bigorsmalltest.pd from my previous mail I
verified that the macro is implemented indeed.

Martin Brinkmann's patch chaosmonster1
(http://www.martin-brinkmann.de) gives a beautiful illustration of the
improvement. This patch is full of filters and delay lines. At it's
initial settings, there is no subnormals problem. But if you set the
bottom slider to the right, it gets silent. With Pd-0.44-0 release,
CPU load explodes. With the 'normalized' Pd, nothing special happens.

And indeed, the PD_BIGORSMALL conditional checks come for free: with
initial settings of the chaosmonster1, performance is equivalent in
both Pd's. Cool! Hopefully this is similar on armv7.

Katja



On Wed, Jan 23, 2013 at 5:01 PM, Hans-Christoph Steiner <hans at at.or.at> wrote:
>
> hey Katya,
>
> This also sounds like good evidence for your idea of writing C code that
> modern compilers optimize well.  Using unions for aliasing allows the compiler
> to do all the new tricks, then writing loops that auto-vectorize gives us the
> real benefits.  Also, I think we can see some gains by using memcpy() since on
> modern libc version, those are highly optimized for the given CPU, dynamically
> choosing the routines based on what instructions are available. memcpy will
> use things like SSSE2 if its available.
>
> .hc
>
> On 01/23/2013 07:47 AM, katja wrote:
>> Finally some good news on this topic. Earlier I stated that 'big or
>> small tests' are expensive for the Pi, but that is not by definition
>> the case. There must have been other conditions blurring my
>> impression. I've now done a systematic test where other influences are
>> ruled out. A test class [lopass~] with exactly the same routine as
>> [lop~] was made, but compiled with PD_BIGORSMALL() macro enabled. It
>> was verified that [lopass~] is not affected by denormals. Performance
>> comparison of [lop~] and [lopass~] shows that both objects cause
>> equivalent CPU load. Meaning, Raspberry Pi gives the 'big or small
>> checks' for free! At least in the case of this simple filter. Please
>> try attached bigorsmalltest.zip on the Pi to see if I'm not dreaming.
>>
>> While I was at the topic anyway, I also tried a big or small test with
>> union instead of direct type aliasing. It has the advantage that the
>> compiler can apply strict aliasing rules. This test with unions did
>> not cause extra CPU load either on the Pi. If you want to verify this
>> result, enable the call to bigorsmall() instead of PD_BIGORSMALL in
>> lopass~.c and recompile.
>>
>> The fact that these tests do not cause extra CPU load, indicate that
>> they are done in parallel with other instructions. Float and int
>> registers are apparently strictly separated on armv6, there's no such
>> thing like Intel's xmm registers or armv7's NEON. As it happens, the
>> big or small tests are done on ints, aliases of the floats that must
>> be tested. Initially I assumed that the transport of floats from vfp
>> to the arm integer processor would be expensive, but if the
>> instructions are done simultaneously it may be an advantage instead.
>> Another thing is that ARM implements branch predication instead of
>> branch prediction. Those terms look almost the same but the routines
>> are very different. Predication is when instructions for both branches
>> are executed, and the wrong result is simply discarded later.
>>
>> Conclusions from the limited test with [lop~] and [lopass~] do not
>> mean that all sorts of conditional checks are cheap on the Pi, or on
>> ARM in general. If PD_BIGORSMALL is enabled for RPi using compile-time
>> definition __arm__, it will also hold for armv7, but it may have very
>> different result there. At the moment I have no access yet to an armv7
>> device. Maybe someone can recompile test class [lopass~] and do the
>> tests on Beagleboard or Cubieboard? Otherwise I may be able to do it
>> on my friend's PengPod when that has arrived.
>>
>> Katja
>>
>>
>> On Tue, Jan 22, 2013 at 8:54 PM, Miller Puckette <msp at ucsd.edu> wrote:
>>> thanks - I'd better try this and find out what's going on :)
>>>
>>> M
>>>
>>> On Mon, Jan 21, 2013 at 11:54:29AM +0100, katja wrote:
>>>> Tried the 0.44.0 build from your website. It has the same issue with
>>>> subnormal values. My test patch is with [lop~]. If inf or nan is fed
>>>> into [lop~], these 'values' keep circulating in the object, it can no
>>>> longer process normal signal values.
>>>>
>>>> I also tried my reverb stuff with specific compiler options for Pi's processor:
>>>>
>>>> -march=armv6zk
>>>> -mcpu=arm1176jzf-s
>>>> -mtune=arm1176jzf-s
>>>>
>>>> With these options, gcc should be able to decide that RunFast mode is
>>>> permitted. But even in combination with -ffast-math (which in turn
>>>> sets -funsafe-math-optimizations and -fno-trapping-math amongst
>>>> others), denormals are still there. I'm literally out of options for
>>>> the moment. Sorry for not having better news.
>>>>
>>>> Katja
>>>>
>>>>



More information about the Pd-list mailing list