Hey Katja,<br><br>Would you mind sharing the 'normalised' Pd-0.44.0 for RPi please.<br><br>Cheers,<br><br>Julian<br><br><br><br><div class="gmail_quote">On 23 January 2013 18:23, katja <span dir="ltr"><<a href="mailto:katjavetter@gmail.com" target="_blank">katjavetter@gmail.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Now I recompiled the Pd-0.44.0 release on Raspberry Pi (took me a few<br>
hours, not only because Pi is so slow) with PD_BIGORSMALL enabled for<br>
arm in m_pd.h. Using bigorsmalltest.pd from my previous mail I<br>
verified that the macro is implemented indeed.<br>
<br>
Martin Brinkmann's patch chaosmonster1<br>
(<a href="http://www.martin-brinkmann.de" target="_blank">http://www.martin-brinkmann.de</a>) gives a beautiful illustration of the<br>
improvement. This patch is full of filters and delay lines. At it's<br>
initial settings, there is no subnormals problem. But if you set the<br>
bottom slider to the right, it gets silent. With Pd-0.44-0 release,<br>
CPU load explodes. With the 'normalized' Pd, nothing special happens.<br>
<br>
And indeed, the PD_BIGORSMALL conditional checks come for free: with<br>
initial settings of the chaosmonster1, performance is equivalent in<br>
both Pd's. Cool! Hopefully this is similar on armv7.<br>
<span class="HOEnZb"><font color="#888888"><br>
Katja<br>
</font></span><div class="HOEnZb"><div class="h5"><br>
<br>
<br>
On Wed, Jan 23, 2013 at 5:01 PM, Hans-Christoph Steiner <<a href="mailto:hans@at.or.at">hans@at.or.at</a>> wrote:<br>
><br>
> hey Katya,<br>
><br>
> This also sounds like good evidence for your idea of writing C code that<br>
> modern compilers optimize well. Using unions for aliasing allows the compiler<br>
> to do all the new tricks, then writing loops that auto-vectorize gives us the<br>
> real benefits. Also, I think we can see some gains by using memcpy() since on<br>
> modern libc version, those are highly optimized for the given CPU, dynamically<br>
> choosing the routines based on what instructions are available. memcpy will<br>
> use things like SSSE2 if its available.<br>
><br>
> .hc<br>
><br>
> On 01/23/2013 07:47 AM, katja wrote:<br>
>> Finally some good news on this topic. Earlier I stated that 'big or<br>
>> small tests' are expensive for the Pi, but that is not by definition<br>
>> the case. There must have been other conditions blurring my<br>
>> impression. I've now done a systematic test where other influences are<br>
>> ruled out. A test class [lopass~] with exactly the same routine as<br>
>> [lop~] was made, but compiled with PD_BIGORSMALL() macro enabled. It<br>
>> was verified that [lopass~] is not affected by denormals. Performance<br>
>> comparison of [lop~] and [lopass~] shows that both objects cause<br>
>> equivalent CPU load. Meaning, Raspberry Pi gives the 'big or small<br>
>> checks' for free! At least in the case of this simple filter. Please<br>
>> try attached bigorsmalltest.zip on the Pi to see if I'm not dreaming.<br>
>><br>
>> While I was at the topic anyway, I also tried a big or small test with<br>
>> union instead of direct type aliasing. It has the advantage that the<br>
>> compiler can apply strict aliasing rules. This test with unions did<br>
>> not cause extra CPU load either on the Pi. If you want to verify this<br>
>> result, enable the call to bigorsmall() instead of PD_BIGORSMALL in<br>
>> lopass~.c and recompile.<br>
>><br>
>> The fact that these tests do not cause extra CPU load, indicate that<br>
>> they are done in parallel with other instructions. Float and int<br>
>> registers are apparently strictly separated on armv6, there's no such<br>
>> thing like Intel's xmm registers or armv7's NEON. As it happens, the<br>
>> big or small tests are done on ints, aliases of the floats that must<br>
>> be tested. Initially I assumed that the transport of floats from vfp<br>
>> to the arm integer processor would be expensive, but if the<br>
>> instructions are done simultaneously it may be an advantage instead.<br>
>> Another thing is that ARM implements branch predication instead of<br>
>> branch prediction. Those terms look almost the same but the routines<br>
>> are very different. Predication is when instructions for both branches<br>
>> are executed, and the wrong result is simply discarded later.<br>
>><br>
>> Conclusions from the limited test with [lop~] and [lopass~] do not<br>
>> mean that all sorts of conditional checks are cheap on the Pi, or on<br>
>> ARM in general. If PD_BIGORSMALL is enabled for RPi using compile-time<br>
>> definition __arm__, it will also hold for armv7, but it may have very<br>
>> different result there. At the moment I have no access yet to an armv7<br>
>> device. Maybe someone can recompile test class [lopass~] and do the<br>
>> tests on Beagleboard or Cubieboard? Otherwise I may be able to do it<br>
>> on my friend's PengPod when that has arrived.<br>
>><br>
>> Katja<br>
>><br>
>><br>
>> On Tue, Jan 22, 2013 at 8:54 PM, Miller Puckette <<a href="mailto:msp@ucsd.edu">msp@ucsd.edu</a>> wrote:<br>
>>> thanks - I'd better try this and find out what's going on :)<br>
>>><br>
>>> M<br>
>>><br>
>>> On Mon, Jan 21, 2013 at 11:54:29AM +0100, katja wrote:<br>
>>>> Tried the 0.44.0 build from your website. It has the same issue with<br>
>>>> subnormal values. My test patch is with [lop~]. If inf or nan is fed<br>
>>>> into [lop~], these 'values' keep circulating in the object, it can no<br>
>>>> longer process normal signal values.<br>
>>>><br>
>>>> I also tried my reverb stuff with specific compiler options for Pi's processor:<br>
>>>><br>
>>>> -march=armv6zk<br>
>>>> -mcpu=arm1176jzf-s<br>
>>>> -mtune=arm1176jzf-s<br>
>>>><br>
>>>> With these options, gcc should be able to decide that RunFast mode is<br>
>>>> permitted. But even in combination with -ffast-math (which in turn<br>
>>>> sets -funsafe-math-optimizations and -fno-trapping-math amongst<br>
>>>> others), denormals are still there. I'm literally out of options for<br>
>>>> the moment. Sorry for not having better news.<br>
>>>><br>
>>>> Katja<br>
>>>><br>
>>>><br>
<br>
_______________________________________________<br>
<a href="mailto:Pd-list@iem.at">Pd-list@iem.at</a> mailing list<br>
UNSUBSCRIBE and account-management -> <a href="http://lists.puredata.info/listinfo/pd-list" target="_blank">http://lists.puredata.info/listinfo/pd-list</a><br>
</div></div></blockquote></div><br>