[PD-dev] double precision Pd: .patch files, tests and benchmarks

Tue Oct 4 15:41:00 CEST 2011

On Oct 4, 2011, at 5:38 AM, IOhannes m zmoelnig wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On 2011-10-04 09:06, katja wrote:
>>
>> Yesterday I forgot to mention why it should definitely not be built
>> with -O0 (unless for debug purposes): PD_BIGORSMALL is defined an
>
> ah yes, this was indeed my fault.
> since i don't feel comfortable with editing m_pd.h to get a different
> build, i used CFLAGS="-DPD_FLOAT_PRECISION=64", which undid any
> optimization flags (which by default are "-O6", which i find a bit
> overdone; and "-g" is not set at all...)
>
> the proper way is to use CPPFLAGS="-DPD_FLOAT_PRECISION=64", which
> results in:
>
> osc-delay-perftest with 400 instances:
> debian           : 31%
> original         : 29%
> single           : 22%
> single(O0)       : 64%
> single(O2)       : 25%
> single(O2+loop)  : 22%
> single(pentium3) : 24%
> single(pentium4) : 22%
> single(prescott) : 22%
> single(core2)    : 22%
> single(core2+sse): 22%
> double           : 25%
> double(O0)       : 86%
> double(O2)       : 27%
> double(O2+loop)  : 26%
> double(pentium3) : 25%
> double(pentium4) : 24%
> double(prescott) : 24%
> double(core2)    : 24%
> double(core2+sse): 25%
>
> osc-delay-perftest with 1200 instances:
> debian           : 94%
> original         : 81%
> single           : 65%
> single(O2)       : 72%
> single(O0)       : ++%
> single(O2+loop)  : 66%
> single(pentium3) : 70%
> single(pentium4) : 66%
> single(prescott) : 65%
> single(core2)    : 59%
> single(core2+sse): 64%
> double           : 77%
> double(O0)       : ++%
> double(O2)       : 82%
> double(O2+loop)  : 77%
> double(pentium3) : 79%
> double(pentium4) : 75%
> double(prescott) : 75%
> double(core2)    : 71%
> double(core2+sse): 75%
>
> which is more inline with katja's measurements.
>
> this is (again) on an i5 650 @ 3.2GHz running in 32bit mode
> optimization flags (as far as they can be reconstructed :-))
> debian: "-g -O2" (this is what is dictated by debian policy)
> original: "-O6 -funroll-loops -fomit-frame-pointer"  (seems to be the
> default)
> single/double: ->original
> (O0): -O0
> (O2): -g -O2
> (O2+loop): -g -O2 -funroll-loops -fomit-frame-pointer
> (prescott): ->original + "-march=prescott"
> (core2): ->original + "-march=core2"
> (core2+sse): ->original + "-march=core2 -mfpmath=sse -msse2"
>
>
> so it seems like the biggest performance boost is given (on the tested
> platform), by compiling with "-g -O2 -funroll-loops
> - -fomit-frame-pointer" (which is cool because i think this can even  
> make
> it into debian, the way it is)
>
>
>> inline function (like it was already suggested by IOhannes a while
>> ago), but at -O0 nothing will be inlined. A benchmark howto would be
>> useful indeed.
>
>
> well, i usually just cram lots of the same object into a subpatch  
> (until
> i get approximately 80% in the slowest environment, in order to not  
> max
> out the CUP and get unknown side-effects), and measure it with the
> built-in load-meter (for loads <100% it behaves quite the same as top)
> nothing very dramatic.

Nice tests, thanks for that.  I would be interested to see the effects  
of auto-vectorization on these numbers.  Have you tried that?  If the  
test patch doesn't include objects that have loops vectorized, it  
won't make a difference.

.hc

----------------------------------------------------------------------------

If you are not part of the solution, you are part of the problem.