[PD-dev] double precision Pd: .patch files, tests and benchmarks
Hans-Christoph Steiner
hans at at.or.at
Tue Oct 4 15:41:00 CEST 2011
On Oct 4, 2011, at 5:38 AM, IOhannes m zmoelnig wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On 2011-10-04 09:06, katja wrote:
>>
>> Yesterday I forgot to mention why it should definitely not be built
>> with -O0 (unless for debug purposes): PD_BIGORSMALL is defined an
>
> ah yes, this was indeed my fault.
> since i don't feel comfortable with editing m_pd.h to get a different
> build, i used CFLAGS="-DPD_FLOAT_PRECISION=64", which undid any
> optimization flags (which by default are "-O6", which i find a bit
> overdone; and "-g" is not set at all...)
>
> the proper way is to use CPPFLAGS="-DPD_FLOAT_PRECISION=64", which
> results in:
>
> osc-delay-perftest with 400 instances:
> debian : 31%
> original : 29%
> single : 22%
> single(O0) : 64%
> single(O2) : 25%
> single(O2+loop) : 22%
> single(pentium3) : 24%
> single(pentium4) : 22%
> single(prescott) : 22%
> single(core2) : 22%
> single(core2+sse): 22%
> double : 25%
> double(O0) : 86%
> double(O2) : 27%
> double(O2+loop) : 26%
> double(pentium3) : 25%
> double(pentium4) : 24%
> double(prescott) : 24%
> double(core2) : 24%
> double(core2+sse): 25%
>
> osc-delay-perftest with 1200 instances:
> debian : 94%
> original : 81%
> single : 65%
> single(O2) : 72%
> single(O0) : ++%
> single(O2+loop) : 66%
> single(pentium3) : 70%
> single(pentium4) : 66%
> single(prescott) : 65%
> single(core2) : 59%
> single(core2+sse): 64%
> double : 77%
> double(O0) : ++%
> double(O2) : 82%
> double(O2+loop) : 77%
> double(pentium3) : 79%
> double(pentium4) : 75%
> double(prescott) : 75%
> double(core2) : 71%
> double(core2+sse): 75%
>
> which is more inline with katja's measurements.
>
> this is (again) on an i5 650 @ 3.2GHz running in 32bit mode
> optimization flags (as far as they can be reconstructed :-))
> debian: "-g -O2" (this is what is dictated by debian policy)
> original: "-O6 -funroll-loops -fomit-frame-pointer" (seems to be the
> default)
> single/double: ->original
> (O0): -O0
> (O2): -g -O2
> (O2+loop): -g -O2 -funroll-loops -fomit-frame-pointer
> (prescott): ->original + "-march=prescott"
> (core2): ->original + "-march=core2"
> (core2+sse): ->original + "-march=core2 -mfpmath=sse -msse2"
>
>
> so it seems like the biggest performance boost is given (on the tested
> platform), by compiling with "-g -O2 -funroll-loops
> - -fomit-frame-pointer" (which is cool because i think this can even
> make
> it into debian, the way it is)
>
>
>> inline function (like it was already suggested by IOhannes a while
>> ago), but at -O0 nothing will be inlined. A benchmark howto would be
>> useful indeed.
>
>
> well, i usually just cram lots of the same object into a subpatch
> (until
> i get approximately 80% in the slowest environment, in order to not
> max
> out the CUP and get unknown side-effects), and measure it with the
> built-in load-meter (for loads <100% it behaves quite the same as top)
> nothing very dramatic.
Nice tests, thanks for that. I would be interested to see the effects
of auto-vectorization on these numbers. Have you tried that? If the
test patch doesn't include objects that have loops vectorized, it
won't make a difference.
.hc
----------------------------------------------------------------------------
If you are not part of the solution, you are part of the problem.
More information about the Pd-dev
mailing list