[PD-dev] double precision Pd: .patch files, tests and benchmarks
IOhannes m zmoelnig
zmoelnig at iem.at
Tue Oct 4 11:38:12 CEST 2011
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
On 2011-10-04 09:06, katja wrote:
>
> Yesterday I forgot to mention why it should definitely not be built
> with -O0 (unless for debug purposes): PD_BIGORSMALL is defined an
ah yes, this was indeed my fault.
since i don't feel comfortable with editing m_pd.h to get a different
build, i used CFLAGS="-DPD_FLOAT_PRECISION=64", which undid any
optimization flags (which by default are "-O6", which i find a bit
overdone; and "-g" is not set at all...)
the proper way is to use CPPFLAGS="-DPD_FLOAT_PRECISION=64", which
results in:
osc-delay-perftest with 400 instances:
debian : 31%
original : 29%
single : 22%
single(O0) : 64%
single(O2) : 25%
single(O2+loop) : 22%
single(pentium3) : 24%
single(pentium4) : 22%
single(prescott) : 22%
single(core2) : 22%
single(core2+sse): 22%
double : 25%
double(O0) : 86%
double(O2) : 27%
double(O2+loop) : 26%
double(pentium3) : 25%
double(pentium4) : 24%
double(prescott) : 24%
double(core2) : 24%
double(core2+sse): 25%
osc-delay-perftest with 1200 instances:
debian : 94%
original : 81%
single : 65%
single(O2) : 72%
single(O0) : ++%
single(O2+loop) : 66%
single(pentium3) : 70%
single(pentium4) : 66%
single(prescott) : 65%
single(core2) : 59%
single(core2+sse): 64%
double : 77%
double(O0) : ++%
double(O2) : 82%
double(O2+loop) : 77%
double(pentium3) : 79%
double(pentium4) : 75%
double(prescott) : 75%
double(core2) : 71%
double(core2+sse): 75%
which is more inline with katja's measurements.
this is (again) on an i5 650 @ 3.2GHz running in 32bit mode
optimization flags (as far as they can be reconstructed :-))
debian: "-g -O2" (this is what is dictated by debian policy)
original: "-O6 -funroll-loops -fomit-frame-pointer" (seems to be the
default)
single/double: ->original
(O0): -O0
(O2): -g -O2
(O2+loop): -g -O2 -funroll-loops -fomit-frame-pointer
(prescott): ->original + "-march=prescott"
(core2): ->original + "-march=core2"
(core2+sse): ->original + "-march=core2 -mfpmath=sse -msse2"
so it seems like the biggest performance boost is given (on the tested
platform), by compiling with "-g -O2 -funroll-loops
- -fomit-frame-pointer" (which is cool because i think this can even make
it into debian, the way it is)
> inline function (like it was already suggested by IOhannes a while
> ago), but at -O0 nothing will be inlined. A benchmark howto would be
> useful indeed.
well, i usually just cram lots of the same object into a subpatch (until
i get approximately 80% in the slowest environment, in order to not max
out the CUP and get unknown side-effects), and measure it with the
built-in load-meter (for loads <100% it behaves quite the same as top)
nothing very dramatic.
fgmasdr
IOhannes
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
iEYEARECAAYFAk6K1AQACgkQkX2Xpv6ydvTGgwCfSp1ytXru2AtPqCQx2O1BZ3Zc
A2QAoNS7ki9euvd4XKaRMhtc0grI2D9V
=EwUX
-----END PGP SIGNATURE-----
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 3636 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://lists.puredata.info/pipermail/pd-dev/attachments/20111004/ac2fbe3b/attachment.bin>
More information about the Pd-dev
mailing list