[PD] what makes Pd-extended 0.43 so CPU-hungry?

Hans-Christoph Steiner hans at at.or.at
Fri Dec 7 23:46:59 CET 2012


Thanks for this write-up and all that testing, its definitely very helpful.  So in the end, you're talking about Pd-extended on debian only?  It sounds like your tests show that 0.43 was not slower on Mac OS X.

It does look like the Debian-i386 builds don't have optimization turned on, you can look at the build log to see exactly how it was built:

http://autobuild.puredata.info/auto-build/2012-12-07/logs/2012-12-07_06.27.52_linux_debian-squeeze-i386_pd-extended.txt

cc -I"/home/pd/auto-build/pd-extended/pd/include/pd" -DPD -DVERSION='"1.2.1"' -fPIC -DPD -DHAVE_G_CANVAS_H -I/home/pd/auto-build/pd-extended/pd/src -Wall -W -ggdb -I/home/pd/auto-build/pd-extended/externals/Gem -I/home/pd/auto-build/pd-extended/externals/pdp/include -DUNIX -Dunix -DDL_OPEN -fPIC -g -fno-inline-functions -fno-omit-frame-pointer -DDEBUG_SOUNDFILE -Wstrict-aliasing=2 -o "freeverb~.o" -c "freeverb~.c"

If you want to mess with the flags, try adding things to OPT_CFLAGS in packages/linux_make/Makefile, that should affect the almost all of the build.  If you just want to test freeverb, you can do this:

cd externals/freeverb
make OPT_CFLAGS="-O6 -msse -msse2 -mfpmath=sse -ftree-vectorize -ftree-vectorizer-verbose=1"

Or things like that... I'd be very interested to hear about profiling results of using these flags.  I only did a little profiling when I stuck those in.

.hc

On Dec 7, 2012, at 4:36 PM, katja wrote:

> Finally I have some clue what's wrong with Pd-E 0.43 for GNU/Linux, or
> for Debian Squeeze at least. Sorry that it took me so long to sit down
> and sort it out.
> 
> The problem is still there, with version 0.43.4: my live performance
> setups run with almost double CPU load, when compared to 0.42. Now I
> also tested with some comprehensive patches which are known to be pure
> vanilla, like Martin Brinkmann's 'chaosmonster'. Remarkably, these
> patches do not show an increased CPU load. Therefore I guessed that it
> must be in external classes.
> 
> I tried using callgrind and kcachegrind (thanks for the hint Jamie).
> Though callgrind makes Pd choke completely (while recording the
> complete call history of a process instead of taking samples), the
> output gave a clue. Freeverb~ was shown to make a couple hundred
> function calls within the perform loop. Functions which are written as
> 'inline' in the C file. An isolated freeverb~ instance turned out to
> do 10% CPU load. Admittedly, this computer (1.8 GHz core duo 2006) is
> not the latest. But freeverb~ normally does some 1% per instance.
> 
> So, freeverb~ is the messenger; without it I might not have noticed
> any problem. But what is the message? Is Pd-E 0.43 compiled without
> optimization? I searched for more inline functions in external libs,
> and found one in bsaylor/svf~. In this case again, the executable
> implements it as a call. The core code however is almost certainly
> compiled with properly inlined functions. There's one frequently
> called inline function in the API (PD_BIGORSMALL, which used to be a
> macro in the past). If this would be compiled as a call, a patch like
> 'chaosmonster' would definitely show performance loss.
> 
> Note that I'm talking about debian binaries so far, more precisely
> Pd-E 0.43.4 for debian squeeze, as downloaded from puredata.info
> downloads page. In contrast, I checked freeverb~ in the distribution
> for OSX i386, and here the inlining was done properly.
> 
> Another difference between those distributions: SSE instructions are
> used for OSX, not for debian. Simple operations like addition and
> multiplication of floats are done on the FPU in debian, while xmm
> registers are used with OSX. This also means that things like abs()
> and ifnan() are function calls for debian, while they could be simple
> instructions on the xmm registers. (Instructions can be viewed by
> dissassembling executables with command objdump -d <file>.)
> 
> My conclusion from these observations: at least some Pd 0.43 externals
> for debian squeeze are compiled with -0O for some reason (don't know
> about other Linuxes). How come? The template makefile (also used for
> freeverb~) has optimization -O6. The root makefile for the packages
> have certain optimization flags as well. Are they somehow conflicting,
> producing an undefined result? Not for OSX, apparently. But for debian
> something goes wrong. The build system stuff is really over my head,
> hopefully someone else has better overview to find the exact cause.
> 
> Katja
> 
> 
> 
> 
> On 5/6/12, Jamie Bullock <jamie.b.bullock at gmail.com> wrote:
>> 
>> Hi Katja,
>> 
>> 
>> On 5 May 2012, at 20:43, katja <katjavetter at gmail.com> wrote:
>> 
>>> 
>>> 
>>> I've tried to use Oprofile on Debian, but this gives me a kernel
>>> failure soon as I start sampling. Does anyone know of a fine
>>> performance profiler for GNU/Linux?
>>> 
>>> Katja
>>> 
>>> 
>> 
>> You might want to try callgrind + kcachegrind...
>> 
>> http://www.slac.stanford.edu/BFROOT/www/Computing/Optimization/genprof.html
>> 
>> best,
>> 
>> Jamie
>> 
>> --
>> http://www.jamiebullock.com
>> 
>> 
>> 
> 
> _______________________________________________
> Pd-list at iem.at mailing list
> UNSUBSCRIBE and account-management -> http://lists.puredata.info/listinfo/pd-list




More information about the Pd-list mailing list