PD and profiling (was: Re: [PD] Pd on OS X)

Sat Nov 16 00:59:32 CET 2002

On Tue, 12 Nov 2002, Adam Lindsay wrote:
> With all these discussions about performance, I was wondering if anyone
> has played around with "reference" patches that could be used for the
> sake of profiling. Would that be a useful thing, to come up with a patch
> or a suite of patches that give PD a general, typical (if there could
> possibly be such a thing) workout?

I'd like to say that GridFlow 0.4-0.5 has a runtime profiler that almost
does not affect performance and is quite accurate.

I don't know how adding such a feature directly to PD/jMax would influence
performance, though; what works especially well with GridFlow is that most
methods, on average, take a long time to run; they typically do between
ten and a thousand math operations before passing control to the next
method.

In GridFlow 0.6 I removed the profiler but I think it's temporary and I
will find out how to integrate a new profiler with the new gridflow in a
proper manner.

When I say runtime profiler, I mean that you can consult its results at
runtime and reset it and you can find which _objects_ are doing the most
work and a regular PD user can take advantage of it to know where the
bottleneck is (however it's true that fixing the bottlenecks is actually
more difficult).

Regular profiling (with gprof and profiler.rb and such) might be
interesting, but for different reasons, mostly for core/external writers.

> That's about the size of it, as -O2 does optimizations where there isn't
> a size-speed tradeoff, and -O3 does. I've noticed some places where
> Miller has hand-unrolled stuff in the code, anyway.

I hand-unrolled stuff in my code too; gcc, or at least some common
versions thereof, can't figure out some unrollings, especially with
variable number of iterations; what i do is I transform this:

	while (n) {
		do_stuff_here;
		n--;
	}

into:

	while (n&3) {
		do_stuff_here;
		n--;
	}
	while (n) {
		do_stuff_here;
		do_stuff_here;
		do_stuff_here;
		do_stuff_here;
		n-=4;
	}

which is the fastest i could do without bloating the codesize too much.

________________________________________________________________
Mathieu Bouchard                       http://artengine.ca/matju