[PD-dev] [GEM] Further CVS changes

Tom Schouten doelie at zzz.kotnet.org
Thu Jan 30 20:39:05 CET 2003

> > I have also done experiments with MMX (which, I have to
> > admit, did not give the results I had hoped for, but maybe just
> > because I did not really know what I was doing ).
> I have added MMX code to my software; the asm code is generated with a
> script. The results I get with int32 are slightly slower than GCC's
> non-MMX output, and I'm doing pretty close to my best. However with int16
> and uint8 the MMX gets a certain percentage of improvement, though really
> not extraordinary... 30-40% ? maybe it's all the packet-handling going
> on around that makes the improvement appear less than it really is?

i got some (at first glance) counterintuitive results using mmx in pdp too. i 
guess a lot of this strangeness has to do with memory bandwidth. simple 
operations like add or scale are not much faster than their scalar integer c 
counterparts. i did get a lot of speedup for the more compute intensive stuff 
like the biquad filters, iterated convolution and basicly anything that needs 
to do a lot of clipping. also i try to limit the data copying to a minimum in 
pdp, this seems to help too..

the general rule seems to be: keep your memory accesses local and your data 
size small: do as much as possible inside the pixel loop, or iterate several 
times over 1 scanline instead of the whole image.


More information about the Pd-dev mailing list