[PD-dev] [GEM] Further CVS changes
doelie at zzz.kotnet.org
Thu Jan 30 20:39:05 CET 2003
> > I have also done experiments with MMX (which, I have to
> > admit, did not give the results I had hoped for, but maybe just
> > because I did not really know what I was doing ).
> I have added MMX code to my software; the asm code is generated with a
> script. The results I get with int32 are slightly slower than GCC's
> non-MMX output, and I'm doing pretty close to my best. However with int16
> and uint8 the MMX gets a certain percentage of improvement, though really
> not extraordinary... 30-40% ? maybe it's all the packet-handling going
> on around that makes the improvement appear less than it really is?
i got some (at first glance) counterintuitive results using mmx in pdp too. i
guess a lot of this strangeness has to do with memory bandwidth. simple
operations like add or scale are not much faster than their scalar integer c
counterparts. i did get a lot of speedup for the more compute intensive stuff
like the biquad filters, iterated convolution and basicly anything that needs
to do a lot of clipping. also i try to limit the data copying to a minimum in
pdp, this seems to help too..
the general rule seems to be: keep your memory accesses local and your data
size small: do as much as possible inside the pixel loop, or iterate several
times over 1 scanline instead of the whole image.
More information about the Pd-dev