[PD-dev] [GEM] Further CVS changes

guenter geiger geiger at xdv.org
Thu Jan 30 10:11:38 CET 2003

On Wed, 29 Jan 2003, chris clepper wrote:
> Also, what are people's thoughts on optimizing the code?  For
> example, I rewrote, but haven't committed, the RGBA part of pix_gain
> to use ints rather than floats in the loop and it's twice as fast now
> on PPC.  A lot can be done with the pix_ code in terms of
> optimization.  Here's a breakdown of CPU use for various pix_gain
> functions on a 1ghz g4 processing DV_NTSC at 30fps:
> pix_gain - original RGBA- 49% cpu use
> pix_gain - int rewrite RGBA - 24%
> pix_gain - yuv scalar - 14% cpu
> pix_gain - yuv altivec - 2% cpu
> Obviously these results show that performance gains can be huge if
> certain optimizations are done.  Is any one doing this for x86?  I
> see that there are two MMX functions that someone added, are there
> plans for more?

hi chris,

Your results are really impressive.
I did some experiments to with the pix code on linux, and you are right
that we can gain a lot by optimizing the code.
I changed pix_add and others from float to integer. I have also done
experiments with MMX (which, I have to admit, did not give the
results I had hoped for, but maybe just because I did not really
know what I was doing ).

At least we should get rid of float pixel processing on all platforms.

> It's probably a good idea to put structures in place,
> up front, to make sure that code compiles across platforms and there
> are not crashes on various processors.  Are #ifdefs enough at this
> point?  We (tigital and myself) are trying to figure out the best way
> to get this altivec code into CVS and have it not impact the x86 side
> of things.

#ifdefs are ugly
If it is in some way possible, we should come up with macro's or
templates for common optimizable functions. (I think there was a message
from Daniel about that).

If that  is not possible, put the architecture dependend code in its
own source file, and write a non-altivec version of the same code.
Later someone may add the MMX version, but can concentrate on
that instead of having to go through ifdefs.



