[PD-dev] [GEM] CVS-changes

chris clepper cclepper at artic.edu
Mon Feb 24 20:20:34 CET 2003


>hi. jsut some comments (although i have commented it in the cvs-logs)...
>
>1. TV is gone (now in pix)
>2. a wee change in the imageStruct-class: reallocate() only 
>allocates memory if the old buffer is too small.
>3. [pix_tIIR]: another time-domain filter for images.
>is alike [pix_biquad] (but faster) and [pix_blur] (but more 
>flexible). You can give 2 arguments: the number of feedback-taps and 
>the number of feedforward-taps (in this order).
>No yuv-optimization is made (because of genericity).
>A test:
>[pix_blur] takes (on my machine) between 16% and 26% (mean: ~20%)
>[pix_tIIR 1 0] takes approx. 24% to 28%.

cool.  i will have to take a look at this and see about writing yuv 
and altivec code for it.  i have some FIR code as well.  can you post 
some specifics of your testing like cpu/ram/movie file and frame rate 
so we can compare performance?

>an ignorant question: how can i make use of loop-unrolling of the 
>compiler ? (how do i have to built the loops? my experiments showed 
>rather worse results when i did large loops (which could have been 
>unrolled by the compiler))

hmm, this sort of confirms what i've heard about this in regards to 
x86 vs PPC.  loop unrolling benefits PPC/RISC a lot since these chips 
tend to have lots of registers (PPC has 32), so by unrolling the 
loops you make sure all of the registers are filled constantly.  4x 
unrolling is typically a sweet spot.  now on x86 there are 
considerably fewer registers (is it still only 4???) so this 
technique is not as effective, but current x86 chips like the 
p4/athlon have gotten really good at Out of Order Execution (OOOE) so 
unrolling should be a bit better on those.  i have not done any loop 
unrolling or load-hoisting or cache streaming for the GEM scalar code 
on PPC simply because it doesn't work cross platform.  instead the 
altivec code uses things like cache streaming to achieve even bigger 
speed increases over the scalar versions.  if unrolling the loops 
using the compiler options is faster then do that because i suspect 
that if you compile for a cpu like the p3 it actually doesn't unroll 
them.  or try doing it with some MMX or SSE code and see if that 
shows an improvement.  i haven't done much optimizing on x86 so this 
is mostly second hand info, but at least x86 has some compilers that 
will produce ridiculously optimized code given the right 
circumstances.  (i wonder if the SPEC stuff has any convolution code 
in it, maybe the compiler intel uses to over inflate those scores 
would be of use...)

>4. what else ?
>cannot remember.

i don't know, but thanks for the post on the changes.

cgc

>
>mfg.asd.r
>IOhannes
>
>
>_______________________________________________
>PD-dev mailing list
>PD-dev at iem.kug.ac.at
>http://iem.kug.ac.at/cgi-bin/mailman/listinfo/pd-dev





More information about the Pd-dev mailing list