[PD-dev] [GEM] CVS-changes
chris clepper
cclepper at artic.edu
Mon Feb 24 20:20:34 CET 2003
>hi. jsut some comments (although i have commented it in the cvs-logs)...
>
>1. TV is gone (now in pix)
>2. a wee change in the imageStruct-class: reallocate() only
>allocates memory if the old buffer is too small.
>3. [pix_tIIR]: another time-domain filter for images.
>is alike [pix_biquad] (but faster) and [pix_blur] (but more
>flexible). You can give 2 arguments: the number of feedback-taps and
>the number of feedforward-taps (in this order).
>No yuv-optimization is made (because of genericity).
>A test:
>[pix_blur] takes (on my machine) between 16% and 26% (mean: ~20%)
>[pix_tIIR 1 0] takes approx. 24% to 28%.
cool. i will have to take a look at this and see about writing yuv
and altivec code for it. i have some FIR code as well. can you post
some specifics of your testing like cpu/ram/movie file and frame rate
so we can compare performance?
>an ignorant question: how can i make use of loop-unrolling of the
>compiler ? (how do i have to built the loops? my experiments showed
>rather worse results when i did large loops (which could have been
>unrolled by the compiler))
hmm, this sort of confirms what i've heard about this in regards to
x86 vs PPC. loop unrolling benefits PPC/RISC a lot since these chips
tend to have lots of registers (PPC has 32), so by unrolling the
loops you make sure all of the registers are filled constantly. 4x
unrolling is typically a sweet spot. now on x86 there are
considerably fewer registers (is it still only 4???) so this
technique is not as effective, but current x86 chips like the
p4/athlon have gotten really good at Out of Order Execution (OOOE) so
unrolling should be a bit better on those. i have not done any loop
unrolling or load-hoisting or cache streaming for the GEM scalar code
on PPC simply because it doesn't work cross platform. instead the
altivec code uses things like cache streaming to achieve even bigger
speed increases over the scalar versions. if unrolling the loops
using the compiler options is faster then do that because i suspect
that if you compile for a cpu like the p3 it actually doesn't unroll
them. or try doing it with some MMX or SSE code and see if that
shows an improvement. i haven't done much optimizing on x86 so this
is mostly second hand info, but at least x86 has some compilers that
will produce ridiculously optimized code given the right
circumstances. (i wonder if the SPEC stuff has any convolution code
in it, maybe the compiler intel uses to over inflate those scores
would be of use...)
>4. what else ?
>cannot remember.
i don't know, but thanks for the post on the changes.
cgc
>
>mfg.asd.r
>IOhannes
>
>
>_______________________________________________
>PD-dev mailing list
>PD-dev at iem.kug.ac.at
>http://iem.kug.ac.at/cgi-bin/mailman/listinfo/pd-dev
More information about the Pd-dev
mailing list