[PD-dev] [GEM] Further CVS changes

Tom Schouten doelie at zzz.kotnet.org
Thu Jan 30 21:18:43 CET 2003


> >the general rule seems to be: keep your memory accesses local and your
> > data size small: do as much as possible inside the pixel loop, or iterate
> > several times over 1 scanline instead of the whole image.
>
> These points seems to hold true for all SIMD types.  Altivec is
> pretty much limited by memory bandwidth so it pays to do as much
> calculation on the data between memory accesses. In Altivec, there
> are also cache control functions to open up dedicated cache-lines to
> the vector unit, which help decrease memory load latencies.  Maybe
> there exists something similar to this for MMX?
>

iirc in both intel and amd extensions to mmx there are such instructions, but 
not compatible with each other..

> The structure of the processing chain is also a big factor.  GEM is
> basically a chain of for loops, which probably isn't ideal, but it is
> quite flexible.
>
> Matju, is GridFlow building a single loop and filling it with
> functions from a table?  That seems like it could be really
> efficient, especially with the decrease in memory accesses between
> objects.
>

hmm. instructions need to come from memory too ;) and most algo's need state 
data and this will fill up your data cache very fast. so swapping entirely to 
this method has the same problem as the "chain of loops" way.

but i agree that there should be a nice optimum somewhere in the middle. a 
while ago i've been looking for a way to parametrize all these factorizations 
to let an optimization program (i.e. genetic algo) descide on what is the 
best way to factor this. but i gave it up because it was a little more 
complex than i thought. however, there is a lot of research going on on this 
issue (data transfer and storage exploration at imec.be for instance..)

tom




More information about the Pd-dev mailing list