On 8/1/06, <b class="gmail_sendername">Tim Blechmann</b> &lt;<a href="mailto:TimBlechmann@gmx.net">TimBlechmann@gmx.net</a>&gt; wrote:<div><span class="gmail_quote"></span><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">

On Tue, 2006-08-01 at 11:28 +0200, geiger wrote:<br>&gt; &gt; Anyone feel like SIMDifiying [arraycopy] (hint hint ;).&nbsp;&nbsp;I suppose<br>&gt; &gt; vasp is a SIMD version of [arraycopy].<br>&gt;<br>&gt; SIMD doesn't help in copying data. And in any case, introducing

<br>&gt; platform dependend code is only advisable in cases where it really<br>&gt; matters.<br><br>are you sure about this? not having benchmarks on this, i'm pretty sure,<br>that moving 128 chunks of aligned memory is more efficient than moving 4

<br>32 bit chunks of unaligned memory ...</blockquote><div><br>The cachelines and prefetch don't change for SIMD, so that will be the limitation on blocks outside of L1 and L2.&nbsp; Copying memory is inefficient no matter what the code is since it doesn't really do any work on the data.&nbsp; 

<br><br>The basic calls like memcpy() should have CPU specific code on every platform (Windows seems a little suspect though).&nbsp; In some cases where the data is not in a single linear array memcpy() might not be the most efficient way to copy, but in general it is hard to beat.

<br></div><br>cgc<br></div><br>