[PD] [OT] SSE/MMX tips?

Bill Gribble grib at billgribble.com
Thu Sep 8 03:30:00 CEST 2011

I noticed that your suggestion did not apply, but assumed it was a subtle riddle taunting me for an offtopic post!

I think the best I can do is 2 vector adds and 2 shifts in place of 4 float adds per 4 floats. Not much of a savings, but with the loop and fetch overhead it may be worth it. I'll benchmark and see!  

It's really just for fun anyway. 

Bill Gribble

On Sep 7, 2011, at 20:59, Mathieu Bouchard <matju at artengine.ca> wrote:

> On Wed, 7 Sep 2011, Mathieu Bouchard wrote:
>> On Wed, 7 Sep 2011, Bill Gribble wrote:
>>> So far iteration on plain floats seems to be the best I can come up with, but HADDPS is tantalizingly close to what I want to do.  Any hints?
>> Once I thought that with some commutativity you could speed things up like this :
>> (f0+f1+f2+f3)+(f4+f5+f6+f7)+...
>> can be rearranged as :
>> (f0+f4+...)+(f1+f5+...)+(f2+f6+...)+(f3+f7+...)
> But what I said does not apply to your case, because you want a scan, whether I didn't really read and assumed a fold.
> I don't know how to optimise a scan.
> _______________________________________________________________________
> | Mathieu Bouchard ---- tél: +1.514.383.3801 ---- Villeray, Montréal, QC

More information about the Pd-list mailing list