Bill Gribble
Wed Sep 7 13:17:23 CEST 2011

I am trying to code a simple operation using SSE2 instructions where possible.  I have a feeling that what I want to do is just a matter of a couple of shufps and haddps instructions but I can't get it. Lazyweb please help!

The operation is integration. I have a vector of 4 single floats (v4sf) and a carry-in float to start.  

For example

CI F0 F1 F2 F3 
5  1  0  10 -5


F0 F1 F2 F3
6  6  16  11

So far iteration on plain floats seems to be the best I can come up with, but HADDPS is tantalizingly close to what I want to do.  Any hints?

Bill Gribble

