[PD-dev] gcc 4.1 and auto-vectorization

Wed Nov 22 01:32:30 CET 2006

As a short follow-up, that's the skeleton for a DSP function that can  
get auto-vectorized (under gcc 4.0.1/PPC at least):

#include <stdlib.h>

#define VECELEMS 4
#define ALIGNMENT (sizeof(float)*(VECELEMS))
#define ALIGNED(ptr) (((size_t)(ptr)&((ALIGNMENT)-1)) == 0)

typedef float *__restrict__ __attribute__((aligned(ALIGNMENT)))  
aligned_float_ptr;

void addfun(int n,float *dst,const float *src1,const float *src2)
{
     int i,j;
     if(ALIGNED(dst) && ALIGNED(src1) && ALIGNED(src2)) {
         aligned_float_ptr d = (aligned_float_ptr)dst;
         aligned_float_ptr s1 = (aligned_float_ptr)src1;
         aligned_float_ptr s2 = (aligned_float_ptr)src2;

         int nv = n/VECELEMS;
         /* this loop will be auto-vectorized */
         for(i = 0; i < nv; ++i,d += VECELEMS,s1 += VECELEMS,s2 +=  
VECELEMS)
             for(int a = 0; a < VECELEMS; ++a)
                 d[a] = s1[a]+s2[a];

         n -= nv*VECELEMS;
         for(i = 0; i < n; ++i)
             d[i] = s1[i]+s2[i];
     }
     else {
         for(i = 0; i < n; ++i)
             dst[i] = src1[i]+src2[i];
     }
}

Of course, in C++ this can be made much more flexible using templates.
Looking at the assembly output is not recommended - it's a mess. It's  
much better to code similar functionality using the vector primitives  
that gcc and MSVC provide.

best greetings,
Thomas

Am 20.11.2006 um 00:16 schrieb Thomas Grill:

>
> Am 19.11.2006 um 22:57 schrieb Mathieu Bouchard:
>
>> On Sun, 19 Nov 2006, Thomas Grill wrote:
>>> Am 18.11.2006 um 22:16 schrieb Mathieu Bouchard:
>>>> perhaps it would be a good start to reimplement newbytes(n)  
>>>> using memalign(16,n) instead of malloc(n).
>>> A few years ago i introduced aligned memory allocation in the pd- 
>>> devel branch.
>>
>> I see how you did it. Is it because posix_memalign() isn't as  
>> portable as we'd like it to be? (I wrote "memalign" by mistake,  
>> which is the name of a deprecated function that does a similar job)
>>
>> It seems like a lot of memory is allocated unaligned. Is that  
>> normal? If the memory allocations you've align cover the most  
>> speed-critical memory, then why did Tim say that about memory  
>> alignment?
>
> The point is that i only introduced and used the aligned memory  
> functions for the SIMD codelets, which are used for DSP and array  
> processing. I'm sure that there are aligned memory allocation  
> functions for either platform (maybe not necessarily  
> posix_memalign...), but i wanted to stay as close as possible to  
> the original PD memory functions.
> I don't think it makes much sense to use aligned memory for  
> anything else than DSP and tables. If one wanted to use it with  
> auto-vectorization the header code would be much the same as the  
> one in the DSP perform functions, with some casting to aligned  
> pointers, so that the compiler knows about it. Aliasing is another  
> thing, though.
>
> greetings,
> Thomas
>
>
> _______________________________________________
> PD-dev mailing list
> PD-dev at iem.at
> http://lists.puredata.info/listinfo/pd-dev
>
>