alsamm (was Re: [PD-dev] Re: [PD] RME hammerfall)

Tim Blechmann TimBlechmann at gmx.net
Tue Apr 19 19:49:15 CEST 2005


hi wini, hi devs 

after some profiling, i figured out that the alsamm driver is burning a
lot of cpu during the alsamm_send_dacs ... output of "opreport -l /usr/local/bin/pd"


CPU: CPU with timer interrupt, speed 0 MHz (estimated)
Profiling through timer interrupt
samples  %        symbol name
29630    38.6436  alsamm_send_dacs
5847      7.6257  tabosc4_tilde_perform
5578      7.2749  block_prolog
4451      5.8050  copyvec_simd
4362      5.6889  testaddvec_simd
3119      4.0678  oss_send_dacs
2019      2.6332  peakvec_simd
1577      2.0567  sighip_perform
1560      2.0346  dsp_tick
1410      1.8389  testcopyvec_simd
978       1.2755  sigthrow_perfsimd
973       1.2690  env_tilde_accum_simd
834       1.0877  zerovec_simd
780       1.0173  sys_getrealtime
698       0.9103  sys_domicrosleep
659       0.8595  plus_perf_simd
<snip>


there are two loops that slow down the thing:
               :                
  5313  4.8734 :        for (i = 0, fp2 = fp1 + chn*sys_dacblocksize; i < oframes; i++,fp2++)
               :          {
  2296  2.1060 :            float s1 = *fp2 * F32MAX;
               :            /* better but slower, better never clip ;-)
               :               buf[i]= CLIP32(s1); */
  3278  3.0068 :            buf[i]= ((int) s1 & 0xFFFFFF00);
  1052  0.9650 :            *fp2 = 0.0;
               :          }
               :      }

and

   253  0.2321 :      for (chn = 0; chn < ichannels; chn++) {
               :        
    60  0.0550 :        t_alsa_sample32 *buf = (t_alsa_sample32 *) dev->a_addr[chn];
               :      
 17254 15.8265 :        for (i = 0, fp2 = fp1 + chn*sys_dacblocksize; i < iframes; i++,fp2++)
               :          {
               :            /* mask the lowest bits, since subchannels info can make zero samples nonzero */
 10438  9.5744 :            *fp2 = (float) ((t_alsa_sample32) (buf[i] & 0xFFFFFF00))  
               :              * (1.0 / (float) INT32_MAX);
               :          }      
               :      }


the problem is, that the samples have to be transfered from the sse registers 
to the general purpose registers to do the bitmask operations:

               : 80ba444:       movaps %xmm2,%xmm1
   845  0.7751 : 80ba447:       movss  (%edx),%xmm0
  1451  1.3309 : 80ba44b:       mulss  %xmm1,%xmm0
   311  0.2853 : 80ba44f:       cvttss2si %xmm0,%eax
  1262  1.1576 : 80ba453:       xor    %al,%al
  1705  1.5639 : 80ba455:       mov    %eax,(%esi,%ecx,4)
  1052  0.9650 : 80ba458:       movl   $0x0,(%edx)
  4581  4.2020 : 80ba45e:       add    $0x1,%ecx
     2  0.0018 : 80ba461:       mov    0xffffffe8(%ebp),%ebx
   664  0.6091 : 80ba464:       add    $0x4,%edx
               : 80ba467:       cmp    %ebx,%ecx
     4  0.0037 : 80ba469:       jl     80ba447 <alsamm_send_dacs+0x12c>

and

               : 80ba68e:       movaps %xmm2,%xmm1
  4652  4.2671 : 80ba691:       mov    (%esi,%ecx,4),%eax
 12579 11.5382 : 80ba694:       add    $0x1,%ecx
               : 80ba697:       xor    %al,%al
    70  0.0642 : 80ba699:       cvtsi2ss %eax,%xmm0
  3665  3.3618 : 80ba69d:       mulss  %xmm1,%xmm0
  2051  1.8813 : 80ba6a1:       movss  %xmm0,(%edx)
  3737  3.4278 : 80ba6a5:       add    $0x4,%edx
   888  0.8145 : 80ba6a8:       mov    0xffffffe0(%ebp),%ebx
     3  0.0028 : 80ba6ab:       cmp    %ebx,%ecx
               : 80ba6ad:       jl     80ba691 <alsamm_send_dacs+0x376>


i think the better way would be to hardcode these two loops with sse instructions, 
at least for x86 ... not sure, if this is also a problem on the ppc platform ...

cheers... tim

-- 
mailto:TimBlechmann at gmx.de    ICQ: 96771783
http://www.mokabar.tk

latest mp3: kMW.mp3
http://mattin.org/mp3.html

latest cd: Goh Lee Kwang & Tim Blechmann: Drone
http://www.geocities.com/gohleekwangtimblechmannduo/

After one look at this planet any visitor from outer space 
would say "I want to see the manager."
				      William S. Burroughs




More information about the Pd-dev mailing list