alsamm (was Re: [PD-dev] Re: [PD] RME hammerfall)
Tim Blechmann
TimBlechmann at gmx.net
Tue Apr 19 19:49:15 CEST 2005
hi wini, hi devs
after some profiling, i figured out that the alsamm driver is burning a
lot of cpu during the alsamm_send_dacs ... output of "opreport -l /usr/local/bin/pd"
CPU: CPU with timer interrupt, speed 0 MHz (estimated)
Profiling through timer interrupt
samples % symbol name
29630 38.6436 alsamm_send_dacs
5847 7.6257 tabosc4_tilde_perform
5578 7.2749 block_prolog
4451 5.8050 copyvec_simd
4362 5.6889 testaddvec_simd
3119 4.0678 oss_send_dacs
2019 2.6332 peakvec_simd
1577 2.0567 sighip_perform
1560 2.0346 dsp_tick
1410 1.8389 testcopyvec_simd
978 1.2755 sigthrow_perfsimd
973 1.2690 env_tilde_accum_simd
834 1.0877 zerovec_simd
780 1.0173 sys_getrealtime
698 0.9103 sys_domicrosleep
659 0.8595 plus_perf_simd
<snip>
there are two loops that slow down the thing:
:
5313 4.8734 : for (i = 0, fp2 = fp1 + chn*sys_dacblocksize; i < oframes; i++,fp2++)
: {
2296 2.1060 : float s1 = *fp2 * F32MAX;
: /* better but slower, better never clip ;-)
: buf[i]= CLIP32(s1); */
3278 3.0068 : buf[i]= ((int) s1 & 0xFFFFFF00);
1052 0.9650 : *fp2 = 0.0;
: }
: }
and
253 0.2321 : for (chn = 0; chn < ichannels; chn++) {
:
60 0.0550 : t_alsa_sample32 *buf = (t_alsa_sample32 *) dev->a_addr[chn];
:
17254 15.8265 : for (i = 0, fp2 = fp1 + chn*sys_dacblocksize; i < iframes; i++,fp2++)
: {
: /* mask the lowest bits, since subchannels info can make zero samples nonzero */
10438 9.5744 : *fp2 = (float) ((t_alsa_sample32) (buf[i] & 0xFFFFFF00))
: * (1.0 / (float) INT32_MAX);
: }
: }
the problem is, that the samples have to be transfered from the sse registers
to the general purpose registers to do the bitmask operations:
: 80ba444: movaps %xmm2,%xmm1
845 0.7751 : 80ba447: movss (%edx),%xmm0
1451 1.3309 : 80ba44b: mulss %xmm1,%xmm0
311 0.2853 : 80ba44f: cvttss2si %xmm0,%eax
1262 1.1576 : 80ba453: xor %al,%al
1705 1.5639 : 80ba455: mov %eax,(%esi,%ecx,4)
1052 0.9650 : 80ba458: movl $0x0,(%edx)
4581 4.2020 : 80ba45e: add $0x1,%ecx
2 0.0018 : 80ba461: mov 0xffffffe8(%ebp),%ebx
664 0.6091 : 80ba464: add $0x4,%edx
: 80ba467: cmp %ebx,%ecx
4 0.0037 : 80ba469: jl 80ba447 <alsamm_send_dacs+0x12c>
and
: 80ba68e: movaps %xmm2,%xmm1
4652 4.2671 : 80ba691: mov (%esi,%ecx,4),%eax
12579 11.5382 : 80ba694: add $0x1,%ecx
: 80ba697: xor %al,%al
70 0.0642 : 80ba699: cvtsi2ss %eax,%xmm0
3665 3.3618 : 80ba69d: mulss %xmm1,%xmm0
2051 1.8813 : 80ba6a1: movss %xmm0,(%edx)
3737 3.4278 : 80ba6a5: add $0x4,%edx
888 0.8145 : 80ba6a8: mov 0xffffffe0(%ebp),%ebx
3 0.0028 : 80ba6ab: cmp %ebx,%ecx
: 80ba6ad: jl 80ba691 <alsamm_send_dacs+0x376>
i think the better way would be to hardcode these two loops with sse instructions,
at least for x86 ... not sure, if this is also a problem on the ppc platform ...
cheers... tim
--
mailto:TimBlechmann at gmx.de ICQ: 96771783
http://www.mokabar.tk
latest mp3: kMW.mp3
http://mattin.org/mp3.html
latest cd: Goh Lee Kwang & Tim Blechmann: Drone
http://www.geocities.com/gohleekwangtimblechmannduo/
After one look at this planet any visitor from outer space
would say "I want to see the manager."
William S. Burroughs
More information about the Pd-dev
mailing list