[piksel] Re: [PD] Piksel'05

Christian Klippel ck at mamalala.de
Wed Aug 17 14:58:25 CEST 2005


hello all,

Am Mittwoch 17 August 2005 11:28 schrieb Gisle Fr0ysland:

[...snip...]

>
> I have often been surprised by linux developers' seemingly lack of interest
> for anything that goes on outside their own application. The general
> attituse seems to be that 'the other stuff is not my responsibility, so I
> just have to blindly rely on what other coders are doing'. This is
> specially true for video apps where everybody seems to be closing their
> eyes for things like the lack of a unified high-quality playback-engine and
> free hq codecs.
>

some words about optimized code.....

a long time ago i was coding the vdsp stuff for jmax, vdsp was meant as a 
video extension for jmax. to get the best performance, i banged my head into 
mmx coding, with good results.

but, coding in mmx (or sse, sse2... i can only speak for the intel-compatible 
world here, and when i say mmx, i also mean the sse/etc...) requires some 
carefull thought, and some knowledge how those commands are processed inside 
the cpu. i think that many people just dont want to go through that effort, 
or they simply dont see that this effort is _really_ needed to produce some 
mmx code that is faster than native c.

you can _not_ just use some mmx and think it will give you a boost. if you 
dont take care about the execution pipelines, word alignment, etc, it may 
even happen that the mmx code is actually not faster, or even slower than the 
c counterpart. this has to do with the way those instructions are scheduled.

for example, in mmx the instructions are pipelined. but not every instruction 
can be executed in any pipeline. some instructions can bes executed in both, 
some only in a specific pipeline. now, if you dont care about the instruction 
sequence in mmx code, it can happen that the pipeline is blocked, so that the 
next instruction has to wait. if that happens, also the cache is flushed, 
etc. the result is that there will be many cycles lost for that, plus the new 
fetching of instructions, cache fill, etc. taking care of that is known as 
instruction-pairing.

unless there is taken care of those facts, using mmx instructions will not 
give you much gain.

i guess that some of those constraints also apply to altivec, but im not sure.

for the interrested, here you can find the mmx code i had written for some 
image operations: http://mamalala.de/mmx_ops.tgz
they are really fast, feel free to use them wherever they make sense.

some good reads about mmx and optimizations:
http://www.hayestechnologies.com/en/techsimd.htm#SSE2
http://www.gamedev.net/reference/articles/article1987.asp
http://casl.csa.iisc.ernet.in/ComputerArchitecture/pentopt.html
(there was also another site, but i lost the link ... some mr. tomasi, with 
pixel32 or the like ....)

after i have finished my current hardware stuff, i would be happy to help 
people out with optimizing some code. or at least give some hints and tips.

> We should learn from the linux audio community and their effort to advocate
> the low-latency patches which is finally in the latest kernels.
> We not only need a video-jack but also a video-alsa!
>

yes !

> cheers
> -gisle
>

greetings,

chris

> ps - I hope we can continue these discussions during piksel05 and follow up
> on the efforts contained in the piksel video framework.





More information about the Pd-list mailing list