[PD] parallelism in pd

Wed Apr 17 18:02:49 CEST 2002

On Wed, Apr 17, 2002 at 10:38:51AM -0400, Karl MacMillan wrote:
> On Tue, 2002-04-16 at 20:10, Miller Puckette wrote:
> > My understanding of this is all anecdotal (although I was heavily involved
> > at the hardware level of all this a few years ago.)   The reason Intel
> > processors are more likely to outrun available memory is that they have
> > higher CPU "front side bus" bandwidth than AMD processors, hence the same
> > memory system will be harder put to keep them happy.  I believe the
> > current P4 has 3.2 GB/sec FSb bandwidth whereas DDR 2100 memory has 2.1 GB/sec,
> > so unless you can keep your FSB idle 70% of the time your dual-processor-
> > plus-DDR2100 system, for example, will be limited by memory bandwidth, and
> > even in that case memory latency will be greater than for a uniprocessor
> > since the two CPUs will have to queue for memory accesses.
> > 
> 
> The only advantage of the dual processor systems, however, is that you
> get twice the cache. Depending on how much cache helps your particular
> application and how well the OS keeps the same process on the same cpu,
> you might get a boost from that. For some apps this might outweigh the
> increased memory latency. Also, does SSE or SSE2 have any of the cache
> control instructions like altivec? Altivec allows you to tell the cpu
> fill the cache with data from main memory and the streaming continues
> during interrupts and context switches (I think I remember that
> correctly). Also, you can mark the data in the cache as least recently
> used so that it gets flushed from the cache first. Pretty cool!
> 
> Miller, do you have any papers on how the dsp graphs were spread across
> the processors in the ISPW? Was the parallelism automatic or did the
> user have to do it explicitly?
> 
> Karl
> 

Hi again,

I agree that it's cool to effectively double cache size; indeed you double
everything else too (floating point units, e.g.).  Basically, the more the
merrier; usually, increasing resources does end up increasing throughput once
you learn how to take advantage of it.  We don't know to what extent this
increase will be limited by the central memory bottlenecks of today's
multiprocessors.

On that subject, I just read something in aceshardware.com that suggests that
a new AMD architecture will actually support multiple memory banks to match
multiple CPUs.  This could result in some real gains.

Personally, I've been content to use less massive computers and simply wait
the extra 6 months or so that it seems to take for the uniprocessors to
overtake any existing multiprocessor --- with many, many fewer system
headaches.  Plus, I save a lot of money that way, and my machine is easy
to carry around...

I wrote up the ISPW stuff as:

"FTS: A Real-time Monitor for Multiprocessor Music Synthesis." 
Computer Music Journal 15(3): pp. 58-67, 1991,

which you can get from 

http://www.crca.ucsd.edu/~msp/publications.html

The parallelism was explicit; at the top of each window you checked a box
to select which processor it lived in (so inlets/outlets could turn into
interprocessor paths.)  It wasn't great; people spent a lot of time reorganizing
patches among the (up to 6) processors...

cheers
Miller