[PD] Super computer made of legos and Raspberry Pi computers

Sun Sep 16 22:26:50 CEST 2012

On Sun, Sep 16, 2012 at 10:24:45AM -0300, Alexandre Torres Porres wrote:
> now my question is;
> 
> spending 4k to build a Pi supercomputer can give you more power and
> possibilities than with a top of the line MAC for example (which will cost
> just as much, and be a quad core 2.7 intel i7, 1.6GHz bus, 16GB Ram).

We keep using the word 'supercomputer', and maybe a bit of
perspective would help clarify matters of scale. 

Back in the mists of time /\/\/\ ...... wavy lines ...../\/\/\

A computer that a small business might own could be moved by one person
if they really needed the exercise. After the 1980s they were called 
microcomputers and you could pick one up and carry it.

A minicomputer had a special room of its own, and was between ten and 
maybe fifty times faster. You could get a good one for a hundred thousand 
dollars. Minis were generally for mid level industrial organisations.
Notice the power factor here between the everymans computer and the
"top of the range" generally available model, which has remained constant.
The biggest price differential is over the smallest value curve, as
you would expect in commercial mass market.

A mainframe was an order of magnitude more powerful than a standard
computer, having a whole floor to itself. Mainframes are generally
for bulk data processing and were owned by governments or very
large corporations. They were characterised by IO, rows of tape machines
and teleprinters, more like a giant computerised office.

A supercomputer is, by definition, that which is on the cutting edge of
feasible research. Most supercomputers are in a single location and not
distributed or opportunistic, they usually have a building dedicated to
them and a power supply suitable for a small town of a thousand homes
(a few MW). A team of full time staff are needed to run them. They cost a 
few hundred million to build and a few tens of millions per year to operate. 
Current supercomputers are measured in tens of Peta FLOPS, ten to a hundred 
times more powerful than the equivalent mainframe, and are primarily 
used for scientific modelling.

To put this operational scale versus nomenclature into todays terms 
(taking into account one order of magnitide shift in power );

A microcomputer would probably be classed as a wearable, embedded or
essentially invisible computer operating at a few tens or hundreds
of MFLOPS, costing between one and ten dollars and operating
from a lithium battery. If you have active RFID ID your credit card
probably has more CPU power than an early business computer.
The Raspberry Pi, gumsticks, and PIC based STAMPs occupy this spectrum. 

The word minicomputer now tends to denote a small desktop, notebook
or smartphone, or anything that is considered 'mini' compared
to the previous generation, and probably having the capabilities of a
full desktop from two or three years ago.

A powerful standard computer, the kind for a gaming fanatic or
at the heart of a digital music/video studio is about five to ten 
times as powerful as the smallest micro (a much smaller gap than 
one might think) despite the large difference in power consumption
and cost. Thse run at a few GFLOPS. 

What used to be a 'minicomputer' is now what might be used in a
commercial renderfarm, essentially a room of clustered boxes
costing tens of thousands of dollars and consuming a 
heavy domestic sized electricity bill. Total CPU power in
the range of 10 GFLOP to 1 TFLOP

The current guise of the 'mainframe' is what we would now see as a 
Data Center, a floor of an industrial unit, probably much like
your ISP or hosting company with many rows of racked indepenedent
units that can be linked into various cluster configurations 
for virtual services, network presence and data storage. 
Aggregate CPU power in the region of 10 TFLOP to 0.5 PFLOP

Supercomputers are still supercomputers, by definition they are
beyond wildest imagination and schoolboy fantasies unless
you happen to be a scientist who gets to work with them.
A bunch of lego bricks networked together does not give you 20PFLOP,
so it does not a supercomputer make. 

However, there is a different point of view emerging since the mid 
1990s based on concentrated versus distributed models. Since the 
clustering of cheap and power efficient microcomputers is now 
possible because of operating system and networking advances, 
we often hear of amazing feats of collective CPU power obtained 
by hooking together old Xboxes with GPUs, (Beowulf - TFLOP range)
or using opportunistic distributed networks to get amazing power 
out of unused cycles (eg SETI at home/BOINC and other volunteer 
arrays, or 'botnets' used by crackers) (tens to hundreds of TFLOPS).

Some guides to growth here with interesting figures on the estimated
cost per GFLOP over the last 50 years:

https://en.wikipedia.org/wiki/FLOPS

> I'm guessing that CPU wize it would be more powerful indeed; even thought
> it's a modest one, that's 64 cores against 4...

So the issue now is that a parallel model of computing needs the
problem cast into a program that works in this way. Some algorithms
are trivially rewritten to work well on clusters, but many are not.
The aggregate power isn't a full indicator of the expected speedup.
A multi-core has fast data connection between cores but little
memory for each processor, whereas a cluster may have GB of
memory associated with each node but much slower data throughput
between nodes.

> what I'm not familiar to is how supercomputing works and optimizes the work
> by splitting it into all CPU units. 

This is an important area of computer science. In summary, if the overhead 
of splitting a subproblem, sending it to node/core,
collecting the result and re-integrating it back into the end solution
is less than it would cost to compute it on a more powerful single node,
then you have a speedup. This is where algorithm design gets fun :) 

Message passing protocols serve to split up the data according to 
schemes that mirror the algorithm, a bit like routers in the internet.
Wavefront broadcast, bifurcation, all manner of schemes are used to
break up and reassemplbe the sub-processes. Anderw Tannenbaum wrote
one of the early and very accessible books on it all, called "Distributed
Operating Systems"

If _all_ the data needs to be present everywhere in the system then
distributed models fail because the data throughput problem starts
to dominate the advantage gained by parallel computation. So, only
certain kinds of program can be run on 'supercomputers' that work
this way. Your average desktop application like Protools probably
wouldn't benefit much running on the IBM Sequoia, because it isn't 
written to get advantage from that architecture. 

cheers,
Andy