[PD] crazy weird stuff happening.

Mon Nov 1 00:01:42 CET 2004

On Sun, 2004-10-31 at 07:04, Tom Dunstan wrote:
> convinced that i'm being excessive. it runs fine for about 20 minutes 
> then all of a sudden the whole system goes into a melt down. the mouse 
> is really slow and X looks as if its going to give up. (in some cases it 
> does). my computer goes all crazy and the hard disk light is flashing 
> like mad. the cpu meters (both pd and linux 'top' ) don't seem to be 
> peaking out, its hovering around 30%.

a couple ideas....

1) bad sectors on the disk.  ide controllers tend to induce *long*
delays when they hit a bad sector.  use the smartmon tools to get a
diagnostic.  syslog might say something about this too.

2) running out of ram (i.e., attempting to allocate more than what is
available).  this is very likely since you mention that X is getting
killed. linux seems to get really sluggish if you run out of ram.
eventually it will kill one or more processes to free up space, but it
can take a while before it decides what process to kill. I've noticed
that it seems to hit the disk a lot during this time, even if you don't
have swap enabled (not sure why, maybe its making an emergency swap
area), the load average also goes way up which is correlated with the
sluggish interactivity. since you've tested with multiple versions of
pd, I'd guess some external or other process is leaking memory / left
hanging, etc.

I manage some remote servers for work, and there was one that would
consistently "crash" after about 2 weeks of uptime.  in fact, it was not
offline, you could ping it, but it could not even spawn a shell so i was
unable to login and diagnose it -- eventually i determined that there
was a cron job that was not exiting, so every day there would be a new
copy of this process hanging around until the machine ran out of memory
and essentially froze.

regarding the latency with jack, etc.  first of all, if you really care
about latency, i think its important to actually measure roundtrip
latency with a loopback test, the reason being that the audio subsystem
is suprisingly complex, i.e., functionally its a black box (or at least,
a nearly opaque one).   block size/count at the application level is
just that -- the underlying driver and/or hardware could easily be doing
something completely different (e.g., some devices actually operate with
a fixed block size, alsa also has the plughw layer which can really
confuse things).  a program might claim to have x latency but in fact
the true latency is 2x or some other number.  it never ceases to amaze
me how lazy people are when it comes to testing (myself included, i
suppose ;).  

I think that jack is a truely remarkable piece of work, and its quite
possible that it simply "understands" how to use the alsa api to get the
best possible performance (although this discussion is meaningless sans
loopback testing).  in any case the cpu overhead is negligible unless
you are dithering/resampling, so I would not worry about it.

cheers & best of luck,
andy.

-- 
Andrew (Andy) W. Schmeder
<andy \at a2hd \dot com>
http://www.a2hd.com/