[PD] pd and tcp: what to do against crashes?

Martin Peach martin.peach at sympatico.ca
Sat Feb 21 18:59:05 CET 2009


Hi Roman,
I think it probably comes down to the code not checking for all possible 
error conditions. Under udp you can send as much as you like to 
nonexistent receivers but tcp needs an active connection.
Most likely the code is just assuming that everything is working properly.
It sounds as though data being sent to a client whose connection has 
just dropped but before it has timed out, will go into nevernever land 
and the thread will hang.
It would be nice to have a setup that could reliably reproduce the bug, 
then it would be much easier to fix. Probably having 2 machines 
connected and pulling the cable out of one at the right moment should do it.
Anyway I'll stop speculating now and have a look at the code...

Martin


Roman Haefeli wrote:
> hi all
> 
> i've been working now quite some time with setups, where different
> instances of pd spread over the world are connected with each other over
> another instance of pd (i.e. serverpatch). i tried different classes for
> establishing tcp connections between clients and servers, namely
> [netclient]/[netserver], [tcpclient]/[tcpserver] or a mix of the two. no
> matter, what configuration is used, server crashes are likely to happen
> from time to time. the 'server' means here the instance of pd, that is
> running the patch containing either [tcpserver] or [netserver]. crash
> means: pd is still running, but not responding. when i start pd with gui
> for debugging purposes, the gui is also still there, but doesn't
> respond.
> 
> when i am testing on my own, running several instance of pd on my local
> box (or on some more boxes, i have access to), everything runs fine,
> even under heavy load of data being exchanged between the clients. at
> most, there are some drop-outs, but never crashes. however, when having
> a netpd-session with several people connected from everywhere, crashes
> happen much more often. from my experience, i can tell, that those
> crashes are more likely to happen, if one or more clients have an
> unreliable internet connection  (or weak wifi signal etc). since tcp is
> connection-aware - tcp requires connection establishment (handshake) but
> also connection termination - and some clients just disappear without
> proper termination, the server still expects them to be there. this is
> also indicated by the number of connected clients reported by the
> server: when a client loses connection and then reconnects, the number
> is higher than the real number of connected clients. if this happens
> several times, the reported number of connected clients raises, because
> connections weren't terminated correctly. 
> 
> now, when another client is sending 'broadcast' messages (messages meant
> to be sent to all connected clients), the server still tries to send the
> messages to the disappeared clients. 
> another situation: if the client, that disappeared, sent a dump request
> to another client just before vanishing, the other client will try to
> send the whole dump to the vanished client. i wonder now, what happens,
> if all those messages cannot be delivered by the server. i suspect this
> to be the cause of the crashes.
> 
> from the pd user side, there seems to be no way to address this issue,
> since there is no way for the server (i.e. the patch around
> [netserver]/[tcpserver]) to tell, if a client silently disappeared. so
> the server will still try to deliver all the messages. i am suspecting,
> that some buffer overrun occurs here, but i cannot tell really without
> understanding the code of [netserver] or [tcpserver]. also i don't know,
> at which level those buffer overruns would happen: somewhere in the
> external (netserver/tcpserver) code, in the pd code, or even in the
> kernel/OS? the only thing, that i know, is that i haven't seen apache or
> some other tcp server crashing because of clients having bad connection.
> so there must be a solution to this problem, but i don't know where to
> look for it. another problem is that, from a pd user perspective, one
> has very little control over the things happening at tcp level. if you
> need to send a big amount of data, there is no mechanism provided to
> send the data at maximum available bandwidth. so you either send
> everything at once, which fills the internal 4kb buffer of [net*] or
> [tcp*], so that a long drop-out occurs, until the buffer is emptied
> again. or the data is sent with time intervals between  each message in
> order to artificially reduce the bandwidth used. the latter approach has
> the disadvantage of not using the whole available bandwidth. also, in
> userspace you don't see, if a message could be delivered or not, which
> will, as described in above situations, lead to the  situation, that
> more messages will be sent to a non-existing receiver, which might fill
> some buffer, which _probaly_ leads to a crash of pd. 
> 
> because above problems, i came to the conclusion, that it is currently
> not possible to have several instances of pd connected with each other
> without the system  (i.e. one or more instances of pd) crashing from
> time to time. i know, that pd's main goal is computing audio and not
> networking, but still it would be a big benefit, if the the audio and
> networking would reliably work together in pd. 
> currently, i don't know what is the best approach to face those issues:
> giving more control to the userspace or make the net classes of pd less
> prone to clients not behaving 'correctly' at tcp level. i do know, that
> i will not be able to fix those issues myself, therefor i would like
> see, if more people are interested in helping to work this out. or if
> people think, that pd is the wrong tool to work with such setups, i
> would like to know that as well. 
> 
> oops.. sorry for the long post..
> 
> roman
> 
> 
> 		
> ___________________________________________________________ 
> Telefonate ohne weitere Kosten vom PC zum PC: http://messenger.yahoo.de
> 
> 
> _______________________________________________
> Pd-list at iem.at mailing list
> UNSUBSCRIBE and account-management -> http://lists.puredata.info/listinfo/pd-list
> 
> 





More information about the Pd-list mailing list