[PD] pd and tcp: what to do against crashes?

Sun Feb 22 22:25:15 CET 2009

On Sun, 2009-02-22 at 15:17 -0500, Martin Peach wrote:
> Roman Haefeli wrote:
> > On Sat, 2009-02-21 at 12:59 -0500, Martin Peach wrote:
> >> Hi Roman,
> >> I think it probably comes down to the code not checking for all possible 
> >> error conditions. 
> > 
> > cool, if it would be as simple as that.
> > 
> >> Under udp you can send as much as you like to 
> >> nonexistent receivers but tcp needs an active connection.
> >> Most likely the code is just assuming that everything is working properly.
> >> It sounds as though data being sent to a client whose connection has 
> >> just dropped but before it has timed out, will go into nevernever land 
> >> and the thread will hang.
> 
> After looking at the actual code, I think the above is not true. The TCP 
> stack will just keep trying to send the buffer until it times out; how 
> long that takes seems to be system dependent. I don't see why that 
> should cause Pd to crash.
> 
> > 
> > where is neverneverland?  i mean, in tcp protocol, the receiver has to
> > confirm, that it received the messages, so i guess, the sender needs to
> > keep all the messages, that were sent to the vanished client, but were
> > never confirmed, right? 
> >
> 
> Yes, the TCP code keeps trying to send for a while. From the code it 
> looks like an error "tcp_server: send blocked xxx msec" should be 
> printed if the send() function doesn't return quickly, but I think that 
> will only happen if there is some local problem with the network.
> The send() man page says:
> "When the message does not fit into the send buffer of the socket, 
> send() normally blocks, unless the socket has been placed in 
> non-blocking I/O mode. In non-blocking mode it would return EAGAIN in 
> this case. The select(2) call may be used to determine when it is 
> possible to send more data. "
> 
> So I guess it's plausible that Pd is getting stuck when the send buffer 
> is overrun (in blocking mode send() doesn't return until there is some 
> room in the buffer, although it does return if the buffer is not full 
> even if it can't be sent). The error message will never get printed 
> because send has blocked forever.
> 
> I think netserver uses the exact same code.

good to know, since it appears to have the exact same problem.

> I guess they should either be using select() to see if a socket is 
> writeable before calling send() on it, or opening the socket in 
> non-blocking mode and checking for errors like EAGAIN, and in either 
> case shut down a socket whose send buffer is full.

hm.. i doubt, that this is a good idea. in the current implementation of
all [net*] and [tcp*] classes, it is very likely to hit a buffer
overrun, you only need to send some amount of messages in zero logical
time and the socket would be closed. i guess, either would those classes
handle this kind of situation in a more intelligent way (don't know yet,
what this means, though), or there needs to be more control in
userspace. i already mentioned it before: if every net class would
output a bang, whenever the send buffer is emptied, one could design a
patch in a manner, that it only sends messages, if the other end is
listening and buffer is not full. this way it would even be possible to
have transmission at maximum available bandwidth. i don't know how this
could be achieved without giving at least that amount of control into
userspace. 

> A way around it could be to have the clients always reply to messages, 
> then have the server shut down the connections that don't answer in time.

yeah, this would work with [tcpserver], but not with [netserver]: it
doesn't provide a method for closing connections, afaik. 

but to me it sounds awkward to reimplement task a in a higher level,
that should be done at tcp level. i don't think, that a protocol over
tcp should work this way. also it would make message based data
transmission very slow, since for each message, that should be send, you
would need to wait the time of latency twice. 

> In playing around with [tcpclient] and web servers I noticed that the 
> server always closes the connection as soon as each request has been 
> answered, so that problem doesn't really arise for Apache.

you're right. actually, i can't think of many setups, that are similar
to what i described in my first post of the thread: one server with many
clients constantly staying connected. it seems to be the least trivial
setup.

roman

___________________________________________________________ 
Der frühe Vogel fängt den Wurm. Hier gelangen Sie zum neuen Yahoo! Mail: http://mail.yahoo.de