[PD] pd and tcp: what to do against crashes?

Sun Feb 22 21:17:32 CET 2009

Roman Haefeli wrote:
> On Sat, 2009-02-21 at 12:59 -0500, Martin Peach wrote:
>> Hi Roman,
>> I think it probably comes down to the code not checking for all possible 
>> error conditions. 
> 
> cool, if it would be as simple as that.
> 
>> Under udp you can send as much as you like to 
>> nonexistent receivers but tcp needs an active connection.
>> Most likely the code is just assuming that everything is working properly.
>> It sounds as though data being sent to a client whose connection has 
>> just dropped but before it has timed out, will go into nevernever land 
>> and the thread will hang.

After looking at the actual code, I think the above is not true. The TCP 
stack will just keep trying to send the buffer until it times out; how 
long that takes seems to be system dependent. I don't see why that 
should cause Pd to crash.

> 
> where is neverneverland?  i mean, in tcp protocol, the receiver has to
> confirm, that it received the messages, so i guess, the sender needs to
> keep all the messages, that were sent to the vanished client, but were
> never confirmed, right? 
>

Yes, the TCP code keeps trying to send for a while. From the code it 
looks like an error "tcp_server: send blocked xxx msec" should be 
printed if the send() function doesn't return quickly, but I think that 
will only happen if there is some local problem with the network.
The send() man page says:
"When the message does not fit into the send buffer of the socket, 
send() normally blocks, unless the socket has been placed in 
non-blocking I/O mode. In non-blocking mode it would return EAGAIN in 
this case. The select(2) call may be used to determine when it is 
possible to send more data. "

So I guess it's plausible that Pd is getting stuck when the send buffer 
is overrun (in blocking mode send() doesn't return until there is some 
room in the buffer, although it does return if the buffer is not full 
even if it can't be sent). The error message will never get printed 
because send has blocked forever.

I think netserver uses the exact same code.
I guess they should either be using select() to see if a socket is 
writeable before calling send() on it, or opening the socket in 
non-blocking mode and checking for errors like EAGAIN, and in either 
case shut down a socket whose send buffer is full.

A way around it could be to have the clients always reply to messages, 
then have the server shut down the connections that don't answer in time.

In playing around with [tcpclient] and web servers I noticed that the 
server always closes the connection as soon as each request has been 
answered, so that problem doesn't really arise for Apache.

Martin