Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Broken connection restore cannot happen faster than the connection timeout #426

Open
andriytk opened this issue May 8, 2019 · 1 comment

Comments

@andriytk
Copy link
Contributor

andriytk commented May 8, 2019

When an established connection is breaking for some reason (because the TCP Ack does not come within the TCP User Timeout, for example) re-establishing the connection is bound by the connection timeout - it just cannot happen faster. Consider the following scenario:

  1. The connection is breaking at A side and A.1 sends the connect request to B.
  2. B.2 receives the request, but the remote endpoint is still valid there, so it replies with the ConnectionRequestCrossed (code) and tries to probe the existing connection.
  3. The probing fails and B.1 sends its connection request to A.
  4. A.1 gets ConnectionRequestCrossed reply and gets stuck on the "resolved" MVar at findRemoteEndPoint until the connection timeout fires.
  5. A.2 receives the connection request and replies with ConnectionRequestCrossed.
  6. B.3 receives the ConnectionRequestCrossed reply and puts the "crossed" MVar.
  7. The connection timeout fires on A.1, it releases the "resolved" MVar and removes the stale endpoint. The new connection request is sent to B.
  8. B.2 immediately accepts the request (because the "crossed" MVar was already set).

Here is the log of the described scenario:

ssu1: May 07 11:27:05 ssu1 p[6344]: Network.Socket.ByteString.sendMany: total=188193 sent=100951
ssu1: May 07 11:27:05 ssu1 p[6344]: Network.Socket.ByteString.sendMany: total=87242 sent=65160
ssu1: May 07 11:27:06 ssu1 p[6344]: Network.Socket.ByteString.sendMany: total=44503 sent=26171
ssu1: May 07 11:27:06 ssu1 p[6344]: runScheduledAction: exception=ProcessLinkException pid://172.28.128.18:9070:0:485043 DiedNormal remote=172.28.128.19:9070:0 stack=["Network.Transport.TCP.runScheduledAction (src/Network/Transport/TCP.hs:(1946,1)-(1977,38))","Network.Transport.TCP.withScheduledAction.\\ (src/Network/Transport/TCP.hs:1985:34-71)","Network.Transport.TCP.withScheduledAction (src/Network/Transport/TCP.hs:(1983,1)-(1986,94))","Network.Transport.Internal.mapIOException (src/Network/Transport/Internal.hs:149:1-42)","Network.Transport.TCP.apiSend (src/Network/Transport/TCP.hs:(819,1)-(849,49))"]
ssu1: May 07 11:27:07 ssu1 p[6344]: Network.Socket.ByteString.sendMany: total=125734 sent=72780
ssu1: May 07 11:27:07 ssu1 p[6344]: handleIncomingMessages.prematureExit: err=user error (recvExact: Socket closed) st=RemoteEndPointValid
ssu1: May 07 11:27:07 ssu1 p[6344]: runScheduledAction: exception=Network.Socket.ByteString.sendMany: resource vanished (Broken pipe) remote=172.28.128.19:9070:0 stack=["Network.Transport.TCP.runScheduledAction (src/Network/Transport/TCP.hs:(1946,1)-(1977,38))","Network.Transport.TCP.withScheduledAction.\\ (src/Network/Transport/TCP.hs:1985:34-71)","Network.Transport.TCP.withScheduledAction (src/Network/Transport/TCP.hs:(1983,1)-(1986,94))","Network.Transport.TCP.apiSend (src/Network/Transport/TCP.hs:(819,1)-(849,49))","Network.Transport.TCP.apiConnect (src/Network/Transport/TCP.hs:(765,1)-(779,11))"]
ssu1: May 07 11:27:07 ssu1 p[6344]: runScheduledAction: remoteState=RemoteEndPointFailed: user error (recvExact: Socket closed)
ssu1: May 07 11:27:07 ssu1 p[6344]: runScheduledAction: exception=Network.Socket.ByteString.sendMany: timeout (Connection timed out) remote=172.28.128.19:9070:0 stack=["Network.Transport.TCP.runScheduledAction (src/Network/Transport/TCP.hs:(1946,1)-(1977,38))","Network.Transport.TCP.withScheduledAction.\\ (src/Network/Transport/TCP.hs:1985:34-71)","Network.Transport.TCP.withScheduledAction (src/Network/Transport/TCP.hs:(1983,1)-(1986,94))","Network.Transport.TCP.apiSend (src/Network/Transport/TCP.hs:(819,1)-(849,49))","Network.Transport.TCP.apiConnect (src/Network/Transport/TCP.hs:(765,1)-(779,11))"]
ssu1: May 07 11:27:07 ssu1 p[6344]: runScheduledAction: remoteState=RemoteEndPointFailed: Network.Socket.ByteString.sendMany: resource vanished (Broken pipe)
ssu1: May 07 11:27:07 ssu1 p[6344]: runScheduledAction: exception=Network.Socket.ByteString.sendMany: invalid argument (Bad file descriptor) remote=172.28.128.19:9070:0 stack=["Network.Transport.TCP.runScheduledAction (src/Network/Transport/TCP.hs:(1946,1)-(1977,38))","Network.Transport.TCP.withScheduledAction.\\ (src/Network/Transport/TCP.hs:1985:34-71)","Network.Transport.TCP.withScheduledAction (src/Network/Transport/TCP.hs:(1983,1)-(1986,94))","Network.Transport.TCP.apiSend (src/Network/Transport/TCP.hs:(819,1)-(849,49))","Network.Transport.TCP.apiConnect (src/Network/Transport/TCP.hs:(765,1)-(779,11))"]
ssu1: May 07 11:27:07 ssu1 p[6344]: runScheduledAction: remoteState=RemoteEndPointFailed: Network.Socket.ByteString.sendMany: timeout (Connection timed out)
ssu1: May 07 11:27:07 ssu1 p[6344]: resetIfBroken: remoteEP failed
ssu1: May 07 11:27:07 ssu1 p[6344]: removeRemoteEndPoint: 172.28.128.19:9070:0
ssu1: May 07 11:27:07 ssu1 p[6344]: createConnectionTo.go: ourAddr=172.28.128.18:9070:0 theirAddr=172.28.128.19:9070:0
ssu1: May 07 11:27:07 ssu1 p[6344]: findRemoteEndPoint: (addr, new?, id)=(172.28.128.19:9070:0,True,18)
ssu1: May 07 11:27:07 ssu1 p[6344]: socketToEndPoint: AddrInfo {addrFlags = [AI_ADDRCONFIG,AI_V4MAPPED], addrFamily = AF_INET, addrSocketType = Stream, addrProtocol = 6, addrAddress = 172.28.128.19:9070, addrCanonName = Nothing}
ssu1: May 07 11:27:07 ssu1 p[6344]: createConnectionTo.go: ourAddr=172.28.128.18:9070:0 theirAddr=172.28.128.19:9070:0
ssu1: May 07 11:27:07 ssu1 p[6344]: createConnectionTo.go: ourAddr=172.28.128.18:9070:0 theirAddr=172.28.128.19:9070:0
ssu1: May 07 11:27:07 ssu1 p[6344]: createConnectionTo.go: ourAddr=172.28.128.18:9070:0 theirAddr=172.28.128.19:9070:0
ssu1: May 07 11:27:07 ssu1 p[6344]: createConnectionTo.go: ourAddr=172.28.128.18:9070:0 theirAddr=172.28.128.19:9070:0
ssu5: May 07 11:27:08 ssu5 p[6375]: findRemoteEndPoint: (addr, new?, id)=(172.28.128.18:9070:0,False,2)
ssu5: May 07 11:27:08 ssu5 p[6375]: handleConnectionRequest.go: ConnectionRequestCrossed: 172.28.128.18:60992
ssu5: May 07 11:27:08 ssu5 p[6375]: ProbeSocket: 172.28.128.18:9070:0
ssu5: May 07 11:27:08 ssu5 p[6375]: handleIncomingMessages.prematureExit: err=Network.Socket.recvBuf: invalid argument (Bad file descriptor) st=RemoteEndPointValid
ssu5: May 07 11:27:08 ssu5 p[6375]: resetIfBroken: remoteEP failed
ssu5: May 07 11:27:08 ssu5 p[6375]: removeRemoteEndPoint: 172.28.128.18:9070:0
ssu5: May 07 11:27:08 ssu5 p[6375]: createConnectionTo.go: ourAddr=172.28.128.19:9070:0 theirAddr=172.28.128.18:9070:0
ssu5: May 07 11:27:08 ssu5 p[6375]: findRemoteEndPoint: (addr, new?, id)=(172.28.128.18:9070:0,True,17)
ssu5: May 07 11:27:08 ssu5 p[6375]: socketToEndPoint: AddrInfo {addrFlags = [AI_ADDRCONFIG,AI_V4MAPPED], addrFamily = AF_INET, addrSocketType = Stream, addrProtocol = 6, addrAddress = 172.28.128.18:9070, addrCanonName = Nothing}
ssu5: May 07 11:27:08 ssu5 p[6375]: createConnectionTo.go: ourAddr=172.28.128.19:9070:0 theirAddr=172.28.128.18:9070:0
ssu5: May 07 11:27:08 ssu5 p[6375]: createConnectionTo.go: ourAddr=172.28.128.19:9070:0 theirAddr=172.28.128.18:9070:0
ssu1: May 07 11:27:08 ssu1 p[6344]: createConnectionTo.go: ourAddr=172.28.128.18:9070:0 theirAddr=172.28.128.19:9070:0
ssu1: May 07 11:27:08 ssu1 p[6344]: createConnectionTo.go: ourAddr=172.28.128.18:9070:0 theirAddr=172.28.128.19:9070:0
ssu1: May 07 11:27:08 ssu1 p[6344]: findRemoteEndPoint: (addr, new?, id)=(172.28.128.19:9070:0,False,18)
ssu1: May 07 11:27:08 ssu1 p[6344]: handleConnectionRequest.go: ConnectionRequestCrossed: 172.28.128.19:36454
--------------------  connectTimeout is fired !!!
ssu1: May 07 11:27:09 ssu1 p[6344]: createConnectionTo.go: removeRemoteEndPoint 172.28.128.19:9070:0
ssu1: May 07 11:27:09 ssu1 p[6344]: removeRemoteEndPoint: 172.28.128.19:9070:0
ssu1: May 07 11:27:09 ssu1 p[6344]: findRemoteEndPoint: (addr, new?, id)=(172.28.128.19:9070:0,True,19)
ssu1: May 07 11:27:09 ssu1 p[6344]: socketToEndPoint: AddrInfo {addrFlags = [AI_ADDRCONFIG,AI_V4MAPPED], addrFamily = AF_INET, addrSocketType = Stream, addrProtocol = 6, addrAddress = 172.28.128.19:9070, addrCanonName = Nothing}
ssu1: May 07 11:27:09 ssu1 p[6344]: findRemoteEndPoint: (addr, new?, id)=(172.28.128.19:9070:0,False,19)
ssu1: May 07 11:27:09 ssu1 p[6344]: findRemoteEndPoint: (addr, new?, id)=(172.28.128.19:9070:0,False,19)
ssu1: May 07 11:27:09 ssu1 p[6344]: findRemoteEndPoint: (addr, new?, id)=(172.28.128.19:9070:0,False,19)
ssu1: May 07 11:27:09 ssu1 p[6344]: createConnectionTo.go: ourAddr=172.28.128.18:9070:0 theirAddr=172.28.128.19:9070:0
ssu1: May 07 11:27:09 ssu1 p[6344]: createConnectionTo.go: 172.28.128.19:9070:0 connId=1024
ssu1: May 07 11:27:09 ssu1 p[6344]: createConnectionTo.go: 172.28.128.19:9070:0 connId=1025
ssu1: May 07 11:27:09 ssu1 p[6344]: findRemoteEndPoint: (addr, new?, id)=(172.28.128.19:9070:0,False,19)
ssu1: May 07 11:27:09 ssu1 p[6344]: createConnectionTo.go: 172.28.128.19:9070:0 connId=1026
ssu1: May 07 11:27:09 ssu1 p[6344]: handleIncomingMessages: got CreatedNewConnection from 172.28.128.19:9070:0
ssu1: May 07 11:27:09 ssu1 p[6344]: createConnectionTo.go: 172.28.128.19:9070:0 connId=1027
ssu1: May 07 11:27:09 ssu1 p[6344]: Network.Socket.ByteString.sendMany: total=125734 sent=82536
ssu1: May 07 11:27:09 ssu1 p[6344]: Network.Socket.ByteString.sendMany: total=43198 sent=10136
ssu1: May 07 11:27:09 ssu1 p[6344]: handleIncomingMessages: got CreatedNewConnection from 172.28.128.19:9070:0
ssu5: May 07 11:27:09 ssu5 p[6375]: findRemoteEndPoint: (addr, new?, id)=(172.28.128.18:9070:0,True,17)
ssu5: May 07 11:27:09 ssu5 p[6375]: handleConnectionRequest.go: ConnectionRequestAccepted: 172.28.128.18:60994
ssu5: May 07 11:27:09 ssu5 p[6375]: findRemoteEndPoint: (addr, new?, id)=(172.28.128.18:9070:0,False,17)
ssu5: May 07 11:27:09 ssu5 p[6375]: findRemoteEndPoint: (addr, new?, id)=(172.28.128.18:9070:0,False,17)
ssu5: May 07 11:27:09 ssu5 p[6375]: handleIncomingMessages: got CreatedNewConnection from 172.28.128.18:9070:0
ssu5: May 07 11:27:09 ssu5 p[6375]: createConnectionTo.go: 172.28.128.18:9070:0 connId=1024
ssu5: May 07 11:27:09 ssu5 p[6375]: createConnectionTo.go: 172.28.128.18:9070:0 connId=1025
ssu5: May 07 11:27:09 ssu5 p[6375]: handleIncomingMessages: got CreatedNewConnection from 172.28.128.18:9070:0
ssu5: May 07 11:27:09 ssu5 p[6375]: handleIncomingMessages: got CreatedNewConnection from 172.28.128.18:9070:0
ssu5: May 07 11:27:09 ssu5 p[6375]: handleIncomingMessages: got CreatedNewConnection from 172.28.128.18:9070:0
ssu5: May 07 11:27:09 ssu5 p[6375]: Network.Socket.ByteString.sendMany: total=125661 sent=86880
ssu1: May 07 11:27:11 ssu1 p[6344]: Network.Socket.ByteString.sendMany: total=188257 sent=86880
ssu1: May 07 11:27:11 ssu1 p[6344]: Network.Socket.ByteString.sendMany: total=188193 sent=153488
ssu1: May 07 11:27:11 ssu1 p[6344]: Network.Socket.ByteString.sendMany: total=188257 sent=157832
ssu1: May 07 11:27:14 ssu1 p[6344]: Network.Socket.ByteString.sendMany: total=273038 sent=218648

In this case, the connection timeout (connectTimeout) was 2 seconds.

Is this a bug or it's how it is supposed to work?

Originally posted by @andriytk in haskell-distributed/network-transport-tcp#74 (comment)

@andriytk
Copy link
Contributor Author

andriytk commented May 8, 2019

A sequence diagram of how the crossed connection is handled usually to facilitate the problem understanding:
image

@LaurentRDC LaurentRDC transferred this issue from haskell-distributed/network-transport-tcp Sep 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants