Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implementing gracefulClose. #417

Merged
merged 13 commits into from
Aug 28, 2019
Merged

Conversation

kazu-yamamoto
Copy link
Collaborator

@kazu-yamamoto kazu-yamamoto commented Jul 2, 2019

Suppose that we have an HTTP/2 server. When the server tries to close the connection, it sends a GOAWAY frame. And suppose that the server closes the socket immediately. Requests from the client may reach to the server after the socket is gone. In this case, TCP RST is sent from the server to the client. The client misunderstands that this communication failed.

To close sockets gracefully, we should do:

  • shutdown(2) TX to send TCP FIN
  • recv(2) to check if TCP FIN has been arrived from the peer. This operation should be timed out.
  • Both in normal and error cases, deallocate the socket with close(2)

To implement this, many things are missing in the network library:

  • Actual implementation of getSocketOption and setSocketOption for RecvTimeout
  • Setting sockets back to blocking. This is necessary because that both non-blocking sockets without timeout and blocking sockets with timeout return EWOULDBLOCK.
  • Receiving data from blocking sockets without triggering the IO manager.

This PR implements all above. To keep the signatures of getSocketOption and setSocketOption, I took the Windows way. That is, the argument is Int in milliseconds.

@kazu-yamamoto kazu-yamamoto requested review from eborden and Mistuke July 2, 2019 06:46
@kazu-yamamoto
Copy link
Collaborator Author

@Mistuke I implemented some stuffs for Windows but they are not perfect. Would you give look and consider how we can implement?

@kazu-yamamoto
Copy link
Collaborator Author

I would love to implement SendTimeout if necessary.

@lpeterse
Copy link

lpeterse commented Jul 2, 2019

Hi,

I'm curious about this one as this is a general problem and applies to more or less all TCP-based protocols. I just want to make sure my current understanding of the topic is correct or maybe I'd learn something here.

In the scenario you describe, the server sends GOAWAY and then immediately closes the socket (2 subsequent syscalls). Depending on how SO_LINGER is configured, the server's TCP stack transmits the GOAWAY data and then the socket goes into FIN_WAIT/TIME_WAIT/CLOSED state. The server is certainly no longer interested in client data as it initiated the active close.

The client may or may not receive the GOAWAY data.

  • In case it does, it knows this is the termination of the HTTP conversation from the server side. It may try to send more data from its own side and then send a GOAWAY as well and close the socket. If the client's TCP stack ACKs the GOAWAY and receives a RST (as you described) it would IMHO be perfectly fine to be notified about the broken connection as it's no longer of any use. It the server was meant to be receiving outstanding data from the client it shouldn't have closed the connection.
  • In case it does not, it will encounter a recv call returning 0 meaning there's no more data to come. I think in case of HTTP2 it may assume that the connection has been closed abnormally.

Endpoints SHOULD always send a GOAWAY frame before closing a connection so that the remote peer can know whether a stream has been partially processed or not. For example, if an HTTP client sends a POST at the same time that a server closes a connection, the client cannot know if the server started to process that POST request if the server does not send a GOAWAY frame to indicate what streams it might have acted on. An endpoint might choose to close a connection without sending a GOAWAY for misbehaving peers.

My question is: What improvement would it yield to additionally use the stream termination signalling on the TCP level? My suspicion is that the GOAWAY/GOAWAY mechanism has been introduced to overcome exactly this shortcoming of most TCP implementations (not all TCP stacks support or correctly implement shutdown).

@kazu-yamamoto
Copy link
Collaborator Author

kazu-yamamoto commented Jul 2, 2019

@lpeterse I think that you are confused.
The good termination is like this:

  1. S -> C: HTTP/2 GOAWAY (send)
  2. C -> S: TCP ACK
  3. S -> C: TCP FIN (shutdown TX)
  4. C -> S : TCP FIN (recv returns 0)
  5. S: close

What I tries to solve is that the current common pattern where close is called between 1. and 2.

  1. S -> C: HTTP/GOAWAY (send) and close
  2. C -> S: TCP ACK
  3. S -> C: TCP RST

We can use shutdown TX instead of close but this solves only normal cases. What about the case where TCP FIN does not come back from the client? We need to call close for error cases when shutdown fails or recv fails/reaches timeout.

In my understanding, the combination of LINGER and close does not help. You should read "UNIX network programming" to understand this.

@Mistuke
Copy link
Collaborator

Mistuke commented Jul 4, 2019

@kazu-yamamoto I'm currently on holiday and traveling for the next 3 weeks and didn't take a laptop with me. But I'll take a look sometime tomorrow.

@eborden
Copy link
Collaborator

eborden commented Jul 10, 2019

Sorry I've been on vacation, I'll get to reviewing this soon.

@kazu-yamamoto
Copy link
Collaborator Author

It appeared that gracefulShutdown stops everything when waiting for FIN since it is a blocking socket. I need to consider workaround.

@kazu-yamamoto
Copy link
Collaborator Author

kazu-yamamoto commented Jul 11, 2019

The following code works as expected:

gracefulClose :: Socket -> Int -> IO ()
gracefulClose s tm = (sendRecvFIN `E.finally` close s) `E.catch` ignore
  where
    sendRecvFIN = do
        -- Sending TCP FIN.
        shutdown s ShutdownSend
        -- Waiting TCP FIN.
        void $ timeout (tm * 1000) $ recv s 1024
        return ()

    ignore (E.SomeException _e) = return ()

But I don't want to use timeout. I'm now thinking of another approach based on threadDelay.

@kazu-yamamoto
Copy link
Collaborator Author

@eborden I re-implemented gracefulClose. I think this does not use extra resources such as Haskell thread (timeout) or native thread (safe FFI call). Would you please review this version?

@Mistuke Would you suggest how to implement recvBufNoWait for Windows?

@Mistuke
Copy link
Collaborator

Mistuke commented Jul 18, 2019

@kazu-yamamoto sorry for the late reply. I'm currently on vacation somewhere up a mountain and have no laptop.

I'll respond more in detail when I'm back this weekend.

But this is a bit tricky to implement using the non-async interface. On Windows the blocking mode is determined by how the handle was opened. So in order to get recv to not block and also block on the same handle then in recvBufNoWait you have two options (IOCP would have been able to handle this easily)

You can 1) temporarily switch the handle into non-blocking mode using

u_long mode = 1;  // 1 to enable non-blocking socket
ioctlsocket(sock, FIONBIO, &mode);

Or 2) you can first peek the socket to see if any data is available and only call recv after you're sure it won't block since data is available using

unsigned long l;
ioctlsocket(s, FIONREAD, &l);

The problem here though is that this call returning a 0 does not distinguish between no data being available and the socket being closed.

Documentation for both of these can be found here https://docs.microsoft.com/en-us/windows/win32/winsock/winsock-ioctls

Probably switching the blocking mode is the easiest way. But it may give unexpected result if multiple functions access the same socket at the same time.

@kazu-yamamoto
Copy link
Collaborator Author

@Mistuke Current implementation keeps non-blocking, not switch to blocking.

@Mistuke
Copy link
Collaborator

Mistuke commented Jul 22, 2019

@kazu-yamamoto I don't quite follow. It's non-blocking on Linux yes since recv won't block. But on windows there's no guarantee for this as network doesn't open the handle and GHC never opens handles in non-blocking. Or am I missing something?

@kazu-yamamoto
Copy link
Collaborator Author

@Mistuke Sorry. I'm not familiar with Windows. My comment above would not make sense.

@Mistuke
Copy link
Collaborator

Mistuke commented Jul 22, 2019

So the only problem I can see with converting the handle temporarily to non-blocking is that in a threaded program this may interfere with a blocking send if they're called at the same time.

So I think FIONREAD approach is the best option. Do you have a way for me to test if it is working as expected @kazu-yamamoto? I'll be back this weekend and can take a stab at it.

@kazu-yamamoto
Copy link
Collaborator Author

  • Server
  1. cabal install http2
  2. Run the following code:
{-# LANGUAGE OverloadedStrings #-}
module Main (main) where

import Control.Concurrent (forkFinally)
import Control.Exception (SomeException(..))
import qualified Control.Exception as E
import Control.Monad (forever, void)
import Data.ByteString.Builder (byteString)
import Network.HTTP.Types (ok200)
import Network.HTTP2.Server
import Network.Socket

main :: IO ()
main = runTCPServer "80" $ \s _peer -> runHTTP2Server s
  where
    runHTTP2Server s = E.bracket (allocSimpleConfig s 4096)
                                 (\config -> run config server)
                                 freeSimpleConfig
    server _req _aux sendResponse = sendResponse response []
      where
        response = responseBuilder ok200 header body
        header = [("Content-Type", "text/plain")]
        body = byteString "Hello, world!\n"

runTCPServer :: String -> (Socket -> SockAddr -> IO a) -> IO a
runTCPServer port server = withSocketsDo $ do
    addr <- resolve
    E.bracket (open addr) close loop
  where
    resolve = do
        let hints = defaultHints {
                addrFlags = [AI_PASSIVE]
              , addrSocketType = Stream
              }
        head <$> getAddrInfo (Just hints) Nothing (Just port)
    open addr = do
        sock <- socket (addrFamily addr) (addrSocketType addr) (addrProtocol addr)
        setSocketOption sock ReuseAddr 1
        withFdSocket sock $ setCloseOnExecIfNeeded
        bind sock $ addrAddress addr
        listen sock 1024
        return sock
    -- for "close"
    loop sock = forever $ do
        (conn, peer) <- accept sock
        void $ forkFinally (server conn peer) (clear conn)
    clear conn _ = close conn `E.catch` ignore
      where
        ignore (SomeException _) = return ()
    -- for "gracefulClose"
{-
    loop sock = forever $ do
        (conn, peer) <- accept sock
        void $ forkFinally (server conn peer) (const $ gracefulClose conn 5000)
-}
  • Client
  1. Download h2spec
  2. Type h2spec to test localhost:80
  3. You can see some errors relating to GOAWAY.
  • After gracefulClose is implemented
  1. Modify runTCPServer according to the comments
  2. Do the same thing. No failure should happen.

@kazu-yamamoto
Copy link
Collaborator Author

@Mistuke Any progress?

@Mistuke
Copy link
Collaborator

Mistuke commented Jul 31, 2019 via email

@Mistuke
Copy link
Collaborator

Mistuke commented Aug 3, 2019

I have a patch https://gist.github.com/Mistuke/1a69e57cd17c59f3e809274500cc5ec0 but have so far been unable to compile the example. having some cabal issues. Since I'm only after seeing if recvBufNoWait works correctly I can probably just use a testcase that only uses network no?

@kazu-yamamoto
Copy link
Collaborator Author

OK. I will try to make a test case only with network.

@Mistuke
Copy link
Collaborator

Mistuke commented Aug 4, 2019

So how do you compile the other testcase? Do you just install network globally? I was trying to use cabal v2-build and v2-repl -b http2 which definitely builds http2 but it's not available in the repl. I then tried sandboxes which gave a similar error. I can just install it globally if that is how you do it

@kazu-yamamoto
Copy link
Collaborator Author

I install everything globally. I don't use repl.

@kazu-yamamoto
Copy link
Collaborator Author

@Mistuke I added your patch. Since git am does not handle your gistfile, I just left your credit as comment. If you don't like please let me know.

Anyway, I also added a test case. Let's wait for AppVeyor.

@kazu-yamamoto
Copy link
Collaborator Author

@Mistuke AppVeyor fails due to the new test case. Please check it out!

@Mistuke
Copy link
Collaborator

Mistuke commented Aug 10, 2019

diff --git a/Network/Socket/Buffer.hsc b/Network/Socket/Buffer.hsc
index 0f71ccd..36ad50b 100644
--- a/Network/Socket/Buffer.hsc
+++ b/Network/Socket/Buffer.hsc
@@ -147,22 +147,20 @@ recvBufNoWait s ptr nbytes = withFdSocket s $ \fd -> do
     alloca $ \ptr_bytes -> do
       res <- c_ioctlsocket fd #{const FIONREAD} ptr_bytes
       avail <- peek ptr_bytes
-      r <- if res == #{const NO_ERROR} then
+      r <- if res == #{const NO_ERROR} && avail > 0 then
                c_recv fd (castPtr ptr) (fromIntegral nbytes) 0{-flags-}
            else if avail == 0 then
                -- Socket would block, could also mean socket is closed but
                -- can't distinguish
                return (-1)
-           else return $ fromIntegral avail
-      if r >= 0 || avail == 0 then
-        return $ fromIntegral r
-        else do
-          err <- c_WSAGetLastError
-          if err == #{const WSAEWOULDBLOCK}
-             || err == #{const WSAEINPROGRESS} then
-              return (-1)
-            else
-              return (-2)
+           else do err <- c_WSAGetLastError
+                   if err == #{const WSAEWOULDBLOCK}
+                       || err == #{const WSAEINPROGRESS} then
+                       return (-1)
+                     else
+                        return (-2)
+      return $ fromIntegral r
+
 #else
     r <- c_recv fd (castPtr ptr) (fromIntegral nbytes) 0{-flags-}
     if r >= 0 then

that should do it. Sorry for the long delay! I'm a bit swamped with work these days.

@kazu-yamamoto
Copy link
Collaborator Author

@Mistuke I works well! Thanks!

@kazu-yamamoto
Copy link
Collaborator Author

@eborden Would you review and merge this PR?

@kazu-yamamoto
Copy link
Collaborator Author

@nh2 May I ask you to review this PR, too?

@nh2
Copy link
Member

nh2 commented Aug 13, 2019

Yes, I will take a look.

@kazu-yamamoto
Copy link
Collaborator Author

Gentle ping @eborden @nh2

clock = 200
-- shutdown sometime returns ENOTCONN.
-- Probably, we don't want to log this error.
ignore (E.SomeException _e) = return ()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd advise using E.IOException instead to avoid ignoring async exceptions. Especially given the usage of threadDelay, it's reasonable that an external thread may end up trying to kill this operation.

Copy link
Member

@nh2 nh2 Aug 20, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Here you're ignoring all exceptions, not only ENOTCONN; we'd have to check errno for that.
  • While it may be acceptable that close ignores exceptions (see
    -- | Close the socket. This function does not throw exceptions even if
    -- the underlying system call returns errors.
    --
    -- If multiple threads use the same socket and one uses 'fdSocket' and
    -- the other use 'close', unexpected behavior may happen.
    -- For more information, please refer to the documentation of 'fdSocket'.
    close :: Socket -> IO ()
    ), I don't think it's OK here, because you're catching this not only over close s but also sendRecvFIN. I don't think we should ingore exceptions of sendRecvFIN. It calls shutdown, which has proper errno raising behaviour, and we should re-raise those exceptions. At least all the ones that aren't ENOTCONN, but even in that case I'm not sure why it's legitimate to not throw on ENOTCONN; the comment probably needs expansion. The user could also call accidentally call gracefulClose on an unconnected socket; wouldn't we want to tell them that they did something wrong by raising the ENOTCONN?
  • Somewhat unrelatedly, @snoyberg and I think that even close should probably not silently discard EBADF as it currently does, because that's a user error. (The other errnos leave the socket in undefined state so there it's probably more OK.)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@snoyberg You are right. As a quick response, I changed it to IOExeption. But I should consider @nh2's comments later.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nh2 I will consider error handling for gracefulClose more. Error handling for other APIs should be discussed in #151.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nh2 It seems to me that my experience was based on the previous wrong implementation using unsafe calls. Using the current implementation, I don't need to ignore ENOTCONN anymore since it does not occur in tests. So, let's remove ignore.

r <- recvBufNoWait s buf bufSize
let delay' = delay + clock
when (r == -1 && delay' < tm) $ do
threadDelay (clock * 1000)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of looping with a threadDelay, what if we had an MVar which can be filled with either a MoreData or TimeoutTripped data constructor. We use registerTimeout to set a timeout to putMVar the TimeoutTripped constructor, and registerFd to putMVar the MoreData. Then we don't need an arbitrary delay length and avoid any kind of arbitrary delays or busy-looping.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@snoyberg Does 5ee9828 implement what you want?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Mistuke GHC for Windows does not provide GHC.Event. Should we use the previous busy-loop implementation for Windows?

-- application which can read it. So, let's stop receiving
-- to prevent attacks.
r <- recvBufNoWait s buf bufSize
let delay' = delay + clock
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here you're counting the time you asked to sleep, not the time that was actually slept (which can be a lot more when the system is busy).

I think if we implement loop-based sleeping (which we may not have to do based on @snoyberg's comment), we should re-check the time that has passed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nh2 We have to use threadDelay on WIndows due to IO manager's limitation. Though your point is absolutely right, network should not depends on time nor hourglass unfortunately. Do you have any ideas to check the time using base only?

@Mistuke
Copy link
Collaborator

Mistuke commented Aug 22, 2019

@kazu-yamamoto yes that is the only way to get it to work for now. Ev.registerFd is never going to work on Windows since it's I/O is not polling based.

@kazu-yamamoto
Copy link
Collaborator Author

I rescued Windows and GHC 7.8 on Unix. CIs are now green.

@kazu-yamamoto
Copy link
Collaborator Author

If base does not provide functionality to check time, I think that this PR is ready for merging.

kazu-yamamoto added a commit to kazu-yamamoto/network that referenced this pull request Aug 28, 2019
@kazu-yamamoto kazu-yamamoto merged commit 2f62dd3 into haskell:master Aug 28, 2019
@kazu-yamamoto kazu-yamamoto deleted the close branch August 28, 2019 03:34
@kazu-yamamoto
Copy link
Collaborator Author

Merged. Thank you all for reviewing!

@kazu-yamamoto
Copy link
Collaborator Author

FYI:
https://kazu-yamamoto.hatenablog.jp/entry/2019/09/20/165939

fisx added a commit to wireapp/wire-server that referenced this pull request Oct 1, 2019
…!)"

This reverts commit a3bbd89.

The approach is not viable because of snoyberg/http-client#394
The fact that we don't have haskell/network#417 in our
version of the network package probably doesn't help either.
fisx added a commit to wireapp/wire-server that referenced this pull request Oct 1, 2019
…!)"

This reverts commit a3bbd89.

The approach is not viable because of snoyberg/http-client#394
The fact that we don't have haskell/network#417 in our
version of the network package probably doesn't help either.
fisx added a commit to wireapp/wire-server that referenced this pull request Oct 2, 2019
…!)"

This reverts commit a3bbd89.

The approach is not viable because of snoyberg/http-client#394
The fact that we don't have haskell/network#417 in our
version of the network package probably doesn't help either.
fisx added a commit to wireapp/wire-server that referenced this pull request Oct 15, 2019
…!)"

This reverts commit a3bbd89.

The approach is not viable because of snoyberg/http-client#394
The fact that we don't have haskell/network#417 in our
version of the network package probably doesn't help either.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants