Skip to content

Conversation

sandersaares
Copy link

@sandersaares sandersaares commented Jul 22, 2025

Test to accompany bug report #853.

@seanmonstar
Copy link
Member

Thanks for providing a test case! What you describe in the issue sounds plausible, but I notice in this that flow control isn't touched, so neither side will give any more connection window space at all, so it will by design definitely not be able to make further progress.

);
}

thread::sleep(CHECK_FOR_PROGRESS_INTERVAL);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'd also want to use something like tokio::time::sleep(CHECK_FOR_PROGRESS_INTERVAL).await to not block any of the other tasks on this thread.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch - updated.

@sandersaares
Copy link
Author

sandersaares commented Jul 22, 2025

What you describe in the issue sounds plausible, but I notice in this that flow control isn't touched, so neither side will give any more connection window space at all, so it will by design definitely not be able to make further progress.

It is not clear what you mean by this. Why would flow control need to be touched? The connection has a non-zero window (before it gets allocated to pending_open requests that could not possibly use it, based on the theory above), so all the requests should be transmitted through that window. Am I misunderstanding something?

@seanmonstar
Copy link
Member

In HTTP/2, there's a window on each stream, and on the connection as a whole. And the default is fairly small, just 64kb. Flow control management with h2 is manual. Without the other "side" give more window on the connection, no stream will be allowed to send more data.

hyper will manage it for you automatically... And, perhaps you do in your larger app. But it is a reason why this specific test will hang. With 10kb bodies, it should stop around the 7th request (-ish, since prioritization/ordering of DATA frames is not specified).

@sandersaares
Copy link
Author

If this test were failing because the connection window is not being extended, it would be failing also with CONCURRENCY = 1, wouldn't it? As configured in the PR, each increment of CONCURRENCY translates to 100 requests of 10 KB, so 1 MB of data, which would exceed a 64KB window.

Yet, with lower values of CONCURRENCY the test appears to pass just fine. Even 10 000 requests go through just fine with CONCURRENCY=1.

This suggests to me that the problem is not the lack of connection window extension.

@seanmonstar
Copy link
Member

OK, I've debugged a bit more, and was reminded that h2 will automatically clear connection window that a stream has used upon dropping all references to that particular stream, which is why there are WINDOW_UPDATE frames of stream-id=0 being sent. That clears that issue up.

Fiddling with the numbers in your test case, and adding a few more printlns, it seems there is a correlation between streams being refused and eventually the connection window not having capacity. I'll need to investigate that a little more.

@benjaminp
Copy link

The lack of calls to request_capacity() calls in this testcase means it completely relies on the streaming-dropping auto-release to keep the connection window open. From the client's perspective, when the CONCURRENCY is sufficiently large, it's possible that all open streams have partially transmitted request bodies and all local-half-closed streams have partially transmitted response bodies. Neither side can make progress because the window only changes when a stream completes. I think this state could explain the deadlock?

@sandersaares
Copy link
Author

sandersaares commented Aug 27, 2025

Sounds plausible. Forgive my ignorance but where in the Hyper stack does this "make sure you reserve enough connection window before you try sending something" go? In our real scenario where we found this behavior, we were just using Hyper as the higher layer above h2 and I posted this as an h2 bug simply in order to try minimize the repro.

Perhaps that was a mistake and I removed important machinery from the picture in my minimization attempt. If we take the full picture of a Hyper-based HTTP2 client, where would such logic go? I was under the assumption that Hyper would itself manage the different requests on the same connection. Is that assumption false? Or should we be looking at this as a Hyper bug, whereby it is failing to do that?

@seanmonstar
Copy link
Member

hyper does manage capacity for you automatically, yes. When I investigated some last month, I do believe there's a bug in h2 that is missing either a window update or release or something.

But my current priorities have me looking into other things at the moment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants