-
-
Notifications
You must be signed in to change notification settings - Fork 495
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Concurrent calls to nng_dial_start() for a "tls+tcp://" REQ socket are returning NNG_ECONNSHUT on Windows #1668
Comments
Here's a self-contained test for vs2022, with the output of one run on my machine. I highlighted the case where dialing a tls+tcp listener from multiple REQ sockets on different threads failed. Most of the time, all 4 dials fail, but sometimes one works. The prebuilt libs are this version of NNG: And mbed SHA: 981743d nng_version() returns "1.6.0pre", mbedtls_version_get_string "2.28.3". |
What do you mean by that? Do you mean Pair1 polyamorous mode? Because normal pair1 mode is just for 1 to 1 connections. |
Sorry, yes, I think I had enabled polyamorous mode. You can ignore that comment, it was just another configuration I tried to see if a multithreaded TLS client would fail for multiple protocols. |
I am also having the same issue with pair1 polyamorous mode thats why I was surprised with you reporting that it works. |
Ah, I see. Sorry for the poor detail on the PAIR1 experiment - I'm not 100% sure how I configured it. |
The peer has to be polyamorous if you're dialing multiple times. Generally it's not a good idea to keep redialing like this -- if you want to be resilient against a server that isn't listening, us NNG_FLAG_NONBLOCK with your dialing -- it will keep retrying in the background (but you don't get a synchronous notification of the connection) I'll take a look at your test code. |
The example is REQ/REP. I don't see anything fundamentally wrong. Perhaps there is something weird going on in mbedTLS. |
So one thing that may matter is peer authentication options for TLS+TCP. I haven't looked at your specific certs and options, but that can make a big difference. Having said that, I see you're using NNG_OPT_TLS_AUTH_MODE_NONE, so none of that should matter. |
Yep, this issue is specifically about req/rep. Never mind the pair1 detail. |
I still haven't had a chance to look at this. Can you tell me, are you using the current main/tip from NNG, and what version of mbedTLS are you using? |
I have the same issue and these are my environment details
|
SO I'm starting to think that there is something different in the mbedTLS 3.0 API -- perhaps some part of it that I expect to be concurrent is not. |
Btw, using mbedTLS 2.4.0 works fine. |
Okay good info, I'll put 2.4 through our testing. |
@jritts can you adjust your application and see if it still is an issue? I'm thinking we had conflated several issues here, and the 50 msec timeouts might have been conflating the picture somewhat. |
I suspect mbedTLS 2.28 (which is the LTS release of it) probably works fine too. That is the latest supported LTS version I think. Version 3.x of mbedTLS (there are many) may not be TLS... and I need to explore it further. |
Will double check but I do think this is a distinct issue - I uploaded a standalone VS project in an earlier comment which had reproduced it for me. We only see it in windows. I haven't yet tested with the newer version of mbedtls - hope to do so later today. |
I suspect that the root cause here might be also a bogus timer handing that basically resulted in premature wakeups of the dialer. I just discovered this today. If you're in a position to try out the main branch (which is now in heavy development, so YMMV) please do so. |
I am opening multiple REQ sockets from different threads to the same REP socket, and observing nng_dial_start() returning NNG_ECONNSHUT. This is on Windows.
Strangely, nng_dial_start() only returns that error most of the time; once in a while, the tls+tcp connection succeeds.
If I switch both nodes to "tcp://" (and bypass the TLS config calls), dialing always succeeds.
With the identical REP code, but simply switching the listener to the wss:// schema, browser websocket connections always succeed.
No such issues with tls+tcp PAIR1 connections.
The text was updated successfully, but these errors were encountered: