Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] TCP sessions not retried when closing #3544

Open
spacetourist opened this issue Dec 19, 2024 · 0 comments
Open

[BUG] TCP sessions not retried when closing #3544

spacetourist opened this issue Dec 19, 2024 · 0 comments

Comments

@spacetourist
Copy link
Contributor

spacetourist commented Dec 19, 2024

OpenSIPS version you are running

version: opensips 3.4.8 (x86_64/linux)
flags: STATS: On, DISABLE_NAGLE, USE_MCAST, SHM_MMAP, PKG_MALLOC, Q_MALLOC, F_MALLOC, HP_MALLOC, DBG_MALLOC, FAST_LOCK-ADAPTIVE_WAIT
ADAPTIVE_WAIT_LOOPS=1024, MAX_RECV_BUFFER_SIZE 262144, MAX_LISTEN 16, MAX_URI_SIZE 1024, BUF_SIZE 65535
poll method support: poll, epoll, sigio_rt, select.
git revision: 57ffb3b7a
main.c compiled on 00:00:00 Aug 21 2024 with gcc 11

Describe the bug
Every day my OpenSIPs instances get a few examples of calls failing due to an async HTTP request timing out. The error is as follows:

ERROR:rest_client:_resume_async_http_req: async GET timed out

I managed to get a capture of this taking place and can see OpenSIPs is reusing a connection for a few requests within a second and that the remote (HAProxy) is timing out of 250ms and resetting the connection. OpenSIPs is racing against this and occasionally this fails as shown in the screen capture:

image

You can see the session is FIN just 1/1000000s before the send operation.

To Reproduce
I'm not sure how to reproduce, perhaps you'd need a similar environment with the server keeping the connection alive for 250ms.

Expected behavior
OpenSIPs would ideally detect this scenario and retry immediately and silently, at present it seems to wait for the entire request timeout (configured as 5s) before async returns and complains.

Relevant System Logs
Here is the 5s timeout shown from xlog:

2024-12-19T07:59:49.746085+00:00 opensips[1159185]: NOTICE:INVITE[1] (NEW CALL)
2024-12-19T07:59:54.479360+00:00 opensips[1159185]: ERROR:rest_client:_resume_async_http_req: async GET timed out
2024-12-19T07:59:54.479870+00:00 opensips[1159185]: ERROR:INVITE[1] [ASYNC_RESUME] data transmission timeout - return 504.

I also have a PCAP if required.

OS/environment information
Almalinux 9

Additional context
I have the ability to tune the server timeouts etc if that would minimise/remove the issue however I am keen to hear if there is anything that can be done OpenSIPs side to address the issue at source.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant