Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Memory leak in TLS connection ERROR when HEP is enabled for TLS #3496

Open
zooptwopointone opened this issue Oct 16, 2024 · 15 comments
Open
Assignees
Labels

Comments

@zooptwopointone
Copy link

OpenSIPS version you are running

flags: STATS: On, DISABLE_NAGLE, USE_MCAST, SHM_MMAP, PKG_MALLOC, Q_MALLOC, F_MALLOC, HP_MALLOC, DBG_MALLOC, FAST_LOCK-ADAPTIVE_WAIT
ADAPTIVE_WAIT_LOOPS=1024, MAX_RECV_BUFFER_SIZE 262144, MAX_LISTEN 16, MAX_URI_SIZE 1024, BUF_SIZE 65535
poll method support: poll, epoll, sigio_rt, select.
git revision: 78cdea979
main.c compiled on  with gcc 12

Describe the bug
There appears to be a Memory leak in PKG memory for TLS connections when HEP is enabled for TLS.

To Reproduce

Using Wolf SSL

modparam("tls_mgm", "server_domain", "REDACTED")
modparam("tls_mgm", "certificate", "[cert1]/etc/opensips/tls/cert1.pem")
modparam("tls_mgm", "private_key", "[cert1]/etc/opensips/tls/cert1.key")
modparam("tls_mgm", "tls_method", "[cert1]TLSv1_2")
modparam("tls_mgm", "ca_dir", "[cert1]/etc/ssl/certs")
modparam("tls_mgm", "require_cert", "[cert1]0")
modparam("tls_mgm", "verify_cert", "[cert1]0")
modparam("tls_mgm", "client_domain", "client1")
modparam("tls_mgm", "certificate", "[client1]/etc/opensips/tls/cert1.pem")
modparam("tls_mgm", "private_key", "[client1]/etc/opensips/tls/cert1.key")
modparam("tls_mgm", "tls_method", "[client1]TLSv1_2")
modparam("tls_mgm", "ca_dir", "[client1]/etc/ssl/certs")
modparam("tls_mgm", "require_cert", "[client1]0")
modparam("tls_mgm", "verify_cert", "[client1]0")
modparam("proto_tls", "tls_max_msg_chunks", 8)

loadmodule "proto_hep.so"
loadmodule "tracer.so"
modparam("proto_hep", "hep_id", "[hep_dst]1.1.1.1:9060;transport=udp;version=3")
modparam("proto_hep", "homer5_on", 1)
modparam("proto_hep", "hep_capture_id", 2870)
modparam("tracer", "trace_on", 1)
modparam("tracer", "trace_id", "[tid]uri=hep:hep_dst")
modparam("proto_tls", "trace_on", 1)
modparam("proto_tls", "trace_destination", "hep_dst")

Running this in a loop, which connects sends nothing over and closes the connection. So that TLS fails.

while true ; do  nc SIPSERVER 5061 >/dev/null < /dev/zero; done

Will result in this error message, which also starts to cause problems with real calls.

2024-10-16T20:24:56.256222+00:00 ID3270LA /usr/sbin/opensips[2500793]: WARNING:core:fm_malloc: not enough contiguous free pkg memory (8 bytes left, need 168), attempting defragmentation... please increase the "-M" command line parameter!
2024-10-16T20:24:56.256222+00:00 ID3270LA /usr/sbin/opensips[2500793]: WARNING:core:fm_malloc: not enough contiguous free pkg memory (8 bytes left, need 168), attempting defragmentation... please increase the "-M" command line parameter!
2024-10-16T20:24:56.256331+00:00 ID3270LA /usr/sbin/opensips[2500793]: ERROR:core:fm_malloc: not enough free pkg memory (8 bytes left, need 168), please increase the "-M" command line parameter!
2024-10-16T20:24:56.256331+00:00 ID3270LA /usr/sbin/opensips[2500793]: ERROR:core:fm_malloc: not enough free pkg memory (8 bytes left, need 168), please increase the "-M" command line parameter!
2024-10-16T20:24:56.256429+00:00 ID3270LA /usr/sbin/opensips[2500793]: ERROR:proto_hep:create_hep3_message: no more pkg mem!
2024-10-16T20:24:56.256429+00:00 ID3270LA /usr/sbin/opensips[2500793]: ERROR:proto_hep:create_hep3_message: no more pkg mem!
2024-10-16T20:24:56.256525+00:00 ID3270LA /usr/sbin/opensips[2500793]: ERROR:proto_hep:add_hep_chunk: invalid call! bad input params!
2024-10-16T20:24:56.256525+00:00 ID3270LA /usr/sbin/opensips[2500793]: ERROR:proto_hep:add_hep_chunk: invalid call! bad input params!
2024-10-16T20:24:56.256621+00:00 ID3270LA /usr/sbin/opensips[2500793]: ERROR:core:create_trace_message: failed to add correlation id! aborting trace...!
2024-10-16T20:24:56.256621+00:00 ID3270LA /usr/sbin/opensips[2500793]: ERROR:core:create_trace_message: failed to add correlation id! aborting trace...!
2024-10-16T20:24:56.256716+00:00 ID3270LA /usr/sbin/opensips[2500793]: ERROR:core:trace_message_atonce: failed to create the message!
2024-10-16T20:24:56.256716+00:00 ID3270LA /usr/sbin/opensips[2500793]: ERROR:core:trace_message_atonce: failed to create the message!

Commenting out

modparam("proto_tls", "trace_on", 1)
modparam("proto_tls", "trace_destination", "hep_dst")

and doing the same test never results in failure. I used -m 64 -M 2 options to test on a system not taking any calls just to narrow down the problem. I tried looking into the code to see if I could spot where it was missing when a TLS connection fails, but it is a bit more involved than I can spend right now.

Expected behavior
TLS connection failures should cleanup memory used by HEP(speculation) just like successful connection end.

Relevant System Logs
Provided above.

OS/environment information

Debian 12
Opensips installed from Opensips package installation process provided by Opensips.org

Additional context

The TLS errors are high on our network as there is a LoadBalancer that makes a TCP connection to the port to validate reachability. It is currently not capable of negotiating TLS. This was not a problem for 2.4 version of Opensips.

@zooptwopointone zooptwopointone changed the title Memory leak in TLS connection ERROR when HEP is enabled for TLS [Bug] Memory leak in TLS connection ERROR when HEP is enabled for TLS Oct 17, 2024
@zooptwopointone zooptwopointone changed the title [Bug] Memory leak in TLS connection ERROR when HEP is enabled for TLS [BUG] Memory leak in TLS connection ERROR when HEP is enabled for TLS Oct 17, 2024
@bogdan-iancu
Copy link
Member

@zooptwopointone , please see here some info on how to deal with OOM issues - https://opensips.org/Documentation/TroubleShooting-OutOfMem. Try to get a pkg mem dump from the proc reporting the pkg OOM.

@bogdan-iancu bogdan-iancu self-assigned this Oct 18, 2024
@zooptwopointone
Copy link
Author

zooptwopointone commented Oct 29, 2024

dump.log

Here is the output of a server configured with 4MB of pkg memory after it started to have the problem.

Followed the link you provided and this is the result of running the opensips-cli .... command.

Copy link

Any updates here? No progress has been made in the last 15 days, marking as stale. Will close this issue if no further updates are made in the next 30 days.

@github-actions github-actions bot added the stale label Nov 14, 2024
@zooptwopointone
Copy link
Author

This is still an issue.

@stale stale bot removed the stale label Nov 19, 2024
Copy link

github-actions bot commented Dec 5, 2024

Any updates here? No progress has been made in the last 15 days, marking as stale. Will close this issue if no further updates are made in the next 30 days.

@github-actions github-actions bot added the stale label Dec 5, 2024
@zooptwopointone
Copy link
Author

No updates from me, just waiting to see if there is more information I can provide.

@stale stale bot removed the stale label Dec 7, 2024
Copy link

Any updates here? No progress has been made in the last 15 days, marking as stale. Will close this issue if no further updates are made in the next 30 days.

@github-actions github-actions bot added the stale label Dec 22, 2024
@zooptwopointone
Copy link
Author

This is still an issue.

@stale stale bot removed the stale label Dec 31, 2024
@zooptwopointone
Copy link
Author

Adding comment to not let it go stale.

Copy link

github-actions bot commented Feb 6, 2025

Any updates here? No progress has been made in the last 15 days, marking as stale. Will close this issue if no further updates are made in the next 30 days.

@github-actions github-actions bot added the stale label Feb 6, 2025
@zooptwopointone
Copy link
Author

Adding comment to not get stale again.

@zooptwopointone
Copy link
Author

@bogdan-iancu Sorry maybe I needed to tag you. the log you requested is attached above.

@github-actions github-actions bot removed the stale label Feb 7, 2025
@bogdan-iancu
Copy link
Member

@zooptwopointone , thanks for the heads up here. Yes, I see in the mem dump the 3733 hep chunks, still allocated. The question is (or more looking for confirmation) : this leak happens only when there is a TLS failure? if so, can you detail on the failure ?

@zooptwopointone
Copy link
Author

zooptwopointone commented Feb 9, 2025

@bogdan-iancu Yes this Only happens with TLS and HEP. I am not sure how to explain the failure further, besides that you get memory allocation issues causing new valid connections to fail. And If I remember right it would be any SIP Communication UDP/TCP/TLS at that point as non of them can Allocate memory.

From looking at the code. (Me being only Hacker level in C) is that the Normal TCP part of the connection can get established then it will do HEP. So if there is a failure with the TCP connection the HEP memory allocation never happens., But what I remember seeing for TLS is that once the TCP connection is Established it will allocate the HEP memroy then Negotiate TLS. and if that fails it didn't seem to have a place where it would Free() the memory for HEP on failure. Again I am not great at C and it has been quite a while ago since I was looking at it.

Let me know how else I can help.

Copy link

Any updates here? No progress has been made in the last 15 days, marking as stale. Will close this issue if no further updates are made in the next 30 days.

@github-actions github-actions bot added the stale label Feb 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants