Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

notification: Fix memory leak of pending substreams #296

Merged
merged 4 commits into from
Dec 3, 2024

Conversation

lexnv
Copy link
Collaborator

@lexnv lexnv commented Dec 2, 2024

This PR fixes a subtle memory leak that can happen in the following edge-case situation:

  • connection is established and substream outbound is initiated with remote peer
  • the substream ID is tracked until the substream either completes successfully or fails
  • the connection is closed soon after, leading to no substream events ever being generated

For this edge-cases, we need to remove the tracking of the substream ID when the connection is reported as closed.

This has been detected after running a node for more than 2 days with the following generic metrics PR:

Closes: #295

cc @paritytech/networking

@lexnv lexnv added the bug Something isn't working label Dec 2, 2024
@lexnv lexnv self-assigned this Dec 2, 2024
@lexnv lexnv requested a review from dmitry-markin December 2, 2024 09:34
NotificationError::Rejected,
)
.await;
}
// user either initiated an outbound substream or an outbound substream was
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docs need to be updated.

Signed-off-by: Alexandru Vasile <[email protected]>
lexnv added a commit that referenced this pull request Dec 3, 2024
Similar to #296, there is a
possibility of leaking memory in the following edge-case:
- T0: Connection is established and outbound substream is initiated with
peer
  - This maps the substream ID to the request bytes information
- T1: Connection is closed before the service has a chance to report
`TransportEvent::SubstreamOpened` or
`TransportEvent::SubstreamOpenFailure`

In this case, if we connect and immediately disconnect with a request in
flight, we are effectively leaking the request bytes.


Detected by:
- #294


### Dashboard

- We are leaking ~111 requests over 3 days timespan:

<img width="1484" alt="Screenshot 2024-12-03 at 10 41 01"
src="https://github.com/user-attachments/assets/f6701017-4add-4aa1-aee1-e1f8d33d54f3">


cc @paritytech/networking

Signed-off-by: Alexandru Vasile <[email protected]>
@lexnv lexnv merged commit af19055 into master Dec 3, 2024
8 checks passed
@lexnv lexnv deleted the lexnv/pending-outbound-fix branch December 3, 2024 14:07
lexnv added a commit that referenced this pull request Dec 3, 2024
## [0.8.3] - 2024-12-03

This release includes two fixes for small memory leaks on edge-cases in
the notification and request-response protocols.

### Fixed

- req-resp: Fix memory leak of pending substreams
([#297](#297))
- notification: Fix memory leak of pending substreams
([#296](#296))

cc @paritytech/networking

---------

Signed-off-by: Alexandru Vasile <[email protected]>
Co-authored-by: Bastian Köcher <[email protected]>
github-merge-queue bot pushed a commit to paritytech/polkadot-sdk that referenced this pull request Dec 4, 2024
## [0.8.3] - 2024-12-03

This release includes two fixes for small memory leaks on edge-cases in
the notification and request-response protocols.

### Fixed

- req-resp: Fix memory leak of pending substreams
([#297](paritytech/litep2p#297))
- notification: Fix memory leak of pending substreams
([#296](paritytech/litep2p#296))

cc @paritytech/networking

---------

Signed-off-by: Alexandru Vasile <[email protected]>
lexnv added a commit to paritytech/polkadot-sdk that referenced this pull request Dec 4, 2024
## [0.8.3] - 2024-12-03

This release includes two fixes for small memory leaks on edge-cases in
the notification and request-response protocols.

### Fixed

- req-resp: Fix memory leak of pending substreams
([#297](paritytech/litep2p#297))
- notification: Fix memory leak of pending substreams
([#296](paritytech/litep2p#296))

cc @paritytech/networking

---------

Signed-off-by: Alexandru Vasile <[email protected]>
Krayt78 pushed a commit to Krayt78/polkadot-sdk that referenced this pull request Dec 18, 2024
## [0.8.3] - 2024-12-03

This release includes two fixes for small memory leaks on edge-cases in
the notification and request-response protocols.

### Fixed

- req-resp: Fix memory leak of pending substreams
([paritytech#297](paritytech/litep2p#297))
- notification: Fix memory leak of pending substreams
([paritytech#296](paritytech/litep2p#296))

cc @paritytech/networking

---------

Signed-off-by: Alexandru Vasile <[email protected]>
dudo50 pushed a commit to paraspell-research/polkadot-sdk that referenced this pull request Jan 4, 2025
## [0.8.3] - 2024-12-03

This release includes two fixes for small memory leaks on edge-cases in
the notification and request-response protocols.

### Fixed

- req-resp: Fix memory leak of pending substreams
([paritytech#297](paritytech/litep2p#297))
- notification: Fix memory leak of pending substreams
([paritytech#296](paritytech/litep2p#296))

cc @paritytech/networking

---------

Signed-off-by: Alexandru Vasile <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

notifications: Investigate possible memory leak with pending outbound substream tracking
3 participants