Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reference counting changes #1951

Open
wants to merge 21 commits into
base: main
Choose a base branch
from
Open

Reference counting changes #1951

wants to merge 21 commits into from

Conversation

gdamore
Copy link
Contributor

@gdamore gdamore commented Nov 29, 2024

This converts the main part of NNG to use reference counting atomics efficiently, instead of some other hacky approaches using locks. It should be safer, and faster both!

Copy link

codecov bot commented Nov 29, 2024

Codecov Report

Attention: Patch coverage is 83.73984% with 60 lines in your changes missing coverage. Please review.

Project coverage is 68.09%. Comparing base (ec714f0) to head (6726fc1).

Files with missing lines Patch % Lines
src/core/pipe.c 76.27% 4 Missing and 10 partials ⚠️
src/nng.c 59.25% 6 Missing and 5 partials ⚠️
src/sp/transport/socket/sockfd.c 70.96% 3 Missing and 6 partials ⚠️
src/core/aio.c 63.15% 1 Missing and 6 partials ⚠️
src/core/socket.c 90.41% 2 Missing and 5 partials ⚠️
src/core/dialer.c 88.46% 0 Missing and 3 partials ⚠️
src/supplemental/websocket/websocket.c 93.18% 1 Missing and 2 partials ⚠️
src/core/listener.c 92.00% 0 Missing and 2 partials ⚠️
src/platform/posix/posix_tcpdial.c 75.00% 0 Missing and 1 partial ⚠️
src/sp/protocol/pubsub0/sub.c 0.00% 0 Missing and 1 partial ⚠️
... and 2 more

❗ There is a different number of reports uploaded between BASE (ec714f0) and HEAD (6726fc1). Click for more details.

HEAD has 3 uploads less than BASE
Flag BASE (ec714f0) HEAD (6726fc1)
4 1
Additional details and impacted files
@@             Coverage Diff             @@
##             main    #1951       +/-   ##
===========================================
- Coverage   81.87%   68.09%   -13.79%     
===========================================
  Files          94       93        -1     
  Lines       24006    20392     -3614     
  Branches     3199     3047      -152     
===========================================
- Hits        19655    13886     -5769     
+ Misses       4276     3966      -310     
- Partials       75     2540     +2465     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Operations that might be performed during teardown, such as reaping,
waiting, closing, freeing, should only be done if the aio has properly
been initialized.  This is important for certain simple cases where
inline aio objects are used, and initialization of an outer object can
fail before the enclosed aio is initialized.
Once a context has started the process of close, further attempts
to close it will return NNG_ECLOSED.  What was I thinking to ever
do anything else?
This starts by using this for the nni_pipe, but we will use it
for the other primary objects as well.  This should simplify
the tear down and hopefully eliminate some races.

It does mean that pipe destruction goes through an additional
context switch, for now at least.  This shouldn't be on the hot
data path anyway.
This uses simple reference counters for now that should be simpler,
and hopefully more reliable.
This is a major change, but it should eliminate some of the problems
we have seen with use-after-free bugs in shutdown.  It should also
be faster as we don't need to use locks as much.
This updates the pipe to use contiguous data for the transport data
as well as the pipe protocol data.  It updates sockfd to use this, and
eliminates the need for the sockfd transport to do its own asynchronous
reaping, thereby hopefully closing a shutdown race.

The other transports will shortly get the same treatment.

Also fixed valgrind complaint about uninitialized data in the socket test.
This avoids certain kinds of challenging deadlocks during finalization,
but it does require users of the optimized nni_aio_init function to
explicitly call nni_aio_stop before doing nni_aio_fini.

As a minor benefit, this should reduce the number of mutex entry/exit
blocks for very short lived objects (such as rapidly recycling contexts).
If an error occurs, the application gets to know about it.  There
cannot be external factors that cause us to spin for memory, since
this is not accessible via the network.
This particular problem reported by the sanitizer was unlikely to
have any real impact, but using this boolean and avoiding the
clearing of the expire_q avoids a possible NULL pointer dereference.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant