-
Notifications
You must be signed in to change notification settings - Fork 408
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
make sure own identity does not get wiped during TVU deduplication #5380
base: master
Are you sure you want to change the base?
make sure own identity does not get wiped during TVU deduplication #5380
Conversation
Backports to the beta branch are to be avoided unless absolutely necessary for fixing bugs, security issues, and perf regressions. Changes intended for backport should be structured such that a minimum effective diff can be committed separately from any refactoring, plumbing, cleanup, etc that are not strictly necessary to achieve the goal. Any of the latter should go only into master and ride the normal stabilization schedule. Exceptions include CI/metrics changes, CLI improvements and documentation updates on a case by case basis. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most importantly, I think this solves the problem. Nice find!
That said, I would prefer to either assert that we will always find our own index or fully handle the case where we don't.
Left a few other nits along the way
turbine/src/cluster_nodes.rs
Outdated
// This node's index within shuffled nodes. | ||
let index = nodes.by_ref().position(pred).unwrap(); | ||
// This node's index should be somewhere within shuffled indices. | ||
let index = nodes.by_ref().position(pred)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're going to just unwrap/expect on the return value, might as well just unwrap here, closer to the root of the problem
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
addressed in 92202a0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
get_retransmit_addrs()
returns aResult
get_retransmit_addrs()
callsget_retransmit_peers()
(the function that this .unwrap() is in)
Why not have get_retransmit_peers()
return a Result
and allow the caller decide how to handle it ? While we have seemingly found the cause of this panic, I don't think it hurts to make our error handling more robust
@@ -310,6 +311,11 @@ pub fn new_cluster_nodes<T: 'static>( | |||
if broadcast { | |||
weighted_shuffle.remove_index(index[&self_pubkey]); | |||
} | |||
// Paranoid check to ensure the correct operation of the weighted shuffle. | |||
assert!( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe debug_assert
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does not get called very often (once every few seconds), and this ensures the thing crashes early (rather than in any of the unwraps in the code). Basically this is the check that would get hit if the invariant of "have my own key in the list" is ever violated in the future changes to dedup logic and/or if there are bugs in gossip.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't had the chance to review yet but will do so today. Please hold off on merging until I'm able to review
Problem
fn dedup_tvu_addrs
in cluster_nodes.rs is too aggressive and can accidentally delete node's own identity if a staked node was ever running on the same TVU socket address.Summary of Changes
fn dedup_tvu_addrs
does not accidentally wipe node's own identityFixes #5356