Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stop unstaked nodes from pushing EpochSlots into the cluster #5141

Merged
merged 6 commits into from
Mar 14, 2025

Conversation

alexpyattaev
Copy link

@alexpyattaev alexpyattaev commented Mar 4, 2025

Problem

  • EpochSlots is 70% of gossip traffic
  • Unstaked nodes do not need to send it

Summary of Changes

  • Prevent them from sending the message

Fixes #
Partially #5034

@alexpyattaev alexpyattaev force-pushed the epoch_slots_unstaked branch from 038f5b2 to d20ce72 Compare March 4, 2025 20:11
@alexpyattaev
Copy link
Author

@gregcusack @bw-solana please take a look if this is what we need to stop unstaked nodes from polluting gossip

@gregcusack
Copy link

@gregcusack @bw-solana please take a look if this is what we need to stop unstaked nodes from polluting gossip

just to confirm and based off of side convo, we are holding off on this until epochslots are ready to be fully removed, right?

@alexpyattaev
Copy link
Author

just to confirm and based off of side convo, we are holding off on this until epochslots are ready to be fully removed, right?

As far as I understand we can do this right away since EpochSlots, when made by unstaked nodes, do not really do much other than pollute gossip (since repair will prioritize staked nodes anyway, and unstaked nodes do not take part in consensus).

@alexpyattaev alexpyattaev marked this pull request as ready for review March 7, 2025 11:47
@alexpyattaev
Copy link
Author

alexpyattaev commented Mar 7, 2025

@alessandrod mentioned that FD are happy with EpochSlots gone, can we at least start testing this simple solution that will cut bandwidth by 50%? Waiting for a complete solution can take months.

Copy link

@bw-solana bw-solana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a couple comments on the code.

But it sounds like we need to get aligned on direction...

My understanding is EpochSlots are used in repair and ancestor hash repair sampling services. If unstaked nodes stop pushing out EpochSlots, what is the expected behavior change? More concentrated repair/sampling load on the staked nodes?

If we're comfortable with the behavior changes for these services, this seems like high impact gossip bandwidth reduction.

@@ -79,6 +79,14 @@ impl ClusterSlotsService {
cluster_slots_update_receiver: ClusterSlotsUpdateReceiver,
exit: Arc<AtomicBool>,
) {
let node_id = cluster_info.id();
let my_stake = bank_forks

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we need to derive each of these in the loop since they can change

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed in df0ffb9

&cluster_slots_update_receiver,
&cluster_info,
);
// only staked nodes push EpochSlots into CRDS

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be better to include the "why" here

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed in df0ffb9

@AshwinSekar
Copy link

My understanding is EpochSlots are used in repair and ancestor hash repair sampling services

Speaking from the consensus angle, one of the conditions for kicking off ancestor hashes repair is when we observe 52%+ on a dead block from EpochSlots. If unstaked nodes do not push EpochSlots this doesn't matter.

However when sampling for ancestor hashes repair (unlike normal repair) we select peers purely based on EpochSlots. For correctness it does not matter if we exclude unstaked nodes here, since we must have already seen that enough staked nodes have frozen this slot. It might add latency to only sample from staked nodes, however if we're in a situation that is reliant on ancestor hashes repair the cluster is already temporarily stuck.

I think the more important factor is whether we want to restrict regular repair to only staked nodes.

@alexpyattaev
Copy link
Author

Regular repair already heavily prefers staked nodes. Basically the weight function is literally just node's stake, and unstaked nodes are given a stake of 1.

@alexpyattaev alexpyattaev force-pushed the epoch_slots_unstaked branch from df0ffb9 to 92475ba Compare March 8, 2025 18:21
@bw-solana
Copy link

The code changes on this PR LGTM as far as the mechanics of removing EpochSlots, but I want to make sure everyone is on board that we aren't accidentally rugging any downstream services.

It sounds like we are cleared to remove EpochSlots from gossip for unstaked nodes from a concensus (ancestor hash repair sampling) perspective according to @AshwinSekar (correct me if this is wrong).

Are we okay from a repair perspective? Any additional code changes we would need to make this work? @behzadnouri ?

Any concerns for FD? CC @ptaffet-jump

Comment on lines 105 to 111
let my_stake = bank_forks
.read()
.unwrap()
.root_bank()
.current_epoch_stakes()
.node_id_to_stake(&node_id)
.unwrap_or_default();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in e3df68e

@alexpyattaev
Copy link
Author

Further digging - repair weights for unstaked nodes are set to 1, repair weights for staked nodes are in the millions, raw data from https://github.com/alexpyattaev/agave/blob/4a8e72c36ff6f36cdfe712af7b6edf2cc7825f59/core/src/repair/serve_repair.rs#L1087 looks like this:

14077635496876, 27799088767508, 100008717125, 1997717120, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1

Similar thing is going on with ancestor hashes

.filter_map(|(i, ci)| Some((slot_peers.get(ci.pubkey())? + 1, i)))

actual odds look like this:

len(weights)=5732 # total number of nodes in cluster
 max(weights)=13322909430537740 # max weight (in lamports)
 np.median(weights)=1.0 # yep, most nodes are unstaked
 sum(weights==1)=4405 # 4405 to be exact, that is their total weight for sampling too
sum(weights[weights>1])=377029995643360008 # that is total weight of all staked nodes
4405/377029995643360008 = 1.1683420552477143e-14 # chance of unstaked node getting picked at all

so I think we have very low odds of actually picking any unstaked node even today....

@bw-solana
Copy link

Further digging - repair weights for unstaked nodes are set to 1, repair weights for staked nodes are in the millions, raw data from https://github.com/alexpyattaev/agave/blob/4a8e72c36ff6f36cdfe712af7b6edf2cc7825f59/core/src/repair/serve_repair.rs#L1087 looks like this:

14077635496876, 27799088767508, 100008717125, 1997717120, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1

Similar thing is going on with ancestor hashes

.filter_map(|(i, ci)| Some((slot_peers.get(ci.pubkey())? + 1, i)))

actual odds look like this:

len(weights)=5732 # total number of nodes in cluster
 max(weights)=13322909430537740 # max weight (in lamports)
 np.median(weights)=1.0 # yep, most nodes are unstaked
 sum(weights==1)=4405 # 4405 to be exact, that is their total weight for sampling too
sum(weights[weights>1])=377029995643360008 # that is total weight of all staked nodes
4405/377029995643360008 = 1.1683420552477143e-14 # chance of unstaked node getting picked at all

so I think we have very low odds of actually picking any unstaked node even today....

this matches my understanding. My takeaway being this change would have immeasurable change to repair concentration on staked nodes or protocol security.

@gregcusack gregcusack self-requested a review March 13, 2025 19:06
behzadnouri
behzadnouri previously approved these changes Mar 13, 2025
Copy link

@behzadnouri behzadnouri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please wait for Ashwin to also approve

@jeffwashington
Copy link

I think the more important factor is whether we want to restrict regular repair to only staked nodes.

It appears @AshwinSekar is only concerned with restricting repair to only staked nodes.
It appears @alexpyattaev demonstrated with math that repair is already restricted to only staked nodes effectively.

I think there is value in getting this in and getting the testing going. We have spilled a lot of ink on epoch slots.

@wen-coding
Copy link

wen-coding commented Mar 13, 2025

I think the more important factor is whether we want to restrict regular repair to only staked nodes.

It appears @AshwinSekar is only concerned with restricting repair to only staked nodes. It appears @alexpyattaev demonstrated with math that repair is already restricted to only staked nodes effectively.

I think there is value in getting this in and getting the testing going. We have spilled a lot of ink on epoch slots.

Ashwin is OOO today, but I did chat with him earlier this week about this change. He thought it should be fine.

@alexpyattaev
Copy link
Author

side-note - an unstaked node will push a 3-4 epochslots messages on startup anyway even with this patch applied. it happens in a completely different part of the code that i missed, and only once, so no need to patch it.

@alessandrod
Copy link

backport to 2.2?

@bw-solana
Copy link

backport to 2.2?

I support this, but it would be good to make sure we've collected adequate signal for the "remove deprecated values from gossip pull messages" change on testnet to not confuse things.

@gregcusack - do we have confirmation yet? I think we're still around 40% Agave stake on testnet..

@gregcusack
Copy link

gregcusack commented Mar 14, 2025

no confirmation yet since we're still waiting for a little more stake on agave on testnet. tbh the believe risk on that backport is very low. but good to make sure.

@alexpyattaev alexpyattaev merged commit 145d562 into anza-xyz:master Mar 14, 2025
47 checks passed
@alexpyattaev alexpyattaev deleted the epoch_slots_unstaked branch March 14, 2025 06:28
@alessandrod alessandrod added the v2.2 Backport to v2.2 branch label Mar 14, 2025
Copy link

mergify bot commented Mar 14, 2025

Backports to the beta branch are to be avoided unless absolutely necessary for fixing bugs, security issues, and perf regressions. Changes intended for backport should be structured such that a minimum effective diff can be committed separately from any refactoring, plumbing, cleanup, etc that are not strictly necessary to achieve the goal. Any of the latter should go only into master and ride the normal stabilization schedule. Exceptions include CI/metrics changes, CLI improvements and documentation updates on a case by case basis.

mergify bot pushed a commit that referenced this pull request Mar 14, 2025
* stop unstaked nodes from pushing EpochSlots into the cluster
* reload own stake on every epoch in case I become staked
* use epoch specs to reduce contention for bank forks

---------

Co-authored-by: Alex Pyattaev <[email protected]>

Big thanks to Behzad for code suggestions.

(cherry picked from commit 145d562)
alexpyattaev added a commit that referenced this pull request Mar 21, 2025
…ackport of #5141) (#5286)

stop unstaked nodes from pushing EpochSlots into the cluster (#5141)

* stop unstaked nodes from pushing EpochSlots into the cluster
* reload own stake on every epoch in case I become staked
* use epoch specs to reduce contention for bank forks

---------

Co-authored-by: Alex Pyattaev <[email protected]>

Big thanks to Behzad for code suggestions.

(cherry picked from commit 145d562)

Co-authored-by: Alex Pyattaev <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
v2.2 Backport to v2.2 branch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants