Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tpu-client-next: add client selection to validator #5201

Merged

Conversation

KirillLykov
Copy link

@KirillLykov KirillLykov commented Mar 9, 2025

Problem

This PR adds a flag use_tpu_client_next to the validator configuration (without cla for now) which allows to use tpu-client-next instead of ConnectionCache for RPC.

Summary of Changes

Copy link

mergify bot commented Mar 9, 2025

If this PR represents a change to the public RPC API:

  1. Make sure it includes a complementary update to rpc-client/ (example)
  2. Open a follow-up PR to update the JavaScript client @solana/web3.js (example)

Thank you for keeping the RPC clients in sync with the server API @KirillLykov.

@KirillLykov KirillLykov force-pushed the klykov/add-client-selection-validator branch 2 times, most recently from 0be26b3 to 26233f7 Compare March 11, 2025 14:50
@KirillLykov KirillLykov added the v2.2 Backport to v2.2 branch label Mar 11, 2025
Copy link

mergify bot commented Mar 11, 2025

Backports to the beta branch are to be avoided unless absolutely necessary for fixing bugs, security issues, and perf regressions. Changes intended for backport should be structured such that a minimum effective diff can be committed separately from any refactoring, plumbing, cleanup, etc that are not strictly necessary to achieve the goal. Any of the latter should go only into master and ride the normal stabilization schedule. Exceptions include CI/metrics changes, CLI improvements and documentation updates on a case by case basis.

))?;

let client = TpuClientNextClient::new(
runtime.handle().clone(),
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we use for the client code the same runtime as used for RPC itself. When using ConnectionCache, it is contrary uses it's own runtime.

This runtime used by rpc-json server and internally we use it to do blocking spawn. It looks logical to use the same runtime also for sending over the network because it decreases the complexity of the system, increases utilization of the given runtime. Due to limited resources, it seems to be advantageous to use less runtimes to avoid competition between them for the idle core. But maybe there are might be some other considerations, @alexpyattaev .

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to pass the runtime handle from a higher layer function?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know why but the runtime was introduced to be owned by rpc instead of validator (as for other services). This is outside of the scope of this PR.

@@ -600,8 +790,10 @@ pub fn service_runtime(
// NB: `rpc_blocking_threads` shouldn't be set too high (defaults to num_cpus / 2). Too many
// (busy) blocking threads could compete with CPU time with other validator threads and
// negatively impact performance.
let rpc_threads = 1.max(rpc_threads);
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two checks were just moved inside this function, nothing new.

@KirillLykov KirillLykov marked this pull request as ready for review March 11, 2025 15:15
steveluscher
steveluscher previously approved these changes Mar 11, 2025
Copy link

@steveluscher steveluscher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exciting to see how this performs.

Is this ::create_foo() approach preferable to one of these?

  1. Split into JsonRpcServiceLegacy and JsonRpcService. Make the choice of the correct one based on the config. Signature stays ::new() for both. Share as much implementation as appropriate.
    if config.use_tpu_client_next {
        JsonRpcService::new(/* ... */)
    } else {
        JsonRpcServiceLegacy::new(/* ... */)
    }.map_err(ValidatorError::Other)?
  2. Keep JsonRpcService but change the ::new() method to accept the union of both configs, match on them, and do the right thing internally.
    let config = if config.use_tpu_client_next {
        JsonRpcServiceNextConfig { /* ... */ }
    } else {
        JsonRpcServiceConfig { /* ... */ }
    };
    let (json_rpc_service, client_updater) = JsonRpcService::new(
        config,
    ).map_err(ValidatorError::Other)?;

Anyway, whatever is fine if it works! Just suggestions.

@KirillLykov
Copy link
Author

KirillLykov commented Mar 12, 2025

@steveluscher the problem with approach (1) is that it breaks existing API of public crate RPC. So what can be done is to add a new JsonRpcServiceNext and leave JsonRpcService as it is. But than I need to extract trait to store in validator instance of JsonRpcService.

Regarding why I used create_ instead of new: I thought that since I have to return now not only the service object but also a client in form of NotifyKeyUpdate, new might be sort of confusing. But maybe a better solution would to use new that returns service object which has client as attribute (and get method)?

To the question why do we need this client to be returned: there is Admin RPC call that changes the validator identity and in this I need to close all the client connections and create a new connection with a new certificate (that uses this new validator key).

@alexpyattaev
Copy link

@steveluscher the problem with approach (1) is that it breaks existing API of public crate RPC. So what can be done is to add a new JsonRpcServiceNext and leave JsonRpcService as it is. But than I need to extract trait to store in validator instance of JsonRpcService.

You can break public API if you bump the version of the crate. As long as semver is observed breaking API is ok.

@KirillLykov
Copy link
Author

KirillLykov commented Mar 12, 2025

@steveluscher what do you think about introducing a Builder? see 14a08c7

Why i pass two arguments to build instead of having them as part of the Builder structure: because I must use &Keypair since Keypair is not Clone. Not sure if this build method looks nice. But the good part is that number of lines added to validator.rs has decreased which I think was your concern.

@KirillLykov KirillLykov force-pushed the klykov/add-client-selection-validator branch from 2574afd to 14a08c7 Compare March 12, 2025 16:53
steveluscher
steveluscher previously approved these changes Mar 12, 2025
Copy link

@steveluscher steveluscher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nicer! Still really don't like the long list of positional arguments, and would rather a config struct, but I'm happy to leave that up to a stylistic preference.

cool(
  true,
  true,
  0,
  true,
  1,
  false,
  true,
  "0",
  true,
  "true",
)

… number of lines added to validator.rs has decreased which I think was your concern.

My main concern is actually how easy it will be to delete the deprecated path without leaving JsonRpcService in a weird state (ie. with weird method names that no longer make sense). My optimization criteria would be:

  • be able to delete the old code in as few steps as possible
  • once deleted, the remaining code looks like it would if you'd written it from scratch

@KirillLykov
Copy link
Author

@steveluscher makes sense. I rewrote this with Config instead.

@KirillLykov
Copy link
Author

Probably, would be good to get feedback from @t-nelson since these changes are to be backported to 2.2.

Copy link

@steveluscher steveluscher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I missed this! I didn't feel the need to re-review because it was only a suggestion. I like this better, for sure, though!

@KirillLykov KirillLykov merged commit 5df0d56 into anza-xyz:master Mar 17, 2025
58 checks passed
mergify bot pushed a commit that referenced this pull request Mar 17, 2025
Add a flag `use_tpu_client_next` to the validator configuration (without cla for now) which allows to use tpu-client-next instead of `ConnectionCache` for RPC.

(cherry picked from commit 5df0d56)
@t-nelson
Copy link

i don't understand why the config type was introduced in this change set. why was it not done in a preliminary pr? it is totally independent change and obscures the one that this pr claims to make

@KirillLykov
Copy link
Author

KirillLykov commented Mar 18, 2025

i don't understand why the config type was introduced in this change set. why was it not done in a preliminary pr? it is totally independent change and obscures the one that this pr claims to make

I see what you mean: that this PR description is saying that this PR is about the flag but it also adds quite some code to RPC. Originally, these changes on RPC side were not part of this PR, I introduced them while addressing PR comments. So the point of attention has drifted a bit which creates confusion. But adding this flag alone would be a very small change of introducing a flag which is not used anywhere.
Do you think it is worth reverting current PR, extracting this flag PR and adding on top rest of the changes for the sake of the backporting?
UPD: After some discussions, I think there is no need in backporting this PR because ForwardingStage changes will not be ready in time since I need to prioritize something else.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
v2.2 Backport to v2.2 branch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants