-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IGNITE-24484 RAFT heartbeat coalescing #5221
base: main
Are you sure you want to change the base?
Conversation
@@ -88,19 +93,78 @@ public Future<Message> requestVote(final PeerId peerId, final RequestVoteRequest | |||
|
|||
@Override | |||
public Future<Message> appendEntries(final PeerId peerId, final AppendEntriesRequest request, | |||
final int timeoutMs, final RpcResponseClosure<AppendEntriesResponse> done) { | |||
final int timeoutMs, final RpcResponseClosure<AppendEntriesResponse> done) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Formatting issues here and below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was formatted by our rules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Our rules forbid reformatting of jraft package completely
@@ -198,7 +198,7 @@ void executesWrongJobClassOnRemoteNodesAsync(String jobClassName, int errorCode, | |||
JobTarget.anyNode(clusterNode(node(1)), clusterNode(node(2))), | |||
JobDescriptor.builder(jobClassName).units(units()).build(), | |||
null | |||
).get(1, TimeUnit.SECONDS)); | |||
).get(10, TimeUnit.SECONDS)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this required ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Computer tasks become slower. I do not think exactly why, but running any task takes more time than before the patch. And also compute that sute on TC becomes longer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is definintely a blocker for PR merging.
|
||
if (connect(peerId)) { // Replicator should be started asynchronously by node joined event. | ||
if (isHeartbeatRequest(request)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This check is already done in the upper level in
org.apache.ignite.raft.jraft.core.Replicator#sendEmptyEntries(boolean, org.apache.ignite.raft.jraft.rpc.RpcResponseClosure<org.apache.ignite.raft.jraft.rpc.RpcRequests.AppendEntriesResponse>, boolean)
Just call sendHeartbeat from this method.
if (isHeartbeat) {
....
this.heartbeatInFly = this.rpcService.sendHeartbeat(this.options.getPeerId().getEndpoint(), request, coalesce,
this.options.getElectionTimeoutMs() / 2, heartbeatDone);
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know, but it would be more complicated due to component dependencies. We already use this manner of
checking in AppendEntriesRequestProcessor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was implemented this way in the PoC without any dependency issues.
Ok. let's keep it as is, but at least remove copypaste by moving isHearbeat(..) to utitlities class.
* @param request Append entries request. | ||
* @return True if that request is heartbeat or false otherwise. | ||
*/ | ||
private static boolean isHeartbeatRequest(final AppendEntriesRequest request) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not needed. See the comment above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same as AppendEntriesRequestProcessor#isHeartbeatRequest
final Executor rpcExecutor) { | ||
final RpcClient rc = this.rpcClient; | ||
public <T extends Message> CompletableFuture<Message> invokeWithDone( | ||
PeerId peerId, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid reformatting raft code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I applied formatting that is used in our entire project.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Formatting of jraft code is forbidden by style rules.
* | ||
* @return The future. | ||
*/ | ||
CompletableFuture<Message> invokeAsync( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was not able to understand why this interface is introduced.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The interface is requested to use it for coalessing requests. This interface allows us to not immediately send messages to the network. Instead, you can handle the message manually.
org.apache.ignite.raft.jraft.rpc.impl.core.DefaultRaftClientService#sendHeartbeat
scheduler = opts.getScheduler(); | ||
messagesFactory = opts.getRaftMessagesFactory(); | ||
|
||
scheduler.schedule(this::onSentHeartbeat , opts.getElectionTimeoutMs(), TimeUnit.MILLISECONDS); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This scheme is not efficient.
Currently each raft group generate it's own timer event in r.startHeartbeatTimer(Utils.nowMs());
Which means events multiplication linear to groups number and excessive CPU usage.
Instead, a single event per all groups can be triggered.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably we can create another ticket for this? Or you insist to do this improvement in this one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm ok with another ticket.
7e8bf95
to
788341c
Compare
https://issues.apache.org/jira/browse/IGNITE-24484