-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add guard to setMode for NTEs #598
base: palantir-cassandra-2.2.18
Are you sure you want to change the base?
Conversation
please fix the formatting to match the Cassandra style. You should be able to use the auto formatter. |
logger.warn("Attempted to change mode from NTE to non-NTE", SafeArg.of("attemptedSetMode", m), SafeArg.of("msg", msg)); | ||
else | ||
logger.debug("Attempted to change mode from NTE to non-NTE", SafeArg.of("attemptedSetMode", m), SafeArg.of("msg", msg)); | ||
return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Simplify: log with warn no matter the value of log
, I think this is always worth a warning log
break; | ||
} | ||
Uninterruptibles.sleepUninterruptibly(1, TimeUnit.SECONDS); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might be able to do less aggressive polling while keeping the reactivity by doing something like
while (true) {
try {
bootstrapStream.get(timeBetweenCheck, TimeUnit.SECONDS);
logger.info("Bootstrap streaming completed for tokens {}", tokens);
break;
} catch (TimeoutException e) {
if (hasNonTransientError(NonTransientError.BOOTSTRAP_ERROR)) {
logger.info("Aborting waiting for bootstrap streaming to complete because detected a bootstrap error.",
SafeArg.of("nonTransientErrors", getNonTransientErrors()));
break;
}
} catch (Exception e) {
logger.error("Error while waiting for bootstrap stream to complete", e);
break;
}
}
If we stop waiting for the streams, is there a way to also notify peers/shut down stream early? Or will the following disableNode
handle this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm, why is this less aggressive polling? if possible, i'd like to avoid using exceptions for control flow and i think i don't recognize the benefit here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
notify peers/shut down stream early
i'll test this tomorrow to see if disableNode
kills the streams as we would hope. with operator, the node will be deleted after entering NTE anyway though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
disableNode
will not kill the streams. i just realized that in the original instance that this happened, the streams continued even after the NTE blip
When a node enters non-transient error, it should never regain a normal operation mode (hence non-transient). This PR adds a guard for this and additionally short-circuits the bootstrap when the node enters NTE. This addition is necessary because the bootstrap NTE can occur from different threads than the one calling
bootstrapStream.get()
.