Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add guard to setMode for NTEs #598

Open
wants to merge 6 commits into
base: palantir-cassandra-2.2.18
Choose a base branch
from
Open
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 19 additions & 2 deletions src/java/org/apache/cassandra/service/StorageService.java
Original file line number Diff line number Diff line change
Expand Up @@ -1543,6 +1543,13 @@ private void setMode(Mode m, boolean log)
@VisibleForTesting
void setMode(Mode m, @Safe String msg, boolean log)
{
if (operationMode == Mode.NON_TRANSIENT_ERROR && m != Mode.NON_TRANSIENT_ERROR) {
if (log)
logger.warn("Attempted to change mode from NTE to non-NTE", SafeArg.of("attemptedSetMode", m), SafeArg.of("msg", msg));
else
logger.debug("Attempted to change mode from NTE to non-NTE", SafeArg.of("attemptedSetMode", m), SafeArg.of("msg", msg));
return;
Copy link
Contributor

@inespot inespot Dec 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simplify: log with warn no matter the value of log, I think this is always worth a warning log

}
operationMode = m;
if (log)
logger.info(m.toString(), SafeArg.of("msg", msg));
Expand Down Expand Up @@ -1599,9 +1606,19 @@ private boolean bootstrap(final Collection<Token> tokens)
ListenableFuture<StreamState> bootstrapStream = bootstrapper.bootstrap(streamStateStore, !replacing && useStrictConsistency); // handles token update
try
{
bootstrapStream.get();
while (true) {
if (bootstrapStream.isDone()) {
bootstrapStream.get();
logger.info("Bootstrap streaming completed for tokens {}", tokens);
break;
}
if (hasNonTransientError(NonTransientError.BOOTSTRAP_ERROR)) {
logger.info("Stopped waiting for bootstrap streaming to complete because detected a bootstrap error.", SafeArg.of("nonTransientErrors", getNonTransientErrors()));
break;
}
Uninterruptibles.sleepUninterruptibly(1, TimeUnit.SECONDS);
}
Copy link
Contributor

@inespot inespot Dec 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might be able to do less aggressive polling while keeping the reactivity by doing something like

while (true) {
    try {
        bootstrapStream.get(timeBetweenCheck, TimeUnit.SECONDS);
        logger.info("Bootstrap streaming completed for tokens {}", tokens);
        break;
    } catch (TimeoutException e) {
        if (hasNonTransientError(NonTransientError.BOOTSTRAP_ERROR)) {
            logger.info("Aborting waiting for bootstrap streaming to complete because detected a bootstrap error.",
                    SafeArg.of("nonTransientErrors", getNonTransientErrors()));
            break;
        }
    } catch (Exception e) {
        logger.error("Error while waiting for bootstrap stream to complete", e);
        break;
    }
}

If we stop waiting for the streams, is there a way to also notify peers/shut down stream early? Or will the following disableNode handle this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm, why is this less aggressive polling? if possible, i'd like to avoid using exceptions for control flow and i think i don't recognize the benefit here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

notify peers/shut down stream early

i'll test this tomorrow to see if disableNode kills the streams as we would hope. with operator, the node will be deleted after entering NTE anyway though

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

disableNode will not kill the streams. i just realized that in the original instance that this happened, the streams continued even after the NTE blip

isBootstrapMode = false;
logger.info("Bootstrap streaming completed for tokens {}", tokens);
return !StorageService.instance.hasNonTransientError(StorageServiceMBean.NonTransientError.BOOTSTRAP_ERROR);
}
catch (Throwable e)
Expand Down