Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adjusting disconnectedBehavior Option to Prevent Timeout When Redis Shutdown #2866

Open
MagicalLas opened this issue May 29, 2024 · 6 comments
Labels
type: feature A new feature
Milestone

Comments

@MagicalLas
Copy link

MagicalLas commented May 29, 2024

Feature Request

Is your feature request related to a problem? Please describe

Sometimes need to shut down Redis for maintenance, such as version upgrades. When shutdown or kill redis-server, connection is disconnected immediatly. But some requests do not fail immediately but instead experience timeouts, increasing application latency.

Currently, our Redis client options are configured as follows:

ClientOptions options = ClientOptions.builder()
  .autoReconnect(true)
  .disconnectedBehavior(DisconnectedBehavior.REJECT_COMMANDS)
// ...

The DisconnectedBehavior.REJECT_COMMANDS option appears to cancel commands when the connection is lost. However, if autoReconnect is not set to false, commands in the CommandHandler.stack are not canceled but are placed into the disconnectedBuffer. Therefore, ongoing commands are not rejected if autoReconnect is true, even with the client option modified.

Describe the solution you'd like

Condition for canceling commands in the CommandHandler.stack should be based solely on rejectCommandsWhileDisconnected and not combined with autoReconnect.

(sample implementation)

Describe alternatives you've considered

  1. make other client options

adding a little more complexity, which can be beneficial for migrations such as versioning because they don't change existing behavior

  1. just disable reconnect option.

Adjusting the autoReconnect option can solve this issue immediately, but it would require implement custom reconnect connection.
I want to avoid writing custom code for reconnections by using the auto-reconnect feature.

Teachability, Documentation, Adoption, Migration Strategy

Maybe we need to update docs for disconnectedBehavior option.

@tishun
Copy link
Collaborator

tishun commented Jun 3, 2024

Related to #547 and specifically dde3748

@tishun
Copy link
Collaborator

tishun commented Jun 28, 2024

@MagicalLas ,

I've been thinking about this and I am not sure this is the right way to go.
I think there are two cases here that can't be served both:

  1. On one hand we can have a disconnect caused by network issues. In this case, with auto-reconnect enabled and reliability set to anything else than AT_MOST_ONCE I think the expectation would be to NOT reject the commands, but instead attempt to execute them again when you reconnect
  2. On the other hand in a maintenance case (such as yours) obviously the driver could not do that before the instance is up, which would lead to timeouts (similar to what you experience)

The problem is that the driver has no way to distinguish between these two different scenarios, without implementing some additional logic to notify it upon shutdown (e.g. push a pubsub message to a control channel that would let the driver disable auto-reconnect). As far as I know the Redis server does not currently send any notification when it is being gracefully shut down.

Without the above custom logic any configuration change you make could potentially compromise the first scenario (reducing timeout value, changing reliability setting, disable auto-reconnect).

@tishun tishun added the status: waiting-for-feedback We need additional information before we can continue label Jun 28, 2024
@MagicalLas
Copy link
Author

I agree with your view that if the reliability setting is AT_LEAST_ONE and autoReconnect is true, the expected behavior would be to retry the requests. However, we already specify options such as disconnectedBehavior (DisconnectedBehavior.REJECT_COMMANDS). Even with a reliability setting of AT_LEAST_ONE, it doesn't seem strange to me that commands queued during a disconnect are rejected according to DisconnectedBehavior.

When DisconnectedBehavior.REJECT_COMMANDS is set with a reliability level of AT_LEAST_ONE, new commands are rejected. How about applying the same logic to commands that are already in the queue?

So, I feel that it would not be a major issue if the settings for DisconnectedBehavior take precedence. Please let me know your thoughts.

@tishun
Copy link
Collaborator

tishun commented Jul 2, 2024

Scratch what I said in my last comment. You are correct:

    /**
     * Behavior of connections in disconnected state.
     */
    public enum DisconnectedBehavior {

        /**
         * Accept commands when auto-reconnect is enabled, reject commands when auto-reconnect is disabled.
         */
        DEFAULT,

        /**
         * Accept commands in disconnected state.
         */
        ACCEPT_COMMANDS,

        /**
         * Reject commands in disconnected state.
         */
        REJECT_COMMANDS,
    }

The ACCEPT_COMMAND and REJECT_COMMANDS both serve the purpose to override the default behaviour.

As such, when REJECT_COMMANDS is selected then we need to override any reconnect logic.

By the way - wouldn't this also mean that (in your case) - if you disconnect for a reason different than the version upgrade you would loose all the commands in the queue after reconnecting?

@tishun tishun removed the status: waiting-for-feedback We need additional information before we can continue label Jul 2, 2024
@tishun tishun added for: team-attention An issue we need to discuss as a team to make progress status: waiting-for-triage labels Jul 17, 2024
@tishun
Copy link
Collaborator

tishun commented Aug 8, 2024

@mp911de Seems to me this is a meaningful fix, what do you think?

@tishun tishun added type: feature A new feature and removed for: team-attention An issue we need to discuss as a team to make progress status: waiting-for-triage labels Aug 8, 2024
@tishun tishun added this to the 6.5.0.RELEASE milestone Aug 8, 2024
@mp911de
Copy link
Collaborator

mp911de commented Aug 8, 2024

I need to have another look, this seems more intricate in the details than anticipated.

@tishun tishun modified the milestones: 6.5.0.RELEASE, 6.6.0.RELEASE Nov 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: feature A new feature
Projects
None yet
Development

No branches or pull requests

3 participants