You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm experiencing an issue that seems similar to #2940 in that Lettuce does not seem to be resubscribing Sharded PubSub subscriptions automatically, except I'm using Lettuce 6.5.5.RELEASE, in which the referenced issue should have been fixed.
I may be misunderstanding how things are supposed to work or have misconfigured something, so if that's the case, please let me know.
Current Behavior
Assume that you have a number of applications connected to a Redis cluster with two shards, using .connectPubSub(...).async().ssubscribe(topic).... The subscriptions are distributed across the two shards.
Then, remove one shard (either manually, or via autoscaling policy, such as in an AWS ElastiCache deployment).
By watching debug logs, it seems to me that when subscriptions are made with the regular non-sharded .subscribe call, once Lettuce is disconnected from the shard going-away, and reconnects to the new shard, Lettuce issues another SUBSCRIBE command. This can also be verified by connecting to the Redis cluster via CLI and running CLIENT LIST (I am able to see the connection that was transferred over, and that the latest command run on that connection was subscribe) and with PUBSUB CHANNELS.
However, when subscriptions are made with the sharded .ssubscribe, Lettuce is able to reconnect to the new shard, but there are no debug logs indicating a SSUBSCRIBE command was made. Connecting via CLI shows with CLIENT LIST that the application did successfully reconnect to the new shard, but the latest command is cluster|myid instead of ssubscribe and PUBSUB SHARDCHANNELS shows only the subscriptions that were originally created on that shard and none of the transferred connections.
This difference in behavior (where SUBSCRIBE reconnects successfully, but SSUBSCRIBE does not) also applies to test-initiated failovers (initiated via the AWS ElastiCache console), with the same outcome.
The result is that some number of Sharded PubSub messages are lost because there are no active subscribers for those messages.
Input Code
I can paste my Lettuce client configuration if desired or if that would be helpful, in case you think this might be a problem with my configuration.
Expected behavior/code
Since SUBSCRIBE seems to automatically resubscribe on failovers and auto-scale-in for Redis clusters, I would have expected SSUBSCRIBE to also do the same.
Environment
Lettuce version(s): 6.5.5.RELEASE
Redis version: 7.2.4 (Valkey 8.0.1)
Additional context
As a tangentially related question, does Lettuce handle slot movement/rebalancing for SSUBSCRIBE, such as when nodes are added and slots are redistributed? I couldn't really find documentation around how slot movement in Sharded Pub/Sub works in general and I'm not familiar enough with Lettuce and Redis to figure it out from reading the code, though I gave it a try. Mostly my concern is if it's something I'd need to implement myself, or if that's handled by using Lettuce.
The text was updated successfully, but these errors were encountered:
I did some additional testing and created a simple Java console application, with only the Lettuce 6.5.5.RELEASE client and Netty 4.1.119.Final. I configured six RedisClusterClient instances, with two clients handling publish and spublish respectively, two subscriber clients, and two sharded subscriber clients (for a total of four topics). I was able to confirm that on both adding new nodes (scaling up) the ElastiCache cluster, sharded subscriptions can stop receiving messages and on downscaling/removing nodes, sharded subscriptions can also stop receiving messages.
In both cases, the non-sharded subscriptions seemed unaffected, which is differing behavior.
If it isn't a configuration mistake on my end or perhaps intended (perhaps I'm supposed to override in RedisPubSubListener the sunsubscribed method or listen to some topology refresh event to manually re-establish the sharded subscription when slots move?) then it seems to be an unexpected bug of some type?
Without pasting the code, the general options I have configured are:
ClientResources using a custom DnsAddressResolverGroup to specify things like the NioDatagramChannel type, DnsNameResolverChannelStrategy.ChannelPerResolution, specifying retries on timeout, TCP fallback, no DNS caching, etc
The implementation of RedisPubSubListener only overrides message(...)
SocketOptions with extended keepalive options enabled, idle set at 30 seconds and a few seconds interval after idle
TcpUserTimeout set to 30 seconds
ConnectTimeout set to 10 seconds
PeriodicRefresh set to 60 seconds
DynamicRefreshSources set to false
EnableAllAdaptiveRefreshTriggers set to true
CloseStaleConnections set to true
NodeFilter enabled, filtering out NodeFlag.FAIL and NodeFlag.EVENTUAL_FAIL
Bug Report
I'm experiencing an issue that seems similar to #2940 in that Lettuce does not seem to be resubscribing Sharded PubSub subscriptions automatically, except I'm using Lettuce
6.5.5.RELEASE
, in which the referenced issue should have been fixed.I may be misunderstanding how things are supposed to work or have misconfigured something, so if that's the case, please let me know.
Current Behavior
Assume that you have a number of applications connected to a Redis cluster with two shards, using
.connectPubSub(...).async().ssubscribe(topic)...
. The subscriptions are distributed across the two shards.Then, remove one shard (either manually, or via autoscaling policy, such as in an AWS ElastiCache deployment).
By watching debug logs, it seems to me that when subscriptions are made with the regular non-sharded
.subscribe
call, once Lettuce is disconnected from the shard going-away, and reconnects to the new shard, Lettuce issues anotherSUBSCRIBE
command. This can also be verified by connecting to the Redis cluster via CLI and runningCLIENT LIST
(I am able to see the connection that was transferred over, and that the latest command run on that connection wassubscribe
) and withPUBSUB CHANNELS
.However, when subscriptions are made with the sharded
.ssubscribe
, Lettuce is able to reconnect to the new shard, but there are no debug logs indicating aSSUBSCRIBE
command was made. Connecting via CLI shows withCLIENT LIST
that the application did successfully reconnect to the new shard, but the latest command iscluster|myid
instead ofssubscribe
andPUBSUB SHARDCHANNELS
shows only the subscriptions that were originally created on that shard and none of the transferred connections.This difference in behavior (where
SUBSCRIBE
reconnects successfully, butSSUBSCRIBE
does not) also applies to test-initiated failovers (initiated via the AWS ElastiCache console), with the same outcome.The result is that some number of Sharded PubSub messages are lost because there are no active subscribers for those messages.
Input Code
I can paste my Lettuce client configuration if desired or if that would be helpful, in case you think this might be a problem with my configuration.
Expected behavior/code
Since
SUBSCRIBE
seems to automatically resubscribe on failovers and auto-scale-in for Redis clusters, I would have expectedSSUBSCRIBE
to also do the same.Environment
Additional context
As a tangentially related question, does Lettuce handle slot movement/rebalancing for
SSUBSCRIBE
, such as when nodes are added and slots are redistributed? I couldn't really find documentation around how slot movement in Sharded Pub/Sub works in general and I'm not familiar enough with Lettuce and Redis to figure it out from reading the code, though I gave it a try. Mostly my concern is if it's something I'd need to implement myself, or if that's handled by using Lettuce.The text was updated successfully, but these errors were encountered: