Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cassandra helm chart scaling ring down #32130

Closed
sergio-moreira-everi opened this issue Feb 21, 2025 · 3 comments
Closed

Cassandra helm chart scaling ring down #32130

sergio-moreira-everi opened this issue Feb 21, 2025 · 3 comments

Comments

@sergio-moreira-everi
Copy link

sergio-moreira-everi commented Feb 21, 2025

Name and Version

bitnami/cassandra 12.1.1 image bitnami/cassandra:4.1.3-debian-11-r84

What is the problem this feature will solve?

More a question than a feature request.
If I want to scale down a ring will the helm chart decommission nodes and ensure the data is properly transfered to the remaining nodes?
I have a ring with 4 'nodes' and a keyspace with replication factor 3, I want to scale it back down to 3 'nodes' but am worried about data loss.

What is the feature you are proposing to solve the problem?

more documentation than anything

@github-actions github-actions bot added the triage Triage is needed label Feb 21, 2025
@github-actions github-actions bot removed the triage Triage is needed label Feb 25, 2025
@github-actions github-actions bot assigned migruiz4 and unassigned carrodher Feb 25, 2025
@migruiz4
Copy link
Member

Hi @sergio-moreira-everi,

Currently, manual action would be required to remove a cassandra node from the cluster.

By default, cassandra nodes are configured with preStop lifecycle actions that run nodetool drain to gracefully stop the node before a restart.

Since the chart does not know when a node is being permanently scaled down, it is not possible to automatically run nodetool decomission to migrate data to other nodes.

Since you have replication, cassandra has mechanisms to automatically detect a node was lost for too long and repair from the existing replicas, but if you would like to do it gracefully you need to log into the pod and run the decommission command manually.

By running the decommission command, the readiness probe (nodetool status) will start to fail on the node, but liveness probe (nodetool info) will continue to work, so the pod will stop receiving connections but will continue running until scaled down.

I hope this answered your question.

@migruiz4
Copy link
Member

migruiz4 commented Mar 12, 2025

Forgot to mention, if you see the pod running into 'Error' status after you decommission it, it is expected.

The decommission command will stop the cassandra process and following restarts will fail with error:

ERROR [main] 2025-03-12 11:40:46,600 CassandraDaemon.java:887 - Fatal configuration error
org.apache.cassandra.exceptions.ConfigurationException: This node was decommissioned and will not rejoin the ring unless -Dcassandra.override_decommission=true has been set, or all existing data is removed and the node is bootstrapped again

This expected, it means cassandra process won't start again because it was decommissioned. If you wanted to restore the pod again, e.g. cassandra-3, you would need to delete its PVC or add the option cassandra.override_decommission=true as the error mentions.

Important PVCs are persisted after scaling down a Statefulset (even after removing it). If a Statefulset called cassandra finds an existing PVC called data-cassandra-3, the pod cassandra-3 will reuse it.

@sergio-moreira-everi
Copy link
Author

Hey @migruiz4
Yes I was particularly concerned with running the decommission command manually and that causing an autokill of the pod for failing its health check and then a new pod getting created that just restores the ring to the previous state. But if, like you say, we can decommission the node (pod) manually using nodetool and it and just hangs around as the ring tries to settle its data with the new topology then i think that works well. Then we can just scale the replica count down and it should be fine.

And, like you say, if we decide to scale up again its better to make sure the pvc is nuked for the decomissioned node so it makes a new one during the scaling process.

I think this answers my question. Thanks. Hope it helps others

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants