Skip to content

[Test] Allow allocation in mixed cluster #129680

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jun 19, 2025

Conversation

ywangd
Copy link
Member

@ywangd ywangd commented Jun 19, 2025

The RunningSnapshotIT upgrade test adds shutdown markers to all nodes and removes them once all nodes are upgraded. If an index gets created in a mixed cluster, for example by ILM or deprecation messages, the index cannot be allocated because all nodes are shutting down. Since the cluster ready check between node upgrades expects a yellow cluster, the unassigned index prevents the ready check to succeed and eventually timeout. This PR fixes it by removing shutdown marker for the 1st upgrade node to allow it hosting new indices.

Resolves: #129644
Resolves: #129645
Resolves: #129646

The RunningSnapshotIT upgrade test adds shutdown marker to all nodes and
removed them once all nodes are upgraded. If an index gets created in a
mixed cluster, for example by ILM or deprecation messages, the index
cannot be allocated because all nodes are shutting down. Since the
cluster ready check between node upgrades expects a yellow cluster, the
unassigned index prevents the ready check to succeed and eventually
timeout. This PR fixes it by removing shutdown marker for the 1st
upgrade node to allow it hosting new indices.

Resolves: elastic#129644
Resolves: elastic#129645
Resolves: elastic#129646
@ywangd ywangd requested a review from nicktindall June 19, 2025 04:21
@ywangd ywangd added >test Issues or PRs that are addressing/adding tests :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs v9.1.0 labels Jun 19, 2025
@elasticsearchmachine elasticsearchmachine added the Team:Distributed Coordination Meta label for Distributed Coordination team label Jun 19, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed-coordination (Team:Distributed Coordination)

Copy link
Contributor

@nicktindall nicktindall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, so I assume this still doesn't allow the snapshot to complete during the upgrade because there will be 2 shards that can't be assigned, because only 1 node has no shutdown marker?

@ywangd
Copy link
Member Author

ywangd commented Jun 19, 2025

there will be 2 shards that can't be assigned

Those two shards remain on their initial nodes. They are not unassigned because they are not new shards. The snapshot cannot complete because:

  1. We still have 2 nodes hosting shards that are shutting down
  2. The 2 shards cannot move anywhere because the index is created with 1 shard per node

So yeah the snapshot can completely only when all nodes are upgraded and remaining shutdown marker removed.

@ywangd ywangd added the auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) label Jun 19, 2025
@ywangd
Copy link
Member Author

ywangd commented Jun 19, 2025

@elasticmachine update branch

@elasticsearchmachine elasticsearchmachine merged commit 6858c32 into elastic:main Jun 19, 2025
27 checks passed
@ywangd ywangd deleted the es-129644-fix branch June 19, 2025 10:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs Team:Distributed Coordination Meta label for Distributed Coordination team >test Issues or PRs that are addressing/adding tests v9.1.0
Projects
None yet
4 participants