Skip to content

Implement Delete-by-Query operation after Reshard #125519

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

ankikuma
Copy link
Contributor

No description provided.

@elasticsearchmachine elasticsearchmachine added v9.1.0 serverless-linked Added by automation, don't add manually labels Mar 24, 2025
@ankikuma ankikuma marked this pull request as ready for review June 6, 2025 03:55
@elasticsearchmachine elasticsearchmachine added the needs:triage Requires assignment of a team area label label Jun 6, 2025
@ankikuma ankikuma added :Distributed Indexing/Distributed A catch all label for anything in the Distributed Indexing Area. Please avoid if you can. >non-issue Team:Distributed Indexing Meta label for Distributed Indexing team labels Jun 9, 2025
@elasticsearchmachine elasticsearchmachine removed the needs:triage Requires assignment of a team area label label Jun 9, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed-indexing (Team:Distributed Indexing)

Comment on lines +3614 to +3620

public void deleteByQuery(ShardSplittingQuery query) throws Exception {
// System.out.println("Delete documents using ShardSplitQuery");
indexWriter.deleteDocuments(query);
indexWriter.flush();
indexWriter.commit();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can probably do this in stateless and avoid the need to publish this API here while we're incubating.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also I'd probably move the flush/commit out of the delete operation itself and let something higher up decide when it needs to schedule those.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, indexWriter is private, sorry.

It's interesting that this works, given InternalEngine.createWriter creates an AssertingIndexWriter in test, which is meant to throw when deleteDocuments is called. I'm guessing that not throwing on the query-taking variant of deleteDocuments was an oversight.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way, I don't know if the flush and commit at this level is sufficient to allow us to move the split on this shard to DONE. When we do that, we're also going to drop the search filter for unowned documents, which means the search nodes need to be using the state we've just flushed. I think that implies we should be doing a refresh.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Indexing/Distributed A catch all label for anything in the Distributed Indexing Area. Please avoid if you can. >non-issue serverless-linked Added by automation, don't add manually Team:Distributed Indexing Meta label for Distributed Indexing team v9.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants