Implement Delete-by-Query operation after Reshard #125519

ankikuma · 2025-03-24T17:36:40Z

No description provided.

…eteByQuery Refresh

…eteByQuery Refresh branch

elasticsearchmachine · 2025-06-09T13:11:04Z

Pinging @elastic/es-distributed-indexing (Team:Distributed Indexing)

bcully · 2025-06-17T00:07:25Z

server/src/main/java/org/elasticsearch/index/engine/InternalEngine.java

+
+    public void deleteByQuery(ShardSplittingQuery query) throws Exception {
+        // System.out.println("Delete documents using ShardSplitQuery");
+        indexWriter.deleteDocuments(query);
+        indexWriter.flush();
+        indexWriter.commit();
+    }


I think we can probably do this in stateless and avoid the need to publish this API here while we're incubating.

also I'd probably move the flush/commit out of the delete operation itself and let something higher up decide when it needs to schedule those.

Ah, indexWriter is private, sorry.

It's interesting that this works, given InternalEngine.createWriter creates an AssertingIndexWriter in test, which is meant to throw when deleteDocuments is called. I'm guessing that not throwing on the query-taking variant of deleteDocuments was an oversight.

By the way, I don't know if the flush and commit at this level is sufficient to allow us to move the split on this shard to DONE. When we do that, we're also going to drop the search filter for unowned documents, which means the search nodes need to be using the state we've just flushed. I think that implies we should be doing a refresh.

ankikuma added 5 commits March 21, 2025 13:29

Make ShardSplittingQuery public

1e7cf9d

Merge remote-tracking branch 'upstream/main' into 03182025/ReshardDel…

68794da

…eteByQuery Refresh

commit

233cea9

flush

1d08147

Merge remote-tracking branch 'upstream/main' into 03182025/ReshardDel…

3c6cdc6

…eteByQuery Refresh branch

elasticsearchmachine added v9.1.0 serverless-linked Added by automation, don't add manually labels Mar 24, 2025

ankikuma added 3 commits June 4, 2025 22:18

refresh branch

86c9a46

remove print

393ca7e

Merge remote-tracking branch 'upstream/main' into 03182025/ReshardDel…

b7814c6

…eteByQuery Refresh branch

ankikuma marked this pull request as ready for review June 6, 2025 03:55

elasticsearchmachine added the needs:triage Requires assignment of a team area label label Jun 6, 2025

ankikuma added :Distributed Indexing/Distributed A catch all label for anything in the Distributed Indexing Area. Please avoid if you can. >non-issue Team:Distributed Indexing Meta label for Distributed Indexing team labels Jun 9, 2025

elasticsearchmachine removed the needs:triage Requires assignment of a team area label label Jun 9, 2025

bcully reviewed Jun 17, 2025

View reviewed changes

bcully mentioned this pull request Jun 18, 2025

Add deleteByQuery to InternalEngine #129679

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Implement Delete-by-Query operation after Reshard #125519

Implement Delete-by-Query operation after Reshard #125519

Uh oh!

ankikuma commented Mar 24, 2025

Uh oh!

elasticsearchmachine commented Jun 9, 2025

Uh oh!

bcully Jun 17, 2025

Uh oh!

bcully Jun 17, 2025

Uh oh!

bcully Jun 17, 2025

Uh oh!

bcully Jun 18, 2025

Uh oh!

Uh oh!

Implement Delete-by-Query operation after Reshard #125519

Are you sure you want to change the base?

Implement Delete-by-Query operation after Reshard #125519

Uh oh!

Conversation

ankikuma commented Mar 24, 2025

Uh oh!

elasticsearchmachine commented Jun 9, 2025

Uh oh!

bcully Jun 17, 2025

Choose a reason for hiding this comment

Uh oh!

bcully Jun 17, 2025

Choose a reason for hiding this comment

Uh oh!

bcully Jun 17, 2025

Choose a reason for hiding this comment

Uh oh!

bcully Jun 18, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!