Skip to content

Implement Delete-by-Query operation after Reshard #125519

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 8 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -94,6 +94,7 @@
import org.elasticsearch.index.seqno.SequenceNumbers;
import org.elasticsearch.index.shard.ShardId;
import org.elasticsearch.index.shard.ShardLongFieldRange;
import org.elasticsearch.index.shard.ShardSplittingQuery;
import org.elasticsearch.index.store.Store;
import org.elasticsearch.index.translog.Translog;
import org.elasticsearch.index.translog.TranslogConfig;
Expand Down Expand Up @@ -3610,4 +3611,11 @@ protected long estimateMergeBytes(MergePolicy.OneMerge merge) {
return 0L;
}
}

public void deleteByQuery(ShardSplittingQuery query) throws Exception {
// System.out.println("Delete documents using ShardSplitQuery");
indexWriter.deleteDocuments(query);
indexWriter.flush();
indexWriter.commit();
}
Comment on lines +3614 to +3620
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can probably do this in stateless and avoid the need to publish this API here while we're incubating.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also I'd probably move the flush/commit out of the delete operation itself and let something higher up decide when it needs to schedule those.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, indexWriter is private, sorry.

It's interesting that this works, given InternalEngine.createWriter creates an AssertingIndexWriter in test, which is meant to throw when deleteDocuments is called. I'm guessing that not throwing on the query-taking variant of deleteDocuments was an oversight.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way, I don't know if the flush and commit at this level is sufficient to allow us to move the split on this shard to DONE. When we do that, we're also going to drop the search filter for unowned documents, which means the search nodes need to be using the state we've just flushed. I think that implies we should be doing a refresh.

}
Original file line number Diff line number Diff line change
Expand Up @@ -52,13 +52,13 @@
* It can be used to split a shard into N shards marking every document that doesn't belong into the shard
* as deleted. See {@link org.apache.lucene.index.IndexWriter#deleteDocuments(Query...)}
*/
final class ShardSplittingQuery extends Query {
public final class ShardSplittingQuery extends Query {
private final IndexMetadata indexMetadata;
private final IndexRouting indexRouting;
private final int shardId;
private final BitSetProducer nestedParentBitSetProducer;

ShardSplittingQuery(IndexMetadata indexMetadata, int shardId, boolean hasNested) {
public ShardSplittingQuery(IndexMetadata indexMetadata, int shardId, boolean hasNested) {
this.indexMetadata = indexMetadata;
this.indexRouting = IndexRouting.fromIndexMetadata(indexMetadata);
this.shardId = shardId;
Expand Down