Skip to content

Refactor compaction to use a stream #32784

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

DAlperin
Copy link
Member

@DAlperin DAlperin commented Jun 21, 2025

This is a precursor to incremental compaction

Motivation

Tips for reviewer

Checklist

  • This PR has adequate test coverage / QA involvement has been duly considered. (trigger-ci for additional test/nightly runs)
  • This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
  • If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
  • If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).
  • If this PR includes major user-facing behavior changes, I have pinged the relevant PM to schedule a changelog post.

@DAlperin DAlperin requested a review from a team as a code owner June 21, 2025 19:25
@DAlperin DAlperin force-pushed the dov/compact-use-stream branch from 0184deb to bb3caa0 Compare June 21, 2025 19:29
@DAlperin DAlperin requested a review from bkirwi June 22, 2025 22:58
@DAlperin DAlperin force-pushed the dov/compact-use-stream branch from bb3caa0 to 643a59a Compare June 22, 2025 23:43
Copy link
Contributor

@bkirwi bkirwi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📈!!

let compact_span = debug_span!("compact::consolidate");
let res = tokio::time::timeout(
timeout,
// Compaction is cpu intensive, so be polite and spawn it on the isolated runtime.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lost a comment here.

e
)
})?
.map_err(|e| anyhow!(e))?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we're replacing the old compact method with a composition of the three new methods on the "main" codepath, but all the tests and admin tools still hit the old method. This means that we're keeping around some almost-dead code, and we're not giving the new stuff as much coverage as we otherwise would.

IMO it would be reasonable to either allow flagging between the old and new path, or to just implement fn compact with calls to the new methods...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, compact is reimplemented in terms of these new functions

)
.await?;
let (parts, run_splits, run_meta, updates) =
(batch.parts, batch.run_splits, batch.run_meta, batch.len);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code is destructuring a batch just to rebuild it from clones of all the parts, which feels odd... I think we can just pass on the batch?


if updates == 0 {
continue;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unnecessary / has no effect currently, right?

If so let's skip it... as soon as we're doing actual incremental compactions, we'll want to treat these zero-output runs the same as every other run.

isolated_runtime: Arc<IsolatedRuntime>,
req: CompactReq<T>,
write_schemas: Schemas<K, V>,
) -> impl Stream<Item = Result<CompactRes<T>, anyhow::Error>> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's have this return HollowBatch instead of CompactRes for now, since it is not actually safe to apply to the spine.

@DAlperin DAlperin force-pushed the dov/compact-use-stream branch from 643a59a to 8267469 Compare June 24, 2025 00:44
@DAlperin DAlperin force-pushed the dov/compact-use-stream branch from 8267469 to fd51aed Compare June 24, 2025 00:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants