Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SegmentRefresh task compatibility with UpsertCompactMerge task #14633

Open
tibrewalpratik17 opened this issue Dec 11, 2024 · 0 comments
Open
Labels

Comments

@tibrewalpratik17
Copy link
Contributor

Creating this issue from comment: #14477 (comment)

The current approach for determining the latest uploaded segment in the UploadedRealtimeSegment refresh process relies on comparing segment creation times. However, this method introduces an edge case where segments with identical creation times can lead to incorrect metadata resolution and inconsistencies in the compact-merge process.

Specific Scenario:

  • Consider the following: Uploaded Segment U1, LLC Segments LLC1 and LLC2, which are merged to form a new uploaded segment U2 via UpsertCompactMerge task.

  • If the creation time for both U1 and U2 is identical, and the SegmentRefresh task refreshes U1 after U2 has been uploaded, the keys from U1 will dominate in the metadata manager.

  • As a result: U1 is incorrectly marked as refreshed in Zookeeper (ZK) for U2. The system mistakenly considers U1 as already merged, preventing it from being picked again for the compact-merge process.

This issue can lead to metadata inconsistency, where the latest segment (U2) does not dominate despite being the most recent merge result. Though there are no data consistency issues as such, we still need U1 to merged in future but it will be not until U2 is deleted and it's ZK metadata is cleared.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant