You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The current approach for determining the latest uploaded segment in the UploadedRealtimeSegment refresh process relies on comparing segment creation times. However, this method introduces an edge case where segments with identical creation times can lead to incorrect metadata resolution and inconsistencies in the compact-merge process.
Specific Scenario:
Consider the following: Uploaded Segment U1, LLC Segments LLC1 and LLC2, which are merged to form a new uploaded segment U2 via UpsertCompactMerge task.
If the creation time for both U1 and U2 is identical, and the SegmentRefresh task refreshes U1 after U2 has been uploaded, the keys from U1 will dominate in the metadata manager.
As a result: U1 is incorrectly marked as refreshed in Zookeeper (ZK) for U2. The system mistakenly considers U1 as already merged, preventing it from being picked again for the compact-merge process.
This issue can lead to metadata inconsistency, where the latest segment (U2) does not dominate despite being the most recent merge result. Though there are no data consistency issues as such, we still need U1 to merged in future but it will be not until U2 is deleted and it's ZK metadata is cleared.
The text was updated successfully, but these errors were encountered:
Creating this issue from comment: #14477 (comment)
The current approach for determining the latest uploaded segment in the UploadedRealtimeSegment refresh process relies on comparing segment creation times. However, this method introduces an edge case where segments with identical creation times can lead to incorrect metadata resolution and inconsistencies in the compact-merge process.
Specific Scenario:
Consider the following: Uploaded Segment U1, LLC Segments LLC1 and LLC2, which are merged to form a new uploaded segment U2 via UpsertCompactMerge task.
If the creation time for both U1 and U2 is identical, and the SegmentRefresh task refreshes U1 after U2 has been uploaded, the keys from U1 will dominate in the metadata manager.
As a result: U1 is incorrectly marked as refreshed in Zookeeper (ZK) for U2. The system mistakenly considers U1 as already merged, preventing it from being picked again for the compact-merge process.
This issue can lead to metadata inconsistency, where the latest segment (U2) does not dominate despite being the most recent merge result. Though there are no data consistency issues as such, we still need U1 to merged in future but it will be not until U2 is deleted and it's ZK metadata is cleared.
The text was updated successfully, but these errors were encountered: