-
Notifications
You must be signed in to change notification settings - Fork 9.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: skip duplicated puts when calculating put return time #19383
base: main
Are you sure you want to change the base?
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: joshuazh-x The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Hi @joshuazh-x. Thanks for your PR. I'm waiting for a etcd-io member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted filessee 34 files with indirect coverage changes @@ Coverage Diff @@
## main #19383 +/- ##
==========================================
- Coverage 68.98% 68.90% -0.09%
==========================================
Files 420 420
Lines 35739 35753 +14
==========================================
- Hits 24656 24635 -21
- Misses 9660 9695 +35
Partials 1423 1423 Continue to review full report in Codecov by Sentry.
|
@@ -214,6 +216,12 @@ func putReturnTime(allOperations []porcupine.Operation, reports []report.ClientR | |||
continue | |||
} | |||
kv := keyValue{Key: op.Put.Key, Value: op.Put.Value} | |||
if persistedPutCount[kv] > 1 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's true that we don't check uniqueness of put request here, however on line 92 we check if request was sent uniquely by client. This should prevent patching this, or is this a case that put is unique from client perspective but was persisted twice?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
line 92 is called after putReturnTime
where duplicated requests in wal entries will be hit. Maybe we can make it simple by skipping persisted dup requests in PutReturnTime
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My idea was to keep calculations of maps like clientPutCount
and putReturnTime
independent. Instead of trying to check all conditions everywhere, they are checked once at the top. Have you confirmed that there is no duplicate in client requests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are dups in client requests and they match those in wal. But skipping them won't help because the incorrect return time does not come from the dup requests. The calculation logic in putReturnTime
must apply to non-dup requests, otherwise it will propagate the twisted return time to other non-dup requests.
How about dedup the persisted requests at the top?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I follow, can you add a test case that exemplifies this?
Investigation makes sense, however I think we should fix it in different place. |
Let me check why there is one client request but 2 persisted records. |
Signed-off-by: Joshua Zhang <[email protected]>
91980e8
to
2a6cad2
Compare
@@ -214,6 +214,9 @@ func putReturnTime(allOperations []porcupine.Operation, reports []report.ClientR | |||
continue | |||
} | |||
kv := keyValue{Key: op.Put.Key, Value: op.Put.Value} | |||
if count := persistedPutCount[kv]; count > 1 { | |||
continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment LN218 to fail the new test.
@@ -318,6 +318,25 @@ func TestPatchHistory(t *testing.T) { | |||
{Return: infinite + 99, Output: model.MaybeEtcdResponse{Persisted: true}}, | |||
}, | |||
}, | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test simulates retried txn during a leader transition and ensures that other txn correctly adjust their return times in response to duplicated persisted operations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without the fix, the second operation (key2, value2)'s return time will be adjusted to 149 which is less than its start time 200.
cc @serathius |
Please read https://github.com/etcd-io/etcd/blob/main/CONTRIBUTING.md#contribution-flow.
issue #19303
Skip duplicated persisted put requests when adjusting put return time.