-
Notifications
You must be signed in to change notification settings - Fork 729
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gc: Add GCStateManager
which is planned to be the replacement of SafePointManager
#9169
gc: Add GCStateManager
which is planned to be the replacement of SafePointManager
#9169
Conversation
Signed-off-by: MyonKeminta <[email protected]>
Signed-off-by: MyonKeminta <[email protected]>
Signed-off-by: MyonKeminta <[email protected]>
Signed-off-by: MyonKeminta <[email protected]>
Signed-off-by: MyonKeminta <[email protected]>
Signed-off-by: MyonKeminta <[email protected]>
Signed-off-by: MyonKeminta <[email protected]>
Signed-off-by: MyonKeminta <[email protected]>
Signed-off-by: MyonKeminta <[email protected]>
Signed-off-by: MyonKeminta <[email protected]>
Signed-off-by: MyonKeminta <[email protected]>
Signed-off-by: MyonKeminta <[email protected]>
Signed-off-by: MyonKeminta <[email protected]>
Signed-off-by: MyonKeminta <[email protected]>
Signed-off-by: MyonKeminta <[email protected]>
Signed-off-by: MyonKeminta <[email protected]>
Signed-off-by: MyonKeminta <[email protected]>
Signed-off-by: MyonKeminta <[email protected]>
Signed-off-by: MyonKeminta <[email protected]>
Signed-off-by: MyonKeminta <[email protected]>
Signed-off-by: MyonKeminta <[email protected]>
Signed-off-by: MyonKeminta <[email protected]>
Signed-off-by: MyonKeminta <[email protected]>
Signed-off-by: MyonKeminta <[email protected]>
Signed-off-by: MyonKeminta <[email protected]>
Signed-off-by: MyonKeminta <[email protected]>
pkg/gc/gc_state_manager.go
Outdated
var oldTxnSafePoint uint64 | ||
newTxnSafePoint := target | ||
minBlocker := target | ||
var blockingBarrier *endpoint.GCBarrier | ||
var blockingMinStartTSOwner *string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
var oldTxnSafePoint uint64 | |
newTxnSafePoint := target | |
minBlocker := target | |
var blockingBarrier *endpoint.GCBarrier | |
var blockingMinStartTSOwner *string | |
var ( | |
minBlocker = target | |
newTxnSafePoint = target | |
oldTxnSafePoint uint64 | |
blockingBarrier *endpoint.GCBarrier | |
blockingMinStartTSOwner *string | |
) |
continue | ||
} | ||
|
||
if barrier.IsExpired(now) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed that this comparison occurs during/after the etcd transcation. If certain disk failures cause the read/write operation to take an extremely long time, would this affect the judgment of the barrier time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't get it. Can you explain the bad case you are concerning about in detail?
In my consideration, it's safe that the actual expiration time is a little longer than the specified expiration time, and guarantee a GC barrier to be expired once it's regarded expired (delete immediately in the currrent transaction if IsExpired
returns true). It should not affect the correctness or safety when a transaction runs too long.
pkg/gc/gc_state_manager.go
Outdated
|
||
// saturatingDuration returns a duration calculated by multiplying the given `ratio` and `base`, truncated within the | ||
// range [0, math.MaxInt64] to avoid negative value and overflowing. | ||
func saturatingDuration(ratio int64, base time.Duration) time.Duration { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't use this function in this pr. How about removing it now until you use it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh..
I will move it to the next PR where it's used
// Regard it as NullKeyspaceID if the given one is invalid (exceeds the valid range of keyspace id), no matter | ||
// whether it exactly matches the NullKeyspaceID. | ||
if keyspaceID & ^constant.ValidKeyspaceIDMask != 0 { | ||
return constant.NullKeyspaceID, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about adding a warning log here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For keyspaces where the keyspace-level GC is not enabled, this might constantly happen and should not be treated as a warning.
return m.advanceGCSafePointImpl(keyspaceID, target, false) | ||
} | ||
|
||
func (m *GCStateManager) advanceGCSafePointImpl(keyspaceID uint32, target uint64, compatible bool) (oldGCSafePoint uint64, newGCSafePoint uint64, err error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is compatible=true?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not yet included in this PR.
See: #9134
pkg/gc/gc_state_manager.go
Outdated
downgradeCompatibleMode := false | ||
|
||
var oldTxnSafePoint uint64 | ||
newTxnSafePoint := target |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that this assignment is meaningless
if downgradeCompatibleMode { | ||
err1 = wb.SetGCBarrier(keyspaceID, endpoint.NewGCBarrier(keypath.GCWorkerServiceSafePointID, newTxnSafePoint, nil)) | ||
if err1 != nil { | ||
return err1 | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure. Do we need reset downgradeCompatibleMode=false
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks not necessary as the transaction will exit soon. And the variable is also needed in the logs later.
zap.Uint64("newTxnSafePoint", newTxnSafePoint), zap.String("blocker", blockerDesc), | ||
zap.Bool("downgradeCompatibleMode", downgradeCompatibleMode)) | ||
} else { | ||
log.Info("txn safe point advancement unable to be blocked by the minimum blocker", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand you correctly, when newTxnSafePoint != minBlocker, newTxnSafePoint = oldTxnSafePoint. So is this better?
log.Info("txn safe point advancement unable to be blocked by the minimum blocker", | |
log.Info("txn safe point can't be advanced due to the minimum blocker is less than old txn safe point", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this same as txn safe point is remaining unchanged
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this log I emphasized the fact that there is something that's expected to block the txn safe point at a smaller value, but the txn safe point has exceeded it already. In this case, the major risk is that some blocker isn't effective, instead of txn safe point not being advanced.
pkg/gc/gc_state_manager.go
Outdated
zap.Uint64("newTxnSafePoint", newTxnSafePoint), zap.String("blocker", blockerDesc), | ||
zap.Uint64("minBlockerTS", minBlocker), zap.Bool("downgradeCompatibleMode", downgradeCompatibleMode)) | ||
} | ||
} else if newTxnSafePoint > oldTxnSafePoint { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need the else
?
} else if newTxnSafePoint > oldTxnSafePoint { | |
} | |
if newTxnSafePoint > oldTxnSafePoint { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, as it's possible that it's called with a target that equals to the current value.
panic("unreachable") | ||
} | ||
if newTxnSafePoint == minBlocker { | ||
log.Info("txn safe point advancement is being blocked", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why we not print log once minBlocker
update? Such a log is more accurate and clear
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's possible that minBlocker
is updated multiple times when iterating over all potential blockers, producing too much log. We only care about the final result here.
Signed-off-by: MyonKeminta <[email protected]>
Signed-off-by: MyonKeminta <[email protected]>
Signed-off-by: MyonKeminta <[email protected]>
Signed-off-by: MyonKeminta <[email protected]>
result := GCState{ | ||
KeyspaceID: keyspaceID, | ||
} | ||
if keyspaceID != constant.NullKeyspaceID { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this code, why do we set result.IsKeyspaceLevel = true when checking if keyspaceID != constant.NullKeyspaceID?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's not keyspace level GC, the call to this function must have been redirected to the null keyspace. I'll add comments to explain this later.
// deprecated when these work are all done. | ||
|
||
// GCStateManager is the manager for all kinds of states of TiKV's GC for MVCC data. | ||
// nolint:revive |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not remove it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The linter suggests removing the GC
prefix which is repeative with the packagename gc
, but the name StateManager
looks making nonsense to me.
pkg/gc/gc_state_manager.go
Outdated
// GCStateManager is the manager for all kinds of states of TiKV's GC for MVCC data. | ||
// nolint:revive | ||
type GCStateManager struct { | ||
lock syncutil.RWMutex |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lock syncutil.RWMutex | |
syncutil.RWMutex |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the recommended style? It makes the Lock
and Unlock
methods exported and visible in external packages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It makes sense. How about changing it to mu so that we don't need to write lock.Lock.
// Returns a struct AdvanceTxnSafePointResult, which contains the old txn safe point, the target, and the new | ||
// txn safe point it finally made it to advance to. If there's something blocking the txn safe point from being | ||
// advanced to the given target, it may finally be advanced to a smaller value or remains the previous value, in which | ||
// case the BlockerDescription field of the AdvanceTxnSafePointResult will be set to a non-empty string describing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do blocker and barrier have the same meaning? If so, how about unifying them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apart from the GC barriers, The blocker can also be a TiDB min start TS from an earlier version of TiDB that didn't follow up the refactoring. It need to be considered for compatibility concerns.
pkg/gc/gc_state_manager.go
Outdated
return err1 | ||
}) | ||
|
||
if err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return result, err
|
||
// GCState represents the GC state of a keyspace, and additionally its keyspaceID and whether the keyspace-level GC is | ||
// enabled in this keyspace. | ||
// nolint:revive |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about removing it?
Signed-off-by: MyonKeminta <[email protected]>
Signed-off-by: MyonKeminta <[email protected]>
/retest |
pkg/gc/gc_state_manager.go
Outdated
} | ||
if newTxnSafePoint == minBlocker { | ||
log.Info("txn safe point advancement is being blocked", | ||
zap.Uint64("oldTxnSafePoint", oldTxnSafePoint), zap.Uint64("target", target), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
zap.Uint64("oldTxnSafePoint", oldTxnSafePoint), zap.Uint64("target", target), | |
zap.Uint64("old-txn-safe-point", oldTxnSafePoint), zap.Uint64("target", target), |
Signed-off-by: MyonKeminta <[email protected]>
Signed-off-by: MyonKeminta <[email protected]>
Signed-off-by: MyonKeminta <[email protected]>
Signed-off-by: MyonKeminta <[email protected]>
Signed-off-by: MyonKeminta <[email protected]>
@ystaticy: adding LGTM is restricted to approvers and reviewers in OWNERS files. In response to this: Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: JmPotato, rleungx, ystaticy The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/retest |
What problem does this PR solve?
Issue Number: Ref #8978
What is changed and how does it work?
This PR is split out from #9134 to reduce the single PR size a little.
Check List
Tests
Code changes
Side effects
Related changes
Release note