Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gc: Add GCStateManager which is planned to be the replacement of SafePointManager #9169

Merged
merged 48 commits into from
Apr 8, 2025

Conversation

MyonKeminta
Copy link
Contributor

What problem does this PR solve?

Issue Number: Ref #8978

What is changed and how does it work?

Add `GCStateManager` which is planned to be the replacement of `SafePointManager`.

This is part of the refactor task #8978 . The new module is not yet used until the next PR updates the gRPC service interfaces.

This PR is split out from #9134 to reduce the single PR size a little.

Check List

Tests

  • Unit test

Code changes

Side effects

  • Possible performance regression
  • Increased code complexity

Related changes

Release note

None.

Signed-off-by: MyonKeminta <[email protected]>
Signed-off-by: MyonKeminta <[email protected]>
Signed-off-by: MyonKeminta <[email protected]>
Signed-off-by: MyonKeminta <[email protected]>
Signed-off-by: MyonKeminta <[email protected]>
Signed-off-by: MyonKeminta <[email protected]>
Signed-off-by: MyonKeminta <[email protected]>
Signed-off-by: MyonKeminta <[email protected]>
Signed-off-by: MyonKeminta <[email protected]>
Signed-off-by: MyonKeminta <[email protected]>
Signed-off-by: MyonKeminta <[email protected]>
Signed-off-by: MyonKeminta <[email protected]>
Signed-off-by: MyonKeminta <[email protected]>
Signed-off-by: MyonKeminta <[email protected]>
Signed-off-by: MyonKeminta <[email protected]>
Signed-off-by: MyonKeminta <[email protected]>
Signed-off-by: MyonKeminta <[email protected]>
Signed-off-by: MyonKeminta <[email protected]>
Comment on lines 241 to 245
var oldTxnSafePoint uint64
newTxnSafePoint := target
minBlocker := target
var blockingBarrier *endpoint.GCBarrier
var blockingMinStartTSOwner *string
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
var oldTxnSafePoint uint64
newTxnSafePoint := target
minBlocker := target
var blockingBarrier *endpoint.GCBarrier
var blockingMinStartTSOwner *string
var (
minBlocker = target
newTxnSafePoint = target
oldTxnSafePoint uint64
blockingBarrier *endpoint.GCBarrier
blockingMinStartTSOwner *string
)

continue
}

if barrier.IsExpired(now) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed that this comparison occurs during/after the etcd transcation. If certain disk failures cause the read/write operation to take an extremely long time, would this affect the judgment of the barrier time?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't get it. Can you explain the bad case you are concerning about in detail?
In my consideration, it's safe that the actual expiration time is a little longer than the specified expiration time, and guarantee a GC barrier to be expired once it's regarded expired (delete immediately in the currrent transaction if IsExpired returns true). It should not affect the correctness or safety when a transaction runs too long.


// saturatingDuration returns a duration calculated by multiplying the given `ratio` and `base`, truncated within the
// range [0, math.MaxInt64] to avoid negative value and overflowing.
func saturatingDuration(ratio int64, base time.Duration) time.Duration {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't use this function in this pr. How about removing it now until you use it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh..
I will move it to the next PR where it's used

// Regard it as NullKeyspaceID if the given one is invalid (exceeds the valid range of keyspace id), no matter
// whether it exactly matches the NullKeyspaceID.
if keyspaceID & ^constant.ValidKeyspaceIDMask != 0 {
return constant.NullKeyspaceID, nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about adding a warning log here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For keyspaces where the keyspace-level GC is not enabled, this might constantly happen and should not be treated as a warning.

return m.advanceGCSafePointImpl(keyspaceID, target, false)
}

func (m *GCStateManager) advanceGCSafePointImpl(keyspaceID uint32, target uint64, compatible bool) (oldGCSafePoint uint64, newGCSafePoint uint64, err error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is compatible=true?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not yet included in this PR.
See: #9134

downgradeCompatibleMode := false

var oldTxnSafePoint uint64
newTxnSafePoint := target
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that this assignment is meaningless

Comment on lines +308 to +312
if downgradeCompatibleMode {
err1 = wb.SetGCBarrier(keyspaceID, endpoint.NewGCBarrier(keypath.GCWorkerServiceSafePointID, newTxnSafePoint, nil))
if err1 != nil {
return err1
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure. Do we need reset downgradeCompatibleMode=false here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks not necessary as the transaction will exit soon. And the variable is also needed in the logs later.

zap.Uint64("newTxnSafePoint", newTxnSafePoint), zap.String("blocker", blockerDesc),
zap.Bool("downgradeCompatibleMode", downgradeCompatibleMode))
} else {
log.Info("txn safe point advancement unable to be blocked by the minimum blocker",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand you correctly, when newTxnSafePoint != minBlocker, newTxnSafePoint = oldTxnSafePoint. So is this better?

Suggested change
log.Info("txn safe point advancement unable to be blocked by the minimum blocker",
log.Info("txn safe point can't be advanced due to the minimum blocker is less than old txn safe point",

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this same as txn safe point is remaining unchanged?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this log I emphasized the fact that there is something that's expected to block the txn safe point at a smaller value, but the txn safe point has exceeded it already. In this case, the major risk is that some blocker isn't effective, instead of txn safe point not being advanced.

zap.Uint64("newTxnSafePoint", newTxnSafePoint), zap.String("blocker", blockerDesc),
zap.Uint64("minBlockerTS", minBlocker), zap.Bool("downgradeCompatibleMode", downgradeCompatibleMode))
}
} else if newTxnSafePoint > oldTxnSafePoint {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need the else?

Suggested change
} else if newTxnSafePoint > oldTxnSafePoint {
}
if newTxnSafePoint > oldTxnSafePoint {

Copy link
Contributor Author

@MyonKeminta MyonKeminta Mar 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, as it's possible that it's called with a target that equals to the current value.

panic("unreachable")
}
if newTxnSafePoint == minBlocker {
log.Info("txn safe point advancement is being blocked",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we not print log once minBlocker update? Such a log is more accurate and clear

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's possible that minBlocker is updated multiple times when iterating over all potential blockers, producing too much log. We only care about the final result here.

result := GCState{
KeyspaceID: keyspaceID,
}
if keyspaceID != constant.NullKeyspaceID {
Copy link
Contributor

@ystaticy ystaticy Mar 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this code, why do we set result.IsKeyspaceLevel = true when checking if keyspaceID != constant.NullKeyspaceID?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's not keyspace level GC, the call to this function must have been redirected to the null keyspace. I'll add comments to explain this later.

// deprecated when these work are all done.

// GCStateManager is the manager for all kinds of states of TiKV's GC for MVCC data.
// nolint:revive
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not remove it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The linter suggests removing the GC prefix which is repeative with the packagename gc, but the name StateManager looks making nonsense to me.

// GCStateManager is the manager for all kinds of states of TiKV's GC for MVCC data.
// nolint:revive
type GCStateManager struct {
lock syncutil.RWMutex
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
lock syncutil.RWMutex
syncutil.RWMutex

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the recommended style? It makes the Lock and Unlock methods exported and visible in external packages.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes sense. How about changing it to mu so that we don't need to write lock.Lock.

// Returns a struct AdvanceTxnSafePointResult, which contains the old txn safe point, the target, and the new
// txn safe point it finally made it to advance to. If there's something blocking the txn safe point from being
// advanced to the given target, it may finally be advanced to a smaller value or remains the previous value, in which
// case the BlockerDescription field of the AdvanceTxnSafePointResult will be set to a non-empty string describing
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do blocker and barrier have the same meaning? If so, how about unifying them?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apart from the GC barriers, The blocker can also be a TiDB min start TS from an earlier version of TiDB that didn't follow up the refactoring. It need to be considered for compatibility concerns.

return err1
})

if err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return result, err


// GCState represents the GC state of a keyspace, and additionally its keyspaceID and whether the keyspace-level GC is
// enabled in this keyspace.
// nolint:revive
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about removing it?

Signed-off-by: MyonKeminta <[email protected]>
@MyonKeminta
Copy link
Contributor Author

/retest

}
if newTxnSafePoint == minBlocker {
log.Info("txn safe point advancement is being blocked",
zap.Uint64("oldTxnSafePoint", oldTxnSafePoint), zap.Uint64("target", target),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
zap.Uint64("oldTxnSafePoint", oldTxnSafePoint), zap.Uint64("target", target),
zap.Uint64("old-txn-safe-point", oldTxnSafePoint), zap.Uint64("target", target),

Signed-off-by: MyonKeminta <[email protected]>
Signed-off-by: MyonKeminta <[email protected]>
Signed-off-by: MyonKeminta <[email protected]>
Signed-off-by: MyonKeminta <[email protected]>
@ti-chi-bot ti-chi-bot bot added needs-1-more-lgtm Indicates a PR needs 1 more LGTM. approved labels Apr 3, 2025
Copy link
Contributor

ti-chi-bot bot commented Apr 7, 2025

@ystaticy: adding LGTM is restricted to approvers and reviewers in OWNERS files.

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Copy link
Contributor

ti-chi-bot bot commented Apr 8, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: JmPotato, rleungx, ystaticy

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot bot added lgtm and removed needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels Apr 8, 2025
Copy link
Contributor

ti-chi-bot bot commented Apr 8, 2025

[LGTM Timeline notifier]

Timeline:

  • 2025-04-03 07:23:54.170339939 +0000 UTC m=+1722727.854576032: ☑️ agreed by rleungx.
  • 2025-04-08 02:39:20.697125846 +0000 UTC m=+2137654.381361942: ☑️ agreed by JmPotato.

@MyonKeminta
Copy link
Contributor Author

/retest

@ti-chi-bot ti-chi-bot bot merged commit 9092ccc into tikv:master Apr 8, 2025
25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved dco-signoff: yes Indicates the PR's author has signed the dco. lgtm release-note-none Denotes a PR that doesn't merit a release note. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants