-
Notifications
You must be signed in to change notification settings - Fork 490
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pageserver: guard against WAL gaps in the interpreted protocol #10858
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
d2c1547
to
8497de9
Compare
7557 tests run: 7184 passed, 0 failed, 373 skipped (full report)Code coverage* (full report)
* collected from Rust tests only The comment gets automatically updated with the latest test results
c642939 at 2025-02-20T15:52:13.121Z :recycle: |
Add the start LSN of the PG WAL from which the batch of records originated. New pageservers can read messages from old safekeepers, the field will be None. Old pageserver can read messages from new safekeepers. The field is ignored.
This field is always included with every batch of interpreted records. It's an optional field in the proto definition, but it doesn't have to be optional in the application level definition.
c612bd3
to
123ccac
Compare
123ccac
to
66e84d0
Compare
erikgrinaker
approved these changes
Feb 20, 2025
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Problem
The interpreted SK <-> PS protocol does not guard against gaps (neither does the Vanilla one, but that's beside the point).
Summary of changes
Extend the protocol to include the start LSN of the PG WAL section from which the records were interpreted.
Validation is enabled via a config flag on the pageserver and works as follows:
Case 1:
raw_wal_start_lsn
is smaller than the requested LSNThere can't be gaps here, but we check that the shard received records which it hasn't seen before.
Case 2:
raw_wal_start_lsn
is equal to the requested LSNThis is the happy case. No gap and nothing to check
Case 3:
raw_wal_start_lsn
is greater than the requested LSNThis is a gap.
To make Case 3 work I had to bend the protocol a bit.
We read record chunks of WAL which aren't record aligned and feed them to the decoder.
The picture below shows a shard which subscribes at a position somewhere within Record 2.
We already have a wal reader which is below that position so we wait to catch up.
We read some wal in Read 1 (all of Record 1 and some of Record 2). The new shard doesn't
need Record 1 (it has already processed it according to the starting position), but we read
past it's starting position. When we do Read 2, we decode Record 2 and ship it off to the shard,
but the starting position of Read 2 is greater than the starting position the shard requested.
This looks like a gap.
To make it work, we extend the protocol to send an empty
InterpretedWalRecords
to shardsif the WAL the records originated from ends the requested start position. On the pageserver,
that just updates the tracking LSNs in memory (no-op really). This gives us a workaround for
the fake gap.
As a drive by, make
InterpretedWalRecords::next_record_lsn
mandatory in the application level definition.It's always included.
Related: https://github.com/neondatabase/cloud/issues/23935