-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: version 2 catalog serialization #26183
Conversation
Changed from using bitcode for the catalog to JSON. This applies to both the log files as well as snapshots. This required copying existing code into two places: - v1 - latest The types from latest are used as the "in-memory" types that we can work with throughout the codebase. The v1 types are only there for posterity and giving the ability to deserialize catalog files that predate this change; they are not meant to be used throughout the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, just a question
I think I get the idea of introducing the latest version, but it is not clear when moving from existing V1 version to latest is supposed to happen.
Since catalog logs aren't written periodically (this is my understanding, I could be wrong), will that mean unless there's a change to catalog itself the user won't be migrated to json (leaving them to use bitcode?). If that's the intention, that is fine but wanted to check if I'm missing something in this PR itself.
The idea is that existing files written with
Correct on both points. Existing We could force a checkpoint on startup, e.g., if files versioned before the latest get loaded, and that way, the old stuff should never need to be deserialized again. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A bunch of replicated code, but I think it's ultimately cleaner to have the clearly delineated versions
It's mainly my unfamiliarity on when the log/checkpoint is serialized that tripped me. It can stay as latest.
I'm not sure about the cycle, but if it loads old version files from disk and then writes to checkpoint file (at some periodic interval) and deletes the old log files then it is just a matter of time before old log files disappear anyway. In that case we probably don't need to force a snapshot. |
This introduces a new version for the catalog file formats (snapshot files and log files). The reason for introducing a new version is to change the serialization/deserialization format from
bitcode
to JSON. See #26180.The approach taken was to copy the existing type definitions for both log and snapshot files into two places: a
v1
module and av2
module. Going forward:v1
should not be changed. They are only there to enable deserialization of existing bitcode-serialized catalog files.v2
can be modified in a backward-compatible manor, and new types can be added to thev2
modulesWith this PR, old files are not overwritten. The server does not migrate any files on startup. See #26183 (comment)
Closes #26180