-
Notifications
You must be signed in to change notification settings - Fork 602
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CORE-8530] Handle out-of-sync producers by writing records to past-schema parquet #24955
base: dev
Are you sure you want to change the base?
[CORE-8530] Handle out-of-sync producers by writing records to past-schema parquet #24955
Conversation
20ddb11
to
8e9d51a
Compare
b4ffa8f
to
0d4f346
Compare
This comment was marked as outdated.
This comment was marked as outdated.
770eb1d
to
4940eea
Compare
CI test resultstest results on build#61406
test results on build#61444
test results on build#61469
|
4940eea
to
ac6735d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The C++ changes could use some testing, though this functionally looks pretty good to me
namespace iceberg { | ||
bool schemas_equivalent(const struct_type& source, const struct_type& dest) { | ||
chunked_vector<const nested_field*> source_stk; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could use some simple tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe same with table metadata and/or catalog_schema_manager
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah fair point. slipped my mind.
auto source_copy = schema->schema_struct.copy(); | ||
auto compat_res = check_schema_compat(dest_type, schema->schema_struct); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not necessarily related to this PR, but this seems like a really easy footgun to hit. If the solution in general is to make a copy of the struct beforehand, should we make check_schema_compat
take the source schema as non-const?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah good point. perhaps it would be most clear to pass by value.
Checks whether two structs are precisely equivalent[1] using a simultaneous depth-first traversal. The use case is for performing schema lookups on cached table metadata by type rather than by ID. [1] - Exclusive of IDs but inclusive of order. Signed-off-by: Oren Leiman <[email protected]>
Search for a schema that matches the provided type. Signed-off-by: Oren Leiman <[email protected]>
For catalog_schema_manager, we can use this to perform a type-wise schema lookup on cached metadata, resulting in table_info bound to an arbitrary schema rather (possibly) other than the current table schema. Also update catalog_schema_manager::get_ids_from_table_meta to try a type-wise lookup before performing the usual compat check. This way we can short-circuit a schema update if the desired schema is already present in the table. Also pass source struct to check_schema_compat by value to avoid polluting cached table metadata with compat annotations. Signed-off-by: Oren Leiman <[email protected]>
Rather than current schema ID. By this point we should have ensured that the record schema exists in the table (either historically or currently). This change lets us look past the current schema to build a writer for historical data. Signed-off-by: Oren Leiman <[email protected]>
Signed-off-by: Oren Leiman <[email protected]>
ac6735d
to
f5a8ad8
Compare
builds on #24862 . interesting commits start at e215bfe
Backports Required
Release Notes