Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CORE-8530] Handle out-of-sync producers by writing records to past-schema parquet #24955

Open
wants to merge 5 commits into
base: dev
Choose a base branch
from

Conversation

oleiman
Copy link
Member

@oleiman oleiman commented Jan 28, 2025

builds on #24862 . interesting commits start at e215bfe

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v24.3.x
  • v24.2.x
  • v24.1.x

Release Notes

  • none

@oleiman oleiman self-assigned this Jan 28, 2025
@oleiman oleiman force-pushed the dlib/core-8530/write-historical-schema branch from 20ddb11 to 8e9d51a Compare January 28, 2025 03:26
@oleiman oleiman marked this pull request as ready for review January 28, 2025 03:26
@oleiman oleiman marked this pull request as draft January 28, 2025 06:06
@oleiman oleiman force-pushed the dlib/core-8530/write-historical-schema branch 4 times, most recently from b4ffa8f to 0d4f346 Compare January 30, 2025 08:07
@oleiman oleiman marked this pull request as ready for review January 30, 2025 08:07
@vbotbuildovich

This comment was marked as outdated.

@oleiman oleiman force-pushed the dlib/core-8530/write-historical-schema branch 2 times, most recently from 770eb1d to 4940eea Compare January 30, 2025 18:37
@vbotbuildovich
Copy link
Collaborator

vbotbuildovich commented Jan 30, 2025

CI test results

test results on build#61406
test_id test_kind job_url test_status passed
rptest.tests.archival_test.ArchivalTest.test_all_partitions_leadership_transfer.cloud_storage_type=CloudStorageType.S3 ducktape https://buildkite.com/redpanda/redpanda/builds/61406#0194b8ca-ef8b-4e94-8cdf-dc65e16620d1 FLAKY 1/2
rptest.tests.compaction_recovery_test.CompactionRecoveryTest.test_index_recovery ducktape https://buildkite.com/redpanda/redpanda/builds/61406#0194b8dd-56f2-4777-945b-eb3271a519fa FLAKY 1/2
rptest.tests.compaction_recovery_test.CompactionRecoveryUpgradeTest.test_index_recovery_after_upgrade ducktape https://buildkite.com/redpanda/redpanda/builds/61406#0194b8dd-56f3-4f40-bac0-2cefaaba54ea FLAKY 1/2
rptest.tests.random_node_operations_test.RandomNodeOperationsTest.test_node_operations.enable_failures=False.mixed_versions=False.with_tiered_storage=False.with_iceberg=True.with_chunked_compaction=True.cloud_storage_type=CloudStorageType.S3 ducktape https://buildkite.com/redpanda/redpanda/builds/61406#0194b8ca-ef8b-4e94-8cdf-dc65e16620d1 FLAKY 1/2
rptest.tests.scaling_up_test.ScalingUpTest.test_scaling_up_with_recovered_topic ducktape https://buildkite.com/redpanda/redpanda/builds/61406#0194b8ca-ef8c-4348-be15-900097961eff FLAKY 1/2
test results on build#61444
test_id test_kind job_url test_status passed
rptest.tests.compaction_recovery_test.CompactionRecoveryTest.test_index_recovery ducktape https://buildkite.com/redpanda/redpanda/builds/61444#0194bd88-6b8b-459d-948c-f3adbe590209 FLAKY 1/2
rptest.tests.compaction_recovery_test.CompactionRecoveryTest.test_index_recovery ducktape https://buildkite.com/redpanda/redpanda/builds/61444#0194bd8d-80f3-4231-bd1c-8143122d9f09 FLAKY 1/2
rptest.tests.compaction_recovery_test.CompactionRecoveryUpgradeTest.test_index_recovery_after_upgrade ducktape https://buildkite.com/redpanda/redpanda/builds/61444#0194bd88-6b8c-43da-97a1-4558b2638ed0 FLAKY 1/2
rptest.tests.compaction_recovery_test.CompactionRecoveryUpgradeTest.test_index_recovery_after_upgrade ducktape https://buildkite.com/redpanda/redpanda/builds/61444#0194bd8d-80f0-4139-b9b2-91f2b1163b67 FLAKY 1/2
rptest.tests.scaling_up_test.ScalingUpTest.test_scaling_up_with_recovered_topic ducktape https://buildkite.com/redpanda/redpanda/builds/61444#0194bd8d-80f0-4139-b9b2-91f2b1163b67 FLAKY 1/2
test results on build#61469
test_id test_kind job_url test_status passed
gtest_raft_rpunit.gtest_raft_rpunit unit https://buildkite.com/redpanda/redpanda/builds/61469#0194bea3-41a9-42b1-9fae-882f35ba3afd FLAKY 1/2
rptest.tests.e2e_shadow_indexing_test.ShadowIndexingWhileBusyTest.test_create_or_delete_topics_while_busy.short_retention=True.cloud_storage_type=CloudStorageType.ABS ducktape https://buildkite.com/redpanda/redpanda/builds/61469#0194bf02-be31-48a6-8d5f-96391010ac62 FLAKY 1/3

@oleiman oleiman force-pushed the dlib/core-8530/write-historical-schema branch from 4940eea to ac6735d Compare January 31, 2025 16:47
Copy link
Contributor

@andrwng andrwng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The C++ changes could use some testing, though this functionally looks pretty good to me

src/v/iceberg/table_metadata.h Outdated Show resolved Hide resolved
src/v/iceberg/compatibility_utils.cc Outdated Show resolved Hide resolved
Comment on lines +13 to +15
namespace iceberg {
bool schemas_equivalent(const struct_type& source, const struct_type& dest) {
chunked_vector<const nested_field*> source_stk;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could use some simple tests?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe same with table metadata and/or catalog_schema_manager

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah fair point. slipped my mind.

Comment on lines 291 to 292
auto source_copy = schema->schema_struct.copy();
auto compat_res = check_schema_compat(dest_type, schema->schema_struct);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessarily related to this PR, but this seems like a really easy footgun to hit. If the solution in general is to make a copy of the struct beforehand, should we make check_schema_compat take the source schema as non-const?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah good point. perhaps it would be most clear to pass by value.

Checks whether two structs are precisely equivalent[1] using a simultaneous
depth-first traversal.

The use case is for performing schema lookups on cached table metadata by type
rather than by ID.

[1] - Exclusive of IDs but inclusive of order.

Signed-off-by: Oren Leiman <[email protected]>
Search for a schema that matches the provided type.

Signed-off-by: Oren Leiman <[email protected]>
For catalog_schema_manager, we can use this to perform a type-wise schema
lookup on cached metadata, resulting in table_info bound to an arbitrary
schema rather (possibly) other than the current table schema.

Also update catalog_schema_manager::get_ids_from_table_meta to try a type-wise
lookup before performing the usual compat check. This way we can short-circuit
a schema update if the desired schema is already present in the table.

Also pass source struct to check_schema_compat by value to avoid polluting
cached table metadata with compat annotations.

Signed-off-by: Oren Leiman <[email protected]>
Rather than current schema ID. By this point we should have ensured that the
record schema exists in the table (either historically or currently). This
change lets us look past the current schema to build a writer for historical
data.

Signed-off-by: Oren Leiman <[email protected]>
@oleiman oleiman force-pushed the dlib/core-8530/write-historical-schema branch from ac6735d to f5a8ad8 Compare January 31, 2025 23:11
@oleiman oleiman requested a review from andrwng February 1, 2025 04:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants