Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

timeline_ancestor_detach: hardlinking where possible #8828

Open
koivunej opened this issue Aug 26, 2024 · 1 comment · May be fixed by #10729
Open

timeline_ancestor_detach: hardlinking where possible #8828

koivunej opened this issue Aug 26, 2024 · 1 comment · May be fixed by #10729
Labels
c/storage/pageserver Component: storage: pageserver t/feature Issue type: feature, for new features or requests

Comments

@koivunej
Copy link
Member

koivunej commented Aug 26, 2024

For all of the small number of uses timeline ancestor detach (#6994) has seen, there is a need to download layers in the restart following completion of the prepare phase after reset_tenant. The on-demand downloads happen for logical size calculation or compaction.

We could simply hard link all Layer instances which happen to be present already. Layer::keep_resident can be used to test for the residency, and keeping the layer resident while hard-linking happens. There are already TODO comments, and the way we fsync might need to change as well.

This is a performance optimization, but it is easy to do.

@koivunej koivunej added c/storage/pageserver Component: storage: pageserver t/feature Issue type: feature, for new features or requests labels Aug 26, 2024
@arpad-m
Copy link
Member

arpad-m commented Feb 7, 2025

@mtyazici just ran into this today: he wondered why it takes 50 seconds after an ancestor detach for the compute to be able to start. I would say this issue is responsible. the compute spends those 50 seconds in whatever gets into config_ms, which is consistent with it being in logical size calculation.

And indeed, we see on-demand downloads in the 50 seconds after the finishing of the ancestor detach in logical size calculation, which do download previously "remote copied" layers:

2025-02-07T17:55:24.042260Z  INFO request{method=PUT path=/v1/tenant/7aac26897db093502c77f1faaf345d19-0008/timeline/5709d5b4d006100b23e4f5a9506774e8/detach_ancestor request_id=aa2c4e3e-309e-42ff-b5e5-7176b7720644}:detach_ancestor{tenant_id=7aac26897db093502c77f1faaf345d19 shard_id=0008 timeline_id=5709d5b4d006100b23e4f5a9506774e8}: remote copied layer=000000067F000040050000405D0000128000-030000000000000000000000000000000002__000000026101B9C1-00000002BF56C4C9-00000060
2025-02-07T17:55:27.575303Z  INFO initial_size_calculation{tenant_id=7aac26897db093502c77f1faaf345d19 shard_id=0008 timeline_id=5709d5b4d006100b23e4f5a9506774e8}:logical_size_calculation_task:get_or_maybe_download{layer=000000067F000040050000405D0000128000-030000000000000000000000000000000002__000000026101B9C1-00000002BF56C4C9}: on-demand download successful size=264265728
2025-02-07T17:55:27.575332Z  INFO initial_size_calculation{tenant_id=7aac26897db093502c77f1faaf345d19 shard_id=0008 timeline_id=5709d5b4d006100b23e4f5a9506774e8}:logical_size_calculation_task:get_or_maybe_download{layer=000000067F000040050000405D0000128000-030000000000000000000000000000000002__000000026101B9C1-00000002BF56C4C9}: completing layer init for other tasks waiters=1
2025-02-07T17:55:27.576064Z  INFO initial_size_calculation{tenant_id=7aac26897db093502c77f1faaf345d19 shard_id=0008 timeline_id=5709d5b4d006100b23e4f5a9506774e8}:logical_size_calculation_task:get_or_maybe_download{layer=000000067F000040050000405D000011D145-030000000000000000000000000000000002__00000001C4E319D1-000000026101B9C1}: downloading on-demand reason=file was not found
2025-02-07T17:55:29.810944Z  INFO initial_size_calculation{tenant_id=7aac26897db093502c77f1faaf345d19 shard_id=0008 timeline_id=5709d5b4d006100b23e4f5a9506774e8}:logical_size_calculation_task:get_or_maybe_download{layer=000000067F000040050000405D000011D145-030000000000000000000000000000000002__00000001C4E319D1-000000026101B9C1}: on-demand download successful size=211820544
2025-02-07T17:55:29.810973Z  INFO initial_size_calculation{tenant_id=7aac26897db093502c77f1faaf345d19 shard_id=0008 timeline_id=5709d5b4d006100b23e4f5a9506774e8}:logical_size_calculation_task:get_or_maybe_download{layer=000000067F000040050000405D000011D145-030000000000000000000000000000000002__00000001C4E319D1-000000026101B9C1}: completing layer init for other tasks waiters=1

@arpad-m arpad-m linked a pull request Feb 7, 2025 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c/storage/pageserver Component: storage: pageserver t/feature Issue type: feature, for new features or requests
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants