-
Notifications
You must be signed in to change notification settings - Fork 521
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
compute: add non-skippable apply config operations. #11344
base: main
Are you sure you want to change the base?
Conversation
apply_config() step of compute start is controlled by skip_pg_catalog_updates flag, this is a performance optimization to decrease compute startup time, but it introduces extra dependency on cplane. Introduce small subset of operations that we run always, independent from this flag.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I would rather see this is a refactor of self.apply_config()
if possible. I'm happy to explain further if that doesn't make sense.
@@ -1619,8 +1640,24 @@ impl ComputeNode { | |||
"updated postgresql.conf to set neon.disable_logical_replication_subscribers=false" | |||
); | |||
} | |||
self.pg_reload_conf()?; | |||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PR description makes it sound like we should run this always, but the existence of the else seems to indicate we only run it sometimes. Can you clarify?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't delete the operations from regular audit_config()
, so they run as usual when the skip_pg_catalog_updates flag is set to False.
I agree that we can refactor apply_config()
further,
but before that I'd like to discuss if this change of logic makes sense.
Pls, check my reasoning:
It will slow down compute starts, but most of the times it will be no-op, so operations will be really fast.
The alternative solution is to check if log_level has changed on the cplane side of the operation and set skip_pg_catalog_updates=False
(or rater introduce new flag, because we don't want to run all catalog updates just for this small change).
But this "if log_level has changed" query to cplane database also takes some time, so I'm not sure if compute start time will be better.
*also added this to PR description.
Ignore the approval, whoops. |
@@ -1497,6 +1497,27 @@ impl ComputeNode { | |||
Ok::<(), anyhow::Error>(()) | |||
} | |||
|
|||
/// Apply config operations that are not covered by `skip_pg_catalog_updates` | |||
#[instrument(skip_all)] | |||
pub fn apply_config_non_skippable(&self, compute_state: &ComputeState) -> Result<()> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The alternative solution is to check if log_level has changed on the cplane side of the operation and set skip_pg_catalog_updates=False
I think this is the right approach, let's do not do some tricky exclusions in compute. We do this for first starts after branch creation or when roles/DBs are created via API. So the logic should be:
- Trigger apply config without updates skip on all running computes
- Marks the rest of the branches with force catalog updates, so the next start will force updates as well
This is a small-to-medium cplane change, so I think someone from cplane may help @clipperhouse and review this logic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not following.
Looks like this will be some tricky exclusion on cplane side....
The feature is: allow changing audit_log_level
for existing projects.
It requires endpoint restart, because we need to add extensions to shared_preload_libraries.
So the question is who should watch for audit_log_level
change to run Create extension`.
Options I see:
- don't watch at all, always try to
create extension pgaudit
if audit logging is enabled.
We already do this thing with neon extension update by the way.
See post_apply_config()
So we already have an exception (which we can unite with this code later).
I don't think 2 noop DDLs will impact performance a lot.
-
start_compute operation need to check previous state of
audit_log_level
(I don't think we preserve it anywhere, so this will be some strange query..) -
setting change should update some flag in cplane for all existing branches, so that start_compute knows that is must force catalog update. + start_compute should reset this flag per branch, when it's done.
Maybe this is not too hard to code, but I don't know if it's worth to introduce complicated logic for this change.
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See post_apply_config() So we already have an exception (which we can unite with this code later).
post_apply_config() is async, I don't think it's a good idea to put any audit-related there as it means that there are zero guarantees that i) it will succeed at all; ii) it won't take a lot of time, so it will race with any user actions, and the latter won't be audited <- this is not what I'd expect from the 'audit' feature. Please, correct me if I miss something
setting change should update some flag in cplane for all existing branches, so that start_compute knows that is must force catalog update. + start_compute should reset this flag per branch, when it's done.
Yes, this is somewhat my proposal, we already do this for branch creation, just in this case we need to update all live branches, so it will be more like
- 'Stop the world' -- suspend all computes, we already do this for enabling LR and adding new preload libs (soon)
- Mark all branches to do full config at next start
- This is likely it, but someone needs to review the current state of the code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
post_apply_config() is async, I don't think it's a good idea to put any audit-related there
Agree.
'Stop the world' -- suspend all computes, we already do this for enabling LR and adding new preload libs (soon)
Wow, I didn't know that this code exists. It makes this change a lot simpler.
And it makes total sense to restart all branches when we enable audit log.
Thank you for the review. I'll create a subtask for cplane issue.
2916 tests run: 2723 passed, 65 failed, 128 skipped (full report)Failures on Postgres 17
Failures on Postgres 15
Flaky tests (2)Postgres 17
Postgres 15
Test coverage report is not availableThe comment gets automatically updated with the latest test results
15b06a6 at 2025-03-21T17:07:10.690Z :recycle: |
to fix audit_logging configuration.
apply_config() step of compute start is controlled by skip_pg_catalog_updates flag,
this is a performance optimization to decrease compute startup time, but it introduces extra dependency on cplane.
Introduce small subset of operations that we run always, independent from this flag.
For reviewers:
It will slow down compute starts, but most of the times it will be no-op, so operations will be really fast.
The alternative solution is to check if log_level has changed on the cplane side of the operation and set
skip_pg_catalog_updates=False
(or rater introduce new flag, because we don't want to run all catalog updates just for this small change).
But this "if log_level has changed" query to cplane database also takes some time, so I'm not sure if compute start time will be better.