Single primary cluster file per cluster #2229
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
Follow up of: #2200
Follow up of: #2204
Follow up of:#2216
Keep a single, long-lived cluster file per cluster UID that is once written and then only managed by the FDB client library.
The problem is that multiple entities are writing the cluster file, concurrently without synchronization, leading to races:
To decouple the fdbcli invocations, this change will write a temporary cluster file per fdbcli invocation, synthesized from the current connection string as indicated by the FDB client library. We discard the file immediately after use.
To remove the race with trying connection options, we adjust the lifecycle of the FDB admin client. We create a singleton instance per cluster UID, maintained by a singleton database provider. The initial connection always uses the seed connection string, afterwards we only poll the FDB client library for updates to the connection string.
With that, the FDB client library can exclusively manage the cluster file.
Type of change
Discussion
n/a
Testing
Unit tests pass but I still didn't manage to run the e2e tests.
Documentation
n/a
Follow-up
Not sure if there are still races with the lock client accessing the same cluster file.