You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In a unit test suite which repeatedly opens the same store, concurrently, I'm experiencing a deadlock in flock via nix::lockFile(rw) via nix::LocalStore::createTempRootsFile()
createTempRootsFile was not designed with concurrent instances in mind. See Additional context.
I have not experienced this when running the tests with a daemon, perhaps because it introduces timing noise so that the critical section responsible for the hang is unlikely to (mis)align with the other workers, or the worker pid is used, which is unique.
Steps To Reproduce
Triggered by running this test in the Nix sandbox (so that it uses a local, alternate store)
(This program is also responsible for triggering realisations, despite its name. It's main reason to exist is evaluation though.)
Expected behavior
This problem could be solved in multiple ways:
a. Document that ensuring one store instance per store per process is the caller's responsibility
b. Use process-wide Sync<std::map<Path, TempRoots>>, where temproots creation and perhaps other operations go through the shared TempRoots object
c. Use process-wide Sync<std::map<Path, LocalStore>>, to dedup the store instances in openStore
It already has thread safety measures, including for temp roots
Does not allow multiple instances with different settings!
d. Use subdirectories prefix/$pid/$random or prefix/$pid-$random instead of prefix/$pid. I'd need to know more about the garbage collector file system state (which is undocumented). Clearing the $pid would be different. I don't know what the other implications are. Current state: see fnTempRoots below.
This section of createTempRootsFile is synced to the LocalStore instance, not to the file path, and openLockFile(_, true) is not an exclusive (O_EXCL) create operation, so multiple instances end up trying to lock the same fd. Fixing this race condition would solve the hang, but does not solve the erroneous clearing of other instances' temporary roots.
Use subdirectories prefix/$pid/$random or prefix/$pid-$random
Yes, I think setting fnTempRoots to /nix/var/nix/temproots/<pid>-<counter> (where counter is an std::atomic that gets incremented in the LocalStore constructor) would do the trick. However, it would require changing LocalStore::findTempRoots() to support the new filename format, since currently it requires filenames to be a single integer.
Describe the bug
In a unit test suite which repeatedly opens the same store, concurrently, I'm experiencing a deadlock in
flock
vianix::lockFile(rw)
vianix::LocalStore::createTempRootsFile()
createTempRootsFile
was not designed with concurrent instances in mind. See Additional context.I have not experienced this when running the tests with a daemon, perhaps because it introduces timing noise so that the critical section responsible for the hang is unlikely to (mis)align with the other workers, or the worker pid is used, which is unique.
Steps To Reproduce
Triggered by running this test in the Nix sandbox (so that it uses a local, alternate store)
(This program is also responsible for triggering realisations, despite its name. It's main reason to exist is evaluation though.)
Expected behavior
This problem could be solved in multiple ways:
a. Document that ensuring one store instance per store per process is the caller's responsibility
b. Use process-wide
Sync<std::map<Path, TempRoots>>
, where temproots creation and perhaps other operations go through the sharedTempRoots
objectc. Use process-wide
Sync<std::map<Path, LocalStore>>
, to dedup the store instances inopenStore
d. Use subdirectories
prefix/$pid/$random
orprefix/$pid-$random
instead ofprefix/$pid
. I'd need to know more about the garbage collector file system state (which is undocumented). Clearing the$pid
would be different. I don't know what the other implications are. Current state: seefnTempRoots
below.Metadata
Nix "master", 4f50b1d
Additional context
This section of
createTempRootsFile
is synced to theLocalStore
instance, not to the file path, andopenLockFile(_, true)
is not an exclusive (O_EXCL
) create operation, so multiple instances end up trying to lock the same fd. Fixing this race condition would solve the hang, but does not solve the erroneous clearing of other instances' temporary roots.nix/src/libstore/gc.cc
Lines 57 to 65 in d467f7a
Initialization of
fnTempRoots
:nix/src/libstore/local-store.cc
Line 112 in d467f7a
Checklist
Add 👍 to issues you find important.
The text was updated successfully, but these errors were encountered: