Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Concurrent instances of the same store hang on void LocalStore::createTempRootsFile() #11979

Open
2 tasks
roberth opened this issue Nov 27, 2024 · 1 comment
Open
2 tasks
Labels
bug store Issues and pull requests concerning the Nix store

Comments

@roberth
Copy link
Member

roberth commented Nov 27, 2024

Describe the bug

In a unit test suite which repeatedly opens the same store, concurrently, I'm experiencing a deadlock in flock via nix::lockFile(rw) via nix::LocalStore::createTempRootsFile()

createTempRootsFile was not designed with concurrent instances in mind. See Additional context.

I have not experienced this when running the tests with a daemon, perhaps because it introduces timing noise so that the critical section responsible for the hang is unlikely to (mis)align with the other workers, or the worker pid is used, which is unique.

Steps To Reproduce

Triggered by running this test in the Nix sandbox (so that it uses a local, alternate store)

nix build github:nixops4/nixops4/8ec34a75da90973a21335a2d844928741756b2b2#packages.x86_64-linux.nixops4-eval-release

(This program is also responsible for triggering realisations, despite its name. It's main reason to exist is evaluation though.)

Expected behavior

This problem could be solved in multiple ways:

a. Document that ensuring one store instance per store per process is the caller's responsibility

b. Use process-wide Sync<std::map<Path, TempRoots>>, where temproots creation and perhaps other operations go through the shared TempRoots object

c. Use process-wide Sync<std::map<Path, LocalStore>>, to dedup the store instances in openStore

  • It already has thread safety measures, including for temp roots
  • Does not allow multiple instances with different settings!

d. Use subdirectories prefix/$pid/$random or prefix/$pid-$random instead of prefix/$pid. I'd need to know more about the garbage collector file system state (which is undocumented). Clearing the $pid would be different. I don't know what the other implications are. Current state: see fnTempRoots below.

Metadata

Nix "master", 4f50b1d

Additional context

This section of createTempRootsFile is synced to the LocalStore instance, not to the file path, and openLockFile(_, true) is not an exclusive (O_EXCL) create operation, so multiple instances end up trying to lock the same fd. Fixing this race condition would solve the hang, but does not solve the erroneous clearing of other instances' temporary roots.

nix/src/libstore/gc.cc

Lines 57 to 65 in d467f7a

if (pathExists(fnTempRoots))
/* It *must* be stale, since there can be no two
processes with the same pid. */
unlink(fnTempRoots.c_str());
*fdTempRoots = openLockFile(fnTempRoots, true);
debug("acquiring write lock on '%s'", fnTempRoots);
lockFile(fdTempRoots->get(), ltWrite, true);

Initialization of fnTempRoots:

, fnTempRoots(fmt("%s/%d", tempRootsDir, getpid()))

Checklist


Add 👍 to issues you find important.

@roberth roberth added bug store Issues and pull requests concerning the Nix store labels Nov 27, 2024
roberth added a commit to nixops4/nixops4 that referenced this issue Nov 27, 2024
Fixes tests hanging. Before this commit:

    nix build .#packages.x86_64-linux.nixops4-eval-release

See NixOS/nix#11979
roberth added a commit to nixops4/nixops4 that referenced this issue Nov 27, 2024
Fixes tests hanging. Before this commit:

    nix build .#packages.x86_64-linux.nixops4-eval-release

See NixOS/nix#11979
@edolstra
Copy link
Member

edolstra commented Nov 28, 2024

Use subdirectories prefix/$pid/$random or prefix/$pid-$random

Yes, I think setting fnTempRoots to /nix/var/nix/temproots/<pid>-<counter> (where counter is an std::atomic that gets incremented in the LocalStore constructor) would do the trick. However, it would require changing LocalStore::findTempRoots() to support the new filename format, since currently it requires filenames to be a single integer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug store Issues and pull requests concerning the Nix store
Projects
None yet
Development

No branches or pull requests

2 participants