Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How does Buck2 digest directories? Is it stable enough to share with other projects? #866

Open
Ericson2314 opened this issue Mar 5, 2025 · 4 comments
Labels
question Further information is requested

Comments

@Ericson2314
Copy link

I see

I am asking because I have been talking to the rest of the Nix team about making a Merkle alternative / replacement to NAR, and it would be nice to do this avoiding NIH, especially if it can allow us to reuse a bunch of high-quality FUSE and remote-execution network protocol work :),

We already export git-hashing on an experimental basis, but it is is not quite fit-for-purpose. (The main issue is that Nix store object can just be a single symlink, but without a root directory, Git cannot distinguish non-executable files, executable-files, and symlinks.) And of course, using Blake for speed, which matters far more with binary artifacts than source code, is quite attractive.

@alexlian
Copy link
Contributor

alexlian commented Mar 5, 2025

https://github.com/facebook/buck2/blob/main/docs/rfcs/drafts/digest-kinds.md which indicates an interest in developing a new Blake-based method.

More than an interest, it was implemented if you search the code. We've migrated internally now and also passing through Sapling/EdenFS, and IIRC it's configurable. We'd have to ask the Mononoke / EdenFS folk to answer how that's done on their side though.

@Ericson2314
Copy link
Author

@alexlian thanks! I made the silly mistake of trying to find my code on my phone, there's a bit too much for that :).

Is your answer implying that Buck2 usually asks EdenFS to hash the director on its behalf? Should I therefore go there to figure out what the exact hashing format is?

@alexlian
Copy link
Contributor

alexlian commented Mar 6, 2025

I probably need to double check when/where Eden does it, though I think this is the code in buck2:

app/buck2_directory/src/directory/directory_data.rs

@alexlian alexlian added the question Further information is requested label Mar 6, 2025
@sluongng
Copy link
Contributor

sluongng commented Mar 7, 2025

@Ericson2314 You might be interested in reading about Directory and DirectoryNode in Remote Build API https://github.com/search?q=repo%3Abazelbuild%2Fremote-apis%20%2Fmessage%20Directory%2F&type=code
They help encode directories, files and symlinks into a merkle tree of protobuf messages. It's used by Buck2(at least the OSS version), Bazel and some other build tools.

Both Buck2 and Bazel have the ability to compute the merkle tree manually, or short circuit to an xattr with a pre-computed digest provided by the underlying FUSE. I think in Buck2 case, both Eden and Buck2 uses blake3 to help with computing digest of larger blobs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants