Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nix builds have empty githash #4015

Closed
1 task done
enricozb opened this issue Apr 28, 2024 · 17 comments
Closed
1 task done

Nix builds have empty githash #4015

enricozb opened this issue Apr 28, 2024 · 17 comments
Labels
bug Something isn't working

Comments

@enricozb
Copy link
Contributor

Prerequisites

  • Put an X between the brackets on this line if you have done all of the following:
    • Check that your issue is not already filed.
    • Reduce the issue to a minimal, self-contained, reproducible test case. Avoid dependencies to mathlib4 or std4.

Description

When installing lean4 via a flake, it's version string does not contain a git sha1 hash. When hashing lean files (for downloading mathlib's cache), these hashes depend on the git commit hash. Thus using the mathlib cache fails when using a lean/lake through nix.

Proposed Solution

I wanted to open a PR immediately but did not do so since the guidelines request an issue is opened first. I have a patch here that adds the git hash into CMakeLists.txt before building. There is some precedence for this.

Context

Here is a Zulip discussion and an investigation of the issue.

Steps to Reproduce

When installing lean4 via its nix flake, it's version string does not contain a git sha1 hash. For example, entering a devshell with this flake.nix file

{
  description = "...";
  inputs.nixpkgs.url = "github:NixOS/nixpkgs/nixpkgs-unstable";
  inputs.flake-utils.url = "github:numtide/flake-utils";

  inputs.lean4.url = "github:leanprover/lean4/v4.7.0";

  outputs = { self, nixpkgs, flake-utils, lean4 }:
    flake-utils.lib.eachDefaultSystem (system:
      let
        pkgs = nixpkgs.legacyPackages.${system};
        lean4-pkgs = lean4.packages.${system};
      in {
        devShells.default = pkgs.mkShell {
          packages = [ lean4-pkgs.lean-all lean4-pkgs.vscode ];
        };
      });
}

and running lean --version produces:

Lean (version 4.7.0, x86_64-unknown-linux-gnu, Release)

Notice there is no commit in the version.

Expected Behavior

A properly built lean binary outputs the following:

Lean (version 4.8.0, x86_64-unknown-linux-gnu, commit 23a038925adf, Release)

Impact

Add 👍 to issues you consider important. If others are impacted by this issue, please ask them to add 👍 to it.

@enricozb enricozb added the bug Something isn't working label Apr 28, 2024
@nomeata
Copy link
Collaborator

nomeata commented Apr 29, 2024

Just adding the git hash here would defeat most of benefits of using nix, such as a cache hit if you build identical sources.

I'd propose to implement it differently: a nix'ed lean binary ought to look in its own nix store path, and use the nix hash therein as the “githash”. This should have the uniqueness guarantees lake wants, without causing unnecessary rebuilds.

Happy to review a PR

@digama0
Copy link
Collaborator

digama0 commented Apr 29, 2024

Just adding the git hash here would defeat most of benefits of using nix, such as a cache hit if you build identical sources.

I think this is not an option because Lean.githash is a value available to code compiled using the generated lean compiler. Not using the right githash causes these programs to behave incorrectly, and the githash is also embedded in .olean files and lean will not load files with the wrong githash. (I think this design is too strict, but) this is what we have, and it means that you can't really produce a correct lean build if it doesn't have the right githash. Maybe you can get away with this for dev builds but definitely not releases.

@nomeata
Copy link
Collaborator

nomeata commented Apr 29, 2024

What is “this” you are referring to? You only quote the first sentence, but that’s not what I am proposing.

What is the correctness requirement for the githash value? Is the following sound: “If the sources that were used to built the lean executable differe, then the githash value differs” (I assumed it was).

So if the lean binary itself knows “a hash of all sources used to build it it” (which the nix store path provides), and the lean binary uses that hash as the “githash” everywhere (in oleans, as the value of Lean.githash, in lean --version), then all should be well, shouldn’t it? Or what could break?

@enricozb
Copy link
Contributor Author

enricozb commented Apr 29, 2024

Just adding the git hash here would defeat most of benefits of using nix, such as a cache hit if you build identical sources.

How would you have identical sources if the git hash is different? I'm probably misunderstanding this sentence.

What is the correctness requirement for the githash value?

The mathlib cached .olean files have a hash that is derived from the git hash of lean that was used to build them. If we were to use some nix hash as the "git hash", users of lean installed with nix would never be able to use the mathlib cache. This nix-hash lean would request files with a hash that do not exist in the remote cache.

Additionally, flakes expose the git hash used to build them through self.rev.

@nomeata
Copy link
Collaborator

nomeata commented Apr 29, 2024

How would you have identical sources if the git hash is different? I'm probably misunderstanding this sentence.

git commit --allow-empty -m "New commit"

or some form of rebasing.

If we were to use some nix hash as the "git hash", users of lean installed with nix would never be able to use the mathlib cache.

Yes, that’s what I would expect: It’s a different build of lean and it seems actively dangerous to use the same build artifacts. So sounds like a feature to me, not a bug.

Additionally, flakes expose the git hash used to build them through self.rev.

Right, but that’s dangerous. Imagine someone using such a flake, but then applies a patch using .override. If we use the git hash rather than something derived from the nix derivation hash they will get cache hits when they (probably) shouldn’t.

@enricozb
Copy link
Contributor Author

Fair enough, I think this is enough rationale not to pursue adding the git hash like I proposed. Since nix installs are not technically supported, nix users can patch in the git hash if they wish.

@nomeata
Copy link
Collaborator

nomeata commented Apr 29, 2024

Ah, I missed that the primary motivation was compatibility with mathlib’s cache (as opposed to, say, lake build rebuilding when necessary), sorry for that.

For the reasons above I am hesitant to make nix builds out of the box pretend that they equivalent to some particular release build.

Does setting LEAN_GITHASH on your end help? It is taken into account by lake, not sure if cache also uses it. It still is a bit tedious to find out the release’s hash and set the environment variable, but at least it would provide a way out for those who understand what they are doing.

@enricozb
Copy link
Contributor Author

Is LEAN_GITHASH read during runtime? Setting it to a garbage value and doing lake update still downloads the cache successfully (making me think it's not being used in the way I expect).

@nomeata
Copy link
Collaborator

nomeata commented Apr 29, 2024

It is a lake concept, and a new one at it. Judging from the cache source it is currently not heeded. One could argue that maybe it should.

@enricozb
Copy link
Contributor Author

Yeah that would simplify using the cache through a nix install. The devshell could then export LEAN_GITHASH locked to the lean4 revision that was used as an input to the flake.

I think we just need to change this line to have Lean.gitHash be overridden by an env var if present.

@nomeata
Copy link
Collaborator

nomeata commented Apr 29, 2024

Exactly. (Maybe a bit more robust to use a wrapper around the lake binary to set the environment variable – you wouldn’t want it to apply just because you happen to be in one project’s devshell but you run some lean from somewhere else.)

@enricozb
Copy link
Contributor Author

Yeah but I think the mathlib cache code runs lake (and lean) through elan, instead of letting the current PATH find those binaries.

@digama0
Copy link
Collaborator

digama0 commented Apr 30, 2024

It is important that cache uses Lean.githash i.e. the compile-time-known hash, rather than one produced from runtime data, because lean itself also checks the githash (it is stored in olean files) and will produce the "invalid header" message otherwise. So even if you hack cache to not check this value, it will result in cache downloading files that lean refuses to use.

@nomeata
Copy link
Collaborator

nomeata commented Apr 30, 2024

Doesn't lean adhere to LEAN_GITHASH? The original use case for this is for developers to use a different version of lean without rebuilding everything (which is of course risky, but also very useful when debugging), so I'd assume it would take precedence over the value compiled in at build time. Away from the computer right now, so can't check easily.

@Kha
Copy link
Member

Kha commented Aug 1, 2024

The Nix build has been officially deprecated #4895

@Kha Kha closed this as not planned Won't fix, can't repro, duplicate, stale Aug 1, 2024
@enricozb
Copy link
Contributor Author

enricozb commented Aug 1, 2024

Was there any Zulip discussion on deprecating the Nix build?

@nomeata
Copy link
Collaborator

nomeata commented Aug 1, 2024

Only in so far that a few questions about the nix setup were responded to by saying that it's unfortunately unsupported and users are on their own.

I do hope that someone will pick it up and continue to maintain a nix setup for lean, albeit in a separate repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants