Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

artifact: permission error to read certificates #24462

Closed
ahjohannessen opened this issue Nov 14, 2024 · 6 comments
Closed

artifact: permission error to read certificates #24462

ahjohannessen opened this issue Nov 14, 2024 · 6 comments

Comments

@ahjohannessen
Copy link

I got this on Flatcar Linux this morning:

failed to download artifact "https://github.com/grpc-ecosystem/grpc-health-probe/releases/download/v0.4.35/grpc_health_probe-linux-amd64": getter subprocess failed: exit status 1: failed to download artifact: Get "https://github.com/grpc-ecosystem/grpc-health-probe/releases/download/v0.4.35/grpc_health_probe-linux-amd64": tls: failed to verify certificate: x509: failed to load system roots and no roots provided; open /etc/ssl/certs/ca-certificates.crt: permission denied

Seems something changed with regards to artifact permissions to read certificates:

tls: failed to verify certificate: x509: failed to load system roots and no roots provided; open /etc/ssl/certs/ca-certificates.crt: permission denied

After upgrading to 1.9.1 -> 1.9.3. Temporarily solved it by setting disable_filesystem_isolation = true, which probably is not a permanent fix or good idea?

On Fedora CoreOS machines I do not have this issue (yet).

Nomad version

1.9.3

Operating system and Environment details

Flatcar Container Linux

Flatcar Container Linux by Kinvolk stable 4081.2.0 for VMware
core@app03 ~ $ uname -a
Linux app03 6.6.60-flatcar #1 SMP PREEMPT_DYNAMIC Tue Nov 12 16:20:46 -00 2024 x86_64 Intel(R) Xeon(R) Gold 6138 CPU @ 2.00GHz GenuineIntel GNU/Linux

Fedora CoreOS:

Fedora CoreOS 41.20241027.3.0
core@app04:~$ uname -a
Linux app04 6.11.5-300.fc41.x86_64 #1 SMP PREEMPT_DYNAMIC Tue Oct 22 20:11:15 UTC 2024 x86_64 GNU/Linux
@tgross
Copy link
Member

tgross commented Nov 15, 2024

On Fedora CoreOS machines I do not have this issue (yet).

@ahjohannessen when you say you don't have this issue on CoreOS but you do on Flatcar, are you talking about the exact same version of Nomad? Also, don't both those distros run all the software as containers?

@tgross tgross self-assigned this Nov 15, 2024
@tgross tgross moved this from Needs Triage to Triaging in Nomad - Community Issues Triage Nov 15, 2024
@ahjohannessen
Copy link
Author

ahjohannessen commented Nov 16, 2024

On Fedora CoreOS machines I do not have this issue (yet).

@ahjohannessen when you say you don't have this issue on CoreOS but you do on Flatcar, are you talking about the exact same version of Nomad? Also, don't both those distros run all the software as containers?

@tgross

Same version of Nomad. I install the binaries with ansible-nomad, no container install.

For things like consul, consul-template, nomad and vault I prefer setting it up running outside containers. Everything else goes into containers that Nomad controls :)

@tgross
Copy link
Member

tgross commented Nov 18, 2024

Very puzzling... our Landlock library didn't change between 1.9.1 and 1.9.3 (we just upgraded it but that's not in shipped versions yet). #24157 landed in 1.9.2 but I don't see any way in which that could impact permissions for the getter subprocess, because (a) it only kicks in if you ask for it, and (b) it's applied after the artifact is downloaded, which is later than you see here. The go-getter library was updated for 1.9.0, so any change there would have impacted your 1.9.1 deployment as well.

A few more things for us to look at:

  • Can you verify the file permissions are identical between the two hosts?
  • Can you post the kernel version for both hosts, and check whether landlock is enabled for either?
  • Can you post the full log-line with a few lines of before-and-after context?

@ahjohannessen
Copy link
Author

A few more things for us to look at:

  • Can you verify the file permissions are identical between the two hosts?

Which file permissions should I look at? The error comes from open /etc/ssl/certs/ca-certificates.crt: permission denied
so here are permissions for that file on the host:

Flatcar Linux:

core@app03 ~ $ ls -l /etc/ssl/certs/ca-certificates.crt 
lrwxrwxrwx. 1 root root 54 Nov 12 15:19 /etc/ssl/certs/ca-certificates.crt -> ../../../usr/share/ca-certificates/ca-certificates.crt

core@app03 ~ $ realpath /etc/ssl/certs/ca-certificates.crt | xargs ls -l
-rw-r--r--. 1 root root 263406 Nov 12 15:19 /usr/share/ca-certificates/ca-certificates.crt

Fedora CoreOS:

core@app04:~$ ls -l /etc/ssl/certs/ca-certificates.crt 
lrwxrwxrwx. 1 root root 20 Nov 13 14:15 /etc/ssl/certs/ca-certificates.crt -> ../tls-ca-bundle.pem

core@app04:~$ realpath /etc/ssl/certs/ca-certificates.crt | xargs ls -l
-r--r--r--. 1 root root 226489 Nov 13 14:15 /etc/pki/ca-trust/extracted/pem/tls-ca-bundle.pem
  • Can you post the kernel version for both hosts, and check whether landlock is enabled for either?

Flatcar Linux:

core@app03 ~ $ uname -r
6.6.60-flatcar
core@app03 ~ $ sudo dmesg | grep landlock || journalctl -kb -g landlock
[    0.199244] LSM: initializing lsm=lockdown,capability,landlock,selinux,integrity
[    0.199816] landlock: Up and running.

Fedora CoreOS:

core@app04:~$ uname -r
6.11.5-300.fc41.x86_64
core@app04:~$ sudo dmesg | grep landlock || journalctl -kb -g landlock
[    0.246114] LSM: initializing lsm=lockdown,capability,yama,selinux,bpf,landlock,ima,evm
[    0.247395] landlock: Up and running.
  • Can you post the full log-line with a few lines of before-and-after context?
permission denied"
Nov 14 10:41:17 app03 nomad[524522]:  client.alloc_runner.task_runner: Task event: alloc_id=14290cbf-a0d9-024a-fd57-e04597e175a0 task=fetch-health-probe-iam type="Failed Artifact Download" msg="failed to download artifact \"https://github.com/grpc-ecosystem/grpc-health-probe/releases/download/v0.4.35/grpc_health_probe-linux-amd64\": getter subprocess failed: exit status 1: failed to download artifact: Get \"https://github.com/grpc-ecosystem/grpc-health-probe/releases/download/v0.4.35/grpc_health_probe-linux-amd64\": tls: failed to verify certificate: x509: failed to load system roots and no roots provided; open /etc/ssl/certs/ca-certificates.crt: permission denied" failed=false
Nov 14 10:41:17 app03 nomad[524522]:     2024-11-14T10:41:17.790Z [ERROR] client.alloc_runner.task_runner: prestart failed: alloc_id=14290cbf-a0d9-024a-fd57-e04597e175a0 task=fetch-health-probe-iam error="prestart hook \"artifacts\" failed: failed to download artifact \"https://github.com/grpc-ecosystem/grpc-health-probe/releases/download/v0.4.35/grpc_health_probe-linux-amd64\": getter subprocess failed: exit status 1: failed to download artifact: Get \"https://github.com/grpc-ecosystem/grpc-health-probe/releases/download/v0.4.35/grpc_health_probe-linux-amd64\": tls: failed to verify certificate: x509: failed to load system roots and no roots provided; open /etc/ssl/certs/ca-certificates.crt: permission denied"
Nov 14 10:41:17 app03 nomad[524522]: client.alloc_runner.task_runner: prestart failed: alloc_id=14290cbf-a0d9-024a-fd57-e04597e175a0 task=fetch-health-probe-iam error="prestart hook \"artifacts\" failed: failed to download artifact \"https://github.com/grpc-ecosystem/grpc-health-probe/releases/download/v0.4.35/grpc_health_probe-linux-amd64\": getter subprocess failed: exit status 1: failed to download artifact: Get \"https://github.com/grpc-ecosystem/grpc-health-probe/releases/download/v0.4.35/grpc_health_probe-linux-amd64\": tls: failed to verify certificate: x509: failed to load system roots and no roots provided; open /etc/ssl/certs/ca-certificates.crt: permission denied"
Nov 14 10:41:17 app03 nomad[524522]:     2024-11-14T10:41:17.979Z [ERROR] client.artifact: sub-process: OUTPUT="failed to download artifact: Get \"https://github.com/grpc-ecosystem/grpc-health-probe/releases/download/v0.4.35/grpc_health_probe-linux-amd64\": tls: failed to verify certificate: x509: failed to load system roots and no roots provided; open /etc/ssl/certs/ca-certificates.crt: permission denied"
Nov 14 10:41:17 app03 nomad[524522]:     2024-11-14T10:41:17.979Z [INFO]  client.alloc_runner.task_runner: Task event: alloc_id=e5b16765-2881-afa5-c60b-c6f23754376d task=fetch-health-probe-mgmt type="Failed Artifact Download" msg="failed to download artifact \"https://github.com/grpc-ecosystem/grpc-health-probe/releases/download/v0.4.35/grpc_health_probe-linux-amd64\": getter subprocess failed: exit status 1: failed to download artifact: Get \"https://github.com/grpc-ecosystem/grpc-health-probe/releases/download/v0.4.35/grpc_health_probe-linux-amd64\": tls: failed to verify certificate: x509: failed to load system roots and no roots provided; open /etc/ssl/certs/ca-certificates.crt: permission denied" failed=false
Nov 14 10:41:17 app03 nomad[524522]: client.artifact: sub-process: OUTPUT="failed to download artifact: Get \"https://github.com/grpc-ecosystem/grpc-health-probe/releases/download/v0.4.35/grpc_health_probe-linux-amd64\": tls: failed to verify certificate: x509: failed to load system roots and no roots provided; open /etc/ssl/certs/ca-certificates.crt: permission denied"

@tgross
Copy link
Member

tgross commented Nov 21, 2024

Thanks @ahjohannessen! It looks like the problem is with the real path to the certs here: /usr/share/ca-certificates/ca-certificates.crt. Landlock follows symlinks so that we have protection of the real paths, and /usr/share isn't in the list of files added to the Landlock sandbox (see https://github.com/shoenig/go-landlock/blob/v1.2.2/path_linux.go#L110-L119).

The way to workaround this will be to add that path to the client.filesystem_isolation_extra_paths. That'll should allow the artifact getter to use those certs without having to turn off the protection entirely.

@ahjohannessen
Copy link
Author

ahjohannessen commented Nov 21, 2024

Did this on the client:

filesystem_isolation_extra_paths = ["f:r:/usr/share/ca-certificates/ca-certificates.crt"]

Fixes the issue :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

No branches or pull requests

2 participants