Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set container file label on workload API directory #105

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

azdagron
Copy link
Member

@azdagron azdagron commented May 14, 2023

This allows the driver to be used within OpenShift without using an init container to set the label.

This PR attempts to set the label unconditionally. It will fail on existing deployments that mount the Workload API socket directory read-only into the CSI driver container. Results are logged.

Fixes #54.

@azdagron azdagron requested a review from MarcosDY as a code owner May 14, 2023 00:53
@azdagron azdagron force-pushed the azdagron/set-selinux-container-file-label branch 5 times, most recently from 8446c10 to 6d03ceb Compare May 14, 2023 01:47
@azdagron azdagron force-pushed the azdagron/set-selinux-container-file-label branch 2 times, most recently from ffecf38 to fc4fe9c Compare May 14, 2023 02:16
Copy link

@erikgb erikgb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @azdagron! I would like to test this in our clusters with enforcing SELinux. Are images from PRs available from CI, or do I have to build it myself to test? I have been able to reproduce the issue on Openshift, but also on Rancher RKE2 with SELinux enforcing.

@@ -50,7 +50,7 @@ spec:
# driver will mount this directory into containers.
- mountPath: /spire-agent-socket
name: spire-agent-socket-dir
readOnly: true
#readOnly: true
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#readOnly: true

Or we could explicitly set it to false, with a comment about why?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh whoops, I didn't mean to leave this commented line in. But yes, I agree that being explicit and adding a comment would be very helpful and self-documenting.

// Set the SELinux label on the workload API directory. This allows the
// mount to be used within OpenShift, for example. This will fail if the
// Workload API socket directory is mounted read-only.
if err := chcon(config.WorkloadAPISocketDir, seLinuxContainerFileLabel, true); err != nil {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to check if SELinux is in enforcing mode before attempting this? That would allow users without enforcing SELinux to use a read-only mount and avoid log errors.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems reasonable. I'll look into how to do that.

@azdagron
Copy link
Member Author

azdagron commented May 14, 2023

Are images from PRs available from CI, or do I have to build it myself to test?

Unfortunately not at the moment. You'll have to build it yourself. Once it is merged, the nightly workflow will publish the image.

@azdagron
Copy link
Member Author

Hey, @erikgb ! I'd love to get this into a release soon. Have you had a chance to give this a go, by chance?

@erikgb
Copy link

erikgb commented May 29, 2023

Hey, @erikgb ! I'd love to get this into a release soon. Have you had a chance to give this a go, by chance?

Hi @azdagron! I had a look at building the image myself a couple of weeks ago. And the combination of a quite complex image build and me sitting on a corporate network appeared like a bad combination. 😉 But after your last poke, I did another attempt. And I think I managed to build the image from a local mirror with some minor adjustments to the build process.

I started off the upstream main branch, just to see that things were working as before. As I cannot "play around" in our Openshift clusters, I had to use our development RKE2 cluster - where I have reproduced the problem earlier.

  1. Build image from main branch, no change in config: Permission denied in the example-workload pod.
  2. Build image from your PR branch, no change in config: Same behavior in the example-workload pod. No error in the spiffe-csi-driver-container.
  3. Still use image from your PR branch, change volumeMount to read-write: Same behavior in the example-workload pod. No error in the spiffe-csi-driver-container.
  4. Add init container that runs chcon: The example-workload pod can connect to the agent socket and obtain it's identities.

So my testing so far shows that this PR does NOT currently fix the problem. Are you able to add some more logging to verify that I haven't messed up my local image build? Or maybe try it yourself in a cluster with SELinux enabled? I'll recommend Rancher RKE2 which we run in a single node configuration for early testing like this.

@erikgb
Copy link

erikgb commented May 29, 2023

After removing the init-container, this appears to work, but I'll guess that's because the SELinux settings are already applied on my node. If I change the host path for the agent sockets to a new directory, I'm back to the error in the example-workload: transport: Error while dialing: dial unix /spiffe-workload-api/agent.sock: connect: permission denied.

With the init-container in place this works again:

2023/05/29 12:09:59 Update:
2023-05-29T14:09:59.578614750+02:00 2023/05/29 12:09:59   SVIDs:
2023-05-29T14:09:59.578619524+02:00 2023/05/29 12:09:59     spiffe://infra-dev.mycompany.com/ns/egb-test/sa/default
2023-05-29T14:09:59.578622633+02:00 2023/05/29 12:09:59   Bundles:
2023-05-29T14:09:59.578626048+02:00 2023/05/29 12:09:59     infra-dev.mycompany.com (1 authorities)

@azdagron
Copy link
Member Author

Thanks for giving that a go @erikgb!

I've shuffled the code to unconditionally log SELinux status before attempting to do the chcon:

If you build and load the image correctly, you should get a log line like the following (except my status shows not enabled)

INFO	driver/driver.go:66	SELinux status	{"enabled": false, "enforceMode": -1, "processLabel": "", "fileLabel": ""}

Note that make docker-build does not end up loading the docker image into your local registry. You have to run docker load-images afterwards for that.

If you have time to give it a whirl, that would be helpful. In the meantime I'm going to attempt to get RKE2 up.

@azdagron azdagron force-pushed the azdagron/set-selinux-container-file-label branch from f36d619 to 7c0cfad Compare May 29, 2023 13:03
azdagron added 3 commits May 29, 2023 07:04
This allows the driver to be used within OpenShift without using an init
container to set the label.

Fixes #54.

Signed-off-by: Andrew Harding <[email protected]>
Signed-off-by: Andrew Harding <[email protected]>
Signed-off-by: Andrew Harding <[email protected]>
@azdagron azdagron force-pushed the azdagron/set-selinux-container-file-label branch from 7c0cfad to 84bcf56 Compare May 29, 2023 13:04
@erikgb
Copy link

erikgb commented May 29, 2023

@azdagron The added logging is interesting:

2023-05-29T13:34:16.189Z	INFO	spiffe-csi-driver/main.go:45	Starting.	{"version": "0.2.3-dev", "nodeID": "h1-a-iaut-kud01.mycompany.com", "workloadAPISocketDir": "/run/spire/sockets2", "csiSocketPath": "/spiffe-csi/csi.sock"}
2023-05-29T15:34:16.190791564+02:00 2023-05-29T13:34:16.190Z	INFO	driver/driver.go:68	SELinux status	{"enabled": false, "enforceMode": -1, "processLabel": "", "fileLabel": ""}
2023-05-29T15:34:16.190803336+02:00 2023-05-29T13:34:16.190Z	INFO	server/server.go:50	Listening...

Very strange indeed. SELinux is for sure enabled, causing the permission denied without the init container. Could there be some native requirements (inside container) for using https://github.com/opencontainers/selinux? I haven't looked at the code, but if it is (just) a Go wrapper around chcon, this won't work from the scratch base image. 🤔 If that's the case, I would expect an error, tho.

@erikgb
Copy link

erikgb commented May 29, 2023

I did a quick test using ubi9-minimal (which includes SELinux tools) as the base image, and no difference. 😞 There is something strange going on here....

@azdagron
Copy link
Member Author

Is SELinux fs visible in the CSI driver container? or does it need to be mounted somehow?

@erikgb
Copy link

erikgb commented May 29, 2023

@azdagron That was a good lead! Inspired by kubernetes-sigs/aws-ebs-csi-driver#1544 (comment) adding the volumes gives another behavior:

2023-05-29T14:06:17.915Z	INFO	spiffe-csi-driver/main.go:45	Starting.	{"version": "0.2.3-dev", "nodeID": "h1-a-iaut-kud01.mycompany.com", "workloadAPISocketDir": "/run/spire/sockets2", "csiSocketPath": "/spiffe-csi/csi.sock"}
2023-05-29T16:06:17.915939560+02:00 2023-05-29T14:06:17.915Z	INFO	driver/driver.go:68	SELinux status	{"enabled": true, "enforceMode": 1, "processLabel": "", "fileLabel": ""}
2023-05-29T16:06:17.916683952+02:00 2023-05-29T14:06:17.916Z	ERROR	spiffe-csi-driver/main.go:59	Failed to create driver	{"error": "failed to set the container file label on the Workload API socket directory: lsetxattr /run/spire/sockets2: invalid argument"}
2023-05-29T16:06:17.916696102+02:00 main.main
2023-05-29T16:06:17.916699682+02:00 	/code/cmd/spiffe-csi-driver/main.go:59
2023-05-29T16:06:17.916702715+02:00 runtime.main
2023-05-29T16:06:17.916705708+02:00 	/usr/local/go/src/runtime/proc.go:250

@azdagron
Copy link
Member Author

Ok, looks like I'm going to need to get RKE2 or something similar going to get this buttoned down.

Looks like this story might be improving for k8s 1.27 as well (with CSI driver support): https://kubernetes.io/blog/2023/04/18/kubernetes-1-27-efficient-selinux-relabeling-beta/

@empath-nirvana
Copy link

Any progress on this? We're up to k8s 1.29 now and that feature has been on by default since 1.28

@erikgb
Copy link

erikgb commented Mar 20, 2024

@empath-nirvana It would help if you could test this feature branch! And suggest improvements that might solve the issue.

@azdagron Are you able to rebase this PR? Seems like there are some conflicting files.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

unable to read csi driver mounted file in OpenShift
3 participants