-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
harden prov args #49
Comments
update after sync'ing with @asraa : hashes are actually needed because other jobs from the calling workflow can upload artifacts too. |
it's impossible to protect the hash calculation step if the step runs in the same job as the build. I'll see if we can use the job.outputs to pass the entire binary as base64/hex instead. If this works well enough, we can think doing the same for other data, i.e. not rely on the artifact registry |
does not work |
update: I've separated the part that prints compiler arguments (including filename) in a different job. This part works. It is possible to store the entire binary in the logs, there's no limit size for logs it seems. But a rogue compiler would be able to patch some files and subvert the output too. The more I think about it., the more I think that's maybe not part of the threat model... :/ For example, python allows running script pre-packaging: in this case we have to trust the script anyway. This seems to indicate we cannot provide guarantees unless the script are not out right malicious. I think the only way for additional protection here is to "sandbox" the compilation ourselves and limit fs, network access in order to share only a thin interface with the compiler (docker container would be an amazing start). This way we'd have more confidence in the hash we output in the provenance. I wonder if we can use a dockerfile action for it (https://docs.github.com/en/actions/creating-actions/creating-a-docker-container-action) it does not seem to be the best option because it still allows printing messages, etc |
Fyi, I tested uploading an artifact from another job, and uploading a second time still works. So we do need the other hash and we cannot rely on immutability of the uploaded artifacts. |
Similar to this (and I'll review the PR in case it fits easily), can we achieve a "sandbox" around the binary upload by deleting any upload with the binary name that comes from the dry-run? |
yes, I think that will be covered in my last PR: I'm going to move the upload to the last step where we upload the provenance. (slipped thru the cracks last Friday) Glad that you caught it! |
Today, we build https://github.com/asraa/slsa-on-github/blob/main/.github/workflows/slsa-builder-go.yml#L204 and create the provenance https://github.com/asraa/slsa-on-github/blob/main/.github/workflows/slsa-builder-go.yml#L221 in the same job.
Now consider what happens if the compiler is passed a command that can print: this command
echo "::set-output name=compiler-command::..."
could be overwritten. We want to protect against this by separating the jobs. This is particularly needed if we output the compiler command that was run in the provenance, since the command is printed by the builder; and the compiler commands may overwrite the print in https://github.com/asraa/slsa-on-github/blob/main/build-go/pkg/build.go#L88Within the step, it's possible to print multiple time so we should always have the print as the last command. Within a job, different steps can print but that's fine because the output we use to retrieve the value from other jobs is done by accessing
job.step.id
If the compiler is tricked into running arbitrary commands, then it could backdoor the VM and hijack the last print which contains the hash computation: this is fine because with arbitrary commands, the attacker can backdoor the final binary anyway. This does not violate the claims that "no additional stuff was included in the final binary besides what's defined in the repo"
If the compiler is tricked into overwriting the binary's hash but not the binary itself, it will be caught in the next job by verifying the hash.
According to https://docs.github.com/en/actions/using-workflows/storing-workflow-data-as-artifacts
Downloading files: You can only download artifacts that were uploaded during the same workflow run. When you download a file, you can reference it by name.
and the API https://docs.github.com/en/rest/reference/actions#artifacts
meaning that we are guaranteed (today) that the artifacts was uploaded during the same workflow run. This further reduces the attack surface. It seems that the hash we use today for integrity are not strictly needed then.. but they are probably good defense in depth
The text was updated successfully, but these errors were encountered: