Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Evaluate if seccomp-mdwe is a good addition to dangerzone #1082

Open
randomhydrosol opened this issue Feb 16, 2025 · 10 comments
Open

Evaluate if seccomp-mdwe is a good addition to dangerzone #1082

randomhydrosol opened this issue Feb 16, 2025 · 10 comments
Labels
enhancement New feature or request

Comments

@randomhydrosol
Copy link

randomhydrosol commented Feb 16, 2025

I am the author of the seccomp-mdwe project and would like to see if it might be useful for Dangerzone. Currently, Dangerzone relies on gVisor for sandboxing, but gVisor by itself does not prevent an attacker from running new untrusted code, which could potentially enable a sandbox escape.

My library addresses this by installing a seccomp filter that completely disallows in-memory code generation. It has been built and tested with gVisor, and works on both x86_64 and arm64 architectures (though there is no 32-bit support).

In practical terms, even if an attacker exploits a vulnerability in LibreOffice or MuPDF, they cannot generate and execute new code in memory. Instead, they would be forced to rely on complex techniques such as constructing a ROP chain, significantly increasing the difficulty of a successful attack.

Please let me know if this is something you would be interested in integrating into the project.

@randomhydrosol
Copy link
Author

randomhydrosol commented Feb 16, 2025

note that this library has a dependency on the hardened malloc project

While I recommend adding hardened malloc to the container itself and injecting it into every process with LD_PRELOAD or similar, that would depend on the software used in the container not having serious memory handling issues with malloc() calls

I am happy to assist with integration as needed

@randomhydrosol
Copy link
Author

Note that this is for the inner container. The outer container should ideally use https://man7.org/linux/man-pages/man2/pr_set_mdwe.2const.html which is not supported by gvisor

@apyrgio
Copy link
Contributor

apyrgio commented Feb 17, 2025

Hey @randomhydrosol, thanks a lot for letting us know about your project. I'm not that familiar with exploits, ROP gadgets and what not, so bear with me while I try to understand a bit more:

  1. Let me do a quick summary of what I understood: seccomp-mdwe would run within the gVisor sandbox. Probably needs to run as PID 1 in that sandbox, and then call Python. In doing that, we would make sure that LibreOffice / MuPDF cannot have memory sections that are both writable and executable. This would make RCEs in these programs more difficult, assuming they exploit a buffer overflow in the first place. The gVisor kernel would not be protected by this restriction, unless we run it externally with PR_SET_MDWE.
    • Did I get this right?
  2. Is this something that would make sense to add in gVisor as well, perhaps as an added protection for the sandboxed process?
  3. Any idea why PR_SET_MDWE does not work with gVisor? It seems that rclone was crashing with W^X in FreeBSD, but it turned out that this was a Go restriction (see cmd/link: unable to execute go binaries on a FreeBSD system with W^X enabled golang/go#48112). I think it's a good idea to let the gVisor devs know about this, but that's up to you.
  4. Do you perhaps know what's the oldest Linux kernel that supports seccomp-mdwe?

Finally, I tried to build your project in Debian Bookworm, but I get the following errors when running ./build.sh:

clang: error: '-ftrivial-auto-var-init=zero' hasn't been enabled; enable it at your own peril for benchmarking purpose only with '-enable-trivial-auto-var-init-zero-knowing-it-will-be-removed-from-clang'
clang: error: invalid linker name in argument '-fuse-ld=lld'
clang: error: invalid linker name in argument '-fuse-ld=lld'
clang: error: '-ftrivial-auto-var-init=zero' hasn't been enabled; enable it at your own peril for benchmarking purpose only with '-enable-trivial-auto-var-init-zero-knowing-it-will-be-removed-from-clang'
clang: error: invalid linker name in argument '-fuse-ld=lld'
clang: error: invalid linker name in argument '-fuse-ld=lld'

If you have any build instructions, that would help. Else, I can open an issue in your repo and continue the discussion there.

@randomhydrosol
Copy link
Author

Yes, the gVisor kernel itself will not be protected unless the relevant prctl call is made. You cannot use seccomp-mdwe outside the gVisor sandbox for two reasons. First, outside the sandbox the kernel typically supports both 32-bit and 64-bit ABIs, whereas gVisor only supports a single 64-bit ABI for the architecture it was built for. It is impossible to write a seccomp filter for a 32-bit environment that effectively enforces W^X because syscalls like shmat are multiplexed, meaning there is always a way to obtain writable memory unless the kernel is the one enforcing W^X (for instance, via PR_SET_MDWE). However, the lack of W^X enforcement in the gVisor kernel itself is not a major concern, as the gVisor sentry is written in a memory-safe language; exploiting vulnerabilities to write arbitrary code into the sentry’s memory is far harder than exploiting the typical programs that the sandbox runs. If your host kernel does not support this prctl, it is not a huge dealbreaker.

Secondly, seccomp-mdwe on arm64 is limited to working only inside gVisor. A seccomp-based solution can never truly enforce W^X on arm64 because the linker will often try to do something like mprotect(..., PROT_EXEC | PROT_BTI | PROT_MTE, ...), which a strict seccomp filter disallows. This does not arise inside gVisor because gVisor implements the 4.4 kernel ABI, which lacks those extra permissions and thus avoids the linker issue. To be clear, seccomp-mdwe was purpose-built to enforce W^X specifically within gVisor, and any use outside gVisor is unsupported (even if it may work on x86_64).

Regarding the build error, it likely happens because we compile with hardened flags as a best practice and test on Arch Linux or Gentoo, whose compilers are more recent than those in LTS distributions. To fix this, I plan to add a GCC build script specifically for LTS-based distributions like Debian stable.

Once W^X is enforced, attackers with a code execution vulnerability can still execute code but are severely limited in the malicious actions they can perform. A typical sandbox escape is straightforward when one can write code directly into memory, but if the attacker has an exploit against gVisor and W^X is enforced, they cannot simply write their payload. Instead, they must construct a ROP chain, relying on existing code in the process—something that can be 10–20 times more difficult based on real-world exploit pricing. A secondary, more disturbing possibility is if an attacker finds a vulnerability in a privileged process outside the sandbox and tricks that process into accessing sandbox memory; if the memory is writable and executable, they can plant malicious code for the privileged process to run. Disallowing new executable pages mitigates this vector to a large extent. While Clang’s CFI can make ROP attacks harder still, Linux userspace is generally not compiled with CFI, so W^X is currently the best practical defense. ROP attacks on gVisor’s own kernel are less of a concern because the sentry is primarily written in a memory-safe language, making control flow hijacking significantly more difficult.

seccomp-mdwe need not run as PID 1 inside the sandbox; you can enable it exactly when untrusted code is about to run, and provide an option for users to decide if they want it enabled. As for why PR_SET_MDWE cannot be used inside gVisor, it is because gVisor tries to match the 4.4 kernel ABI, which predates this feature.

Further thoughts:

On Windows and macOS, the secure kernel or hypervisor typically enforces W^X and includes CFI-like techniques (for example, Control Flow Guard on Windows, or KTRR and PAC on macOS), making VM escapes significantly harder—though not strictly impossible. Hyper-V on Windows also helps mitigate processor-based vulnerabilities such as Spectre. By contrast, Linux does not currently have a reliable way to enforce W^X within the kernel itself, because doing so requires a hypervisor or another layer with privileges exceeding the Linux kernel, and no production implementation of that is publicly available. Consequently, a Linux VM escape remains a very pressing problem, and this is simply a tradeoff we have to accept for the time being. I would be quite surprised if an attacker, even a nation state one, managed to defeat both gvisor as well as hyperv and escaped to the host due to how strong their mitigations are though

@randomhydrosol
Copy link
Author

randomhydrosol commented Feb 17, 2025

A few other bypasses definitely come to mind:
Strong filesystem W^X is needed. The docker container should have the root filesystem read-only and any bind mounts/tmpfs mounts must be mounted as noexec

In my organization we are using a kernel with CFI as W^X is impractical to at least make exploiting the host kernel from within the gvisor sandbox extremely hard but that’s a separate discussion for another day and another thread. Something like firecracker VMs on Linux that will run gvisor may be a more isolated approach but I’m not sure

The nonewprivs bit must be set in docker security settings for the container (if not already set) and all capabilities except what are needed should be dropped as gvisor changes sandbox enforcement slightly based on capabilities granted to the container

@randomhydrosol
Copy link
Author

Also feel free to open an issue in my repo if desired.

@randomhydrosol
Copy link
Author

As for point 4, the oldest kernel that seccomp-mdwe supports is kernel 4.4

It has been tested from 4.4-mainline on x86_64

When it comes to gvisor not running in an environment that enforces W^X (as per the BSD issue you linked), I unfortunately do not have a clear answer

@apyrgio
Copy link
Contributor

apyrgio commented Feb 24, 2025

Thanks a lot for the detailed replies, and sorry it took me a while to respond.

Let me give you some context on where we are dev-wise. We're currently in the middle of decoupling our container image from our application, so that we can offer security updates in a more timely fashion (see #1006).

This means a couple of things for seccomp-mdwe. First of all, 0.9.0 will be focused on this feature, so other features will move to a subsequent milestone. Second, 0.7.0 and 0.8.0 were very stability/security heavy, and were lacking user-facing features. We want to improve our application's UX, and make it easier to use, which will be a tangible change for our users, and will have positive second-order effects on Dangerzone's security.

So, what I'm trying to say is that we expect that the independent container updates feature will be the last in a long series of security updates, and then we want to switch focus on something higher up the stack. This means that integrating seccomp-mdwe will not be a high priority for us, even though it has its merits.

However, I plan to keep an eye on upcoming security incidents, and track if any of those would have been severely nerfed by seccomp-mdwe. If that's the case, we may choose to prioritize it.

@randomhydrosol
Copy link
Author

I’m not entirely convinced that improving UX alone directly translates into stronger security. From my perspective, it feels more like marketing than a genuine mitigation strategy.

Enforcing W^X is a well-documented technique with real-world impact. According to Microsoft’s own engineers, implementing W^X in the browser process prevented exploitation of about half the vulnerabilities in the JavaScript engine, while also allowing for additional security features such as CFI (Control Flow Integrity) and ACG (Arbitrary Code Guard). They also noted, that CFI is essentially meaningless if a process can generate and execute new code, as that capability directly subverts control-flow integrity.

Multiple sources reinforce this viewpoint. Microsoft Edge’s “Enhanced Security” disables the V8 JIT compiler to reduce the attack surface and re-enable certain exploit mitigations in the renderer process. Windows Defender Application Control (WDAC) enforces Dynamic Code Security at the system level when virtualization-based security rollback protection is enabled. Hypervisor-Protected Code Integrity (HVCI) extends W^X constraints to the Windows kernel, thereby mitigating advanced kernel-level attacks. Meanwhile, Lockdown Mode on Apple devices adds W^X to a PAC-hardened Safari, making exploits significantly harder to craft. App Control and .NET Hardening further demonstrates how restricting dynamically generated code helps prevent injection and bypass techniques.

As a business that depends heavily on this software, we want to give back to the software we use. However, if the timeline for integrating seccomp-mdwe is pushed back indefinitely, I cannot guarantee that I—or the business—will be able to help with its integration as previously promised.

@lsd-cat
Copy link
Member

lsd-cat commented Feb 26, 2025

While we appreciate the interest and the offer for contribution, I do not think we can prioritize this type of mitigation strategy anytime soon. The whole purpose of the DZ threat model is knowing that exploits in office and media software will happen, and the most efficient mitigation is to keep the containers updated (which is why the team is focusing now on shipping independent container updates), well isolated and with as little surface for escapes as possible, as we are trying to do using gVisor. I fear that implementing application layer mitigations in this way would severely impact long-term maintainability of the software as a whole, for relatively little benefit.

This view might change in the future, especially as your software become more mature, or more widely used, not requiring us to implement specific changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: Todo
Development

No branches or pull requests

3 participants