-
Notifications
You must be signed in to change notification settings - Fork 183
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Evaluate if seccomp-mdwe is a good addition to dangerzone #1082
Comments
note that this library has a dependency on the hardened malloc project While I recommend adding hardened malloc to the container itself and injecting it into every process with LD_PRELOAD or similar, that would depend on the software used in the container not having serious memory handling issues with malloc() calls I am happy to assist with integration as needed |
Note that this is for the inner container. The outer container should ideally use https://man7.org/linux/man-pages/man2/pr_set_mdwe.2const.html which is not supported by gvisor |
Hey @randomhydrosol, thanks a lot for letting us know about your project. I'm not that familiar with exploits, ROP gadgets and what not, so bear with me while I try to understand a bit more:
Finally, I tried to build your project in Debian Bookworm, but I get the following errors when running
If you have any build instructions, that would help. Else, I can open an issue in your repo and continue the discussion there. |
Yes, the gVisor kernel itself will not be protected unless the relevant prctl call is made. You cannot use seccomp-mdwe outside the gVisor sandbox for two reasons. First, outside the sandbox the kernel typically supports both 32-bit and 64-bit ABIs, whereas gVisor only supports a single 64-bit ABI for the architecture it was built for. It is impossible to write a seccomp filter for a 32-bit environment that effectively enforces W^X because syscalls like shmat are multiplexed, meaning there is always a way to obtain writable memory unless the kernel is the one enforcing W^X (for instance, via PR_SET_MDWE). However, the lack of W^X enforcement in the gVisor kernel itself is not a major concern, as the gVisor sentry is written in a memory-safe language; exploiting vulnerabilities to write arbitrary code into the sentry’s memory is far harder than exploiting the typical programs that the sandbox runs. If your host kernel does not support this prctl, it is not a huge dealbreaker. Secondly, seccomp-mdwe on arm64 is limited to working only inside gVisor. A seccomp-based solution can never truly enforce W^X on arm64 because the linker will often try to do something like mprotect(..., PROT_EXEC | PROT_BTI | PROT_MTE, ...), which a strict seccomp filter disallows. This does not arise inside gVisor because gVisor implements the 4.4 kernel ABI, which lacks those extra permissions and thus avoids the linker issue. To be clear, seccomp-mdwe was purpose-built to enforce W^X specifically within gVisor, and any use outside gVisor is unsupported (even if it may work on x86_64). Regarding the build error, it likely happens because we compile with hardened flags as a best practice and test on Arch Linux or Gentoo, whose compilers are more recent than those in LTS distributions. To fix this, I plan to add a GCC build script specifically for LTS-based distributions like Debian stable. Once W^X is enforced, attackers with a code execution vulnerability can still execute code but are severely limited in the malicious actions they can perform. A typical sandbox escape is straightforward when one can write code directly into memory, but if the attacker has an exploit against gVisor and W^X is enforced, they cannot simply write their payload. Instead, they must construct a ROP chain, relying on existing code in the process—something that can be 10–20 times more difficult based on real-world exploit pricing. A secondary, more disturbing possibility is if an attacker finds a vulnerability in a privileged process outside the sandbox and tricks that process into accessing sandbox memory; if the memory is writable and executable, they can plant malicious code for the privileged process to run. Disallowing new executable pages mitigates this vector to a large extent. While Clang’s CFI can make ROP attacks harder still, Linux userspace is generally not compiled with CFI, so W^X is currently the best practical defense. ROP attacks on gVisor’s own kernel are less of a concern because the sentry is primarily written in a memory-safe language, making control flow hijacking significantly more difficult. seccomp-mdwe need not run as PID 1 inside the sandbox; you can enable it exactly when untrusted code is about to run, and provide an option for users to decide if they want it enabled. As for why PR_SET_MDWE cannot be used inside gVisor, it is because gVisor tries to match the 4.4 kernel ABI, which predates this feature. Further thoughts: On Windows and macOS, the secure kernel or hypervisor typically enforces W^X and includes CFI-like techniques (for example, Control Flow Guard on Windows, or KTRR and PAC on macOS), making VM escapes significantly harder—though not strictly impossible. Hyper-V on Windows also helps mitigate processor-based vulnerabilities such as Spectre. By contrast, Linux does not currently have a reliable way to enforce W^X within the kernel itself, because doing so requires a hypervisor or another layer with privileges exceeding the Linux kernel, and no production implementation of that is publicly available. Consequently, a Linux VM escape remains a very pressing problem, and this is simply a tradeoff we have to accept for the time being. I would be quite surprised if an attacker, even a nation state one, managed to defeat both gvisor as well as hyperv and escaped to the host due to how strong their mitigations are though |
A few other bypasses definitely come to mind: In my organization we are using a kernel with CFI as W^X is impractical to at least make exploiting the host kernel from within the gvisor sandbox extremely hard but that’s a separate discussion for another day and another thread. Something like firecracker VMs on Linux that will run gvisor may be a more isolated approach but I’m not sure The nonewprivs bit must be set in docker security settings for the container (if not already set) and all capabilities except what are needed should be dropped as gvisor changes sandbox enforcement slightly based on capabilities granted to the container |
Also feel free to open an issue in my repo if desired. |
As for point 4, the oldest kernel that seccomp-mdwe supports is kernel 4.4 It has been tested from 4.4-mainline on x86_64 When it comes to gvisor not running in an environment that enforces W^X (as per the BSD issue you linked), I unfortunately do not have a clear answer |
Thanks a lot for the detailed replies, and sorry it took me a while to respond. Let me give you some context on where we are dev-wise. We're currently in the middle of decoupling our container image from our application, so that we can offer security updates in a more timely fashion (see #1006). This means a couple of things for So, what I'm trying to say is that we expect that the independent container updates feature will be the last in a long series of security updates, and then we want to switch focus on something higher up the stack. This means that integrating However, I plan to keep an eye on upcoming security incidents, and track if any of those would have been severely nerfed by |
I’m not entirely convinced that improving UX alone directly translates into stronger security. From my perspective, it feels more like marketing than a genuine mitigation strategy. Enforcing W^X is a well-documented technique with real-world impact. According to Microsoft’s own engineers, implementing W^X in the browser process prevented exploitation of about half the vulnerabilities in the JavaScript engine, while also allowing for additional security features such as CFI (Control Flow Integrity) and ACG (Arbitrary Code Guard). They also noted, that CFI is essentially meaningless if a process can generate and execute new code, as that capability directly subverts control-flow integrity. Multiple sources reinforce this viewpoint. Microsoft Edge’s “Enhanced Security” disables the V8 JIT compiler to reduce the attack surface and re-enable certain exploit mitigations in the renderer process. Windows Defender Application Control (WDAC) enforces Dynamic Code Security at the system level when virtualization-based security rollback protection is enabled. Hypervisor-Protected Code Integrity (HVCI) extends W^X constraints to the Windows kernel, thereby mitigating advanced kernel-level attacks. Meanwhile, Lockdown Mode on Apple devices adds W^X to a PAC-hardened Safari, making exploits significantly harder to craft. App Control and .NET Hardening further demonstrates how restricting dynamically generated code helps prevent injection and bypass techniques. As a business that depends heavily on this software, we want to give back to the software we use. However, if the timeline for integrating seccomp-mdwe is pushed back indefinitely, I cannot guarantee that I—or the business—will be able to help with its integration as previously promised. |
While we appreciate the interest and the offer for contribution, I do not think we can prioritize this type of mitigation strategy anytime soon. The whole purpose of the DZ threat model is knowing that exploits in office and media software will happen, and the most efficient mitigation is to keep the containers updated (which is why the team is focusing now on shipping independent container updates), well isolated and with as little surface for escapes as possible, as we are trying to do using gVisor. I fear that implementing application layer mitigations in this way would severely impact long-term maintainability of the software as a whole, for relatively little benefit. This view might change in the future, especially as your software become more mature, or more widely used, not requiring us to implement specific changes. |
I am the author of the seccomp-mdwe project and would like to see if it might be useful for Dangerzone. Currently, Dangerzone relies on gVisor for sandboxing, but gVisor by itself does not prevent an attacker from running new untrusted code, which could potentially enable a sandbox escape.
My library addresses this by installing a seccomp filter that completely disallows in-memory code generation. It has been built and tested with gVisor, and works on both x86_64 and arm64 architectures (though there is no 32-bit support).
In practical terms, even if an attacker exploits a vulnerability in LibreOffice or MuPDF, they cannot generate and execute new code in memory. Instead, they would be forced to rely on complex techniques such as constructing a ROP chain, significantly increasing the difficulty of a successful attack.
Please let me know if this is something you would be interested in integrating into the project.
The text was updated successfully, but these errors were encountered: