Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compiler-level changes #55

Open
beldmit opened this issue Aug 31, 2021 · 26 comments
Open

Compiler-level changes #55

beldmit opened this issue Aug 31, 2021 · 26 comments

Comments

@beldmit
Copy link

beldmit commented Aug 31, 2021

During the last meeting we discussed about possible compiler enchantments (e.g. explicit switching optimization off). Otherwise we have no chance to defend from side-channel attacks, that means we can't safely use the pure Rust crypto implementations.

I wonder if any from the Rust compiler team is involved in this group so we have chances to meet these requirements.

@tarcieri
Copy link
Contributor

I've talked at length with various people who work on both rustc and LLVM about this. In fact there was a (now postponed) RFC for it:

rust-lang/rfcs#2859

See also this RustFest Global talk about it.

My understanding is there has been some work on the LLVM side by Google to add support for this to at least a RISC-V backend, although AFAIK that work is still not public.

It seems like there isn't interest in adding support for this to rustc until the corresponding changes have landed on the LLVM side.

@richsalz
Copy link
Contributor

Do we want to use this issue for all compiler issues? I'd like auto-zeroization when dropping tagged variables.

@nikomatsakis
Copy link

I'm wondering how much this is a compiler- vs- language issue. It seems like as much the latter. I'd be interested in hashing out a bit more concretely what's desired, if there's low-hanging fruit it may be worth acting on.

@tarcieri
Copy link
Contributor

tarcieri commented Aug 31, 2021

Zeroization is another issue that has come up before.

While there are library-level solutions to zeroizing data on drop (zeroize, clear_on_drop), there are much trickier problems than that which actually need the compiler's help, like ensuring values that are copied on move are also zeroed. Moves that wind up copying data are something which isn't immediately visible from the library level, and thus can't be easily solved that way. Some alternatives there are using something like Pin to prevent data from being moved, but there are cases where moves are required.

Additionally zeroize-on-drop is suboptimal for things like transient secrets. During the execution of a cryptographic algorithm there can be many transient secrets, and zeroizing them on drop may be a completely pointless performance hit if these secrets live on the stack and are immediately overwritten by new stack frames.

An alternative in this case is waiting until a particular section of code has completely finished executing, and then zeroizing memory which was used for the stack up to some high watermark, an approach I've sometimes heard referred to as "stack bleaching". I've opened a rust-internals issue about this before:

https://internals.rust-lang.org/t/annotations-for-zeroing-the-stack-of-sensitive-functions-which-deal-in-transient-secrets/11588

@tarcieri
Copy link
Contributor

tarcieri commented Aug 31, 2021

I'm wondering how much this is a compiler- vs- language issue. It seems like as much the latter.

Yeah, particularly in regard to something like a "secret type", it's one of those things that touches every level, from the language itself (adding something like Secret<T>) to rustc to codegen backends like LLVM (and in that regard, LLVM's IR and all of the many passes which operate on it).

@romen
Copy link

romen commented Sep 8, 2021

I've talked at length with various people who work on both rustc and LLVM about this. In fact there was a (now postponed) RFC for it:

rust-lang/rfcs#2859

See also this RustFest Global talk about it.

My understanding is there has been some work on the LLVM side by Google to add support for this to at least a RISC-V backend, although AFAIK that work is still not public.

It seems like there isn't interest in adding support for this to rustc until the corresponding changes have landed on the LLVM side.

During the last call, on top of zeroization or language support for secret types to convey intent and let the compiler frontend help the developers, we discussed also related items, which seems are not covered in the current issue.

One important part of the conversation that is not captured above, and which @beldmit referred to, was about disabling compiler optimization for a function or a code block, in order to prevent the compiler from transforming code that the developer already wrote in a constant-time fashion into non constant-time machine code.

This is orthogonal to having language features such as Secret<T> to let the compiler front-end assist the developer in writing constant-time code, or zeroizing such variables automatically when they get out of scope: even if we had these nice features already, and the language ensured a function in its human readable form is constant time, after being processed by the front-end, the corresponding IR still goes through the rest of the compiler.

I'll take hand-crafted, specialized C code as an example: often, to achieve constant-timeness in C, we use bitwise logic to transform comparisons in bit masks to be applied to the results of operations that are run even if their results will be discarded, so that instruction flow does not depend on secret inputs.
Unfortunately compiler optimization layers have been catching up, and in some cases bitwise operations are turned back to comparisons and jumps by the compiler. The next step in this arms race between developers and compilers, is to sprinkle volatile and __asm__ all around, but this (on top of making things harder to maintain, easier to get wrong, and ugly w.r.t. portability) seems not to be too reliable anymore as a workaround, as compilers once again are catching up, and might optimize away even these tricks.
Concretely, here are a few relevant examples in C of the things that developers caring for constant-time implementations need to do include

  • OpenSSL constant_time.h (but similar things are also spread in the rest of the OpenSSL codebase, without using these centralized definitions, for one reason or another)
  • cmovznz, value_barrier, and callers in NSS ECC

It would be nice, instead of racing against each other, if the communities of cryptographic developers, compiler developers and language designer, could reach an understanding of each other needs and establish a foundation of basic tools to get the different pieces to cooperate instead of fighting. During the call, one such compromise that seemed reasonable for the 3 communities was some way to annotate sections in which all compiler optimizations are off, and what the dev writes in terms of bitwise operations on primitive types (and possibly __asm__ blocks) is translated directly in the corresponding machine code 1:1.

  • Are you aware of active discussions involving both LLVM and Rust maintainers to pursue such annotations/attributes?
    (I would be interested also in developments in this direction using #pragma or something similar made available by LLVM/GCC towards C/C++, but I don't want to derail the conversation here).
  • Another alternative that I think was mentioned in the call, and I am sure was mentioned in previous discussions around this topic, was to have (and expose through language-specific mechanisms) compiler intrinsics for the basic building blocks (e.g., from the previous examples list, having cmovznz and value_barrier—for the various primitive register types—exposed directly to the programmer without having to build them fighting against the compiler optimizer). Are you aware of developments in this direction?

@tarcieri
Copy link
Contributor

tarcieri commented Sep 8, 2021

I'm not aware of any work in that particular area however...

Another alternative that I think was mentioned in the call, and I am sure was mentioned in previous discussions around this topic, was to have (and expose through language-specific mechanisms) compiler intrinsics for the basic building blocks (e.g., from the previous examples list, having cmovznz and value_barrier—for the various primitive register types—exposed directly to the programmer without having to build them fighting against the compiler optimizer). Are you aware of developments in this direction?

Intrinsics for cmov-like instructions sound really interesting. I know this is a particularly tricky area with LLVM due to things like the x86-cmov-conversion pass. I'm not sure how difficult it would be to introduce a sort of cmov-with-optimization-barrier intrinsic, or how effectively an intrinsic like that could be implemented on non-x86 architectures, but it seems worth exploring.

@romen
Copy link

romen commented Sep 8, 2021

I managed to find some of my previous notes on this topic, from when I started looking into this, before being distracted by other issues.

I find the last quite disheartening, as it means yet another instance of "insecure-by-default", and also a case-by-case fight of arguments about CT vs performance, on compiler code that cannot distinguish between IR-blocks requiring CT and IR-blocks that do not, and wish for the most efficient transformation possible.

@tarcieri

Intrinsics for cmov-like instructions sound really interesting. I know this is a particularly tricky area with LLVM due to things like the x86-cmov-conversion pass. I'm not sure how difficult it would be to introduce a sort of cmov-with-optimization-barrier intrinsic, or how effectively an intrinsic like that could be implemented on non-x86 architectures, but it seems worth exploring.

Mentioning the x86 cmov in particular was useful in finding once more this blog by D. Sprenkels which was the one that started me down the Rust side of this matter!

@tarcieri
Copy link
Contributor

tarcieri commented Sep 8, 2021

this blog by D. Sprenkels

Yeah, that's a great blog post, and where I learned about that x86-cmov-conversion pass.

Unfortunately it seems to come to the conclusion that the best way around x86-cmov-conversion is to make the LLVM optimizer secret-type aware.

If there are alternatives, they might be worth exploring. But this is very much deep in the realm of the internals of the LLVM optimizer.

@romen
Copy link

romen commented Sep 24, 2021

I thought of updating the thread on this issue after the sync-up call today, they could both serve as items to jumpstart the survey of patterns/desirables/properties or as resources for anyone, with different backgrounds, to delve deeper into the problems, the guidelines and the recommendations when dealing with side-channels/constant-time issues/code.

I apologize as they are not really Rust centric, but I still think they can be a good starting point, rich with examples from different point of views, without being forced to parse through many academic papers.
They are not presented in any particular order.

@tarcieri
Copy link
Contributor

During today's call @nikomatsakis suggested it might be possible to disable the LLVM optimizer at the level of individual functions.

If that were possible, it could potentially be used to implement a cmov-like function/intrinsic which bypasses the x86-cmov-conversion pass.

Separate but related: some past discussion on llvm-dev about avoiding optimizations for the purpose of sidechannel resistance:

https://groups.google.com/g/llvm-dev/c/qwogIiwHCAc/m/HGsKD8QFBAAJ

@tarcieri
Copy link
Contributor

@chandlerc pointed me at this C++ example of a dataflow optimization barrier using an LLVM instruction which to my knowledge does not currently have a Rust equivalent.

It seems like the sort of (comparatively) simple solution being suggested earlier. Perhaps it wouldn't be too difficult to add as something like a nightly-only intrinsic.

It seems like an improvement over the current approach used by the subtle crate.

@spitters
Copy link

@tarcieri since you mentioned RISC-V. There's some discussion on adding constant time to the Spec.
riscvarchive/riscv-v-spec#417

@tarcieri
Copy link
Contributor

tarcieri commented Sep 27, 2021

I believe this is a working translation of the C++ code (please do not just copy and paste it into anything willy nilly, this is just for discussion): https://godbolt.org/z/ndqxe6qv4

However it's using the now deprecated nightly-only llvm_asm! macro.

@chandlerc
Copy link

Also worth noting what aspect of this is potentially unsafe -- it relies on only being used with types that fit into a physical register. For a value that doesn't fit into a register, a different sequence would be needed (you'd need to pass the address of that value into the inline asm, and use a different constraint). I'm not aware of a truly generic way to express this directly in IR, but it should be possible to lower generic MIR-style stuff into some equivalent LLVM-IR that has the desired effect of blocking dataflow optimizations.

@tarcieri
Copy link
Contributor

tarcieri commented Sep 27, 2021

@chandlerc yeah, I think the final deliverable is probably an unsafe function which could be used to reliably implement something like subtle::Choice::new (which takes a u8 argument)

Although perhaps it could have a safe API that looks like pub fn value_fence<T: Copy>(value: &T) -> T?

@tarcieri
Copy link
Contributor

tarcieri commented Sep 28, 2021

Here's an attempt at a reference-based value_fence function:

https://godbolt.org/z/371Y7Gx5x

The generated assembly isn't the nice mov/ret anymore, but hopefully it's actually safe?

Edit: here's a version without the volatile vlag, per @chandlerc's feedback below:

https://godbolt.org/z/nvMPYa7q7

@chandlerc
Copy link

Not sure you want the volatile flag, you can just clobber memory while passing the pointer through. For example in C++: https://compiler-explorer.com/z/bh9WzvTPq

@tarcieri
Copy link
Contributor

tarcieri commented Feb 5, 2022

Now that inline assembly is in beta, due for stabilization in Rust 1.59, I started playing around with a cmovnz wrapper and ran into something a little strange:

https://godbolt.org/z/vs83n31sa

The following function:

#[inline]
pub fn cmovnz(condition: u64, src: u64, dst: &mut u64) {
    let mut tmp = *dst;

    unsafe {
        asm!(
            "test {0}, {0}",
            "cmovnz {1}, {2}",
            in(reg) condition,
            inout(reg) tmp,
            in(reg) src
        );
    }

    *dst = tmp;
}

...was lowered to cmovne instead:

test    rdx, rdx
cmovne  rax, rcx

I was really hoping that inline assembly would be treated as a black box free of LLVM optimizations, but it seems that was not the case here.

@Amanieu @chandlerc any idea where this could be happen within LLVM, and should I still be worried about CMOV instructions emitted from inline assembly being rewritten as branches?

@Amanieu
Copy link

Amanieu commented Feb 5, 2022

cmovnz and cmovne are just different names for the same instruction. They assembly to the same bytes. LLVM is just arbitrarily picking one valid disassembly.

@tarcieri
Copy link
Contributor

tarcieri commented Feb 6, 2022

As an experiment, I made a crypto-intrinsics crate which uses inline assembly to emit CMOV, ADX, and MULX instructions:

RustCrypto/utils#730

Still a bit worried about things like flags LLVM isn't aware of being clobbered, but it seems like it should be possible to start using these instructions from stable Rust programs soon.

@Amanieu
Copy link

Amanieu commented Feb 6, 2022

You should not rely on flags being preserved outside of an asm! block. If you need to use flags then either consume the flags within the same asm! block or pass the flag as an integer using the setc instruction.

@tarcieri
Copy link
Contributor

tarcieri commented Feb 8, 2022

Seems we'll need a different strategy for constructing assembly for ADX/MULX chains than mere function calls, then, in order to ensure the flags aren't clobbered.

Perhaps macro_rules! that construct the asm! could work /cc @dignifiedquire

@dignifiedquire
Copy link

yeah, I was afraid the flags might not work out. I suspect using some simple macros to assemble a single inline assembly call is the way to go here

@tarcieri
Copy link
Contributor

I've released an initial v0.1 version of a cmov crate:

On x86/x86_64 targets, it uses inline assembly to emit cmovz/cmovnz instructions which are guaranteed to execute in constant time.

On other targets, it provides a "best effort" fallback to a bitwise implementation which is not guaranteed to execute in constant time.

Hopefully the latter can be addressed by adding more architecture-specific inline assembly, particularly for popular architectures like AArch64.

@tarcieri
Copy link
Contributor

tarcieri commented Mar 2, 2022

Thanks to @codahale, cmov now has aarch64 support.

I've also posted a release announcement here: https://users.rust-lang.org/t/ann-cmov-v0-1-constant-time-conditional-move-intrinsics-for-x86-x86-64-aarch64/72473

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants