RFC: No (opsem) Magic Boxes #3712

chorman0773 · 2024-10-15T20:33:39Z

Summary

Currently, the operational semantics of the type alloc::boxed::Box<T> is in dispute, but the compiler adds llvm noalias to it. To support it, the current operational semantics models have the type use a special form of the Unique (Stacked Borrows) or Active (Tree Borrows) tag, which has aliasing implications, validity implications, and also presents some unique complications in the model and in improvements to the type (e.g. Custom Allocators). We propose that, for the purposes of the runtime semantics of Rust, Box is treated as no more special than a user-defined smart pointer you can write today¹. In particular, it is given similar behaviour on a typed copy to a raw pointer.

Rendered

We maintain some trivial validity invariants (such as alignment and address space limits) that a user cannot define, but these invariants only depend upon the value of the Box itself, rather than on memory. ↩

text/3712-box-yesalias.md

Clarify the constraint o the invariant in footnote Co-authored-by: Jacob Lifshay <[email protected]>

clarfonthey · 2024-10-15T21:16:47Z

It feels odd that one of the clear options is left out: why not expose a Unique<T> that has the same semantics as Box<T>, so any smart pointer type can use it?

Like, I definitely agree that Box shouldn't have special semantics that you can't reproduce elsewhere. But among the options, it feels pretty limiting to come to the conclusion that we should eliminate those semantics, rather than just making them reproducible elsewhere.

I agree that whatever happens shouldn't be specific to the Global allocator, though.

scottmcm · 2024-10-15T22:07:10Z

We maintain some trivial validity invariants (such as alignment and address space limits) that a user cannot define

I, at least, fully expect us to eventually have some way of writing alignment-obeying raw pointers in Rust in some way. If nothing else, slice::Iter is less good than it could be because of the lack of them, and optimizing that type is super-important.

(Transmuting between NonNull<T> and unsafe<'a> &'a T is one thing that might work, for example, though it's also possible that the opsem for that will end up saying that it doesn't and that something else is required instead.)

EDIT: added a word to try to communicate that I wasn't expecting this RFC to include such a type.

chorman0773 · 2024-10-15T22:17:20Z

I, at least, fully expect us to have some way of writing alignment-obeying raw pointers in Rust in some way. If nothing else, slice::Iter is less good than it could be because of the lack of them, and optimizing that type is super-important.

That's my hope for the future as well, but to avoid the RFC becoming too cluttered, I am refraining from defining such a type in this RFC.

juntyr · 2024-10-16T05:00:03Z

Is there a list of optimisations that depend on noalias being emitted for Box’es?

clarfonthey · 2024-10-16T05:21:20Z

The RFC seems pretty clear that noalias hasn't really provided many benefits compared to being an extra burden to uphold for implementers, but maybe it is worth seeing if there are any sources that can provide a bit more detail on that.

kennytm · 2024-10-16T05:40:25Z

text/3712-box-yesalias.md

+* A pointer with an address that is not well-aligned for `T` (or in the case of a DST, the `align_of_val_raw` of the value), or
+* A pointer with an address that offsetting that address (as though by `.wrapping_byte_offset`) by `size_of_val_raw` bytes would wrap arround the address space 
+
+The [`alloc::boxed::Box<T>`] type shall be laid out as though a `repr(transparent)` struct containing a field of type `WellFormed<T>`. The behaviour of doing a typed copy as type [`alloc::boxed::Box<T>`] shall be the same as though a typed copy of the struct `#[repr(transparent)] struct Box<T>(WellFormed<T>);`.


sorry but is the term "typed copy" explained somewhere?

(the explanations I could find are from pretty unofficial places like a reddit¹ and urlo² post)

Footnotes

"they are like memcpy, but the copy occurs with a type, giving the compiler some extra power." ↩

"a typed copy consists of essentially decoding the AM-bytes into the abstract value then encoding that abstract value back to AM-bytes at the new location." ↩

The urlo definition is probably good.
It's defined in the opsem, but I don't know if we have a very good written record of that other than spread arround zulip threads and github issues.

scottmcm

I'm a fan of this. I think that people moving from Vec<T> to Box<[T]> having to deal with drastically-different soundness rules is a giant footgun, and getting rid of the special [ST]B behaviour here sounds good to me.

nikomatsakis · 2024-10-16T17:13:00Z

My general take:

The two "endpoints" here are

Efficient end-user abstractions: this allows most safe code to run faster. This would have strong alias requirements and would not expose raw/unsafe details. This permits non-obvious optimizations (e.g., small string optimization or 0-length capacity).
Building blocks for unsafe code: this exposes raw/unsafe details.

From what I can tell, we current orient Box as the former but Vec and String as the latter. That seems backwards, since if anything those are far more useful as abstractions than Box is.

If I could go back in time, I think I would favor end-user abstractions and offer different types (e.g., RawVec or RawBuffer or something) that exposed their innards, but I think that ship has sailed, and we might as well embrace the current situation (which is nice in some ways too).

nikomatsakis · 2024-10-16T17:13:40Z

The RFC would benefit from some attempt to quantify the impact on performance, though our lack of standardized runtime benchmarks makes that hard.

traviscross · 2024-10-16T18:32:51Z

@rust-lang/opsem: We were curious in our discussions, does this RFC represent an existing T-opsem consensus?

chorman0773 · 2024-10-16T19:20:54Z

We were curious in our discussions, does this RFC represent an existing T-opsem consensus?

It does not represent any FCP done by T-opsem, which is why I've included them here. The claims I make, including those about the impact on the operation semantics, are included in the request for comment and consensus.

The RFC would benefit from some attempt to quantify the impact on performance, though our lack of standardized runtime benchmarks makes that hard.

I recall some perf PR's (using the default rustc-perf suite) being done to determine the impact, which showed negligible impact. I can probably pull them up at some point in the RFC's lifecycle.

[perf experiment] Don't emit noalias for box when compiling rustc itself rust#99527 is such PR, and showed only a secondary regression of 1.9% in Cycle Count, but an improvement in Max RSS of 4.0% on average.

RalfJung · 2024-10-17T06:49:26Z

text/3712-box-yesalias.md

+
+(Note that we do not define this type in the public standard library interface, though an implementation of the standard library could define the type locally)
+
+The following are not valid values of type `WellFormed<T>`, and a typed copy that would produce such a value is undefined behaviour:


The Reference has been adjusted a while ago to state validity invariants positively, i..e by listing what must be true, instead of listing what must not be false. IMO that's more understandable, and the RFC should be updated to also do that.

RalfJung · 2024-10-17T06:59:35Z

I agree that whatever happens shouldn't be specific to the Global allocator, though.

There are patterns of using a custom per-Box allocator that are incompatible with the aliasing requirements, at least under our current aliasing models. See rust-lang/miri#3341 for an example. So if we always make Box be unique, we have to declare those allocators to be UB.

Is there a list of optimisations that depend on noalias being emitted for Box’es?

It's "every LLVM optimization that looks at alias information". The question is how much that matters in practice, which is hard to evaluate.

We were curious in our discussions, does this RFC represent an existing T-opsem consensus?

As Connor said, not in any formal sense. Several opsem members have expressed repeatedly that they want to see noalias on Box go, but I don't know whether we have team consensus on this.

My own position is that I love how this simplifies the model and Miri, I am just slightly concerned about this being an irreversible loss of optimization potential that we might regret later. Absence of evidence of optimization benefit is not evidence of absence. Our benchmarks likely just don't have a lot of functions that take Box<T> by value. However, that in itself is an indication that the optimization benefit is likely limited.

Is there a way we can query the ecosystem for functions taking Box<T> by value?

RalfJung · 2024-10-17T07:00:49Z

text/3712-box-yesalias.md

+    - While the easiest alternative is to do nothing and maintain the status quo, as mentioned this has suprisingly implications both for the operational semantics of Rust
+- Alternative 2: Introduce a new type `AlisableBox<T>` which has the same interface as `Box<T>` but lacks the opsem implications that `Box<T>` has.
+    - This also does not help remove the impact on the opsem model that the current `Box<T>` has, though provides an ergonomically equivalent option for `unsafe` code.
+- Alternative 3: We maintain the behaviour only for the unparameterized `Box<T>` type using the `Global` allocator, and remove it for `Box<T,A>` (for any allocator other than `A`), allowing unsafe code to use `Box<T, CustomAllocator>`


This is actually the status quo, since rust-lang/rust#122018

clarfonthey · 2024-10-17T18:49:02Z

Just to follow up on some of the discussion, it wasn't immediately clear to me that types similar to Box, like Vec and Arc, genuinely don't have these semantics, even though it is implied by the fact that Box is unique in this regard. I think this is worth emphasising more in the text of the RFC, since in a very real sense, we've been going without these semantics for most^{[citation needed]} parts of Rust totally fine, which further emphasises less will be lost by removing it for Box.

I would still love if there's more data showing the lack of returns on noalias optimisations, since it feels wrong that something with a lot of history and usage isn't helping that much, but the fact that Vec and other types don't have this optimisation at least helps us understand that it's not going to cause any performance issues if removed.

…C++ counterpointer `std::unique_ptr`, in the prior art section

chorman0773 · 2024-10-17T19:22:15Z

I am just slightly concerned about this being an irreversible loss of optimization potential that we might regret later

FTR, speaking right now as one of the main developers of lccc, my opinion is that the best way to mitigate any loss of future optimization potential is to just be far more granular with &mut. I don't think spamming extra noalias on parameters is necessary if we just emit more metadata on derefs (llvm's scoped noalias and dereferenceable, and lccc's unique and dereferenceable type attributes). If there are problems with doing that, I don't think this is a place where we should necessarily bend the whole language to this one attribute on one backend, especially given the complexity of maintaining the feature as-is, where we keep running into fundamental issues with the special treatment of Box. If there are legitimate optimizations lost as a result of the function level noalias, that can't be made up with more granular scoping on dereferences, that may be a different consideration, but my view is that the language after this change has the semantics necessary to justify at the very least a majority of the optimizations that may ultimately be lost, and its up to rustc and llvm (and other compilers) to make use of those semantics if they wish to.

Just to follow up on some of the discussion, it wasn't immediately clear to me that types similar to Box, like Vec and Arc, genuinely don't have these semantics, even though it is implied by the fact that Box is unique in this regard. I think this is worth emphasising more in the text of the RFC, since in a very real sense, we've been going without these semantics for most parts of Rust totally fine, which further emphasises less will be lost by removing it for Box.

I mentioned Vec<T> in particular, and also mentioned std::unique_ptr<T> from C++, which is the most closely equivalent type, and lacks the same semantic implications (and also optimizations).

scottmcm · 2024-10-18T06:21:19Z

Just to follow up on some of the discussion, it wasn't immediately clear to me that types similar to Box, like Vec and Arc, genuinely don't have these semantics

And more than them just not having them, IIRC someone tried to implement Vec<T> as the obvious wrapper around Box<[MaybeUninit<T>]>, and in the process found out that lots of people are depending on Vec not having them.

RalfJung · 2024-10-18T08:04:35Z

would still love if there's more data showing the lack of returns on noalias optimisations, since it feels wrong that something with a lot of history and usage isn't helping that much

It helps for references. I suspect people added it for Box because "why not".

GoldsteinE · 2024-11-30T13:12:40Z

There’s always an option of having a Box<_> be noalias but not dereferenceable_on_entry. That both doesn’t lose the optimizations in the @traviscross’s example and fixes the ManuallyDrop<_> issue. I am guessing (but I don’t have a proof) that actually dereferencing the Box<_> inside of a function body would lead LLVM to infer dereferencability anyway. (In any case, dereferenceable_on_entry doesn’t even exist, so we’re only losing potential optimizations if it’s ever added)

RalfJung · 2024-11-30T13:32:24Z

There’s always an option of having a Box<_> be noalias but not dereferenceable_on_entry.

That would have to be a weaker noalias than SB/TB, since their noalias requires dereferenceable_on_entry. SB doesn't support such a weaker noalias at all, would TB support it by treating Box<T> almost like a ZST reference.

chorman0773 · 2024-11-30T16:28:05Z

I am not aware of a single real-world usage of Vec that actually breaks the noalias rules.

FTR, I don't like this argument - whether or not its true, it has no bearing on what is undefined behaviour in Rust. None of the proposed aliasing models for Rust that I've seen are "Exactly noalias, no more, no less". Whenever rustc emits llvm noalias, that must be justified by undefined behaviour at the rust level.

Part of the point of this RFC is removing special cases in the memory model, so IMO it's completely against the proposal to add even more special-cased rules to SB or TB to handle something closer to what noalias does specifically for Box<T> and Vec<T>.

RalfJung · 2024-11-30T17:28:41Z

Rust doesn't yet have an aliasing model, only several WIP proposals. If there's some good benefit from having noalias on Vec arguments, it would be reasonable to say that the proposals should be extended to be able to support that.

traviscross · 2024-12-01T01:46:45Z

text/3712-box-yesalias.md

+# Reference-level explanation
+[reference-level-explanation]: #reference-level-explanation
+
+For the remainder of this section, let `WellFormed<T>` designate a type for *exposition purposes only* defined as follows:
+```rust
+#[repr(transparent)]
+struct WellFormed<T: ?Sized>(core::ptr::NonNull<T>);
+```


Are there any differences semantically between what is proposed for Box<_> in this RFC and what would be true of MaybeDangling<Box<_>> today?

That is, since we already accepted RFC 3336, if there is no daylight between these, then it should say that normatively, and it perhaps could even lean into that for defining the semantics.

cc @RalfJung

We haven't spelled out whether MaybeDangling<Box<_>> would allow pointers too close to the end of the address space, but it seems reasonable to say "no" to that. In that case the answer to your question is yes, this RFC proposes to weaken Box so that its validity requirements are equivalent to MaybeDangling<Box<_>>.

Are there any differences semantically between what is proposed for Box<_> in this RFC and what would be true of MaybeDangling<Box<_>> today?

Restating Ralf's answer to this, MaybeDangling only removes aliasing invariants, but preserves the normal validity invariants of the type. This RFC proposes to remove the aliasing invariants from Box<_> as a whole, so it would naturally leave it in an identical state to MaybeDangling<Box<_>>.

However, this also spells out that validity invariant in full, which we cannot rely on existing types yet to do.

traviscross · 2024-12-01T01:56:44Z

text/3712-box-yesalias.md

+In the case of allocators[^3], without special handling of them in the language as well, the protectors assigned to `Box<T>` were violated by (almost) any non-trivial allocator that provides the storage itself (without delegating to something like `malloc` or `alloc::alloc::alloc`). This is because the allocators access the same memory that the `Box` stores to mark it as deallocated and available again. In an extreme example, the same memory could even be yielded back to another `Allocator::allocate` call. Solving this requires special casing `Allocator`s, which is a heavily unresolved discussion, only applying the special opsem behaviour to `Global`, which is opaque via the global allocator functions, or forgoing custom allocators for `Box` entirely (thus depriving anyone needing to use a custom allocator from the user-visible language features `Box` alone provides). With the exception of the former, which is desired for other optimization reasons though [heavily debated and not resolved](https://github.com/rust-lang/unsafe-code-guidelines/issues/442), these solutions are merely solving the symptom, not the problem.  
+
+Any `unsafe` code that may want to temporarily maintain aliased `Box<T>`s for various reasons (such as low-level copy operations), or may want to use something like `ManuallyDrop<Box<T>>`, is put into an interesting position: While they can use a user-defined smart pointer, this requires both care on the part of the smart pointer implementor, but also affects the ergonomics and expressiveness of that code, as `Box<T>` has many special language features that surface at the syntax level, which cannot be replicated today by a user-defined type.


The problems described in these motivation items are also solved by MaybeDangling, no?

Either way, the motivation should be adjusted to describe this. That is, it's confusing for the motivation to be written as though we did not already cover this ground and accept RFC 3336. If that RFC did solve the problem, but the idea is that the present proposal solves it better for Box<_> somehow, then that should be described here. Or alternatively, if there's some way in which RFC 3336 did not solve the problem, then that should be detailed specifically.

Only the last one. unless we say that in order to use a custom allocator with Box, you have always have to wrap the box in MaybeDangling (in which case, I'm not sure how to create one in the first instance).

RalfJung · 2024-12-01T13:00:20Z

text/3712-box-yesalias.md

+
+In the case of `ManuallyDrop<T>`, because `Box<T>` asserts aliasing validity on a typed copy, and is invalidated on drop, it introduces unique behaviour - `ManuallyDrop<Box<T>>` *cannot* be moved after calling `ManuallyDrop::drop` *even* to trusted code known not to access or double-drop the `Box`. No other type in the language or library has the same behaviour[^2], as primitive references do not have any behaviour on drop (let alone behaviour that includes invalidating themselves), and only `Box<T>`, references, and aggregates containing references are retagged on move. There are proposed solutions on the `ManuallyDrop<T>` type, such as treating specially by supressing retags on move, but this is a novel idea (as `ManuallyDrop<T>` asserts non-aliasing validity invariants on move), and it would interfere with retags of references without justification. The proposed complexity is only required because of `Box<T>`.
+
+In the case of allocators[^3], without special handling of them in the language as well, the protectors assigned to `Box<T>` were violated by (almost) any non-trivial allocator that provides the storage itself (without delegating to something like `malloc` or `alloc::alloc::alloc`). This is because the allocators access the same memory that the `Box` stores to mark it as deallocated and available again. In an extreme example, the same memory could even be yielded back to another `Allocator::allocate` call. Solving this requires special casing `Allocator`s, which is a heavily unresolved discussion, only applying the special opsem behaviour to `Global`, which is opaque via the global allocator functions, or forgoing custom allocators for `Box` entirely (thus depriving anyone needing to use a custom allocator from the user-visible language features `Box` alone provides). With the exception of the former, which is desired for other optimization reasons though [heavily debated and not resolved](https://github.com/rust-lang/unsafe-code-guidelines/issues/442), these solutions are merely solving the symptom, not the problem.  


This doesn't properly explain why "(almost) any non-trivial allocator" violates noalias.

There is a class of non-trivial custom allocators that violates Stacked Borrows, specifically if it accesses "metadata memory" that is stored outside the region returned by the allocator, using the pointer that was passed in to deallocate. If that's what you mean, it should be stated explicitly. Is that really "almost any non-trivial allocator"? That seems like a strong claim. It took a while for Miri to run into this issue.

With Tree Borrows, at least some of these cases are not UB any more, since Tree Borrows supports the &Header pattern.

It took a while for Miri to run into this issue.

Most people likely don't use custom allocators with miri.

RalfJung · 2025-01-28T02:10:26Z

Thus I think it's better that we not have a big footgun of code that's sound with Vec<T> being unsound with Box<[T]>

Is this footgun actually big, though? It gave me a lot of pause about doing this when @RalfJung #3712 (comment):

I am not aware of a single real-world usage of Vec that actually breaks the noalias rules.

Note that this is talking about applying noalias rules to Vec, which are not the same rules that Miri applies to Box today! Applying the rules of today's Box to Vec would definitely break a lot of code.

So if we want to avoid surprising inconsistencies between Vec and Box, we have to weaken the requirements around Box. Unless we also weaken the requirements for references, this means we end up with two rather different sets of aliasing requirements for different types mixed up in our memory model. That's a lot of extra complexity, and the gains from that are at this stage largely hypothetical.

RalfJung · 2025-01-28T12:32:48Z

To explain a bit more what I mean: there's a huge set of possible aliasing requirements we could attach to various types. For instance:

Stacked Borrows
Tree Borrows
something like LLVM noalias (see Should &mut-derived pointers be permanently "separate" from their siblings? unsafe-code-guidelines#450 for some discussion of how this differs from the first two)
no requirements at all

One can't mix and match arbitrary combinations of such models in the same language, but Tree Borrows and something noalias-like could almost certainly co-exist.

The first two are definitely not an option for Vec, since too much existing code would have UB. If we furthermore want to ensure that Box<[T]> and Vec<T> have the same fundamental alias requirements, we thus have the following options:

1: use LLVM noalias-like semantics for both Box and Vec
- 1a: keep Stacked/Tree Borrows for references
- 1b: use LLVM noalias-like semantics also for references
2: have no aliasing requirements for both Box and Vec

I'm not a fan of 1b, it allows code that IMO we want to forbid. 1a has the downside that we have two non-trivial alias requirements mixed into our model, which is a cost that should be justified -- and currently we are lacking the data to provide solid justification here. That said, it is entirely possible that in 3 years time, examples justifying this will shows up... it is really hard to get a convincing negative result in this space.

chorman0773 · 2025-01-28T14:05:50Z

I still maintain that much of the benefits we'll get will arise from more granular handling of &mut T and &T in codegen, rather than further special-casing the memory model.
Though I will concede that because Box in particular has a type-primitive Deref operation, the increased granularity won't always be available.

carbotaniuman · 2025-01-31T22:22:30Z

The combination of aliasing models is not as simple as you present, as currently references have at least noalias guarantees (the exact requirements are of course undecided). Box currently has the same aliasing requirements as references, while Vec has none.

I would like to say that unlike Box, which states that A pointer type that uniquely owns a heap allocation of type T, Vec has no such wording or disclaimer on it, and I think it would be a disservice to unsafe code writers to equate it with Box or force upon them new aliasing requirements.

I am broadly in favor of unifying Vec and Box's aliasing requirements, and while the two decisions are undoubtly related - were Vec to gain any aliasing requirements would surely RFC approval, and as such it may be that we will ultimately have 3 different aliasing requirements for references, Box, and Vec.

tmandry · 2025-02-01T01:43:07Z

[perf experiment] Don't emit noalias for box when compiling rustc itself rust#99527 is such PR, and showed only a secondary regression of 1.9% in Cycle Count, but an improvement in Max RSS of 4.0% on average.

Interesting. I would like to repeat this analysis, especially now that we do have a (small) runtime test suite.

Subject to those results, I'm in favor of this change. There are measurable costs to this annotation, like unicode-org/icu4x#2095 (comment) and preventing use with C interop, in addition to the cognitive overhead. If, after reasonably trying, we can't come up with evidence of a real benefit, we shouldn't keep carrying the cost indefinitely just because that evidence could arise at some point in the future.

I do think some of the comments on the RFC thread need to be addressed though.

RalfJung · 2025-02-01T07:46:01Z

The combination of aliasing models is not as simple as you present, as currently references have at least noalias guarantees (the exact requirements are of course undecided). Box currently has the same aliasing requirements as references, while Vec has none.

I don't know what you mean by this. We gave some thought to a weaker form of aliasing, more noalias-style, within Tree Borrows, and so far we have no reason to believe that there would be any problem with this combination.

I would like to say that unlike Box, which states that A pointer type that uniquely owns a heap allocation of type T,

It hasn't said that for very long though, see https://doc.rust-lang.org/1.65.0/std/boxed/struct.Box.html. If we can bugfix the Box docs that way, surely we can do the same with Vec. After all, generally code is only permitted to do things with a library type that are explicitly allowed by the docs.

Vec has no such wording or disclaimer on it, and I think it would be a disservice to unsafe code writers to equate it with Box or force upon them new aliasing requirements.

Given that we don't even know of an example of real-world code that uses Vec in a way that breaks the noalias rules (as has been stated repeatedly above), I think this is a fairly weak argument. I agree there is some risk here, but so far the evidence does not indicate this risk to be very high.

traviscross · 2025-02-01T11:09:53Z

There are measurable costs to this annotation, like unicode-org/icu4x#2095 (comment)...

It's perhaps worth noting that this particular case is accepted under Tree Borrows:

// MIRIFLAGS="-Zmiri-tree-borrows" cargo miri run
fn main() {
    let b = Box::new(0);
    let p = &raw const *b;
    let _b = b;
    _ = unsafe { *p };
}

Also, we separately accepted a fix for this in RFC 3336 ("MaybeDangling"):

// cargo miri run
fn main() {
    let b = MaybeDangling::new(Box::new(42));
    let p = &raw const *b;
    let _b = b;
    _ = unsafe { **p };
}

Playground link

tmandry · 2025-02-03T19:48:09Z

It's perhaps worth noting that this particular case is accepted under Tree Borrows:

That is good to know. Does it not violate noalias? Or perhaps we only apply that attribute on function arguments and not values? (Is there anywhere other than the compiler source where I can find this information?)

I think the answer should be "if it doesn't violate Tree Borrows it definitely doesn't violate any backend-specific annotations we place during codegen", but I also want to understand why that's true.

Also, we separately accepted a fix for this in RFC 3336 ("MaybeDangling"):

Just an observation: Despite reviewing that RFC, it took me quite some time to remember why it makes any sense that MaybeDangling would implement Deref. Maybe it's just a naming issue but it feels subtle, even for people who are comfortable writing unsafe code. The RFC does leave open questions for both the name and whether it should implement Deref safely.

(The answer, for those following along, is that MaybeDangling only removes the validity/"at rest" requirement that a pointer it wraps is dereferenceable and noalias. So it allows unsafe code to modify the pointer such that it does not meet those requirements –but that code must ensure that by the time the Deref is invoked, the reference created does not violate its invariants.)

RalfJung · 2025-02-03T19:52:51Z

Does it not violate noalias?

No.

we only apply that attribute on function arguments

Indeed, noalias as an attribute only exists for function arguments. LLVM has other kinds of alias annotations that can also be used within functions; their semantics are much less understood though and Rust does not use them.

I think the answer should be "if it doesn't violate Tree Borrows it definitely doesn't violate any backend-specific annotations we place during codegen", but I also want to understand why that's true.

Mostly it's true because Tree Borrows was carefully designed to satisfy this property, making some reasonable guesses for what these noalias annotations actually mean.

theemathas · 2025-02-04T16:54:59Z

Of note, String is internally a Vec<u8>. However, CString is internally a Box<[u8]>.

chorman0773 · 2025-02-04T17:17:56Z

That frankly seems like an incredible footgun, given that the latter is intended to be turned into a pointer.

traviscross · 2025-02-04T17:40:43Z

It'd be interesting to see whether it is an actual hazard or not in practice, and if so, what those patterns are. We could of course, if needed, backward-compatibly represent it internally as MaybeDangling<Box<[u8]>>.

carbotaniuman · 2025-02-05T05:54:05Z

If we can bugfix the Box docs that way, surely we can do the same with Vec.

Yes, but this big fix was widely understood as making the current status of the docs reflect reality, and not as an introduction of noalias semantics that Box did not previously have.

After all, generally code is only permitted to do things with a library type that are explicitly allowed by the docs.

This statement does not seem to jive with the fact that Vec has well documented as_mut_ptr() methods that do not materialize references.

The main point of my comment is that making Vec noalias is not a "simple bugfix" but rather a deliberate violation of backwards compatibility that must be justified and supported through the RFC progress.

In simpler words, the status quo is Box has noalias and Vec does not - noalias can not be arbitrary added to Vec just as it cannot be arbitrarily removed from Box.

RalfJung · 2025-02-05T07:38:23Z

This statement does not seem to jive with the fact that Vec has well documented as_mut_ptr() methods that do not materialize references.

I don't see what that has to do with the current discussion. Even if we add noalias, code using the as_mut_ptr in the way covered by the docs will keep being sound. If anything, the fact that we have explicit methods to express the intent of working with a vector "raw" (also includes into_raw_parts) makes it more clear that we do not make any promises beyond that.

safinaskar · 2025-02-05T23:24:07Z

I think we should add noalias to both Box and Vec (at least in case of global allocator), because this matters for GPU and high-performance people (see #3712 (comment) for details). If we don't do this, GPU people will simply invent their own Rust-like language. (And Austral example https://austral-lang.org/ shows us that, in fact, this is very easy to create small language with Rust-like borrow checker.)

saethlin · 2025-02-05T23:32:14Z

(see #3712 (comment) for details). If we don't do this, GPU people will simply invent their own Rust-like language

This is a significant leap to make from that comment, which acknowledges that its benchmarks are based on slices not Box. The point has been made repeatedly in this RFC that if users want to recover noalias they can add a reborrow to get a &mut.

If people are very confident they can make a better language for a specific domain, that sounds great. Rust cannot be the best language for every application, and I would be quite worried if programming language innovation halted with Rust.

safinaskar · 2025-02-06T02:13:18Z

cc rust-lang/unsafe-code-guidelines#326

box_yesalias

43eb657

chorman0773 added T-lang Relevant to the language team, which will review and decide on the RFC. T-opsem Relevant to the operational semantics team, which will review and decide on the RFC. labels Oct 15, 2024

Add RFC Number

266f159

This comment was marked as resolved.

Sign in to view

programmerjake reviewed Oct 15, 2024

View reviewed changes

text/3712-box-yesalias.md Outdated Show resolved Hide resolved

Update text/3712-box-yesalias.md

f5bcb71

Clarify the constraint o the invariant in footnote Co-authored-by: Jacob Lifshay <[email protected]>

scottmcm added the I-lang-nominated Indicates that an issue has been nominated for prioritizing at the next lang team meeting. label Oct 15, 2024

scottmcm self-assigned this Oct 15, 2024

Address making Unique<T> public in Alternatives section

3dcbc86

kennytm reviewed Oct 16, 2024

View reviewed changes

scottmcm removed the I-lang-nominated Indicates that an issue has been nominated for prioritizing at the next lang team meeting. label Oct 16, 2024

scottmcm approved these changes Oct 16, 2024

View reviewed changes

RalfJung reviewed Oct 17, 2024

View reviewed changes

Mention lack of noalias for other stdlib collections, as well as the …

f3ad832

…C++ counterpointer `std::unique_ptr`, in the prior art section

traviscross reviewed Dec 1, 2024

View reviewed changes

RalfJung reviewed Dec 1, 2024

View reviewed changes

bjorn3 mentioned this pull request Dec 11, 2024

Potential undefined behavior from cyclic references between bz_stream and EState/DState. trifectatechfoundation/bzip2-rs#94

Open

traviscross added I-lang-radar Items that are on lang's radar and will need eventual work or consideration. and removed I-lang-nominated Indicates that an issue has been nominated for prioritizing at the next lang team meeting. labels Jan 26, 2025

safinaskar mentioned this pull request Feb 6, 2025

What are the uniqueness guarantees of Box and Vec? rust-lang/unsafe-code-guidelines#326

Open

ojeda mentioned this pull request Feb 6, 2025

Rust wanted features Rust-for-Linux/linux#354

Open

40 tasks

hanna-kruppe mentioned this pull request Feb 9, 2025

CString being noalias is a footgun rust-lang/rust#136770

Open


		(Note that we do not define this type in the public standard library interface, though an implementation of the standard library could define the type locally)

		The following are not valid values of type `WellFormed<T>`, and a typed copy that would produce such a value is undefined behaviour:

		In the case of allocators[^3], without special handling of them in the language as well, the protectors assigned to `Box<T>` were violated by (almost) any non-trivial allocator that provides the storage itself (without delegating to something like `malloc` or `alloc::alloc::alloc`). This is because the allocators access the same memory that the `Box` stores to mark it as deallocated and available again. In an extreme example, the same memory could even be yielded back to another `Allocator::allocate` call. Solving this requires special casing `Allocator`s, which is a heavily unresolved discussion, only applying the special opsem behaviour to `Global`, which is opaque via the global allocator functions, or forgoing custom allocators for `Box` entirely (thus depriving anyone needing to use a custom allocator from the user-visible language features `Box` alone provides). With the exception of the former, which is desired for other optimization reasons though [heavily debated and not resolved](https://github.com/rust-lang/unsafe-code-guidelines/issues/442), these solutions are merely solving the symptom, not the problem.

		Any `unsafe` code that may want to temporarily maintain aliased `Box<T>`s for various reasons (such as low-level copy operations), or may want to use something like `ManuallyDrop<Box<T>>`, is put into an interesting position: While they can use a user-defined smart pointer, this requires both care on the part of the smart pointer implementor, but also affects the ergonomics and expressiveness of that code, as `Box<T>` has many special language features that surface at the syntax level, which cannot be replicated today by a user-defined type.


		In the case of `ManuallyDrop<T>`, because `Box<T>` asserts aliasing validity on a typed copy, and is invalidated on drop, it introduces unique behaviour - `ManuallyDrop<Box<T>>` cannot be moved after calling `ManuallyDrop::drop` even to trusted code known not to access or double-drop the `Box`. No other type in the language or library has the same behaviour[^2], as primitive references do not have any behaviour on drop (let alone behaviour that includes invalidating themselves), and only `Box<T>`, references, and aggregates containing references are retagged on move. There are proposed solutions on the `ManuallyDrop<T>` type, such as treating specially by supressing retags on move, but this is a novel idea (as `ManuallyDrop<T>` asserts non-aliasing validity invariants on move), and it would interfere with retags of references without justification. The proposed complexity is only required because of `Box<T>`.

		In the case of allocators[^3], without special handling of them in the language as well, the protectors assigned to `Box<T>` were violated by (almost) any non-trivial allocator that provides the storage itself (without delegating to something like `malloc` or `alloc::alloc::alloc`). This is because the allocators access the same memory that the `Box` stores to mark it as deallocated and available again. In an extreme example, the same memory could even be yielded back to another `Allocator::allocate` call. Solving this requires special casing `Allocator`s, which is a heavily unresolved discussion, only applying the special opsem behaviour to `Global`, which is opaque via the global allocator functions, or forgoing custom allocators for `Box` entirely (thus depriving anyone needing to use a custom allocator from the user-visible language features `Box` alone provides). With the exception of the former, which is desired for other optimization reasons though [heavily debated and not resolved](https://github.com/rust-lang/unsafe-code-guidelines/issues/442), these solutions are merely solving the symptom, not the problem.

RFC: No (opsem) Magic Boxes #3712

Are you sure you want to change the base?

RFC: No (opsem) Magic Boxes #3712

Conversation

chorman0773 commented Oct 15, 2024 • edited Loading

Summary

Footnotes

This comment was marked as resolved.

clarfonthey commented Oct 15, 2024

scottmcm commented Oct 15, 2024 • edited Loading

chorman0773 commented Oct 15, 2024

juntyr commented Oct 16, 2024

clarfonthey commented Oct 16, 2024

kennytm Oct 16, 2024 • edited Loading

Choose a reason for hiding this comment

Footnotes

Choose a reason for hiding this comment

scottmcm left a comment

Choose a reason for hiding this comment

nikomatsakis commented Oct 16, 2024

nikomatsakis commented Oct 16, 2024

traviscross commented Oct 16, 2024

chorman0773 commented Oct 16, 2024 • edited Loading

Choose a reason for hiding this comment

RalfJung commented Oct 17, 2024 • edited Loading

Choose a reason for hiding this comment

clarfonthey commented Oct 17, 2024 • edited Loading

chorman0773 commented Oct 17, 2024 • edited Loading

scottmcm commented Oct 18, 2024

RalfJung commented Oct 18, 2024 via email

GoldsteinE commented Nov 30, 2024 • edited Loading

RalfJung commented Nov 30, 2024 • edited Loading

chorman0773 commented Nov 30, 2024

RalfJung commented Nov 30, 2024

traviscross Dec 1, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chorman0773 Dec 1, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

RalfJung commented Jan 28, 2025

RalfJung commented Jan 28, 2025 • edited Loading

chorman0773 commented Jan 28, 2025

carbotaniuman commented Jan 31, 2025

tmandry commented Feb 1, 2025

RalfJung commented Feb 1, 2025

traviscross commented Feb 1, 2025 • edited Loading

tmandry commented Feb 3, 2025

RalfJung commented Feb 3, 2025

theemathas commented Feb 4, 2025

chorman0773 commented Feb 4, 2025

traviscross commented Feb 4, 2025

carbotaniuman commented Feb 5, 2025

RalfJung commented Feb 5, 2025 via email

safinaskar commented Feb 5, 2025

saethlin commented Feb 5, 2025

safinaskar commented Feb 6, 2025

chorman0773 commented Oct 15, 2024 •

edited

Loading

scottmcm commented Oct 15, 2024 •

edited

Loading

kennytm Oct 16, 2024 •

edited

Loading

chorman0773 commented Oct 16, 2024 •

edited

Loading

RalfJung commented Oct 17, 2024 •

edited

Loading

clarfonthey commented Oct 17, 2024 •

edited

Loading

chorman0773 commented Oct 17, 2024 •

edited

Loading

GoldsteinE commented Nov 30, 2024 •

edited

Loading

RalfJung commented Nov 30, 2024 •

edited

Loading

traviscross Dec 1, 2024 •

edited

Loading

chorman0773 Dec 1, 2024 •

edited

Loading

RalfJung commented Jan 28, 2025 •

edited

Loading

traviscross commented Feb 1, 2025 •

edited

Loading